Miles per Gallon - Preprocessing: Polynomial Features
1. Analyse du dataset
Preprocessing: Polynomial Features - Une variable
Preprocessing: Polynomial Features - Deux variables
Preprocessing: Polynomial Features - Une variable
Preprocessing: Polynomial Features - Deux variables
1. Analyse du dataset
Analyse du Dataset
df = sns.load_dataset('mpg')
df.head()
mpg cylinders displacement ... model_year origin name 0 18.0 8 307.0 ... 70 usa chevrolet chevelle malibu 1 15.0 8 350.0 ... 70 usa buick skylark 320 2 18.0 8 318.0 ... 70 usa plymouth satellite 3 16.0 8 304.0 ... 70 usa amc rebel sst 4 17.0 8 302.0 ... 70 usa ford torino [5 rows x 9 columns]
Analyse de la variable mpg
df['mpg']
0 18.0 1 15.0 2 18.0 3 16.0 4 17.0 ... 393 27.0 394 44.0 395 32.0 396 28.0 397 31.0 Name: mpg, Length: 398, dtype: float64
Preprocessing: Polynomial Features - Une variable
Documentation Polynomial Features
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
Polynomial Features (1 variable, degré 2)
X = df[['mpg']]
polynomial_transform = PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)
[[1.000e+00 1.800e+01 3.240e+02] [1.000e+00 1.500e+01 2.250e+02] [1.000e+00 1.800e+01 3.240e+02] ... [1.000e+00 3.200e+01 1.024e+03] [1.000e+00 2.800e+01 7.840e+02] [1.000e+00 3.100e+01 9.610e+02]]
Polynomial Features (nombres entiers)
result.astype(int)
[[ 1 18 324] [ 1 15 225] [ 1 18 324] ... [ 1 32 1024] [ 1 28 784] [ 1 31 961]]
1ère colonne: 1
2ème colonne: valeur d'origine
3ème colonne: valeur d'origine mise au carré
2ème colonne: valeur d'origine
3ème colonne: valeur d'origine mise au carré
→
Polynomial Feature de degré 2 et une seule variable: 1, x, x2
Polynomial Features (1 variable, degré 4)
X = df[['mpg']]
polynomial_transform = PolynomialFeatures(degree=4, interaction_only=False, include_bias=False)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)
[[ 18 324 5832 104976] [ 15 225 3375 50625] [ 18 324 5832 104976] ... [ 32 1024 32768 1048576] [ 28 784 21952 614656] [ 31 961 29791 923521]]
→
Polynomial Feature de degré 4 et une seule variable: x, x2, x3, x4
Preprocessing: Polynomial Features - Deux variables
Polynomial Features (2 variables, degré 2)
X = df[['mpg', 'cylinders']]
polynomial_transform = PolynomialFeatures(degree=4, interaction_only=False, include_bias=False)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)
[[ 18 8 324 144 64] [ 15 8 225 120 64] [ 18 8 324 144 64] ... [ 32 4 1024 128 16] [ 28 4 784 112 16] [ 31 4 961 124 16]]
→
Polynomial Feature de degré 2 et 2 variables: x1, x2, x12, x1.x2, x22
Polynomial Features (2 variables, degré 2)
X = df[['mpg', 'cylinders']]
polynomial_transform = PolynomialFeatures(degree=4, interaction_only=True, include_bias=False)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)
[[ 18 8 144] [ 15 8 120] [ 18 8 144] ... [ 32 4 128] [ 28 4 112] [ 31 4 124]]
→
Polynomial Feature de degré 2 et 2 variables: x1, x2, x1.x2