Miles per Gallon - Preprocessing: Polynomial Features

1. Analyse du dataset
Preprocessing: Polynomial Features - Une variable
Preprocessing: Polynomial Features - Deux variables



1. Analyse du dataset

Analyse du Dataset
df = sns.load_dataset('mpg')
df.head()

    mpg  cylinders  displacement  ...  model_year  origin                       name
0  18.0          8         307.0  ...          70     usa  chevrolet chevelle malibu
1  15.0          8         350.0  ...          70     usa          buick skylark 320
2  18.0          8         318.0  ...          70     usa         plymouth satellite
3  16.0          8         304.0  ...          70     usa              amc rebel sst
4  17.0          8         302.0  ...          70     usa                ford torino

[5 rows x 9 columns]

Analyse de la variable mpg
df['mpg']

0      18.0
1      15.0
2      18.0
3      16.0
4      17.0
       ... 
393    27.0
394    44.0
395    32.0
396    28.0
397    31.0
Name: mpg, Length: 398, dtype: float64




Preprocessing: Polynomial Features - Une variable

Documentation Polynomial Features
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

Polynomial Features (1 variable, degré 2)
X = df[['mpg']]
polynomial_transform = PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)

[[1.000e+00 1.800e+01 3.240e+02]
 [1.000e+00 1.500e+01 2.250e+02]
 [1.000e+00 1.800e+01 3.240e+02]
 ...
 [1.000e+00 3.200e+01 1.024e+03]
 [1.000e+00 2.800e+01 7.840e+02]
 [1.000e+00 3.100e+01 9.610e+02]]

Polynomial Features (nombres entiers)
result.astype(int)

[[   1   18  324]
 [   1   15  225]
 [   1   18  324]
 ...
 [   1   32 1024]
 [   1   28  784]
 [   1   31  961]]

1ère colonne: 1
2ème colonne: valeur d'origine
3ème colonne: valeur d'origine mise au carré

→ Polynomial Feature de degré 2 et une seule variable: 1, x, x2

Polynomial Features (1 variable, degré 4)
X = df[['mpg']]
polynomial_transform = PolynomialFeatures(degree=4, interaction_only=False, include_bias=False)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)

[[     18     324    5832  104976]
 [     15     225    3375   50625]
 [     18     324    5832  104976]
 ...
 [     32    1024   32768 1048576]
 [     28     784   21952  614656]
 [     31     961   29791  923521]]

→ Polynomial Feature de degré 4 et une seule variable: x, x2, x3, x4




Preprocessing: Polynomial Features - Deux variables

Polynomial Features (2 variables, degré 2)
X = df[['mpg', 'cylinders']]
polynomial_transform = PolynomialFeatures(degree=4, interaction_only=False, include_bias=False)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)

[[  18    8  324  144   64]
 [  15    8  225  120   64]
 [  18    8  324  144   64]
 ...
 [  32    4 1024  128   16]
 [  28    4  784  112   16]
 [  31    4  961  124   16]]

→ Polynomial Feature de degré 2 et 2 variables: x1, x2, x12, x1.x2, x22

Polynomial Features (2 variables, degré 2)
X = df[['mpg', 'cylinders']]
polynomial_transform = PolynomialFeatures(degree=4, interaction_only=True, include_bias=False)
polynomial_transform.fit(X)
result = polynomial_transform.transform(X)

[[ 18   8 144]
 [ 15   8 120]
 [ 18   8 144]
 ...
 [ 32   4 128]
 [ 28   4 112]
 [ 31   4 124]]

→ Polynomial Feature de degré 2 et 2 variables: x1, x2, x1.x2