Vehicle Fuel Consumption Prediction with Polynomial Regression in ML
FREE Online Courses: Transform Your Career – Enroll for Free!
Fleet managers and vehicle engineers need to predict a car’s fuel consumption (L/100 km) from easily measured trip features before actual driving. Historical telematics data show that consumption depends nonlinearly on average speed, ambient temperature, trip distance, and cabin temperature. A simple linear model underestimates the curvature—e.g. aerodynamic drag grows with speed squared—while an unconstrained high‑degree polynomial overfits.
By applying Polynomial Regression (i.e., linear regression on polynomially expanded features) with Ridge regularisation, we can model smooth, nonlinear dependencies and deliver accurate, interpretable fuel estimates for route planning and eco‑driving coaching.
Dataset
Step-by-Step Code Implementation
1. Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Libraries
import pandas as pd
import numpy as np
# Load the CSV (adjust path as needed)
df = pd.read_csv("data/car-consume.csv")
# Preview
df.head()[['distance_km','consume_L_per_100km','avg_speed_kmh','temp_inside_C','temp_outside_C']]
3. Exploratory Data Analysis
import seaborn as sns
import matplotlib.pyplot as plt
# Scatter: speed vs consumption
sns.scatterplot(x='avg_speed_kmh', y='consume_L_per_100km', data=df, alpha=0.5)
plt.title("Avg Speed vs Fuel Consumption")
plt.xlabel("Average Speed (km/h)")
plt.ylabel("Fuel Consumption (L/100 km)")
plt.show()
4. Define Features & Target
PolynomialFeatures augments inputs with squared and interaction terms—e.g. avg_speed_kmh², distance_km × temp_outside_C—capturing drag effects and temperature interactions.
# Features known before trip starts X = df[['distance_km','avg_speed_kmh','temp_inside_C','temp_outside_C']] y = df['consume_L_per_100km']
5. Build Polynomial Regression Pipeline
- StandardScaler normalises each feature so that the Ridge penalty treats them equally, avoiding dominance by high‑variance terms.
- Ridge regression (ℓ²) applies shrinkage to control overfitting from the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(include_bias=False)),
('ridge', Ridge(random_state=42))
])
6. Train/Test Split & Hyperparameter Search
GridSearchCV explores polynomial degrees (1–3) and regularisation strengths α (10⁻³–10³) via 5‑fold cross‑validation, optimising for the lowest RMSE on held‑out folds.
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
param_grid = {
'poly__degree' : [1, 2, 3],
'ridge__alpha' : np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best parameters:", gs.best_params_)
7. Evaluate Model
from sklearn.metrics import mean_squared_error, r2_score
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f} L/100 km")
print(f"Test R² : {r2:.3f}")
8. Inspect Key Polynomial Coefficients
Coefficient inspection highlights which nonlinear or interaction features have the largest impact—e.g. squared speed or speed × outside temperature—offering actionable insights for eco‑driving recommendations
# Retrieve feature names after polynomial expansion
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
# Retrieve Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
# Plot top 10
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Fuel Consumption")
plt.xlabel("Coefficient magnitude")
plt.tight_layout()
plt.show()
Summary
By integrating polynomial feature engineering with Ridge regularisation in a concise pipeline, we produce a model that:
- Accurately predicts fuel consumption (low RMSE, high R²) from trip planning metrics.
- Captures key nonlinear effects—drag, temperature interactions—without overfitting.
- Provides interpretable drivers that guide route and speed recommendations for improved fuel economy.