Vehicle Fuel Consumption Prediction with Polynomial Regression in ML

FREE Online Courses: Transform Your Career – Enroll for Free!

Fleet managers and vehicle engineers need to predict a car’s fuel consumption (L/100 km) from easily measured trip features before actual driving. Historical telematics data show that consumption depends nonlinearly on average speed, ambient temperature, trip distance, and cabin temperature. A simple linear model underestimates the curvature—e.g. aerodynamic drag grows with speed squared—while an unconstrained high‑degree polynomial overfits.

By applying Polynomial Regression (i.e., linear regression on polynomially expanded features) with Ridge regularisation, we can model smooth, nonlinear dependencies and deliver accurate, interpretable fuel estimates for route planning and eco‑driving coaching.

Dataset

Car Fuel Consumption

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                      # data loading & handling  
import numpy as np                       # numerical operations  

import matplotlib.pyplot as plt          # plotting  
import seaborn as sns                    # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Libraries

import pandas as pd
import numpy as np

# Load the CSV (adjust path as needed)
df = pd.read_csv("data/car-consume.csv")

# Preview 
df.head()[['distance_km','consume_L_per_100km','avg_speed_kmh','temp_inside_C','temp_outside_C']]

3. Exploratory Data Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Scatter: speed vs consumption
sns.scatterplot(x='avg_speed_kmh', y='consume_L_per_100km', data=df, alpha=0.5)
plt.title("Avg Speed vs Fuel Consumption")
plt.xlabel("Average Speed (km/h)")
plt.ylabel("Fuel Consumption (L/100 km)")
plt.show()

4. Define Features & Target

PolynomialFeatures augments inputs with squared and interaction terms—e.g. avg_speed_kmh², distance_km × temp_outside_C—capturing drag effects and temperature interactions.

# Features known before trip starts
X = df[['distance_km','avg_speed_kmh','temp_inside_C','temp_outside_C']]
y = df['consume_L_per_100km']

5. Build Polynomial Regression Pipeline

  • StandardScaler normalises each feature so that the Ridge penalty treats them equally, avoiding dominance by high‑variance terms.
  • Ridge regression (ℓ²) applies shrinkage to control overfitting from the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

6. Train/Test Split & Hyperparameter Search

GridSearchCV explores polynomial degrees (1–3) and regularisation strengths α (10⁻³–10³) via 5‑fold cross‑validation, optimising for the lowest RMSE on held‑out folds.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree' : [1, 2, 3],
    'ridge__alpha' : np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

7. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} L/100 km")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

Coefficient inspection highlights which nonlinear or interaction features have the largest impact—e.g. squared speed or speed × outside temperature—offering actionable insights for eco‑driving recommendations

# Retrieve feature names after polynomial expansion
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)

# Retrieve Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

# Plot top 10
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Fuel Consumption")
plt.xlabel("Coefficient magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a concise pipeline, we produce a model that:

  1. Accurately predicts fuel consumption (low RMSE, high R²) from trip planning metrics.
  2. Captures key nonlinear effects—drag, temperature interactions—without overfitting.
  3. Provides interpretable drivers that guide route and speed recommendations for improved fuel economy.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *