Fuel Consumption Prediction using Linear Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Vehicle engineers and policy makers rely on accurate fuel‑consumption estimates to design efficient engines, set emissions targets, and inform buyers. Given basic engine specifications (size, cylinders), vehicle class, transmission type, and fuel type, we want to predict combined fuel consumption in L / 100 km for new light‑duty vehicles using an interpretable linear‑regression baseline. A transparent model reveals which specs drive efficiency and offers a benchmark before moving to more complex algorithms.

Libraries Required

pandas # tabular wrangling
numpy # numeric helpers
matplotlib.pyplot # quick EDA charts
seaborn # tidy correlation heatmaps (optional)
scikit‑learn # train/test split, model, metrics
joblib # save trained model

Dataset Link

Canadian Fuel Consumption & CO₂

Step by Step Code Implementation

Why linear regression?

Inside typical design limits, combined fuel use scales roughly linearly with engine displacement and cylinder count. A linear model exposes these first‑order correlations and provides coefficients that engineers can read in minutes.

 Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns                          # optional
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

Load data

df = pd.read_csv("FuelConsumptionCo2.csv")
print(df.head())
print(df.shape)

 Minimal cleaning

# drop obvious duplicates or rows with critical blanks
df = df.dropna(subset=['Engine Size(L)', 'Cylinders',
                       'Fuel Type', 'Vehicle Class',
                       'Fuel Consumption Comb (L/100 km)']).copy()

Define features & label

target   = 'Fuel Consumption Comb (L/100 km)'
num_cols = ['Engine Size(L)', 'Cylinders']
cat_cols = ['Fuel Type', 'Vehicle Class', 'Transmission']

X = df[num_cols + cat_cols]
y = df[target]

Pre‑processing & model pipeline

One‑hot encoding avoids treating labels like “SUV” or “Automatic” as ordinal numbers—each becomes its own binary column, letting the model learn a clean adjustment per category.
ColumnTransformer + Pipeline bundles preprocessing and model into one object, preventing training/inference mismatches and easing export to production.

# one‑hot encode the categorical columns
ohe = OneHotEncoder(handle_unknown='ignore')

preproc = ColumnTransformer([
        ('cat', ohe, cat_cols)
    ], remainder='passthrough')               # numeric columns pass through unchanged

lin_model = LinearRegression()

pipeline = Pipeline(steps=[
        ('prep', preproc),
        ('model', lin_model)
])

 Train/test split and training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

pipeline.fit(X_train, y_train)

Evaluation

MAE in L / 100 km is intuitive: if MAE ≈ 0.5, our typical guess is within half a litre per 100 km—handy when deciding if the model is “good enough” for showroom labels.

y_pred = pipeline.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.2f} L/100 km")

Inspect coefficients (top drivers)

The coefficient table pinpoints which transmissions, fuel types, or classes improve (negative) or worsen (positive) consumption. Product teams can use this to prioritise efficiency tweaks.

# recover feature names from OneHotEncoder
ohe_feats = pipeline.named_steps['prep']\
                    .named_transformers_['cat']\
                    .get_feature_names_out(cat_cols)

all_feats = list(ohe_feats) + num_cols

coefs = pd.Series(pipeline.named_steps['model'].coef_,
                  index=all_feats).sort_values()

print("Most efficient specs (negative coefficients):")
print(coefs.head(10))
print("\nGas‑guzzling specs (positive coefficients):")
print(coefs.tail(10))

Persist the trained pipeline.

joblib.dump(pipeline, "fuel_consumption_linreg.pkl")

Summary

This walkthrough shows how to turn a publicly available vehicle-spec dataset into a transparent fuel-consumption estimator with little more than pandas and scikit-learn. After minimal cleaning and one‑hot encoding, the linear model captures core efficiency drivers—engine size, cylinder count, vehicle class—while giving planners a numeric error band they can act on. The same pipeline can feed deeper models (Ridge, Gradient Boosting, XGBoost) later, but starting with this interpretable baseline grounds the project in solid engineering insight.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Fuel Consumption Prediction using Linear Regression in ML

Libraries Required

Dataset Link

Step by Step Code Implementation

Why linear regression?

Import libraries

Load data

Minimal cleaning

Define features & label

Pre‑processing & model pipeline

Train/test split and training

Evaluation

Inspect coefficients (top drivers)

Persist the trained pipeline.

Summary

Leave a Reply Cancel reply

 Import libraries

 Minimal cleaning

 Train/test split and training