Fuel Consumption Prediction using Linear Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Vehicle engineers and policy makers rely on accurate fuel‑consumption estimates to design efficient engines, set emissions targets, and inform buyers. Given basic engine specifications (size, cylinders), vehicle class, transmission type, and fuel type, we want to predict combined fuel consumption in L / 100 km for new light‑duty vehicles using an interpretable linear‑regression baseline. A transparent model reveals which specs drive efficiency and offers a benchmark before moving to more complex algorithms.

Libraries Required

  • pandas # tabular wrangling
  • numpy # numeric helpers
  • matplotlib.pyplot # quick EDA charts
  • seaborn # tidy correlation heatmaps (optional)
  • scikit‑learn # train/test split, model, metrics
  • joblib # save trained model

Dataset Link

Canadian Fuel Consumption & CO₂

Step by Step Code Implementation

Why linear regression?

Inside typical design limits, combined fuel use scales roughly linearly with engine displacement and cylinder count. A linear model exposes these first‑order correlations and provides coefficients that engineers can read in minutes.

 Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns                          # optional
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

Load data

df = pd.read_csv("FuelConsumptionCo2.csv")
print(df.head())
print(df.shape)

 Minimal cleaning

# drop obvious duplicates or rows with critical blanks
df = df.dropna(subset=['Engine Size(L)', 'Cylinders',
                       'Fuel Type', 'Vehicle Class',
                       'Fuel Consumption Comb (L/100 km)']).copy()

Define features & label

target   = 'Fuel Consumption Comb (L/100 km)'
num_cols = ['Engine Size(L)', 'Cylinders']
cat_cols = ['Fuel Type', 'Vehicle Class', 'Transmission']

X = df[num_cols + cat_cols]
y = df[target]

Pre‑processing & model pipeline

  • One‑hot encoding avoids treating labels like “SUV” or “Automatic” as ordinal numbers—each becomes its own binary column, letting the model learn a clean adjustment per category.
  • ColumnTransformer + Pipeline bundles preprocessing and model into one object, preventing training/inference mismatches and easing export to production.
# one‑hot encode the categorical columns
ohe = OneHotEncoder(handle_unknown='ignore')

preproc = ColumnTransformer([
        ('cat', ohe, cat_cols)
    ], remainder='passthrough')               # numeric columns pass through unchanged

lin_model = LinearRegression()

pipeline = Pipeline(steps=[
        ('prep', preproc),
        ('model', lin_model)
])

 Train/test split and training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

pipeline.fit(X_train, y_train)

Evaluation

MAE in L / 100 km is intuitive: if MAE ≈ 0.5, our typical guess is within half a litre per 100 km—handy when deciding if the model is “good enough” for showroom labels.

y_pred = pipeline.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.2f} L/100 km")

Inspect coefficients (top drivers)

The coefficient table pinpoints which transmissions, fuel types, or classes improve (negative) or worsen (positive) consumption. Product teams can use this to prioritise efficiency tweaks.

# recover feature names from OneHotEncoder
ohe_feats = pipeline.named_steps['prep']\
                    .named_transformers_['cat']\
                    .get_feature_names_out(cat_cols)

all_feats = list(ohe_feats) + num_cols

coefs = pd.Series(pipeline.named_steps['model'].coef_,
                  index=all_feats).sort_values()

print("Most efficient specs (negative coefficients):")
print(coefs.head(10))
print("\nGas‑guzzling specs (positive coefficients):")
print(coefs.tail(10))

Persist the trained pipeline.

joblib.dump(pipeline, "fuel_consumption_linreg.pkl")

Summary

This walkthrough shows how to turn a publicly available vehicle-spec dataset into a transparent fuel-consumption estimator with little more than pandas and scikit-learn. After minimal cleaning and one‑hot encoding, the linear model captures core efficiency drivers—engine size, cylinder count, vehicle class—while giving planners a numeric error band they can act on. The same pipeline can feed deeper models (Ridge, Gradient Boosting, XGBoost) later, but starting with this interpretable baseline grounds the project in solid engineering insight.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *