Traffic Speed Curve Prediction with Polynomial Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Transportation planners and traffic‑management centres need to predict average vehicle speed on arterial corridors at future time points—using only early‑week historical speed observations, time of day, day of week, and weather conditions—to proactively adjust signal timings and traveller information. Empirical studies show that speed exhibits nonlinear diurnal and weekly patterns (rush‑hour dips, mid‑day plateaus, weekend shifts) and interacts with weather (rainfall amplifies morning slowdown). A straight‑line regression underfits these curves, while an unregularised high‑degree polynomial overfits transient noise. By applying Polynomial Regression to engineered time‑and‑weather features with Ridge regularisation, we capture smooth speed-pattern curvatures and deliver reliable, interpretable forecasts for real‑time traffic control.

Dataset

Traffic Prediction 2019

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Preliminary Cleanup

Weather (temperature, rainfall_mm) introduces environmental effects—rain often slows traffic nonlinearly.

# Load speed and weather tables
speeds = pd.read_csv("train.csv", parse_dates=["timestamp"])
weather = pd.read_csv("weather_data.csv", parse_dates=["timestamp"])

# Merge on timestamp and station if available
df = speeds.merge(weather, on="timestamp", how="left")

# Example columns: ['timestamp','station_id','speed','temperature',
#                   'rainfall_mm','humidity']
df.dropna(subset=["speed"], inplace=True)

3. Feature Engineering & EDA

Time features (hour, dow) are included directly so their interactions (via polynomial terms) model diurnal and weekly cycles.
Lagged speed (speed_lag1) captures inertia: traffic speed rarely jumps instantaneously.

# Extract cyclical time features
df["hour"] = df["timestamp"].dt.hour
df["dow"]  = df["timestamp"].dt.dayofweek  # 0=Mon…6=Sun

# Quick plot: speed vs hour
sns.lineplot(x="hour", y="speed", data=df.sample(5000))
plt.title("Hourly Speed Pattern")
plt.xlabel("Hour of Day")
plt.ylabel("Speed (km/h)")
plt.show()

4. Define Features & Target

# Use lagged speed to capture inertia (previous 1 interval)
df["speed_lag1"] = df.groupby("station_id")["speed"].shift(1)
df.dropna(subset=["speed_lag1"], inplace=True)

X = df[["speed_lag1","hour","dow","temperature","rainfall_mm"]]
y = df["speed"]

5. Build a Polynomial Regression Pipeline

StandardScaler z‑scores numeric inputs so the Ridge penalty weights each term equally.
PolynomialFeatures generates all squares and cross‑products up to the chosen degree, encoding curvature and interactions (e.g., rainfall_mm², speed_lag1×hour).

# Separate numeric and cyclic categorical
num_cols = ["speed_lag1","temperature","rainfall_mm"]
cat_cols = ["hour","dow"]

preproc = ColumnTransformer([
    ("num", StandardScaler(), num_cols),
    ("cat", "passthrough", cat_cols)  # treat hour/dow as numeric; polynomial will expand
])

pipe = Pipeline([
    ("prep", preproc),
    ("poly", PolynomialFeatures(include_bias=False)),
    ("ridge", Ridge(random_state=42))
])

6. Train/Test Split & Hyperparameter Search

GridSearchCV tunes the polynomial degree (1–3) and regularisation α (0.01–100) with 5‑fold CV to minimize RMSE.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-2, 2, 5)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best params:", gs.best_params_)

7. Model Evaluation

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Hold‑out RMSE: {rmse:.2f} km/h | R²: {r2:.3f}")

8. Interpret Key Polynomial Coefficients

# Retrieve names after expansion
poly = gs.best_estimator_.named_steps["poly"]
# we must reconstruct input names
input_features = num_cols + cat_cols
feat_names = poly.get_feature_names_out(input_features)

coefs = gs.best_estimator_.named_steps["ridge"].coef_
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

plt.figure(figsize=(8,4))
imp.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Speed")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression workflow reliably forecasts traffic speed by:

Capturing inertia and nonlinear patterns of daily and weekly cycles.
Accounting for weather impacts that amplify rush‑hour congestion.
Balancing model complexity through Ridge regularisation, yielding low RMSE and interpretable coefficients—enabling operators to adjust signal timing and traveller alerts proactively.

If you are Happy with ProjectGurukul, do not forget to make us happy with your positive feedback on Google | Facebook

Traffic Speed Curve Prediction with Polynomial Regression in ML

Dataset

Step-by-Step Code Implementation

1. Libraries Required

2. Load Data & Preliminary Cleanup

3. Feature Engineering & EDA

4. Define Features & Target

5. Build a Polynomial Regression Pipeline

6. Train/Test Split & Hyperparameter Search

7. Model Evaluation

8. Interpret Key Polynomial Coefficients