Exam Preparation Time Prediction using Linear Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Educators often advise students to “study more,” yet few can translate that advice into the number of hours an individual needs to achieve a desired score.

Using a public “study‑hours ⇄ exam‑score” dataset, we fit a simple linear‑regression model that captures the relationship between preparation time and marks. The resulting equation lets us work forwards (predict the score a student might obtain for a given number of study hours) or backwards (estimate how many hours a student should plan to reach a target grade). The model serves as a transparent baseline before exploring richer, personalised recommendations.

Libraries Required

pandas # tabular wrangling
numpy # numerical helpers
matplotlib.pyplot# quick scatter & fit line
scikit‑learn # model, split, metrics
joblib # save the trained model

Dataset Link

Study Hours vs Exam Scores

Step by Step Code Implementation

Why linear regression? Within normal ranges, exam scores often rise roughly proportionally with extra study time; a straight‑line fit supplies an interpretable first‑order model.

1. Import essentials

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load the data

df = pd.read_csv("study_hours_vs_exam_scores.csv")
print(df.head())
# Expected columns:  'Hours'  (float)  and  'Scores'  (percentage)

3. Basic sanity check

Visual overlay confirms model sanity at a glance; large deviations or a curved pattern would signal the need for polynomial terms or a different algorithm.

# Plot raw relationship
plt.scatter(df['Hours'], df['Scores'])
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score (%)")
plt.title("Study Hours vs Exam Score")
plt.show()

4. Prepare features & label.

X = df[['Hours']]          # 2‑D array expected by scikit‑learn
y = df['Scores']

5. Train‑test split

Train‑test split keeps 20 % of the records unseen during fitting, giving an honest estimate of predictive performance.

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

6. Model training

linreg = LinearRegression()
linreg.fit(X_train, y_train)

7. Evaluation

R² and MAE tell different stories—how much variance we capture and the typical absolute error in score points—helping tutors decide if the rule of thumb is actionable.

y_pred = linreg.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.2f} percentage points")

8. Inspect the fitted line.

Single‑feature simplicity makes the maths clear: Score = m × Hours + b. Once m (slope) and b (intercept) are learned, you can invert the equation to find the required hours for any realistic score goal.

coef  = linreg.coef_[0]       # slope
inter = linreg.intercept_     # y‑intercept

print(f"Score = {coef:.2f} × Hours  +  {inter:.2f}")

# overlay regression line on scatterplot
plt.scatter(df['Hours'], df['Scores'], label="Actual")
x_line = np.linspace(0, df['Hours'].max(), 100).reshape(-1, 1)
plt.plot(x_line, linreg.predict(x_line), color='red', label="Fitted line")
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score (%)")
plt.legend()
plt.show()

9. Utility helper – predict hours for a target score.

Helper function hours_needed() wraps the inversion step so web apps or dashboards can surface personalised study‑time recommendations instantly.

def hours_needed(target_score):
    """
    Estimate study hours required for a desired percentage.
    Returns None if the target is unrealistic for the model.
    """
    if coef == 0:             # safety check
        return None
    return max((target_score - inter) / coef, 0)

print(f"≈ Hours needed for 85 %: {hours_needed(85):.1f}")

10. Persist the model

Model persistence with joblib allows the same coefficients to serve real‑time advice in a classroom portal without retraining.

joblib.dump(linreg, "exam_prep_time_linreg.pkl")

Summary

This compact workflow turns a transparent linear fit into a practical calculator for study planning. Teachers can plug in a target score and hand back a round‑number estimate of preparation hours, backed by real data instead of guesswork. While individual learning rates vary, starting with this interpretable baseline establishes trust, highlights outliers for further mentoring, and lays the groundwork for more nuanced models that fold in subject difficulty, prior knowledge, and learning style.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook