Online Learning Score Prediction with Polynomial Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

E‑learning platforms and instructional designers need to forecast a learner’s final course score (0–100) from early‑course indicators—such as weekly study time, number of forum posts, quiz attempt counts, and chosen learning style—before assessments conclude so that they can intervene with personalised guidance. Empirical data show nonlinear dependencies: extra study time has diminishing returns, forums help most up to a point, and different learning styles interact with activity metrics. A simple linear model underfits these curves, while a naïve high‑degree polynomial overfits noise. By applying Polynomial Regression on engineered features with Ridge (ℓ²) regularisation, we learn a smooth, interpretable mapping from early behaviours to expected final score, enabling timely, targeted support.

Dataset

Student Performance & Learning Style

Step-by-Step Code Implementation

1.  Libraries Required

import pandas as pd                            # data loading & handling  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder  
from sklearn.compose import ColumnTransformer  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Inspect

import pandas as pd

df = pd.read_csv("data/student_performance_learning_style.csv")
# Preview relevant columns
df[['weekly_study_hours','forum_posts','quiz_attempts','learning_style','final_score']].head()

3. Exploratory Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Check nonlinear trend: study hours vs score
sns.scatterplot(
    x='weekly_study_hours', y='final_score',
    hue='learning_style', data=df, alpha=0.6
)
plt.title("Weekly Study Hours vs Final Score (by Learning Style)")
plt.xlabel("Weekly Study Hours")
plt.ylabel("Final Score")
plt.show()

4. Define Features & Target

Generates squared and interaction terms (e.g., weekly_study_hours², forum_posts×quiz_attempts, study_hours×learning_style_Sensory) to capture nonlinear and style‑dependent effects.

# Target: final course score
y = df['final_score']

# Features: numeric and categorical
numeric_features   = ['weekly_study_hours','forum_posts','quiz_attempts']
categorical_features = ['learning_style']

X = df[numeric_features + categorical_features]

5. Build a Polynomial Regression Pipeline

StandardScaler normalizes weekly study hours, forum posts, and quiz attempts so the Ridge penalty treats each numeric equally.
OneHotEncoder transforms learning styles into binary flags.
Applies an ℓ² penalty (controlled by α) to shrink noisy high‑order coefficients and prevent overfitting.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric_features),
    ('cat', OneHotEncoder(drop='first'), categorical_features)
])

pipe = Pipeline([
    ('prep', preprocessor),
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(random_state=42))
])

6. Train/Test Split & Hyperparameter Search

Explores polynomial degree (1–3) and α (10⁻³…10³) via 5‑fold CV, selecting the model that minimizes RMSE.
Applies an ℓ² penalty (controlled by α) to shrink noisy high‑order coefficients and prevent overfitting.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best polynomial degree:", gs.best_params_['poly__degree'])
print("Best Ridge α          :", gs.best_params_['ridge__alpha'])

7. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} points")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

The most significant coefficients—such as quiz_attempts² or weekly_study_hours×learning_style_Visual—highlight the key nonlinear drivers of final course performance, guiding interventions (e.g., encouraging additional quizzes for certain learner types)

# Reconstruct feature names after preprocessing
prep = gs.best_estimator_.named_steps['prep']
num_feats = numeric_features
cat_feats = prep.named_transformers_['cat'] \
                .get_feature_names_out(categorical_features).tolist()
input_feats = num_feats + cat_feats

poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=input_feats)
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Final Score")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By combining polynomial feature engineering with Ridge regularisation in a clean pipeline, we achieve:

Accurate nonlinear prediction of final course scores from early‑week behaviours (low RMSE, high R²).
Controlled model complexity, avoiding overfitting while capturing essential curves and interactions.
Actionable insights, with interpretable polynomial terms pinpointing which combinations of study habits and learning styles most affect outcomes—enabling personalized support in online learning environments.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

Online Learning Score Prediction with Polynomial Regression in ML

Dataset

Step-by-Step Code Implementation

1. Libraries Required

2. Load Data & Inspect

3. Exploratory Analysis

4. Define Features & Target

5. Build a Polynomial Regression Pipeline

6. Train/Test Split & Hyperparameter Search

7. Evaluate Model

8. Inspect Key Polynomial Coefficients