Gym Session Cost Trend Prediction with Polynomial Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Gym‑operations managers need to forecast the week‑over‑week percentage change in average per‑session cost (USD/session) using only early‑week indicators—prior‑week session cost, session volume, membership tier mix, and promotional spend—to adjust pricing and offers before the week concludes. Historical gym‑usage logs show nonlinear dynamics: cost per session plateaus at high volumes, premium‑tier membership reduces unit cost in bulk, and promotional spend interacts with volume in complex ways. A simple linear model underfits these curvatures, while an unrestricted high‑degree polynomial overfits noise in week‑to‑week fluctuations. By fitting a Polynomial Regression model to engineered features with Ridge (ℓ²) regularisation, we capture smooth, interpretable cost‑trend curves and deliver accurate forecasts for dynamic pricing decisions.

Dataset

Gym Membership Dataset

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                            # data loading & manipulation  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Compute Weekly Metrics

Lagged features:

  • cost_prev captures momentum in the session cost;
  • volume_prev captures demand saturation effects.
  • Tier mix features (mix_Basic, mix_Premium, …): one‑hot represent membership‑tier proportions, modelling how bulk tiers affect unit cost.
import pandas as pd

# Load the membership data (adjust path)
df = pd.read_csv("data/gym_membership.csv")

# Suppose the dataset has columns: ['member_id','tier','monthly_fee','weekly_sessions']
# Compute cost per session for each record
df['cost_per_session'] = df['monthly_fee'] / (df['weekly_sessions'] * 4)

# Aggregate to weekly level
weekly = df.groupby('week_start').agg({
    'cost_per_session': 'mean',      # average cost/session
    'member_id':           'count',  # session volume proxy
    'tier':                lambda x: x.value_counts(normalize=True).to_dict()
}).rename(columns={
    'cost_per_session':'avg_cost',
    'member_id':'volume'
}).reset_index()

# Expand tier mix into features
tier_df = pd.DataFrame(weekly['tier'].tolist()).fillna(0).add_prefix('mix_')
weekly = pd.concat([weekly.drop(columns='tier'), tier_df], axis=1)

3. Feature Engineering & Target

PolynomialFeatures: expands inputs into squares and interactions (e.g. cost_prev², cost_prev×volume_prev, volume_prev×mix_Premium) to capture curvature and synergies in cost dynamics.

# Sort chronologically and create lag features
weekly = weekly.sort_values('week_start')
weekly['cost_prev']   = weekly['avg_cost'].shift(1)
weekly['volume_prev'] = weekly['volume'].shift(1)
weekly.dropna(subset=['cost_prev','volume_prev'], inplace=True)

# Compute week‑over‑week cost growth (%)
weekly['cost_growth_pct'] = (
    (weekly['avg_cost'] - weekly['cost_prev']) / weekly['cost_prev'] * 100
)

# Define feature matrix and target
feature_cols = ['cost_prev','volume_prev'] + [c for c in weekly.columns if c.startswith('mix_')]
X = weekly[feature_cols]
y = weekly['cost_growth_pct']

4. Build a Polynomial Regression Pipeline

  • StandardScaler: zero‑means and unit‑scales all predictors so Ridge’s ℓ² penalty treats them uniformly.
  • Ridge Regression: applies ℓ² regularisation (alpha) to shrink noisy high‑order coefficients, preventing overfitting to week‑to‑week noise.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(random_state=42))
])

5. Train/Test Split & Hyperparameter Search

GridSearchCV: tunes polynomial degree (1–3) and regularisation strength α (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out growth predictions.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

# Time‑aware split (no shuffle)
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best params:", gs.best_params_)

6. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE       : {rmse:.2f}% growth")
print(f"Test R²         : {r2:.3f}")

7. Inspect Key Polynomial Coefficients

Coefficient inspection: ranking the most significant coefficients reveals which nonlinear or interaction effects—such as volume_prev² or cost_prev×mix_Premium—most drive predicted cost growth, yielding actionable insights for pricing strategy.

poly       = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=feature_cols)
coefs      = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
import matplotlib.pyplot as plt

coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Cost Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline with Ridge regularisation enables gym managers to:

  • Accurately forecast nonlinear cost‑trend dynamics, capturing diminishing returns and tier synergies (low RMSE, high R²).
  • Control model complexity, avoiding overfitting through hyperparameter tuning of degree and α.
  • Gain actionable, interpretable insights, pinpointing which features—such as squared prior cost or interactions with premium-tier mix—most influence session‑cost growth, supporting data‑driven dynamic pricing and promotional decisions.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *