Concert Ticket Sales Growth Prediction with Polynomial Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Live‑event promoters and venue operators need to forecast year‑over‑year growth in concert ticket sales (%) to inform budgeting and capacity planning before setting next season’s tour dates. Historic data show that annual ticket volumes depend nonlinearly on prior‑year sales (momentum or saturation), average ticket price (price elasticity), and total box‑office revenue (market demand), with diminishing returns and threshold effects. A simple linear model underfits these curves, while an unconstrained polynomial overfits noise in year‑to‑year fluctuations. By applying Polynomial Regression to engineered features with Ridge (ℓ²) regularisation, we can model smooth growth dynamics and deliver reliable, interpretable forecasts for strategic decision‑making.

Libraries Required

import pandas as pd                             # data loading & handling  
import numpy as np                              # numerical operations  

import matplotlib.pyplot as plt                 # plotting  
import seaborn as sns                           # visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Annual Ticket Sales

Step-by-Step Code Implementation

Load Data & Initial Inspection

import pandas as pd

# Load the CSV
df = pd.read_csv("data/AnnualTicketSales.csv")

# Preview relevant columns
df.head()[['YEAR','TICKETS SOLD','TOTAL BOX OFFICE','AVERAGE TICKET PRICE']]

Feature Engineering & Target Definition

We calculate Growth_Pct as the percentage change in TICKETS SOLD from the previous year.

1. Tickets_Prev: captures momentum or saturation.

2. TOTAL BOX OFFICE: overall market demand.

3. AVERAGE TICKET PRICE: price elasticity effects.

# Compute year-over-year growth in tickets sold
df = df.sort_values('YEAR').reset_index(drop=True)
df['Tickets_Prev'] = df['TICKETS SOLD'].shift(1)
df['Growth_Pct']   = (df['TICKETS SOLD'] - df['Tickets_Prev']) / df['Tickets_Prev'] * 100

# Drop the first year (NaN growth)
df = df.dropna(subset=['Growth_Pct'])

# Define features and target
X = df[['Tickets_Prev','TOTAL BOX OFFICE','AVERAGE TICKET PRICE']]
y = df['Growth_Pct']

Exploratory Visualization

import seaborn as sns
import matplotlib.pyplot as plt

# Scatter: prior sales vs growth
sns.scatterplot(x='Tickets_Prev', y='Growth_Pct', data=df, alpha=0.7)
plt.title("Prior Year Tickets vs Growth Rate")
plt.xlabel("Tickets Sold (prev year)")
plt.ylabel("Growth (%)")
plt.show()

Build a Polynomial Regression Pipeline

StandardScaler normalises feature scales so that Ridge treats all polynomial terms equally.
PolynomialFeatures expands inputs into polynomial and interaction terms, modelling curvature (e.g., diminishing returns on large prior sales) and synergy (e.g., high price × box-office interactions).
Ridge applies an ℓ² penalty to shrink noisy high‑order coefficients, preventing overfitting.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),           # normalize scales
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(random_state=42))
])

Train/Test Split & Hyperparameter Search

degree (1 = linear…3 = cubic) to capture appropriate curvature,
alpha (10⁻³…10³) controlling regularisation strength,
using a 5‑fold CV on training years.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

# Split chronologically to avoid look‑ahead bias
train_idx = df['YEAR'] < 2018
X_train, X_test = X[train_idx], X[~train_idx]
y_train, y_test = y[train_idx], y[~train_idx]

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)

Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f}% growth")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection reveals which nonlinear terms—such as (Tickets_Prev)² or Tickets_Prev × Average Ticket Price—most influence predicted growth, offering interpretable levers for pricing and marketing strategies.

# Get feature names after polynomial expansion
poly        = gs.best_estimator_.named_steps['poly']
feat_names  = poly.get_feature_names_out(input_features=X.columns)
coefs       = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

# Plot top 10
import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Sales Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a concise pipeline, we achieve:

1. Accurate nonlinear forecasts of ticket‑sales growth (low RMSE, strong R²).

2. Controlled model complexity, avoiding overfitting to year‑to‑year noise.

3. Actionable insights: the top polynomial features identify key dynamics—such as momentum thresholds and price × demand interactions—guiding data‑driven

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

Concert Ticket Sales Growth Prediction with Polynomial Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Load Data & Initial Inspection

Feature Engineering & Target Definition

Exploratory Visualization

Build a Polynomial Regression Pipeline

Train/Test Split & Hyperparameter Search

Evaluate Model

Inspect Key Polynomial Coefficients