Retail Price Impact Prediction using Bayesian Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Retail merchandisers and pricing analysts need to quantify the sensitivity of unit sales to changes in retail price—before implementing markdowns or list‑price adjustments—using features such as current price per unit, promotion flag, store traffic index, and competitor price. Demand curves often exhibit nonlinear elasticity (e.g., steep drop‑offs beyond certain price thresholds) and forecast uncertainty from unobserved shopper behaviour. By applying Bayesian Regression, we obtain both a point estimate of the price impact coefficient and a credible interval that captures our uncertainty—enabling data‑driven pricing decisions with explicit risk bounds.
Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load supermarket sales data
df = pd.read_csv("data/Supermarket Sales.csv")
# Preview relevant columns
df = df.rename(columns={
'Unit price': 'unit_price',
'Quantity': 'quantity_sold',
'Total': 'total_revenue',
'Customer type':'cust_type',
'City':'store'
})
df[['unit_price','quantity_sold','total_revenue','cust_type','store','Date']].head()
Feature Engineering & Train/Test Split
We model log(quantity) vs log(price) so that β₀ directly represents elasticity (per cent change in units per percent price change).
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Compute log‐sales and log‐price for elasticity
df['log_qty'] = np.log(df['quantity_sold'] + 1)
df['log_price'] = np.log(df['unit_price'])
# Promotion flag if unit_price < median price for that item
median_price = df['unit_price'].median()
df['promo'] = (df['unit_price'] < median_price).astype(int)
# Traffic index proxy: day‐of‐week average footfall (here: weekday vs. weekend)
df['weekday'] = pd.to_datetime(df['Date']).dt.weekday
df['weekend_flag'] = (df['weekday'] >= 5).astype(int)
# Select predictors & target
features = ['log_price','promo','weekend_flag']
X = df[features].values
y = df['log_qty'].values # log‐quantity = intercept + β·log_price + ...
# Train/test split (80/20 random)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize non‐log features for sampling stability
scaler = StandardScaler().fit(X_train[:,1:])
X_train_s = X_train.copy()
X_train_s[:,1:] = scaler.transform(X_train[:,1:])
X_test_s = X_test.copy()
X_test_s[:,1:] = scaler.transform(X_test[:,1:])
Define & Fit Bayesian Regression Model
Priors (α, β, σ): We choose weakly informative Normal(0,1) priors, expressing initial uncertainty but centering around zero.
Regression model:
- μ = α + β₀·log_price + β₁·promo + β₂·weekend_flag.
- Observations: log_qty ∼ Normal(μ, σ).
MCMC sampling: 2,000 posterior draws (after 1,000 tuning) with target_accept=0.9 ensure robust convergence.
import pymc3 as pm
with pm.Model() as price_elasticity_model:
# Priors
α = pm.Normal("α", mu=0, sigma=1) # intercept for log‐sales
β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
# β[0]: price elasticity; β[1]: promo; β[2]: weekend
σ = pm.HalfNormal("σ", sigma=1) # residual noise
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000,
tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Estimates
- Posterior analysis: We extract the mean and 94% Credible Interval for β₀, our primary metric of price sensitivity.
- Prediction: Posterior predictive sampling yields in‐sample MAE for log‐quantity; the same pipeline can forecast on held‐out data.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior
az.summary(trace, round_to=2)
# Extract posterior mean of price elasticity β[0]
elasticity_mean = trace.posterior['β'].sel(beta_dim_0=0).mean().item()
elasticity_hpd = az.hdi(trace.posterior['β'].sel(beta_dim_0=0), hdi_prob=0.94)
print(f"Estimated Price Elasticity (β₀): {elasticity_mean:.2f}")
print(f"94% Credible Interval: [{elasticity_hpd.sel(hdi='lower'):.2f}, {elasticity_hpd.sel(hdi='higher'):.2f}]")
# Posterior predictive & MAE
with price_elasticity_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
y_pred = ppc['Y_obs'].mean(axis=0)
print("Train MAE (log‐qty):", mean_absolute_error(y_train, y_pred))
Visualise Price Elasticity Posterior
The kernel density plot of β₀’s posterior shows both central tendency and uncertainty, guiding confidence in pricing decisions.
import seaborn as sns
import matplotlib.pyplot as plt
# Plot posterior distribution of β₀ (price elasticity)
elasticity_samples = trace.posterior['β'].sel(beta_dim_0=0).values.flatten()
sns.kdeplot(elasticity_samples, fill=True)
plt.axvline(elasticity_mean, color='k', linestyle='--', label='Mean')
plt.axvline(elasticity_hpd.sel(hdi='lower'), color='red', linestyle=':', label='94% CI')
plt.axvline(elasticity_hpd.sel(hdi='higher'), color='red', linestyle=':')
plt.title("Posterior of Price Elasticity (β₀)")
plt.xlabel("Elasticity")
plt.legend()
plt.tight_layout()
plt.show()
Summary
Using Bayesian Regression to model retail price impact provides:
1. Direct elasticity estimate (β₀) with a credible interval, quantifying how sensitive demand is to price changes.
2. Uncertainty quantification in both elasticity and demand forecasts—crucial for risk‑aware markdown strategies.
3. Actionable insights: Merchandisers can set prices knowing both expected sales lift/drop and the confidence bounds, optimising revenue under uncertainty.