Resort Occupancy Growth Prediction with Polynomial Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Resort operators and revenue‑management teams need to forecast week‑over‑week occupancy growth (%) for room inventory, using only early‑week metrics—prior occupancy, average lead time, booking pace (daily arrivals), promotional status, and seasonal indicators—before mid‑week rate adjustments. Empirical patterns show nonlinear dynamics: small increases in lead time can sharply boost occupancy during off‑peak periods but yield diminishing returns near full capacity; promotions interact with seasonality in complex ways. A plain linear model underfits these curved responses, while a naïve high‑degree polynomial overfits noise. By applying Polynomial Regression on engineered features with Ridge regularisation, we capture smooth, interpretable growth curves and deliver accurate occupancy‑growth forecasts for optimized rate decisions.

Dataset

Hotel Booking Demand

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd  
import numpy as np  

import matplotlib.pyplot as plt  
import seaborn as sns  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.compose import ColumnTransformer  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Import Libraries & Load Data

import pandas as pd  
# Load and filter for resort hotels only  
df = pd.read_csv("data/hotel_bookings.csv")  
df = df[df['hotel'] == 'Resort Hotel']  

3. Feature Engineering & Exploratory Analysis

Data aggregation: group daily bookings to weekly totals for resort hotels; compute prior‑week bookings and growth_pct as our target.

import seaborn as sns  
import matplotlib.pyplot as plt  

# Compute week number and arrival-date lag  
df['arrival_date'] = pd.to_datetime(df['arrival_date_year'].astype(str) + '-'  
                                    + df['arrival_date_month'] + '-'  
                                    + df['arrival_date_day_of_month'].astype(str))  
df['week'] = df['arrival_date'].dt.isocalendar().week  
# Group to weekly occupancy and booking metrics  
weekly = (df.groupby('week')  
            .agg({  
                'is_canceled': 'count',         # total bookings  
                'lead_time': 'mean',            # avg lead time  
                'arrival_date': 'count',        # proxy arrivals  
                'arrival_date_month': 'first',  # season  
                'is_repeated_guest': 'mean'     # repeat %  
            })  
            .rename(columns={'is_canceled':'bookings'})  
            .reset_index())  
# Compute prior-week occupancy growth  
weekly['bookings_prev'] = weekly['bookings'].shift(1)  
weekly['growth_pct'] = ((weekly['bookings'] - weekly['bookings_prev'])  
                        / weekly['bookings_prev']) * 100  
weekly.dropna(subset=['growth_pct'], inplace=True)  

# Visualize nonlinear trend: lead time vs growth  
sns.scatterplot(x='lead_time', y='growth_pct', data=weekly, alpha=0.6)  
plt.title("Lead Time vs Occupancy Growth")  
plt.xlabel("Average Lead Time (days)")  
plt.ylabel("Growth (%)")  
plt.show()  

4. Define Features & Target

Feature matrix: includes bookings_prev, lead_time, is_repeated_guest, plus one‑hot seasonal dummies for months.

# One‑hot encode month  
weekly = pd.get_dummies(weekly, columns=['arrival_date_month'], drop_first=True)  

feature_cols = (  
    ['bookings_prev','lead_time','is_repeated_guest'] +  
    [c for c in weekly.columns if c.startswith('arrival_date_month_')]  
)  
X = weekly[feature_cols]  
y = weekly['growth_pct']  

5. Build Polynomial Regression Pipeline

  • StandardScaler z‑scores inputs so ℓ² penalty treats all terms equally.
  • PolynomialFeatures generates squares and interactions (e.g. lead_time², bookings_prev×is_repeated_guest) to model saturation and synergy effects.
  • Ridge regression (ℓ²) shrinks noisy high‑order coefficients, preventing overfitting in the expanded feature space.
from sklearn.pipeline import Pipeline  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  

pipe = Pipeline([  
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])  

6. Train/Test Split & Hyperparameter Search

from sklearn.model_selection import train_test_split, GridSearchCV  
import numpy as np  

X_train, X_test, y_train, y_test = train_test_split(  
    X, y, test_size=0.2, random_state=42  
)  

param_grid = {  
    'poly__degree': [1, 2, 3],  
    'ridge__alpha': np.logspace(-3, 3, 7)  
}  

gs = GridSearchCV(  
    pipe, param_grid,  
    cv=5,  
    scoring='neg_root_mean_squared_error',  
    n_jobs=-1, verbose=1  
)  
gs.fit(X_train, y_train)  
print("Best params:", gs.best_params_)  

7. Evaluate Model

GridSearchCV: tunes polynomial degree (1–3) and regularisation strength α (10⁻³–10³) via 5‑fold CV, optimising for lowest RMSE on occupancy‑growth predictions.

y_pred = gs.predict(X_test)  
rmse = mean_squared_error(y_test, y_pred, squared=False)  
r2   = r2_score(y_test, y_pred)  

print(f"Test RMSE: {rmse:.2f}% growth")  
print(f"Test R²  : {r2:.3f}")  

8. Inspect Key Polynomial Coefficients

Coefficient inspection: ranks the most influential polynomial terms—guiding revenue managers on lever combinations (e.g. high lead time squared or past bookings × repeat‑rate) that drive the largest growth impacts.

poly = gs.best_estimator_.named_steps['poly']  
feat_names = poly.get_feature_names_out(input_features=feature_cols)  
coefs = gs.best_estimator_.named_steps['ridge'].coef_  

import pandas as pd  
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)  

plt.figure(figsize=(8,5))  
imp.plot(kind='barh')  
plt.gca().invert_yaxis()  
plt.title("Top Polynomial Features Driving Occupancy Growth")  
plt.xlabel("Coefficient Magnitude")  
plt.tight_layout()  
plt.show()  

Summary

This Polynomial Regression approach with Ridge regularisation:

  • Accurately forecasts nonlinear occupancy growth, capturing promotion and seasonal curvatures.
  • Controls complexity through α tuning, avoiding spurious high‑order effects.
  • Yields interpretable insights: top polynomial features identify actionable levers—such as lead‑time thresholds and repeat‑guest interactions—enabling dynamic pricing and marketing optimizations.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *