Crop Growth Cost Prediction using Bayesian Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Agricultural planners and farm managers need to predict the market cost per ton of harvested crop—using early‑season indicators such as expected yield (from soil and weather models), fertilizer application rates, irrigation volume, and seasonality—before harvest. Crop prices display nonlinear trends driven by supply projections, input‑cost pass‑through, and seasonal demand. Additionally, price forecasts are subject to uncertainty from weather volatility and market dynamics. By applying Bayesian Regression, we obtain both a point estimate and credible intervals for per‑ton crop cost, enabling risk‑aware budgeting and forward‑contract decisions.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load CSV: columns include 'Week','Crop','Price','Rainfall_mm','AvgTemp_C','AreaPlanted_ha'
df = pd.read_csv("data/crop_price_prediction_dataset.csv", parse_dates=["Week"])
df.head()
Preprocessing & Train/Test Split
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Focus on a single crop (e.g., Wheat) crop = df[df['Crop']=='Wheat'].dropna() # Define predictors and target features = ['Rainfall_mm','AvgTemp_C','AreaPlanted_ha'] X = crop[features].values y = crop['Price'].values # USD per ton # Train/test split (80/20 chronological) train_cut = int(len(crop) * 0.8) X_train, X_test = X[:train_cut], X[train_cut:] y_train, y_test = y[:train_cut], y[train_cut:] # Standardize for stable MCMC scaler = StandardScaler().fit(X_train) X_train_s = scaler.transform(X_train) X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
- Priors (α, β, σ): We choose weakly informative normals for intercept and slopes to reflect moderate prior uncertainty, and a half‑normal for observation noise.
- Linear predictor: μ = α + β·X_standardized models price as a linear function of standardised environmental and acreage features.
- Likelihood: Observed prices are treated as normally distributed around μ with standard deviation σ.
- MCMC sampling: We draw 2,000 posterior samples (after 1,000 tuning) using target_accept=0.9 to ensure stable convergence.
import pymc3 as pm
with pm.Model() as model:
# Priors for intercept and coefficients
α = pm.Normal("α", mu=0, sigma=10)
β = pm.Normal("β", mu=0, sigma=5, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=5)
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000, tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior predictive: Sampling from the posterior predictive distribution yields both point forecasts and credible intervals.
- Point predictions: Posterior means of α and β provide point estimates; we evaluate mean absolute error on held‑out data.
import arviz as az
# Summarize the posterior
az.summary(trace, round_to=2)
# Posterior predictive sampling
with model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate with MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per ton")
Visualise Predictions & Credible Intervals
Plotting price vs rainfall with the posterior mean and 94% credible bands illustrates both the central tendency and the uncertainty in forecasts.
# Vary rainfall while holding others at median
rain_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.column_stack([
rain_grid,
np.full_like(rain_grid, X_train_s[:,1].mean()),
np.full_like(rain_grid, X_train_s[:,2].mean())
])
with model:
pm.set_data({"X": grid})
ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Transform rainfall back
rain_orig = scaler.inverse_transform(
np.column_stack([rain_grid, grid[:,1:],])
)[:,0]
plt.figure(figsize=(8,5))
plt.plot(rain_orig, mean_pred, color="blue", label="Posterior mean")
plt.fill_between(rain_orig, hpd[:,0], hpd[:,1], color="blue", alpha=0.3,
label="94% CI")
plt.scatter(crop['Rainfall_mm'].iloc[train_cut:], y_test,
color="k", alpha=0.5, label="Test data")
plt.xlabel("Weekly Rainfall (mm)")
plt.ylabel("Price (USD/ton)")
plt.title("Bayesian Prediction: Crop Price vs. Rainfall")
plt.legend()
plt.show()
Summary
This Bayesian Regression framework for crop price (cost) prediction delivers:
1. Point estimates of crop price per ton based on early‑season indicators.
2. Credible intervals that quantify forecast uncertainty driven by weather and market variability.
3. Actionable insights: planners can use both the expected price and its uncertainty bounds to set forward contracts, allocate storage, and manage financial risk with greater confidence.