Crop Growth Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Agricultural planners and farm managers need to predict the market cost per ton of harvested crop—using early‑season indicators such as expected yield (from soil and weather models), fertilizer application rates, irrigation volume, and seasonality—before harvest. Crop prices display nonlinear trends driven by supply projections, input‑cost pass‑through, and seasonal demand. Additionally, price forecasts are subject to uncertainty from weather volatility and market dynamics. By applying Bayesian Regression, we obtain both a point estimate and credible intervals for per‑ton crop cost, enabling risk‑aware budgeting and forward‑contract decisions.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Crop Price Prediction Dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load CSV: columns include 'Week','Crop','Price','Rainfall_mm','AvgTemp_C','AreaPlanted_ha'
df = pd.read_csv("data/crop_price_prediction_dataset.csv", parse_dates=["Week"])
df.head()

Preprocessing & Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Focus on a single crop (e.g., Wheat)
crop = df[df['Crop']=='Wheat'].dropna()

# Define predictors and target
features = ['Rainfall_mm','AvgTemp_C','AreaPlanted_ha']
X = crop[features].values
y = crop['Price'].values  # USD per ton

# Train/test split (80/20 chronological)
train_cut = int(len(crop) * 0.8)
X_train, X_test = X[:train_cut], X[train_cut:]
y_train, y_test = y[:train_cut], y[train_cut:]

# Standardize for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors (α, β, σ): We choose weakly informative normals for intercept and slopes to reflect moderate prior uncertainty, and a half‑normal for observation noise.
Linear predictor: μ = α + β·X_standardized models price as a linear function of standardised environmental and acreage features.
Likelihood: Observed prices are treated as normally distributed around μ with standard deviation σ.
MCMC sampling: We draw 2,000 posterior samples (after 1,000 tuning) using target_accept=0.9 to ensure stable convergence.

import pymc3 as pm

with pm.Model() as model:
    # Priors for intercept and coefficients
    α = pm.Normal("α", mu=0, sigma=10)
    β = pm.Normal("β", mu=0, sigma=5, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=5)
    
    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)
    
    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
    
    # Sample posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Sampling from the posterior predictive distribution yields both point forecasts and credible intervals.
Point predictions: Posterior means of α and β provide point estimates; we evaluate mean absolute error on held‑out data.

import arviz as az

# Summarize the posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate with MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per ton")

Visualise Predictions & Credible Intervals

Plotting price vs rainfall with the posterior mean and 94% credible bands illustrates both the central tendency and the uncertainty in forecasts.

# Vary rainfall while holding others at median
rain_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.column_stack([
    rain_grid,
    np.full_like(rain_grid, X_train_s[:,1].mean()),
    np.full_like(rain_grid, X_train_s[:,2].mean())
])

with model:
    pm.set_data({"X": grid})
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)

# Transform rainfall back
rain_orig = scaler.inverse_transform(
    np.column_stack([rain_grid, grid[:,1:],])
)[:,0]

plt.figure(figsize=(8,5))
plt.plot(rain_orig, mean_pred, color="blue", label="Posterior mean")
plt.fill_between(rain_orig, hpd[:,0], hpd[:,1], color="blue", alpha=0.3,
                 label="94% CI")
plt.scatter(crop['Rainfall_mm'].iloc[train_cut:], y_test,
            color="k", alpha=0.5, label="Test data")
plt.xlabel("Weekly Rainfall (mm)")
plt.ylabel("Price (USD/ton)")
plt.title("Bayesian Prediction: Crop Price vs. Rainfall")
plt.legend()
plt.show()

Summary

This Bayesian Regression framework for crop price (cost) prediction delivers:

1. Point estimates of crop price per ton based on early‑season indicators.

2. Credible intervals that quantify forecast uncertainty driven by weather and market variability.

3. Actionable insights: planners can use both the expected price and its uncertainty bounds to set forward contracts, allocate storage, and manage financial risk with greater confidence.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook