Soil Nutrient Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Agronomists and farm managers need to estimate the cost of soil nutrient amendments—before ordering fertilisers—using early‐season soil tests as predictors. Key inputs include soil nitrogen (N), phosphorus (P), potassium (K), pH, and organic matter content (OM). The amendment cost per hectare is a nonlinear function of these soil metrics (e.g., deeper deficiencies require bulk applications, pH corrections have threshold dosing) and fluctuates with fertiliser market prices. A standard point‐estimate model fails to capture forecast uncertainty, risking under‐ or over‐budgeting. By applying Bayesian Regression, we derive both:

1. A point estimate of the amendment cost per hectare.

2. A credible interval that quantifies uncertainty—enabling risk‐aware budget planning and procurement.

Libraries Required

import pandas as pd                              # data I/O & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Crop Recommender Dataset with Soil Nutrients

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

We translate nutrient deficiency (difference from agronomic targets) into required fertiliser rates and multiply them by unit prices to define our target amend_cost.

import pandas as pd

# Load soil‐nutrient data
df = pd.read_csv("data/soil_nutrients.csv")

# Assume we have per‐kg prices for N, P, K fertilizers
# (in reality, merge in real price series)
price_N = 1.20  # USD per kg of N
price_P = 0.80  # USD per kg of P2O5
price_K = 0.60  # USD per kg of K2O

# Compute required amendment (kg/ha) to raise to target levels:
# e.g., target N=25 mg/kg, P=15, K=20
df['req_N'] = np.clip(25 - df['N'], 0, None) * 10   # mg→kg/ha
df['req_P'] = np.clip(15 - df['P'], 0, None) * 10
df['req_K'] = np.clip(20 - df['K'], 0, None) * 10

# Synthetic per‐ha cost
df['amend_cost'] = df['req_N'] * price_N \
                 + df['req_P'] * price_P \
                 + df['req_K'] * price_K

# Features & target
features = ['N','P','K','pH','OM','Rainfall','req_N','req_P','req_K']
X = df[features].values
y = df['amend_cost'].values

 Preprocessing & Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split chronologically or randomly
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize predictors
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(0, 100) for intercept.
β ∼ Normal(0, 50) for each standardized feature.
σ ∼ HalfNormal(50) for residual noise.

Model: We assume amend_cost ∼ Normal(α + β·X, σ).

Inference: We draw 2,000 posterior samples (after 1,000 tuning) with target_accept=0.9 to ensure good convergence diagnostics.

import pymc3 as pm

with pm.Model() as soil_cost_model:
    # Weakly informative priors
    α = pm.Normal("α", mu=0, sigma=100)
    β = pm.Normal("β", mu=0, sigma=50, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=50)

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Sampling yields full predictive distributions—enabling credible intervals around every forecast.

Evaluation: Mean Absolute Error quantifies the average per‑ha cost error on held‑out soil samples.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Posterior summary
az.summary(trace, round_to=2)

# Posterior predictive sampling
with soil_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point forecasts
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per hectare")

Visualise Predictions & Credible Intervals

By varying one soil metric (e.g. N) and holding others fixed, we plot the posterior mean cost curve and its 94% Highest Posterior Density interval—showing both expected cost scaling and uncertainty.

import numpy as np
import matplotlib.pyplot as plt

# Sweep soil N while holding others at median
N_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,0] = N_grid

with soil_cost_model:
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"], samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back-transform N
N_orig = scaler.inverse_transform(grid)[:,0]

plt.figure(figsize=(8,5))
plt.plot(N_orig, mean_pred, label="Posterior mean")
plt.fill_between(N_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Soil Nitrogen (mg/kg)")
plt.ylabel("Amendment Cost (USD/ha)")
plt.title("Bayesian Regression: Cost vs. Soil N Deficiency")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Soil Nutrient Cost Prediction provides:

1. Point forecasts of fertiliser amendment cost per hectare from soil‐test metrics.

2. Credible intervals quantifying uncertainty from soil variability and fertiliser-price fluctuations.

3. Actionable insights: agronomists can budget fertiliser purchases, set input budgets, and plan soil‐management strategies with full awareness of cost risk.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Soil Nutrient Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Preprocessing & Train/Test Split

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals