Crop Nutrient Response Prediction with Polynomial Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Agronomists and precision‑agriculture platforms need to predict crop yield (tons per hectare) as a smooth function of applied nutrient rates (nitrogen, phosphorus, potassium) and key environmental factors—soil moisture, rainfall, and temperature—before fertiliser recommendations are finalised. Experimental trials show that yield gains diminish at high nutrient rates and interact with humidity and temperature. For example, high nitrogen boosts yield only when moisture is sufficient, and excessive phosphorus can inhibit uptake. A simple linear model underestimates such curvature and synergy, while a high‑degree polynomial without regularisation overfits trial noise. By fitting a Polynomial Regression—with engineered interaction and power terms—and controlling complexity via Ridge (ℓ²) regularisation, we can learn a smooth, interpretable response surface for precise nutrient management.

Dataset

Crop Yield Prediction Using Soil and Weather

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                            # data loading & handling  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # visualization enhancements  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Libraries

import pandas as pd

# Adjust path if necessary
df = pd.read_csv("data/crop_yield_soil_weather.csv")

# Preview relevant columns
df.head()[[
    'soil_moisture','soil_nitrogen','soil_phosphorus',
    'soil_potassium','rainfall_mm','avg_temp_c','yield_t_ha'
]]

3. Exploratory Data Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Visualize diminishing returns: nitrogen vs yield
sns.scatterplot(x='soil_nitrogen', y='yield_t_ha', data=df, alpha=0.5)
plt.title("Soil Nitrogen vs Yield")
plt.xlabel("Soil Nitrogen (mg/kg)")
plt.ylabel("Yield (t/ha)")
plt.show()

4. Define Features & Target

PolynomialFeatures: generates all squared terms (e.g. soil_nitrogen²) and pairwise interactions (e.g. soil_nitrogen×rainfall_mm), enabling the model to learn diminishing returns and synergies (e.g., nitrogen uptake boosted by moisture).

# Predictor matrix: nutrient rates + environmental factors
X = df[[
    'soil_nitrogen','soil_phosphorus','soil_potassium',
    'soil_moisture','rainfall_mm','avg_temp_c'
]]
y = df['yield_t_ha']

5. Build a Polynomial Regression Pipeline

StandardScaler: z‑scores each input so Ridge’s ℓ² penalty treats all terms uniformly, avoiding dominance by high‑variance factors.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),            # z‑scale inputs
    ('poly', PolynomialFeatures(
        degree=2,                           # include squares & interactions
        include_bias=False
    )),
    ('ridge', Ridge(random_state=42))       # ℓ² regularisation
])

6. Train/Test Split & Hyperparameter Search

Ridge regression: controls overfitting from the expanded feature space by shrinking coefficients via α.
GridSearchCV: tunes the polynomial degree (1–3) and regularisation strength α (10⁻³…10³) across 5‑fold CV, optimising for lowest RMSE on held‑out folds.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Tune polynomial degree and regularisation α
param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best params:", gs.best_params_)

7. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} t/ha")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

Coefficient inspection: ranking the most significant absolute coefficients pinpoints which nutrients and interactions most strongly affect yield—guiding agronomic recommendations (e.g., optimal nitrogen × moisture regimes).

# Retrieve feature names after expansion
poly      = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs     = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

# Plot top 10 drivers
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Yield")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By blending polynomial feature engineering with Ridge regularisation in a streamlined pipeline, this workflow:

Accurately models nonlinear nutrient–yield responses, capturing diminishing returns and environment interactions (low RMSE, high R²).
Balances flexibility and generalisation via α-tuning, preventing over-fitting to trial variability.
Yields interpretable insights: top polynomial features (e.g. soil_nitrogen×soil_moisture, rainfall_mm²) reveal actionable nutrient‑moisture regimes, enabling data‑driven fertiliser strategies for maximised yield.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Crop Nutrient Response Prediction with Polynomial Regression in ML

Dataset

Step-by-Step Code Implementation

1. Libraries Required

2. Load Data & Libraries

3. Exploratory Data Analysis

4. Define Features & Target

5. Build a Polynomial Regression Pipeline

6. Train/Test Split & Hyperparameter Search

7. Evaluate Model

8. Inspect Key Polynomial Coefficients