Crop Nutrient Response Prediction with Polynomial Regression in ML
FREE Online Courses: Knowledge Awaits – Click for Free Access!
Agronomists and precision‑agriculture platforms need to predict crop yield (tons per hectare) as a smooth function of applied nutrient rates (nitrogen, phosphorus, potassium) and key environmental factors—soil moisture, rainfall, and temperature—before fertiliser recommendations are finalised. Experimental trials show that yield gains diminish at high nutrient rates and interact with humidity and temperature. For example, high nitrogen boosts yield only when moisture is sufficient, and excessive phosphorus can inhibit uptake. A simple linear model underestimates such curvature and synergy, while a high‑degree polynomial without regularisation overfits trial noise. By fitting a Polynomial Regression—with engineered interaction and power terms—and controlling complexity via Ridge (ℓ²) regularisation, we can learn a smooth, interpretable response surface for precise nutrient management.
Dataset
Crop Yield Prediction Using Soil and Weather
Step-by-Step Code Implementation
1. Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization enhancements from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Libraries
import pandas as pd
# Adjust path if necessary
df = pd.read_csv("data/crop_yield_soil_weather.csv")
# Preview relevant columns
df.head()[[
'soil_moisture','soil_nitrogen','soil_phosphorus',
'soil_potassium','rainfall_mm','avg_temp_c','yield_t_ha'
]]
3. Exploratory Data Analysis
import seaborn as sns
import matplotlib.pyplot as plt
# Visualize diminishing returns: nitrogen vs yield
sns.scatterplot(x='soil_nitrogen', y='yield_t_ha', data=df, alpha=0.5)
plt.title("Soil Nitrogen vs Yield")
plt.xlabel("Soil Nitrogen (mg/kg)")
plt.ylabel("Yield (t/ha)")
plt.show()
4. Define Features & Target
PolynomialFeatures: generates all squared terms (e.g. soil_nitrogen²) and pairwise interactions (e.g. soil_nitrogen×rainfall_mm), enabling the model to learn diminishing returns and synergies (e.g., nitrogen uptake boosted by moisture).
# Predictor matrix: nutrient rates + environmental factors
X = df[[
'soil_nitrogen','soil_phosphorus','soil_potassium',
'soil_moisture','rainfall_mm','avg_temp_c'
]]
y = df['yield_t_ha']
5. Build a Polynomial Regression Pipeline
StandardScaler: z‑scores each input so Ridge’s ℓ² penalty treats all terms uniformly, avoiding dominance by high‑variance factors.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()), # z‑scale inputs
('poly', PolynomialFeatures(
degree=2, # include squares & interactions
include_bias=False
)),
('ridge', Ridge(random_state=42)) # ℓ² regularisation
])
6. Train/Test Split & Hyperparameter Search
- Ridge regression: controls overfitting from the expanded feature space by shrinking coefficients via α.
- GridSearchCV: tunes the polynomial degree (1–3) and regularisation strength α (10⁻³…10³) across 5‑fold CV, optimising for lowest RMSE on held‑out folds.
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Tune polynomial degree and regularisation α
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)
7. Evaluate Model
from sklearn.metrics import mean_squared_error, r2_score
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f} t/ha")
print(f"Test R² : {r2:.3f}")
8. Inspect Key Polynomial Coefficients
Coefficient inspection: ranking the most significant absolute coefficients pinpoints which nutrients and interactions most strongly affect yield—guiding agronomic recommendations (e.g., optimal nitrogen × moisture regimes).
# Retrieve feature names after expansion
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
# Plot top 10 drivers
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Yield")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()
Summary
By blending polynomial feature engineering with Ridge regularisation in a streamlined pipeline, this workflow:
- Accurately models nonlinear nutrient–yield responses, capturing diminishing returns and environment interactions (low RMSE, high R²).
- Balances flexibility and generalisation via α-tuning, preventing over-fitting to trial variability.
- Yields interpretable insights: top polynomial features (e.g. soil_nitrogen×soil_moisture, rainfall_mm²) reveal actionable nutrient‑moisture regimes, enabling data‑driven fertiliser strategies for maximised yield.