Crop Growth Rate Prediction using Linear Regression in ML
FREE Online Courses: Click for Success, Learn for Free - Start Now!
Modern greenhouses and open‑field farms capture a stream of sensor readings—temperature, humidity, light intensity, soil moisture—as well as periodic measurements of plant height or biomass. Knowing how fast a crop is growing, instead of waiting for the final yield, lets agronomists fine-tune irrigation, fertiliser, and climate control on the fly, boosting productivity and cutting waste.
In this guided project, we build a simple linear regression model that predicts a plant’s daily growth rate (cm day⁻¹) from easily logged environmental features. Although more sophisticated models can capture non‑linear effects, starting with linear regression reveals which factors have the most substantial first‑order influence and provides an interpretable baseline for growers and data teams.
Libraries Required
- pandas # data wrangling
- numpy # numerical helpers
- matplotlib.pyplot # quick visual checks
- seaborn # cleaner correlation plots (optional)
- scikit‑learn # model + metrics
- joblib # persist the trained model
Dataset Link
Greenhouse Plant Growth Metrics
Step by Step Code Implementation
1. Import core libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # optional from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_absolute_error import joblib
2. Load the data
df = pd.read_csv("greenhouse_plant_growth_metrics.csv")
3. Quick exploration and cleaning
print(df.head()) print(df.info()) df = df.dropna() # simple strategy; refine if needed
4. Feature engineering – compute growth rate
- Data ordering & shift: Grouping by plant_id then shifting gives us the previous measurement for each individual, essential for an accurate rate.
- Growth‑rate label: Dividing height gain by days automatically normalises measurements taken at irregular intervals.
- Why linear regression: Growth rate often responds roughly proportionally to light, temperature, and moisture within normal operating ranges; linear models highlight these main effects and provide coefficients that agronomists can read at a glance.
The dataset records plant_height_cm at regular intervals. We convert these snapshots into a per‑day growth rate:
# ensure data are sorted by plant ID and date
df = df.sort_values(['plant_id', 'date'])
# calculate height difference divided by elapsed days
df['prev_height'] = df.groupby('plant_id')['plant_height_cm'].shift(1)
df['prev_date'] = df.groupby('plant_id')['date'].shift(1)
# days between measurements
df['days_elapsed'] = (pd.to_datetime(df['date']) -
pd.to_datetime(df['prev_date'])
).dt.days
df['growth_rate'] = (df['plant_height_cm'] - df['prev_height']) / df['days_elapsed']
# keep rows where we have a valid rate
df = df.dropna(subset=['growth_rate'])
5. Select predictors & label
features = ['soil_moisture', 'air_temperature', 'air_humidity',
'light_intensity', 'soil_nutrients'] # adjust to your columns
X = df[features]
y = df['growth_rate']
6. Train‑test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
7. Model training
linreg = LinearRegression() linreg.fit(X_train, y_train)
8. Evaluation
- Evaluation metrics: R² indicates the fraction of variance explained, while MAE expresses the typical error in the same units (cm day⁻¹), making it tangible for field teams.
y_pred = linreg.predict(X_test)
print(f"R² : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.3f} cm/day")
9. Interpreting coefficients
- Coefficient table: Printing the sorted coefficients offers an immediate ranking of influential variables for agronomy decisions.
coef_df = pd.DataFrame({
'feature': features,
'coefficient': linreg.coef_
}).sort_values('coefficient', ascending=False)
print(coef_df)
Higher positive coefficients signal factors that accelerate growth; negative values flag inhibitors.
10. Save the pipeline
- Model persistence: Saving with joblib lets you schedule daily predictions without retraining.
joblib.dump(linreg, "crop_growth_rate_linreg.pkl")
Summary
This mini-project demonstrates how a straightforward linear-regression baseline can convert raw greenhouse sensor data into a real-time indicator of plant vigour. By predicting daily growth rates instead of final yields, farmers can correct suboptimal conditions early, ultimately increasing output and resource efficiency. The workflow—clean data → engineer growth‑rate label → fit, evaluate, interpret—scales seamlessly to more advanced models later (polynomial terms, ensembles, or temporal networks) once the linear benchmark and data pipeline are in place. Tailor the feature list and refresh the model as new seasons’ data roll in to keep predictions sharp.