Crop Growth Rate Prediction using Linear Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Modern greenhouses and open‑field farms capture a stream of sensor readings—temperature, humidity, light intensity, soil moisture—as well as periodic measurements of plant height or biomass. Knowing how fast a crop is growing, instead of waiting for the final yield, lets agronomists fine-tune irrigation, fertiliser, and climate control on the fly, boosting productivity and cutting waste.

In this guided project, we build a simple linear regression model that predicts a plant’s daily growth rate (cm day⁻¹) from easily logged environmental features. Although more sophisticated models can capture non‑linear effects, starting with linear regression reveals which factors have the most substantial first‑order influence and provides an interpretable baseline for growers and data teams.

Libraries Required

pandas # data wrangling
numpy # numerical helpers
matplotlib.pyplot # quick visual checks
seaborn # cleaner correlation plots (optional)
scikit‑learn # model + metrics
joblib # persist the trained model

Dataset Link

Greenhouse Plant Growth Metrics

Step by Step Code Implementation

 1. Import core libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns                   # optional
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

 2. Load the data

df = pd.read_csv("greenhouse_plant_growth_metrics.csv")

 3. Quick exploration and cleaning

print(df.head())
print(df.info())
df = df.dropna()                         # simple strategy; refine if needed

4. Feature engineering – compute growth rate

Data ordering & shift: Grouping by plant_id then shifting gives us the previous measurement for each individual, essential for an accurate rate.
Growth‑rate label: Dividing height gain by days automatically normalises measurements taken at irregular intervals.
Why linear regression: Growth rate often responds roughly proportionally to light, temperature, and moisture within normal operating ranges; linear models highlight these main effects and provide coefficients that agronomists can read at a glance.

The dataset records plant_height_cm at regular intervals. We convert these snapshots into a per‑day growth rate:

# ensure data are sorted by plant ID and date
df = df.sort_values(['plant_id', 'date'])

# calculate height difference divided by elapsed days
df['prev_height']  = df.groupby('plant_id')['plant_height_cm'].shift(1)
df['prev_date']    = df.groupby('plant_id')['date'].shift(1)

# days between measurements
df['days_elapsed'] = (pd.to_datetime(df['date']) -
                      pd.to_datetime(df['prev_date'])
                     ).dt.days

df['growth_rate']  = (df['plant_height_cm'] - df['prev_height']) / df['days_elapsed']

# keep rows where we have a valid rate
df = df.dropna(subset=['growth_rate'])

5. Select predictors & label

features = ['soil_moisture', 'air_temperature', 'air_humidity',
            'light_intensity', 'soil_nutrients']          # adjust to your columns
X = df[features]
y = df['growth_rate']

6. Train‑test split

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

7. Model training

linreg = LinearRegression()
linreg.fit(X_train, y_train)

8. Evaluation

Evaluation metrics: R² indicates the fraction of variance explained, while MAE expresses the typical error in the same units (cm day⁻¹), making it tangible for field teams.

y_pred = linreg.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.3f} cm/day")

9. Interpreting coefficients

Coefficient table: Printing the sorted coefficients offers an immediate ranking of influential variables for agronomy decisions.

coef_df = pd.DataFrame({
    'feature': features,
    'coefficient': linreg.coef_
}).sort_values('coefficient', ascending=False)

print(coef_df)

Higher positive coefficients signal factors that accelerate growth; negative values flag inhibitors.

 10. Save the pipeline

Model persistence: Saving with joblib lets you schedule daily predictions without retraining.

joblib.dump(linreg, "crop_growth_rate_linreg.pkl")

Summary

This mini-project demonstrates how a straightforward linear-regression baseline can convert raw greenhouse sensor data into a real-time indicator of plant vigour. By predicting daily growth rates instead of final yields, farmers can correct suboptimal conditions early, ultimately increasing output and resource efficiency. The workflow—clean data → engineer growth‑rate label → fit, evaluate, interpret—scales seamlessly to more advanced models later (polynomial terms, ensembles, or temporal networks) once the linear benchmark and data pipeline are in place. Tailor the feature list and refresh the model as new seasons’ data roll in to keep predictions sharp.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

Crop Growth Rate Prediction using Linear Regression in ML

Libraries Required

Dataset Link

Step by Step Code Implementation

1. Import core libraries

2. Load the data

3. Quick exploration and cleaning

4. Feature engineering – compute growth rate

5. Select predictors & label

6. Train‑test split

7. Model training

8. Evaluation

9. Interpreting coefficients

10. Save the pipeline

Summary

Leave a Reply Cancel reply

 1. Import core libraries

 2. Load the data

 3. Quick exploration and cleaning

7. Model training

 10. Save the pipeline