Crop Growth Rate Prediction using Linear Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Modern greenhouses and open‑field farms capture a stream of sensor readings—temperature, humidity, light intensity, soil moisture—as well as periodic measurements of plant height or biomass. Knowing how fast a crop is growing, instead of waiting for the final yield, lets agronomists fine-tune irrigation, fertiliser, and climate control on the fly, boosting productivity and cutting waste.

In this guided project, we build a simple linear regression model that predicts a plant’s daily growth rate (cm day⁻¹) from easily logged environmental features. Although more sophisticated models can capture non‑linear effects, starting with linear regression reveals which factors have the most substantial first‑order influence and provides an interpretable baseline for growers and data teams.

Libraries Required

  • pandas # data wrangling
  • numpy # numerical helpers
  • matplotlib.pyplot # quick visual checks
  • seaborn # cleaner correlation plots (optional)
  • scikit‑learn # model + metrics
  • joblib # persist the trained model

Dataset Link

Greenhouse Plant Growth Metrics

Step by Step Code Implementation

 1. Import core libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns                   # optional
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

 2. Load the data

df = pd.read_csv("greenhouse_plant_growth_metrics.csv")

 3. Quick exploration and cleaning

print(df.head())
print(df.info())
df = df.dropna()                         # simple strategy; refine if needed

4. Feature engineering – compute growth rate

  • Data ordering & shift: Grouping by plant_id then shifting gives us the previous measurement for each individual, essential for an accurate rate.
  • Growth‑rate label: Dividing height gain by days automatically normalises measurements taken at irregular intervals.
  • Why linear regression: Growth rate often responds roughly proportionally to light, temperature, and moisture within normal operating ranges; linear models highlight these main effects and provide coefficients that agronomists can read at a glance.

The dataset records plant_height_cm at regular intervals. We convert these snapshots into a per‑day growth rate:

# ensure data are sorted by plant ID and date
df = df.sort_values(['plant_id', 'date'])

# calculate height difference divided by elapsed days
df['prev_height']  = df.groupby('plant_id')['plant_height_cm'].shift(1)
df['prev_date']    = df.groupby('plant_id')['date'].shift(1)

# days between measurements
df['days_elapsed'] = (pd.to_datetime(df['date']) -
                      pd.to_datetime(df['prev_date'])
                     ).dt.days

df['growth_rate']  = (df['plant_height_cm'] - df['prev_height']) / df['days_elapsed']

# keep rows where we have a valid rate
df = df.dropna(subset=['growth_rate'])

5. Select predictors & label

features = ['soil_moisture', 'air_temperature', 'air_humidity',
            'light_intensity', 'soil_nutrients']          # adjust to your columns
X = df[features]
y = df['growth_rate']

6. Train‑test split

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

7. Model training

linreg = LinearRegression()
linreg.fit(X_train, y_train)

8. Evaluation

  • Evaluation metrics: R² indicates the fraction of variance explained, while MAE expresses the typical error in the same units (cm day⁻¹), making it tangible for field teams.
y_pred = linreg.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.3f} cm/day")

9. Interpreting coefficients

  • Coefficient table: Printing the sorted coefficients offers an immediate ranking of influential variables for agronomy decisions.
coef_df = pd.DataFrame({
    'feature': features,
    'coefficient': linreg.coef_
}).sort_values('coefficient', ascending=False)

print(coef_df)

Higher positive coefficients signal factors that accelerate growth; negative values flag inhibitors.

 10. Save the pipeline

  • Model persistence: Saving with joblib lets you schedule daily predictions without retraining.
joblib.dump(linreg, "crop_growth_rate_linreg.pkl")

Summary

This mini-project demonstrates how a straightforward linear-regression baseline can convert raw greenhouse sensor data into a real-time indicator of plant vigour. By predicting daily growth rates instead of final yields, farmers can correct suboptimal conditions early, ultimately increasing output and resource efficiency. The workflow—clean data → engineer growth‑rate label → fit, evaluate, interpret—scales seamlessly to more advanced models later (polynomial terms, ensembles, or temporal networks) once the linear benchmark and data pipeline are in place. Tailor the feature list and refresh the model as new seasons’ data roll in to keep predictions sharp.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *