House Size Price Prediction using Linear Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

In real estate, the size of a house has a significant impact on its market value. Buyers, sellers, and agents need to understand this effect. In this machine learning project, we will construct a linear regression model to predict a home’s selling price based solely on its living area (square footage). By fitting a line to sales data based on previous records, our model will quantify the contribution of each additional square foot to the price. This will enable users to estimate property values quickly and make informed decisions.

Libraries Required

  • Pandas: for data ingestion and manipulation
  • NumPy: for numerical computations
  • Matplotlib: for plotting data and results
  • Scikit-learn: for model training, prediction, and evaluation

Dataset Link

Housing Price Prediction

Step by Step Code Implementation

1. Importing Libraries

We import pandas and numpy to manage and process data arrays, matplotlib to visualize relationships, and scikit-learn modules for model creation and metrics.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

2. Loading the Dataset

The CSV file (“house_data.csv”) contains two columns:

  • Size: living area in square feet
  • Price: selling price in US dollars. We load the data into a DataFrame for inspection and analysis.
# Assume 'house_data.csv' contains columns 'Size' (in sq ft) and 'Price' (in USD)
data = pd.read_csv('house_data.csv')

3. Exploratory Data Analysis

 A scatter plot helps us verify that price generally increases with size, indicating a linear trend suitable for regression.

# Quick glimpse of data
print(data.head())

# Scatter plot: Size vs Price
plt.scatter(data['Size'], data['Price'])
plt.title('House Size vs. Price')
plt.xlabel('Size (sq ft)')
plt.ylabel('Price (USD)')
plt.show()

4. Defining Features and Target

We designate the Size column as our independent variable (feature) and Price as the dependent variable (target). Wrapping Size in double brackets retains it as a DataFrame.

X = data[['Size']]     # Feature: house size
y = data['Price']      # Target: sale price

5. Splitting into Training and Test Sets

To gauge how well our model generalises, we reserve 25% of the dataset for testing. Setting random_state=0 ensures our results are reproducible.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0
)

6. Training the Linear Regression Model

Instantiating LinearRegression, we call the fit() method on the training data, enabling the model to learn the coefficients (slope) and intercept, thereby defining the best-fitting line.

model = LinearRegression()
model.fit(X_train, y_train)

7. Predicting on the Test Set

Using predict(), we generate price estimates for homes in the test set based on their sizes.

y_pred = model.predict(X_test)

8. Evaluating Model Performance

  • MAE (Mean Absolute Error): average absolute difference between predicted and actual prices; lower values are better and easier to interpret in dollars.
  • MSE (Mean Squared Error): penalizes larger errors more heavily by squaring differences.
  • R² Score: proportion of variance in prices explained by size; values nearer 1.0 indicate a strong linear relationship.
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error:  ${mae:,.2f}")
print(f"Mean Squared Error:   ${mse:,.2f}")
print(f"R² Score:             {r2:.3f}")

9. Visualizing Regression Line

We plot the regression line atop training data points to visually assess the model’s fit and identify any systematic deviations.

plt.scatter(X_train, y_train, color='lightgray', label='Training data')
plt.plot(X_train, model.predict(X_train), color='blue', linewidth=2, label='Fit line')
plt.title('Linear Regression Fit on Training Data')
plt.xlabel('Size (sq ft)')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

Summary

This project demonstrates how we can predict house prices solely from size using a linear regression model. In practice, this approach provides a transparent baseline; professionals can customise it with bigger datasets or explore polynomial and regularised regression methods to capture non‑linear dynamics.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *