Detecting Parkinson’s Disease with Machine Learning

What is Parkinson’s Disease?

Parkinson’s disease is a central nervous system disorder. Its symptoms occur because of low dopamine levels in the brain. Four Primary symptoms are tremor, rigidity, slow movement and balance problems. Till now no cure for Parkinson’s Disease is known, treatment aims to reduce the effects of the symptoms.

About Detecting Parkinson’s Disease Project:

In this Python machine learning project, we will build a model to detect Parkinson’s disease using one of the Classifier techniques known as RandomForestClassifier as our output contains only 1’s and 0’s. We’ll load the dataset, get the features and targets, split them into training and testing sets, and finally pass them to RandomForestClassifier for prediction.

Project Prerequisites:

Install the following libraries using pip :

pip install numpy , pandas , matplotlib , sklearn 

The versions which are used in this parkinson’s Disease Detection project for python and its corresponding modules are as follows:

1) python: 3.8.5
2) numpy: 1.19.5
3) pandas: 1.1.5
4) matplotlib: 3.2.2
5) sklearn: 0.24.2

Download ML Project Code for Detecting Parkinson’s Disease

Please download the source code of detecting parkinson’s disease with machine learning: Detecting Parkinson’s Disease Project Code

Project file Structure:

parkinsons.csv: dataset file for our project. python file where we will train our model and predict the output.

Steps for detecting Parkinson’s disease:

1) Import Libraries

Firstly we will import all the required libraries which have been shared in the prerequisites section.


#load all modules for Parkinson's Disease Detection Project
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier

2) Preprocessing

Read the ‘parkinsons.csv’ file in dataframe using the ‘pandas’ library


#read csv file of parkinson's dataset

dataset file :

parkinson's disease dataset file

Fetch the features and targets from the dataframe. Features will be all columns except ‘name’ and ‘status’. Therefore we will drop these two columns. And our target will be ‘status’ column which contains 0’s(no parkinson’s disease) and 1’s(has parkinson’s disease)


#extract features but we don't want 'name' and 'status' columns
#so we will drop these two columns and store remainings
#axis 1 is for columns
features = df.drop(['name','status'],axis=1)

#our target data will be 'status'
target = df.loc[:,'status']

3) Normalization

We will scale our feature data in the range of -1(minimum value) and 1(maximum value). Scaling is important because variables at different scales do not contribute equal fitting to the model which may end up creating bias. For that we will be using ‘MinMaxScaler()’ to fit and then transform the feature data.


#scale all the features data in the range between -1,1
scaler= MinMaxScaler((-1,1))

4) Training and Testing

Split dataset into 80:20 ratio where 80 rows for training and 20 rows for testing purposes. For this we will pass scaled features and target data to ‘train_test_split()’.


#split the dataset into training and testing sets where 20% data for testing purpose. 
x_train,x_test,y_train,y_test = train_test_split(features_c, target ,

5) Building the classifier model

We will use Random forest Classifier for the classification of our data points. So let’s see what random forest is.

What is a Random Forest Algorithm?

Random forest is a supervised learning algorithm. It can be used for both Classification and Regression problems. It uses an ensemble learning method known as ‘bagging’ (Bootstrap Aggregation) which is a process of combining multiple classifiers to solve a complex problem.

Random forest creates various random subsets of the given dataset and passes them to different numbers of decision trees and takes the prediction from each tree. Based on the majority votes of prediction, random forest takes the average to predict the output and also to increase the accuracy and overall result. As there are greater numbers of trees in random forests, it prevents the problem of overfitting.

Random forest searches for the most important feature while splitting a node which helps in building a better model.

Random forest example:

random forest example

As you can see in the above image we are building a random forest for the topic “Favourite movie”. We have used three trees for the classification. Let’s say these three trees are three different persons. So after asking different questions and applying the conditions we can see that the first two people said their favorite movie is “The Shawshank Redemption” and one said “Forrest Gump”.

As Random forest classifier uses majority voting technique, we get “The Shawshank Redemption” as our Final Class output.

Initialize ‘RandomForestClassifier’ and train the model.


#initialize the random forest classifier and fit the datas
model= RandomForestClassifier(random_state=2),y_train)


random forest classifier

After we train our model we can see how RandomForestClassifier fits our features and targets of the dataset. We will plot the first 5 trees of our classifier using the ‘matplotlib’ library.


#plot the RandomForestClassifier’s first 5 trees 
fig, axes = plt.subplots(nrows = 1,ncols = 5,figsize = (10,2), dpi=900)
for index in range(0, 5):
    tree.plot_tree(model.estimators_[index],feature_names= features.columns,
 class_names= 'status',filled = True,ax = axes[index])
    axes[index].set_title('Estimator: ' + str(index+1), fontsize = 11)
fig.savefig('Random Forest 5 Trees.png')

random forest trees

6) Prediction and Accuracy

Now we will predict our output(y_pred) for testing data(x_test) which is 20% of the dataset using the model which we have trained. Also, we will calculate accuracy, mean absolute error and root mean square error of our model.


#predict the output for x_test

#calculate accuracy,root mean squared error
print('Accuracy :',accuracy_score(y_test, y_pred))
print('Mean Absolute Error:', mean_absolute_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, y_pred)))


model accuracy and error

7) Building the Prediction System.

Finally, we will take the user’s input and check whether data has Parkinson’s disease or not.


#input data and transform into numpy array
in_data= np.asarray(tuple(map(float,input().rstrip().split(','))))
#reshape and scale the input array
in_data_re = in_data.reshape(1,-1)
in_data_sca = scaler.transform(in_data_re)

#print the predicted output for input array
print("Parkinson's Disease Detected" if model.predict(in_data_sca) else "No Parkinson's Disease Detected")

Detect Parkinson’s Disease Output

detect parkinson's disease output prediction


In this Machine learning project, we developed a model using the RandomForestClassifier of the sklearn module of python to detect if an individual has Parkinson’s Disease or not. We got the machine learning model with 97.43% accuracy, which is good as our dataset contains less records.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Leave a Reply

Your email address will not be published. Required fields are marked *