Complete Linear Regression & its Python Implementation
Linear Regression is one of the first and most basic algorithms that you will learn while starting your career in Machine Learning. Linear Regression as mentioned was a part of statistics and was then used in Machine Learning for the prediction of data.

In this module, we will be learning Linear Regression and its implementation in python. Moving towards what is Linear Regression first. Linear Regression is a concept of statistics which was formulated and used to predict one numeric variable (dependent variable) based on a given input variable (independent variable).
In Machine Learning, the model can be used in many practical life situations to predict various values such as:
- Consider a situation where a builder wants to sells the flats in his/her constructed building. The builder may take the sales and their prices of the previous flats sold in the same area and then decide upon the market price of his flats. This is what can be achieved using Linear Regression. Here the builder can build a linear regression model using the rate at which previous flats were sold in the area and get an estimate of what could be price at which the current flats can be sold.
- Consider another situation where the CAB company wants to calculate the estimated bill for a customer before the ride. This can be achieved by using the historic data, plotting the data, and then using the historic data predicting the new data. The company analyses the various bill values for the various distances and then calculates an estimate for the new distance value. This is what can be achieved using Linear Regression.
A Simple Linear Regression is of the form:
y = mx + c
Here x is the explanatory variable (independent variable), y is the dependent variable, m is the slope of the line and c is the intercept (the value of y when x = 0).
Let’s discuss the above linear regression with the previously stated example of Cab Fare Calculation
Fig. Graph Representing the fare value for corresponding distance value
The above example has been plotted for the following values:
X-Values (Distance in Kms) | Y-Values (Price in Rupees) |
7 | 91 |
11 | 121 |
22 | 253 |
35 | 437.5 |
As shown in the table the values have plotted in the graph. Now using these values and the concept of linear regression we can predict our fare for a given distance in km.
Let’s Implement this example of Linear Regression using Python:
- We will implement this example using Sci-Kit Learn and NumPy Libraries.
- First, we will import the libraries into our code:
Note: If you do not have NumPy and Sci-Kit Learn libraries installed on your computer follow the following steps:
For NumPy (Windows):
- NumPy can be installed with Conda, with Pip, or with a package manager on macOS and Linux.
- If using Conda then,
- conda install numpy
- If using Pip then,
- pip install numpy
- For more details, you can visit https://numpy.org/install/
For Sci-Kit Learn Installation (Windows):
- SciKit-Learn can be installed with Conda, with Pip, or with a package manager on macOS and Linux.
- If using Conda then,
- conda install scikit-learn
- If using Pip then,
- pip install -U scikit-learn
- For more details, you can visit https://scikit-learn.org/stable/install.html
Moving towards the Implementation:
- After Importing the libraries, we will now create 2 NumPy arrays for storing the values of the distance and its corresponding fare as follows:
The reshape method used here is for creating a two-dimensional array out here as we require an array in the form of one column and any number of rows.
- Now we will create a linear regression model and fit this data into that model.
Statement (1) creates an instance of Linear Regression and statement (2) fits the data into that instance.
There are various parameters of this model such as:
- fit_intercept which is a Boolean value and is TRUE by Default which is used to calculate the intercept ‘c’ in our case and is 0 if changed to FALSE.
- Normalize is a Boolean value and is FALSE by default and it decides whether to normalize the input or not.
- copy_X which is also a Boolean value and is TRUE by default and it decides where to copy and overwrite the input variable or not.
- n_jobs which is an Integer value and is None by default which tells us the number of jobs in parallel computation.
- Now we use the score() method to find out the coefficient of determination of our Linear Regression model.
- After this the slope and the intercept values are retrieved i.e. ‘m’ and ‘c’ respectively in our case is retrieved as follows:
Now we try to predict the values of fare (dependent variable) again using the distance values (independent variable) using the predict() method:
In statement 1 predict methods are used to predict the values and in statement 2 the print function is used along with the ‘sep’ parameter which uses the newline character to keep the values printed separated on each line.
- This was one of the ways of calculating while another one is applying the above-stated formula of linear regression that is:
y = mx + c
Here y = fare, x = distance, m = slope and c = intercept.
So, it will be done as follows:
- Since we all know that this model is used for the prediction of new values so it can be done as follows:
- We will take a set of new values for x and do the same procedure i.e. as follows:
Linear Regression with Multiple Variables
This that we have done is for a single dependent variable but we all know that the output is never dependent on a single thing only. There may be multiple things. In our case of Cab fare prediction, the output is also dependent on the time taken to reach the destination. Thus, this also becomes a parameter on which the fare is dependent. Now let’s use Linear Regression for multiple variables.
Consider the following two charts:
Fig. Graph Representing the fare value for corresponding distance value
The above example has been plotted for the following values:
X-Values (Distance in Kms) | Y-Values (Price in Rupees) |
7 | 91 |
11 | 121 |
22 | 253 |
35 | 437.5 |
Fig. Graph Representing the value for the corresponding time value
The above example has been plotted for the following values:
Time in Minutes | Amount in Rupees |
20 | 91 |
32 | 121 |
55 | 253 |
60 | 437.5 |
Let’s Implement this example of Linear Regression using Python:
- We will implement this example using similar Sci-Kit Learn and NumPy Libraries.
- First, we will import the libraries into our code:
- After Importing the libraries, we will now create 2 NumPy arrays for storing the values of the distance and its corresponding fare as follows:
- Now we will create a linear regression model and fit this data into that model.
- Now we use the score() method to find out the coefficient of determination of our Linear Regression model.
- After this the slope and the intercept values are retrieved i.e. ‘m’ and ‘c’ respectively in our case is retrieved as follows:
- Now we try to predict the values of fare (dependent variable) again using the distance values (independent variable) using the predict() method:
- This was one of the ways of calculating while another one is applying the above-stated formula of linear regression that is:
y = mx + c
Here y = fare, x = distance, m = slope and c = intercept.
So, it will be done as follows:
Here we multiply the slope i.e. weight with each column of the input and take the sum of all and then add it with the intercept to get the final output.
- Since we all know that this model is used for the prediction of new values so it can be done as follows:
- We will take a set of new values for x and do the same procedure i.e. as follows:
Hope you all understood what is Linear Regression and how to implement the same in Python.
Thanks for Reading!!!!!!!!!
4 Responses
[…] Regression tutorial: A simple descriptive blog post on Linear Regression can be found here with python […]
[…] Learning vs Deep Learning. Along with this to have an insight into Linear Regression with Python Click or Artificial Intelligence […]
[…] forest is a supervised machine learning algorithm that can be used for solving classification and regression problems both. However, mostly it is preferred for classification. It is named as a random forest […]
[…] cost function of linear regression is represented […]