Linear Regression From Scratch

An implementation of Linear Regression from scratch in python 📈. Includes the hypothesis function, partial differentiation of variables, parameter update rule and gradient descent.

Getting Started

Download the project on your computer. git clone https://github.com/sarvasvkulpati/LinearRegression or download the ZIP file
Go to the directory with the file: cd LinearRegression
Download the required packages: pip install -r requirements.txt
Run the project: python3 LinearRegression.py

How It Works

What Is Linear Regression?

Linear regression is a method for approximating a linear relationship between two variables. While that may sound complicated, all it really means is that it takes some input variable, like the age of a house, and finds out how it's related to another variable, for example, the price it sells at.

We use it when the data has a linear relationship, which means that when you plot the points on a graph, the data lies approximately in the shape of a straight line.

Linear Regression involves a couple of steps:

Randomly initialising parameters for the hypothesis function
Computing the mean squared error
Calculating the partial derivatives
Updating the parameters based on the derivatives and the learning rate
Repeating from 2 until the error is minimised

The Hypothesis Function

The linear equation is the standard form that represents a straight line on a graph, where m represents the gradient, and b represents the y-intercept.

The Hypothesis Function is the exact same function in the notation of Linear Regression.

$h_{\theta}(x) = \theta_{1}x + \theta_{0}$

The two variables we can change – m and b – are represented as parameters theta1 and theta0

In the beginning, we randomly initialize our parameters, which means we give theta1 and theta0 random values to begin with. This will output a random line, maybe something like this:

The Error Function

Clearly the line drawn in the graph above is wrong. But how wrong is it? That's what the error function is for - it calculates the total error of your graph. We'll be using an error function called the Mean Squared Error function, or MSE, represented by the letter J.

$J(\theta_{0}, \theta_{1}) = \frac{1}{2}\sum_{i=1}^{m}(h_{\theta}(x_{i}) - y_{i})^{2}$

Now while that may look complicated, what it's doing is actually quite simple. The function J takes our parameters theta0, and theta1 as an input. Then, from every point from 1 to m, where m is the number of points, it calculates the distance between the x value and the value predicted by our function and squares it. Squaring it ensures that the error is always positive, and that the error is greater the further away the predicted value is from the actual value.

Now that we have a value for how wrong our function is, we need to adjust the function to reduce this error.

Calculating Derivatives

If we graph our parameters against the error (i.e graphing the cost function), we'll find that it forms something similar to the graph below. Our goal is to find the lowest point of that graph, where the error is at its lowest. This is called minimising the cost function.

To do this, we need to consider what happens at the bottom of the graph - the gradient is zero. So to minimize the cost function, we need to get the gradient to zero.

The gradient is given by the derivative of the function, and the partial derivatives of the functions are

$\frac{\delta }{\delta \theta_{0}} = h_{\theta}(x_{i})- y_{i}$

$\frac{\delta }{\delta \theta_{0}} = (h_{\theta}(x_{i})- y_{i})x_{i}$

Updating The Parameters Based On The Learning Rate

Now we need to update our parameters to reduce the gradient. To do this, we use the gradient update rule

$\theta_{0} = \theta_{0} - \alpha\frac{\delta}{\delta \theta_{0}}$

$\theta_{1} = \theta_{1} - \alpha\frac{\delta}{\delta \theta_{1}}$

Alpha is what we call the Learning rate, which is a small number that allows the parameters to be updated by a small amount. As mentioned above, we are trying to update the gradient such that it's closer to zero (the bottom). The Learning rate helps guide the network to the lowest point on the curve by small amounts.

Minimising the Cost Function

Now we repeat these steps - checking the error, calculating the derivatives, and updating the weights, until the error is as low as possible. This is called minimising the cost funtion.

Once your error is minimised, your line should now be the best possible fit to approximate the data!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
.DS_Store		.DS_Store
LinearRegression.py		LinearRegression.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression From Scratch

Getting Started

How It Works

What Is Linear Regression?

The Hypothesis Function

The Error Function

Calculating Derivatives

Updating The Parameters Based On The Learning Rate

Minimising the Cost Function

About

Releases

Packages

Languages

Priya-Pathak/LinearRegression

Folders and files

Latest commit

History

Repository files navigation

Linear Regression From Scratch

Getting Started

How It Works

What Is Linear Regression?

The Hypothesis Function

The Error Function

Calculating Derivatives

Updating The Parameters Based On The Learning Rate

Minimising the Cost Function

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages