Day 8: What is Linear Regression with Derivation.

Introduction to Linear Regression. Derivation of Ordinary Least Square method.

By Nandeshwar

Jun 08, 2021

Day 8.png


Simple Linear Regression / Univariate Linear Regression

As the name suggests this algorithm is used for Regression problems and again from the name we can infer that Linear Regression is a Linear Model. This basically means that this algorithm establishes a linear relationship between the input variable(X) and the single output variable (Y). It is the simplest model in Machine Learning.

When there are multiple input variables(X), it is called Multiple Linear Regression.

Wanna jump right to code, check out complete code on Github.

In this algorithm we consider an input variable - X and one output variable - Y. We have to establish a linear relationship between them. The linear relationship can be defined as follows.

Y=β0+β1X

  • β1 is called scale factor or slope or coefficient
  • β0 is called intercept or bias coefficient

This equation is similar to the slope of a line ie y=mx+b where m=β1 (Slope of the line) and b=β0 (Intercept). Hence we can state that in the Simple Linear Regression model we want to draw a line between X and Y which defines the relationship between them.

Assumptions for Linear Regression

  1. A linear relationship between features and the target variable.
  2. Additivity means that the effect of changes in one of the features on the target variable does not depend on values of other features. For example, a model for predicting the revenue of a company has one of the two features - the number of items "a" sold and the number of items "b" sold. When a company sells more items "a" the revenue increases and this is independent of the number of items "b" sold. But, if customers who buy "a" stop buying "b", the additivity assumption is violated.
  3. No correlation between features (no collinearity). This affects the model severely.
  4. Homoscedasticity

Different types of Linear Regression

  1. Univariate Linear Regression or Simple Linear Regression - Single Variable Linear Regression is a technique used to model the relationship between a single input independent variable (feature variable) and an output dependent variable using a linear model i.e a line.
  2. Multiple Linear Regression - It is a statistical technique that uses several explanatory variables to predict the outcome of a single dependent variable.
  3. Ridge Regression - Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors
  4. Lasso Regression - Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

How does Simple Linear Regression work?

The main task of this algorithm is to find the best line which fits the given input data.

The hypothesis is defined by

hθ(xi)=β0+β1xi

which can be rewritten as

(1)yi^=β0+β1xi

where hθ(xi) represents the predictive response for ith observation.

Ordinary Least Square

One of the most common and accurate methods is the Ordinary Least Square method.

Let us create a dataset first

day8-fig1.png

Our goal is to find a model whose line fits best with respect to the given data.

There should be a minimal error between predicted values and actual values.

Error is the distance (d) of the actual point from the model fit line. The Sum of all the errors can be denoted by E. So we can say

E=i=0n(y(actual)y(predicted))2

which can be written as for ith value

(2)E=i=0n(yiyi^)2

where n is total number of inputs

The square is taken because some points are above the line and some points are below. So to eliminate the negatives we square the value.

The algorithm's goal is to minimize this error function.

Derivation

  1. Given n inputs and outputs (xi,yi)
  2. Find best fit for the line in equation 1 yi=β0+β1Xi
  3. The best fit line should have minimum error. To achieve the same we will have to minimize the error function defined above

Using equation 1 and equation 2

(3)E=i=0n(yiβ0β1xi)2

To minimize our above-mentioned cost function, as we have learned in calculus, a univariate optimization involves taking the derivative and setting equal to 0. Similarly, this minimization problem above is solved by setting the partial derivatives equal to 0. That is, take the derivative of (1) with respect to β0 and set it equal to 0. We then do the same thing for β1. This gives us,

(4)Eβ0=i=1n2(yiβ0β1xi)=0

and

(5)Eβ1=i=1n2xi(yiβ0β1xi)=0

respectively

Before going further we know for the fact that
(6)i=1Nyi=Ny¯
where y¯ is mean of y
Now let us find the value of β0 first from equation 2

We can simply divide both sides with -2 which will give

Eβ0=i=1n(yiβ0β1xi)=0

Using equation 6

nβ0=ny¯nβ1x¯

We simply divide everything by n and amazingly!

(7)β0=y¯β1x¯

Now let us find the value of β1 from equation 5

We can simply divide both sides with -2 again which will give

i=1nxiyiβ0xiβ1xi2=0

Using equation 6, we can substitute β0 in the above equation

i=1nxiyi(y¯β1x¯)xiβ1xi2=0

Note that the summation is applying to everything in the above equation. We can distribute the sum to each term to get,

i=1nxiyiy¯i=1nxi+β1x¯i=1nxiβ1i=1nxi2=0

Using equation 6 again we get

β1=i=1nxiyinx¯y¯i=1nxi2nx¯2

You can either look up or derive for yourself that i=1n(xix¯)(yiy¯)=i=1nxiyinx¯y¯. You can also easily derive that i=1n(xix¯)2=i=1nxi2nx¯2. These two can be derived very easily using algebra. Give it a shot yourself.


β1=i=1n(xix¯)(yiy¯)i=1n(xix¯)2

Voila! we are done.

Get the complete code on GitHub.

References

Tags

Machine Learning
Beginner
Maths