Machine Learning-Linear Regression

Machine Learning-Linear Regression

“Information is an amazing substance and machine learning is the craft of extricating helpful data from the informational collection”

To art such a workmanship, Machine learning has different procedures/calculations. A machine learning issue is characterized dependent on three attributes — Performance, Task, and Experience. The presentation of an undertaking improves dependent on past information or experience.

Machine learning is characterized into three significant parts:

1) Supervised Learning

At the point when a given data set has a predefined set of named inputs/yields. At that point it is simpler to prepare a model to discover connections between different substances. For instance — House value expectation, Email spam arrangement.

2) Unsupervised Learning

At the point when a given data set doesn’t have a predefined set of marked sources of info/yields. At that point we can prepare a model to group the data dependent on qualities and likenesses. Model — Anomaly/Cancer expectation, false exchange.

3) Reinforcement Learning

At the point when the important data isn’t given, a progression of tests are performed and the data is gathered. The gathered data must speak to the whole network in order to get the precise outcomes. Model — Playing PC games to improve exactness through honor and punishment framework.

Data Pre-Processing:


Data preprocessing in Machine Learning refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for a building and training Machine Learning models.

  • Collecting the huge amount of relevant data
  • Replacing null with median values and maintaining common data type each column-wise.
  • Dropping unnecessary features
  • Representing characters/string in the form of numbers using LabelEncoder
  • Data normalization: This technique is required when the data representing has different scales of value. For eg. Human recognizable units such as personage = 25, speed of a car = 80 Km/hr, etc.. which a machine would easily get confused. So data normalization helps to solve this problem. The formulae used here is as shown below

Image for post

The scale of the data after pre-processing will be in the range of (-1 to +1)

  • Separating input and output from raw data


  • Pandas — Manipulating raw data.
  • Numpy — Mathematical calculations.
  • Matplotlib, pyplot — Plotting graphs.


Why matrix: A list is represented as a vector, multiple vectors represent a matrix. In this form, it is easier to do calculations especially when we have huge data.

Image for post

Matrix Addition
Matrix Multiplication

Image for post

Matrix Transpose

Why separation: To discover the little contrasts while building our model. The subordinate is for single variable functions, and the incomplete subsidiary is for multivariate functions. In ascertaining the halfway subordinate, we will simply change the estimation of one variable, while keeping others steady.

Supervised Learning can be classified into two types:

  • Regression — Continous range of values(Output)
  • Classification — Output is discrete and predicts to be in either one of the groups.

All the supervised learning models have data and it should be divided into two parts: Input(X) and Output(Y). In order to understand/find a suitable model for the data use a scatter plot.

Regression Model:

Looking at the below data points, we tend to define such data by a line.

Image for post

Image for post

Equation of a line

In order to predict accurate values, we need a best-fit line representing the whole data set. Rearranging the variables as commonly used in the machine learning context.

Image for post

Image for post

  • x — input parameter
  • Θ0— Bias: Meaning taking sides. In ML, we choose one generalization over another from the set of possible generalizations
  • Θ1— hyperparameter: Tuned for a given predictive modeling problem
  • y — output

Error Function/Cost:

Image for post

Every ML model’s objective is to reduce the error to 0.

Image for post

Minimize the error

To Calculate the sum of all the errors, we use a squared error function.

Image for post

Model: X*Θ^T

Minimize Error: We can minimize the error by trying out different Θ1 values in our model.

Gradient Descent:

Image for post

The diagram appeared here is — for different theta esteems and the separate squared mistake esteem.

Regardless of how high our blunder esteem is, we have to bring to a base. So as to do that, we can use the slant of a line(∂y/∂x)[Partial separation of y as for x] which is demonstrated as follows. A negative slope guarantees we are going down the bend and ‘C’ is the learning rate by which we progress the estimation of theta.

Image for post
Gradient Descent formula

Choosing a ‘learning rate’ is important for a model. In order to do that we need to practice by trial and error method. To automate this we have gradient descent formula which is universal to all the machine learning models.

Related Posts