## Machine Learning-Linear Regression

“Information is an amazing substance and machine learning is the craft of extricating helpful data from the informational collection”

To art such a workmanship, Machine learning has different procedures/calculations. A machine learning issue is characterized dependent on three attributes — Performance, Task, and Experience. The presentation of an undertaking improves dependent on past information or experience.

Machine learning is characterized into three significant parts:

## 1) Supervised Learning

At the point when a given data set has a predefined set of named inputs/yields. At that point it is simpler to prepare a model to discover connections between different substances. For instance — House value expectation, Email spam arrangement.

## 2) Unsupervised Learning

At the point when a given data set doesn’t have a predefined set of marked sources of info/yields. At that point we can prepare a model to group the data dependent on qualities and likenesses. Model — Anomaly/Cancer expectation, false exchange.

## 3) Reinforcement Learning

At the point when the important data isn’t given, a progression of tests are performed and the data is gathered. The gathered data must speak to the whole network in order to get the precise outcomes. Model — Playing PC games to improve exactness through honor and punishment framework.

## Data Pre-Processing:

Data preprocessing in Machine Learning refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for a building and training Machine Learning models.

- Collecting the huge amount of relevant data
- Replacing null with median values and maintaining common data type each column-wise.
- Dropping unnecessary features
- Representing characters/string in the form of numbers using LabelEncoder
- Data normalization: This technique is required when the data representing has different scales of value. For eg. Human recognizable units such as personage = 25, speed of a car = 80 Km/hr, etc.. which a machine would easily get confused. So data normalization helps to solve this problem. The formulae used here is as shown below

The scale of the data after pre-processing will be in the range of (-1 to +1)

- Separating input and output from raw data

## Frameworks:

- Pandas — Manipulating raw data.
- Numpy — Mathematical calculations.
- Matplotlib, pyplot — Plotting graphs.

## Mathematics:

*Why matrix:* A list is represented as a vector, multiple vectors represent a matrix. In this form, it is easier to do calculations especially when we have huge data.

## Supervised Learning can be classified into two types:

- Regression — Continous range of values(Output)
- Classification — Output is discrete and predicts to be in either one of the groups.

All the supervised learning models have data and it should be divided into two parts: Input(X) and Output(Y). In order to understand/find a suitable model for the data use a **scatter plot.**

Regression Model:

Looking at the below data points, we tend to define such data by a line.

In order to predict accurate values, we need a best-fit line representing the whole data set. Rearranging the variables as commonly used in the machine learning context.

- x — input parameter
- Θ0— Bias: Meaning taking sides. In ML, we choose one generalization over another from the set of possible generalizations
- Θ1— hyperparameter: Tuned for a given predictive modeling problem
- y — output

Error Function/Cost:

Every ML model’s objective is to reduce the error to 0.

To Calculate the sum of all the errors, we use a squared error function.

Model: X*Θ^T

Minimize Error: We can minimize the error by trying out different Θ1 values in our model.

Gradient Descent: