Machine Learning-Linear Regression
“Information is an amazing substance and machine learning is the craft of extricating helpful data from the informational collection”
To art such a workmanship, Machine learning has different procedures/calculations. A machine learning issue is characterized dependent on three attributes — Performance, Task, and Experience. The presentation of an undertaking improves dependent on past information or experience.
Machine learning is characterized into three significant parts:
1) Supervised Learning
At the point when a given data set has a predefined set of named inputs/yields. At that point it is simpler to prepare a model to discover connections between different substances. For instance — House value expectation, Email spam arrangement.
2) Unsupervised Learning
At the point when a given data set doesn’t have a predefined set of marked sources of info/yields. At that point we can prepare a model to group the data dependent on qualities and likenesses. Model — Anomaly/Cancer expectation, false exchange.
3) Reinforcement Learning
At the point when the important data isn’t given, a progression of tests are performed and the data is gathered. The gathered data must speak to the whole network in order to get the precise outcomes. Model — Playing PC games to improve exactness through honor and punishment framework.
Data preprocessing in Machine Learning refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for a building and training Machine Learning models.
- Collecting the huge amount of relevant data
- Replacing null with median values and maintaining common data type each column-wise.
- Dropping unnecessary features
- Representing characters/string in the form of numbers using LabelEncoder
- Data normalization: This technique is required when the data representing has different scales of value. For eg. Human recognizable units such as personage = 25, speed of a car = 80 Km/hr, etc.. which a machine would easily get confused. So data normalization helps to solve this problem. The formulae used here is as shown below
The scale of the data after pre-processing will be in the range of (-1 to +1)
- Separating input and output from raw data
- Pandas — Manipulating raw data.
- Numpy — Mathematical calculations.
- Matplotlib, pyplot — Plotting graphs.
Why matrix: A list is represented as a vector, multiple vectors represent a matrix. In this form, it is easier to do calculations especially when we have huge data.
Supervised Learning can be classified into two types:
- Regression — Continous range of values(Output)
- Classification — Output is discrete and predicts to be in either one of the groups.
All the supervised learning models have data and it should be divided into two parts: Input(X) and Output(Y). In order to understand/find a suitable model for the data use a scatter plot.
Looking at the below data points, we tend to define such data by a line.
In order to predict accurate values, we need a best-fit line representing the whole data set. Rearranging the variables as commonly used in the machine learning context.
- x — input parameter
- Θ0— Bias: Meaning taking sides. In ML, we choose one generalization over another from the set of possible generalizations
- Θ1— hyperparameter: Tuned for a given predictive modeling problem
- y — output
Every ML model’s objective is to reduce the error to 0.
To Calculate the sum of all the errors, we use a squared error function.
Minimize Error: We can minimize the error by trying out different Θ1 values in our model.