most fundamental steps in the part of data processing. • Goal of this technique is to make sure all features are on almost the same scale. • Because most of the times our dataset will contain features highly varying in magnitudes, units and range.
This technique also makes it easy for ML algorithms to process the data. To solve issues faced by different algorithms. Helps algorithms to train and converge faster.
picture • Differnet scale features will affect the step size of a gradient descent. • Scaling features can help the gradient descent converge more quickly towards the minima. • Example algorithms like linear regression, logistic regression, neural network, etc.
large values. • Because we calculate the distance between the data points to determine their similarity. • This will give different weights to different features. • By scaling the features, we can conclude that all of them contribute equally to the result. • Example algorithms like KNN, K-means, and SVM
not distance-based, feature scaling is unimportant, including Naive Bayes, Linear Discriminant Analysis, and Tree-Based models (gradient boosting, random forest, etc.).
we don’t know about the distribution or it not follow a Gaussian distribution. • It is useful when the algorithms doesn't assume about the data distribution. • Images • Where you need data to be between 0 and 1
Which is used to transform features by subtracting from mean and dividing by standard deviation. • So, all the values are centered around the mean with a unit SD. • Each feature will be of differnet scale. • It can handle outliers
zero mean and unit standard deviation. • If the feature distribution is normal or Gaussian, we can use standardisation (does not have to be necessarily true)