70% missing values. • Fill missing values with median → numerical data • Fill missing values with mode → categorical data 0 20 40 60 80 100 Percent of missing values 0.00 0.02 0.04 0.06 0.08 Density Missing values distribution 5
One hot encoding → categorical • Train test ratio 80 : 20% 0 20 40 60 80 100 Number of columns int64 object float64 Data type Number of columns distributed by Data Types 7