need to recompile the source code • Can combine Python and other languages • Different programming styles • Flexibility factor decreases the possibility of errors
first and know your goal • It gives hints for Data Cleaning • It give ideas for feature engineering • Data size? No of features? Type of a feature? Target variable? • Data distribution plots • Correlation plots
typical challenge is feeding the right data • Better data beats fancier algorithms • garbage in gets you garbage out. • Goal of data cleaning is to identify and remove errors like • Missing data • Outliers • Bad data and duplicates • Irrelevant features • Standardization
future events, while statistics involves the analysis of the frequency of past events. Collection of tools used to answer important questions about data Inferential statistics and descriptive statistical methods Descriptive statistics will help to gain knowledge from the raw data Inferential statistics help to draw conclusions from the data Difference between statistical models and machine learning