Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Statistics with Python

Introduction to Statistics with Python

Avatar for barrachri

barrachri

April 16, 2016
Tweet

More Decks by barrachri

Other Decks in Programming

Transcript

  1. THE STORY OF THIS TALK: 3 DAYS BEFORE THE CONFERENCE

    VALERIO: CHRISTIAN, WE HAVE A FREE SLOT AND WE NEED A TALK CHRISTIAN: I CAN’T IN 3 DAYS… VALERIO: YOU MUST.
  2. CONTENT 1. What is STATISTICS ? 2. Variable types 3.

    Univariate distribution 4. Frequencies 5. M^3 (Mean, Median, Mode) 6. Variance and Standard Deviation 7. Multivariate distribution 8. Covariance and Correlation
  3. — Oxford English Dictionary …. THE BRANCH OF SCIENCE OR

    MATHEMATICS CONCERNED WITH THE ANALYSIS AND INTERPRETATION OF NUMERICAL DATA AND APPROPRIATE WAYS OF GATHERING SUCH DATA. ” “
  4. — American Statistical Association STATISTICS IS THE SCIENCE OF LEARNING

    FROM DATA, AND OF MEASURING, CONTROLLING, AND COMMUNICATING UNCERTAINTY; AND IT THEREBY PROVIDES THE NAVIGATION ESSENTIAL FOR CONTROLLING THE COURSE OF SCIENTIFIC AND SOCIETAL ADVANCES ” “
  5. — John Tukey, Bell Labs, Princeton University THE BEST THING

    ABOUT BEING A STATISTICIAN IS THAT YOU GET TO PLAY IN EVERYONE ELSE'S BACKYARD. ” “
  6. — Mark Twain THERE ARE THREE KINDS OF LIES: LIES,

    DAMNED LIES, AND STATISTICS. ” “
  7. 4 KINDS OF VARIABLES • QUANTITATIVE VARIABLES • CONTINUOUS •

    DISCRETE • CATEGORICAL VARIABLES • ORDINAL • NOMINAL
  8. DIFFERENT TYPES OF FREQUENCY • ABSOLUTE FREQUENCY (ni): number of

    observation for each of the “OBSERVATIONAL UNIT“ • ABSOLUTE CUMULATIVE FREQUENCY (Ni): Ni = Ni-1 + ni • RELATIVE FREQUENCY (fi): number of observations for each of the “OBSERVATIONAL UNIT“ divided by the total number of observations (N) • RELATIVE CUMULATIVE FREQUENCY (Fi): Fi = Fi-1 + fi • % FREQUENCY: fi * 100 • % CUMULATIVE FREQUENCY: Fi * 100
  9. 3 MAIN CONCEPTS • OBSERVATIONAL UNITS: entities whose characteristics we

    measure or observe (ALIAS ROWS) • VARIABLE: feature, characteristic of the OBSERVATIONAL UNITS (ALIAS COLUMNS) • FREQUENCY: Number of OBSERVATIONAL UNITS with the same value of a VARIABLE
  10. import numpy as np import pandas as pd import matplotlib.pyplot

    as plt %matplotlib inline univariate = pd.DataFrame(df["Product (X1)"].value_counts()) univariate.columns = ["Absolute Frequency (ni)"] univariate
  11. univariate_stocks = pd.DataFrame(df["Stock (X6)"].value_counts()) univariate_stocks = univariate_stocks.sort_index() univariate_stocks.columns = ["Absolute

    Frequency (ni)"] univariate_stocks["Relative Frequency (fi)"] = univariate_stocks["Absolute Frequency (ni)"]/ univariate_stocks["Absolute Frequency (ni)"].sum() univariate_stocks['Relative Cumulative Frequency (Fi)'] = univariate_stocks['Relative Frequency (fi)'].cumsum() univariate_stocks