Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Statistics with Python

Introduction to Statistics with Python

barrachri

April 16, 2016
Tweet

More Decks by barrachri

Other Decks in Programming

Transcript

  1. THE STORY OF THIS TALK: 3 DAYS BEFORE THE CONFERENCE

    VALERIO: CHRISTIAN, WE HAVE A FREE SLOT AND WE NEED A TALK CHRISTIAN: I CAN’T IN 3 DAYS… VALERIO: YOU MUST.
  2. CONTENT 1. What is STATISTICS ? 2. Variable types 3.

    Univariate distribution 4. Frequencies 5. M^3 (Mean, Median, Mode) 6. Variance and Standard Deviation 7. Multivariate distribution 8. Covariance and Correlation
  3. — Oxford English Dictionary …. THE BRANCH OF SCIENCE OR

    MATHEMATICS CONCERNED WITH THE ANALYSIS AND INTERPRETATION OF NUMERICAL DATA AND APPROPRIATE WAYS OF GATHERING SUCH DATA. ” “
  4. — American Statistical Association STATISTICS IS THE SCIENCE OF LEARNING

    FROM DATA, AND OF MEASURING, CONTROLLING, AND COMMUNICATING UNCERTAINTY; AND IT THEREBY PROVIDES THE NAVIGATION ESSENTIAL FOR CONTROLLING THE COURSE OF SCIENTIFIC AND SOCIETAL ADVANCES ” “
  5. — John Tukey, Bell Labs, Princeton University THE BEST THING

    ABOUT BEING A STATISTICIAN IS THAT YOU GET TO PLAY IN EVERYONE ELSE'S BACKYARD. ” “
  6. — Mark Twain THERE ARE THREE KINDS OF LIES: LIES,

    DAMNED LIES, AND STATISTICS. ” “
  7. 4 KINDS OF VARIABLES • QUANTITATIVE VARIABLES • CONTINUOUS •

    DISCRETE • CATEGORICAL VARIABLES • ORDINAL • NOMINAL
  8. DIFFERENT TYPES OF FREQUENCY • ABSOLUTE FREQUENCY (ni): number of

    observation for each of the “OBSERVATIONAL UNIT“ • ABSOLUTE CUMULATIVE FREQUENCY (Ni): Ni = Ni-1 + ni • RELATIVE FREQUENCY (fi): number of observations for each of the “OBSERVATIONAL UNIT“ divided by the total number of observations (N) • RELATIVE CUMULATIVE FREQUENCY (Fi): Fi = Fi-1 + fi • % FREQUENCY: fi * 100 • % CUMULATIVE FREQUENCY: Fi * 100
  9. 3 MAIN CONCEPTS • OBSERVATIONAL UNITS: entities whose characteristics we

    measure or observe (ALIAS ROWS) • VARIABLE: feature, characteristic of the OBSERVATIONAL UNITS (ALIAS COLUMNS) • FREQUENCY: Number of OBSERVATIONAL UNITS with the same value of a VARIABLE
  10. import numpy as np import pandas as pd import matplotlib.pyplot

    as plt %matplotlib inline univariate = pd.DataFrame(df["Product (X1)"].value_counts()) univariate.columns = ["Absolute Frequency (ni)"] univariate
  11. univariate_stocks = pd.DataFrame(df["Stock (X6)"].value_counts()) univariate_stocks = univariate_stocks.sort_index() univariate_stocks.columns = ["Absolute

    Frequency (ni)"] univariate_stocks["Relative Frequency (fi)"] = univariate_stocks["Absolute Frequency (ni)"]/ univariate_stocks["Absolute Frequency (ni)"].sum() univariate_stocks['Relative Cumulative Frequency (Fi)'] = univariate_stocks['Relative Frequency (fi)'].cumsum() univariate_stocks