Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking 2023 - Lecture 10

Statistical Rethinking 2023 - Lecture 10

Richard McElreath

February 01, 2023
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Generalized Linear Models Linear Models: Expected value is additive (“linear”)

    combination of parameters Generalized Linear Models: Expected value is some function of an additive combination of parameters Y i ∼ Normal(μ i , σ) μ i = α + β X X i + β Z Z i Y i ∼ Bernoulli(p i ) f(p i ) = α + β X X i + β Z Z i f(p i )
  2. logit(p i ) = α + βx i Generalized Linear

    Models: Expected value is some function of an additive combination of parameters Uniform changes in predictor not uniform changes in prediction All predictor variables interact, moderate one another In uences predictions & uncertainty of predictions
  3. G D A u set.seed(12) N <- 2000 # number

    of applicants # even gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p )
  4. set.seed(12) N <- 2000 # number of applicants # even

    gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u
  5. set.seed(12) N <- 2000 # number of applicants # even

    gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u
  6. set.seed(12) N <- 2000 # number of applicants # even

    gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u > table(G,D,u) , , u = 0 D G 1 2 1 908 0 2 220 645 , , u = 1 D G 1 2 1 0 114 2 28 85
  7. set.seed(12) N <- 2000 # number of applicants # even

    gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u > p_u0 [,1] [,2] [1,] 0.1 0.1 [2,] 0.1 0.3 > p_u1 [,1] [,2] [1,] 0.3 0.5 [2,] 0.3 0.5 Department 2 discriminates against gender 1
  8. dat_sim <- list( A=A , D=D , G=G ) #

    total effect gender m1 <- ulam( alist( A ~ bernoulli(p), logit(p) <- a[G], a[G] ~ normal(0,1) ), data=dat_sim , chains=4 , cores=4 ) # direct effects - now confounded! m2 <- ulam( alist( A ~ bernoulli(p), logit(p) <- a[G,D], matrix[G,D]:a ~ normal(0,1) ), data=dat_sim , chains=4 , cores=4 ) > precis(m1,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1] -1.82 0.09 -1.96 -1.67 1483 1 a[2] -0.89 0.07 -1.00 -0.78 1562 1 > precis(m2,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1,1] -2.05 0.11 -2.23 -1.89 2482 1 a[1,2] -0.66 0.18 -0.96 -0.38 2369 1 a[2,1] -1.77 0.19 -2.06 -1.47 2278 1 a[2,2] -0.68 0.08 -0.81 -0.56 2210 1 total e ect shows disadvantage direct e ect confounded
  9. post2 <- extract.samples(m2) dens(inv_logit(post2$a[,1,1]),lwd=3,col=4,xli m=c(0,0.5),xlab="probability admission") dens(inv_logit(post2$a[,2,1]),lwd=3,col=4,lty =2,add=TRUE) dens(inv_logit(post2$a[,1,2]),lwd=3,col=2,add =TRUE)

    dens(inv_logit(post2$a[,2,2]),lwd=3,col=2,lty =2,add=TRUE) > precis(m1,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1] -1.82 0.09 -1.96 -1.67 1483 1 a[2] -0.89 0.07 -1.00 -0.78 1562 1 total e ect shows disadvantage direct e ect confounded D2,G2 D2,G1 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density D1,G1 D1,G2
  10. You guessed it: Collider bias Stratifying by D opens non-

    causal path through u Can estimate total causal e ect of G, but isn’t what we want Cannot estimate direct e ect of D or G Worst place in the world, Collider Bias country
  11. You guessed it: Collider bias More intuitive explanation: High ability

    G1s apply to discriminatory department anyway G1s in that department are higher ability on average than G2s High ability compensates for discrimination => masks evidence G D A u
  12. Citation networks Citation networks of members of NAS Women associated

    with lower lifetime citation rate G C gender citations Lerman et al 2022 Gendered citation patterns among the scienti c elite
  13. Membership Elections to NAS Women associated with 3–15 times higher

    election rate, controlling for citations gender Card et al 2022 Gender Gaps at the Academies NAS member citations G M C
  14. gender NAS member citations G M C q quality (hidden)

    Restrict sample to NAS members, examine citations If men less likely to be elected, then must have higher q,C to compensate
  15. gender NAS member citations G M C q quality (hidden)

    Control for citations, examine elections to NAS G is “treatment”. C is a post-treatment variable! If women less likely to be cited (bias), then women more likely to be elected because they have higher q than indicated by C
  16. No causes in; No causes out Hyped papers with vague

    estimands, unwise adjustment sets Policy design through collider bias? We can do better Strong assumptions required Qualitative data useful G M C q
  17. Sensitivity analysis What are the implications of what we don’t

    know? Assume confound exists, model its consequences for di erent strengths/kinds of in uence How strong must the confound be to change conclusions? G D A u
  18. Sensitivity analysis A i ∼ Bernoulli(p i ) logit(p i

    ) = α[G i , D i ] + β G[i] u i G D A u
  19. Sensitivity analysis A i ∼ Bernoulli(p i ) logit(p i

    ) = α[G i , D i ] + β G[i] u i G D A u (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i G D A u
  20. A i ∼ Bernoulli(p i ) logit(p i ) =

    α[G i , D i ] + β G[i] u i (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i datl <- dat_sim datl$D2 <- ifelse(datl$D==2,1,0) datl$N <- length(datl$D) datl$b <- c(1,1) datl$g <- c(1,0) mGDu <- ulam( alist( # A model A ~ bernoulli(p), logit(p) <- a[G,D] + b[G]*u[i], matrix[G,D]:a ~ normal(0,1), # D model D2 ~ bernoulli(q), logit(q) <- delta[G] + g[G]*u[i], delta[G] ~ normal(0,1), # declare unobserved u vector[N]:u ~ normal(0,1) ), data=datl , chains=4 , cores=4 ) u j ∼ Normal(0,1)
  21. 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30

    40 probability admission Density 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density Ignore confound Assume confound D2,G2 D2,G1 D1,G1 D1,G2
  22. Sensitivity analysis What are the implications of what we don’t

    know? Somewhere between pure simulation and pure analysis Vary confound strength over range and show how results change –or– vary other e ects and estimate confound strength Confounds persist — don’t pretend G D A u 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density
  23. More parameters (2006) than observations (2000)! A i ∼ Bernoulli(p

    i ) logit(p i ) = α[G i , D i ] + β G[i] u i (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i u j ∼ Normal(0,1)
  24. Oceanic Technology How is technological complexity related to population size?

    To social structure? data(Kline) In uence of population size and contact on total tools Kline & Boyd 2010 Population size predicts technological complexity in Oceania
  25. Oceanic Technology data(Kline) culture population contact total_tools mean_TU 1 Malekula

    1100 low 13 3.2 2 Tikopia 1500 low 22 4.7 3 Santa Cruz 3600 low 24 4.0 4 Yap 4791 high 43 5.0 5 Lau Fiji 7400 high 33 5.0 6 Trobriand 8000 high 19 4.0 7 Chuuk 9200 high 40 3.8 8 Manus 13000 low 28 6.6 9 Tonga 17500 high 55 5.4 10 Hawaii 275000 low 71 6.6 Kline & Boyd 2010 Population size predicts technological complexity in Oceania
  26. P C T L contact population tools location Adjustment set

    for P: L Also want to stratify by C, to study interaction
  27. Modeling tools Tool count is not binomial: No maximum Poisson

    distribution: Very high maximum and very low probability of each success Here: Many many possible technologies, very few realized in any one place
  28. Poisson link is log Poisson distribution takes shape from expected

    value Must be positive Exponential scaling can be surprising! Y i ∼ Poisson(λ i ) log(λ i ) = α + βx i λ i = exp(α + βx i ) log(λ i ) exp
  29. Poisson (poison) priors Exponential scaling can be surprising α ∼

    Normal(0,10) log(λ i ) = α 0 20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 mean number of tools Density
  30. 0 20 40 60 80 100 0.00 0.01 0.02 0.03

    0.04 0.05 mean number of tools Density Poisson (poison) priors Exponential scaling can be surprising α ∼ Normal(0,10) log(λ i ) = α α ∼ Normal(3,0.5)
  31. Poisson priors -2 -1 0 1 2 0 20 40

    60 80 100 x value expected count α ∼ Normal(0,1) log(λ i ) = α + βx i β ∼ Normal(0,10) -2 -1 0 1 2 0 20 40 60 80 100 x value expected count α ∼ Normal(3,0.5) β ∼ Normal(0,0.2)
  32. data(Kline) d <- Kline d$P <- scale( log(d$population) ) d$contact_id

    <- ifelse( d$contact=="high" , 2 , 1 ) dat <- list( T = d$total_tools , P = d$P , C = d$contact_id ) # intercept only m11.9 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a, a ~ dnorm( 3 , 0.5 ) ), data=dat , chains=4 , log_lik=TRUE ) # interaction model m11.10 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a[C] + b[C]*P, a[C] ~ dnorm( 3 , 0.5 ), b[C] ~ dnorm( 0 , 0.2 ) ), data=dat , chains=4 , log_lik=TRUE ) compare( m11.9 , m11.10 , func=PSIS ) Y i ∼ Poisson(λ i ) log(λ i ) = α C[i] + β C[i] log(P i ) α j ∼ Normal(3,0.5) β j ∼ Normal(0,0.2)
  33. data(Kline) d <- Kline d$P <- scale( log(d$population) ) d$contact_id

    <- ifelse( d$contact=="high" , 2 , 1 ) dat <- list( T = d$total_tools , P = d$P , C = d$contact_id ) # intercept only m11.9 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a, a ~ dnorm( 3 , 0.5 ) ), data=dat , chains=4 , log_lik=TRUE ) # interaction model m11.10 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a[C] + b[C]*P, a[C] ~ dnorm( 3 , 0.5 ), b[C] ~ dnorm( 0 , 0.2 ) ), data=dat , chains=4 , log_lik=TRUE ) compare( m11.9 , m11.10 , func=PSIS ) Y i ∼ Poisson(λ i ) log(λ i ) = α C[i] + β C[i] log(P i ) α j ∼ Normal(3,0.5) β j ∼ Normal(0,0.2) > compare( m11.9 , m11.10 , func=PSIS ) Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. PSIS SE dPSIS dSE pPSIS weight m11.10 85.9 13.50 0.0 NA 7.3 1 m11.9 141.3 33.69 55.4 33.13 8.0 0
  34. > compare( m11.9 , m11.10 , func=PSIS ) Some Pareto

    k values are high (>0.5). Set pointwise=TRUE to inspect individual points. Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. PSIS SE dPSIS dSE pPSIS weight m11.10 85.9 13.50 0.0 NA 7.3 1 m11.9 141.3 33.69 55.4 33.13 8.0 0 pPSIS is the “penalty”, the e ective number of parameters. How can a model with more parameters (m11.10) have fewer e ective parameters? m11.9 changes more when individual societies are dropped
  35. -1 0 1 2 0 20 40 60 log population

    (std) total tools Yap Trobriand Tonga Hawaii k <- PSIS( m11.10 , pointwise=TRUE )$k plot( dat$P , dat$T , xlab="log population (std)" , ylab="total tools" , col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) , ylim=c(0,75) , cex=1+normalize(k) ) # set up the horizontal axis values to compute predictions at P_seq <- seq( from=-1.4 , to=3 , len=100 ) # predictions for C=1 (low contact) lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) ) lmu <- apply( lambda , 2 , mean ) lci <- apply( lambda , 2 , PI ) lines( P_seq , lmu , lty=2 , lwd=1.5 ) shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) ) # predictions for C=2 (high contact) lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) ) lmu <- apply( lambda , 2 , mean ) lci <- apply( lambda , 2 , PI ) lines( P_seq , lmu , lty=1 , lwd=1.5 ) shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3)) Points scaled by leverage
  36. -1 0 1 2 0 20 40 60 log population

    (std) total tools Yap Trobriand Tonga Hawaii 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii log scale Natural scale
  37. is model is wack Two immediate ways to improve the

    model (1) Use a robust model: gamma-Poisson (neg-binomial) (2) A principled scienti c model 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii
  38. ΔT = αPβ − γT change in tools innovation rate

    diminishing returns (elasticity)
  39. ΔT = αPβ − γT change in tools innovation rate

    diminishing returns (elasticity) rate of loss
  40. ΔT = α C Pβ C − γT diminishing returns

    depend upon contact innovation depends upon contact
  41. ΔT = α C Pβ C − γT Di erence

    equation: Says how tools change, not expected number What is the equilibrium?
  42. ΔT = α C Pβ C − γT 0 10

    20 30 40 50 0 2 4 6 8 10 time Tools P = 1e4 P = 1e3 f <- function(a=0.02,b=0.5,g=0.2,P=1e4,t_max=50) { T <- rep(0,t_max) for ( i in 2:t_max ) T[i] <- T[i-1] + a*P^b - g*T[i-1] return(T) } plot( NULL , xlim=c(0,50) , ylim=c(0,10) , xlab="time" , ylab="Tools" ) T <- f(P=1e3) lines( 1:50 , T , lwd=3 , col=2 ) T <- f(P=1e4) lines( 1:50 , T , lwd=3 , col=2 )
  43. ΔT = α C Pβ C − γT = 0

    Solve for equilibrium T
  44. ΔT = α C Pβ C − γT = 0

    ̂ T = α C Pβ C γ Solve for equilibrium T
  45. ̂ T = α C Pβ C γ T i

    ∼ Poisson(λ i ) λ i = ̂ T
  46. # innovation/loss model dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id )

    m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) All parameters must be positive Two ways to do this (1) use exp() (2) use appropriate prior ̂ T = α C Pβ C γ
  47. All parameters must be positive Two ways to do this

    (1) use exp() (2) use appropriate prior # innovation/loss model dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id ) m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) ̂ T = α C Pβ C γ > precis(m11.11,2) mean sd 5.5% 94.5% n_eff Rhat4 a[1] 0.85 0.68 -0.26 1.90 698 1 a[2] 0.93 0.83 -0.39 2.31 902 1 b[1] 0.26 0.03 0.21 0.32 1149 1 b[2] 0.29 0.10 0.12 0.45 711 1 g 1.11 0.70 0.32 2.43 862 1
  48. 0 50000 150000 250000 0 20 40 60 population total

    tools Tonga Hawaii # innovation/loss model dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id ) m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) ̂ T = α C Pβ C γ
  49. 0 50000 150000 250000 0 20 40 60 population total

    tools Tonga Hawaii 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii Innovation/loss model Generalized linear model Still have to deal with location as confound
  50. Count GLMs Distributions from constraints Maximum entropy priors: Binomial, Poisson,

    and extensions More event types: Multinomial and categorical Robust regressions: Beta-binomial, gamma-Poisson
  51. Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3

    Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Over tting / MCMC Chapters 7, 8, 9 Week 5 Generalized Linear Models Chapters 10, 11 Week 6 Mixtures & ordered categories Chapters 11 & 12 Week 7 Multilevel models I Chapter 13 Week 8 Multilevel models II Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2023
  52. Simpson’s Pandora’s Box Simpson’s paradox: Reversal of an association when

    groups are combined or separated No way to know which association (combined/separated) is correct without causal assumptions In nite evils unleashed Simpson 1951. e Interpretation of Interaction in Contingency Tables The container HELD every imaginable estimate. Pandora opens a spreadsheet Simpson’s paradox plagues us to this day.
  53. Berkeley Paradox Unconditional on department: Women admitted at lower rate

    Conditional on department: Women admitted slightly more Which is correct? No way to know without assumptions dept admit reject applications gender 1 A 512 313 825 male 2 A 89 19 108 female 3 B 353 207 560 male 4 B 17 8 25 female 5 C 120 205 325 male 6 C 202 391 593 female 7 D 138 279 417 male 8 D 131 244 375 female 9 E 53 138 191 male 10 E 94 299 393 female 11 F 22 351 373 male 12 F 24 317 341 female
  54. Berkeley Paradox Which is correct? No way to know without

    assumptions Mediator (department) Collider + confound (ability) dept admit reject applications gender 1 A 512 313 825 male 2 A 89 19 108 female 3 B 353 207 560 male 4 B 17 8 25 female 5 C 120 205 325 male 6 C 202 391 593 female 7 D 138 279 417 male 8 D 131 244 375 female 9 E 53 138 191 male 10 E 94 299 393 female 11 F 22 351 373 male 12 F 24 317 341 female
  55. X Z Y e Pipe X Z Y e Fork

    X Z Y e Collider X Z Y e Descendant A
  56. Z = 1 Z = 0 X Z Y -3

    -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 X Y cols <- c(4,2) N <- 300 Z <- rbern(N) X <- rnorm(N,2*Z-1) Y <- rnorm(N,2*Z-1) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3)
  57. Z = 1 Z = 0 cols <- c(4,2) N

    <- 300 X <- rnorm(N) Z <- rbern(N,inv_logit(X)) Y <- rnorm(N,(2*Z-1)) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3) X Z Y -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 X Y
  58. Z = 1 Z = 0 cols <- c(4,2) N

    <- 300 X <- rnorm(N) Y <- rnorm(N) Z <- rbern(N,inv_logit(2*X+2*Y-2)) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3) X Z Y -2 -1 0 1 2 3 -2 -1 0 1 2 3 X Y
  59. Non-linear haunting In event models, e ect reversal can arise

    other ways Example: Base rate di erences X Y Z treatment outcome covariate
  60. # base rate differences erasing effect of X N <-

    1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) X Y Z
  61. X Y Z # base rate differences erasing effect of

    X N <- 1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) logit(p i ) = α + βX i
  62. # base rate differences erasing effect of X N <-

    1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) X Y Z logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i
  63. -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

    1.0 treatment (X) outcome (Y) -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 Z collapsed
  64. -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

    1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 -0.5 0.0 0.5 1.0 0 1 2 3 4 beta[Z] Density
  65. mX <- quap( alist( Y ~ dbern(p), logit(p) <- a

    + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) mXZ2 <- quap( alist( Y ~ dbern(p), logit(p) <- a[Z] + b[Z]*X, a[Z] ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i logit(p i ) = α Z[i] + β Z[i] X i
  66. -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

    1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 logit(p i ) = α Z[i] + β Z[i] X i -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) Z = 1 Z = 2
  67. -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

    1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 logit(p i ) = α Z[i] + β Z[i] X i -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) Z = 1 Z = 2 -1.0 -0.5 0.0 0.5 1.0 1.5 0.0 1.0 2.0 3.0 beta[Z] Density -0.5 0.0 0.5 1.0 0 1 2 3 4 beta[Z] Density
  68. logit(p i ) = α Z[i] + β Z[i] X

    i Z = 1 Z = 2 -1.0 -0.5 0.0 0.5 1.0 1.5 0.0 1.0 2.0 3.0 beta[Z] Density
  69. Simpson’s Pandora’s Box No paradox, because almost anything can produce

    it People do not have intuitions about coe cient reversals Stop naming statistical paradoxes; start teaching scienti c logic Simpson 1951. e Interpretation of Interaction in Contingency Tables The container HELD every imaginable estimate. Pandora opens a spreadsheet Simpson’s paradox plagues us to this day.