in Wearables Research Friday, November 3 Advances and Challenges in Wearables Research Julia Wrobel, PhD Keynote Speaker Friday, November 3 10:00 AM — 3:00 PM REGISTER: bit.ly/BERD2023 In-Person: Morehouse School of Medicine, Building A, 4th Floor Sr. Biostatistician Virtual: Zoom
• Active individuals tend to live longer and healthier lives • Traditionally, this has been done using retrospective questionnaires • Accelerometers have become hugely popular • Objective • Collection “in the wild” • High resolution 7
for debate • Consider moderate-to-vigorous physical activity (MVPA) • How are “activity counts” generated? • How are cut points formed (no PA / light PA/ MVPA)? • Are these consistent across devices? Age groups? Placements? • Some general recommendations • Keep data in rawest form possible • Process using non-proprietary software 11
24-hour periods- the exact focus of FDA! • In FDA, outcome is curve or function 𝑌! 𝑡 • For accelerometer data 𝑌! 𝑡 is a 24-hour activity profiles 12 𝑡 (hour) 𝑌! (𝑡)
raw data • Less information is discarded • Better ways of imputing data • Missing data is a big problem in wearables • Time-dependent interpretations • Timing and consistency • Does it matter when and how regularly someone moves? 13
• Functional outcome, scalar predictors (e.g. age) • UK Biobank Accelerometry Study • 80,000+ participants • Generalized functional principal components analysis (gFPCA) • National Health and Nutrition Examination Survey (NHANES) • 4,000+ participants (2011-2014 wave) • Registration • How does timing of wake/sleep, PA differ across people? • Baltimore Longitudinal Study on Aging (BLSA) • 500+ participants 14
daily activity patterns across ages from functional regression • Left are males, right panel are females 17 J. Wrobel, J. Muschelli, and A. Leroux (2021). Sensors.
regression model exponential family functional data using a (GLM)-like framework 𝑔 𝐸 𝑌! 𝑠 = 𝜂! 𝑠 = 𝛽" 𝑠 + 𝑏! 𝑠 = 𝛽" 𝑠 + + #$% & 𝜉!# 𝜙# 𝑠 • 𝑌! ∼ 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝐹𝑎𝑚𝑖𝑙𝑦; 𝑔(⋅) is a link function • 𝛽& 𝑠 is a population mean function • 𝜙' 𝑠 are population level eigenfunctions • 𝜉!' are subject-specific scores 23
Examination Survey • Accelerometer data from 2011-2014 wave released in 2021 • Accelerometer data over multiple days from > 4000 subjects • 1440 minutes per day of PA measurement • Goal is to understand population patterns in sedentary behavior • Existing FDA methods cannot handle data of this size • We proposed a fast, general-purpose algorithm for generalized FPCA 24
+ 𝑏! 𝑠 = 𝛽" 𝑠 + + #$% & 𝜉!# 𝜙# 𝑠 1. Bin the data along the functional domain 𝑠 into 𝐿 bins 2. Estimate separate local GLMMs in each bin to obtain 𝜂! 𝑠(! at each bin midpoint 3. Estimate FPCA on local latent estimates 𝜂! 𝑠(! to obtain eigenfunctions 𝝓 𝑠 4. Estimate global model conditioning on eigenfunctions 𝝓 𝑠 by re- estimating subject-specific scores 𝜉!' Four-step fast GFPCA algorithm A. Leroux, C. Crainiceanu, and J. Wrobel (2023+). Fast generalized functional principal components analysis. Under review.
Variational Bayes binary FPCA (Wrobel, 2019), bfpca • Can’t estimate Poisson or other distributions • Two-step conditional model (Gertheiss, 2017), tsGFPCA • Breaks for N > 100 • fastGFPCA is • More accurate than tsGFPCA for binary and Poisson data • Order of magnitude faster • As or more accurate than bfpca for binary data • Comparable computation time 26
Estimates template to which curves are registered • uses fast, novel variational EM algorithm for binary functional data • Step 2: Estimates warping function for each subject • uses constrained maximum likelihood estimation • Implemented in R package registr • Implemented in C++ 34 • Wrobel, Goldsmith (2019). Registration for exponential family functional data. Biometrics. • Wrobel (2018). registr: Registration for exponential family functional data. Journal of Open Source Software. 3.
Columbia Biostatistics Functional Data Analysis Working Group • Jeff Goldsmith Johns Hopkins School of Public Health WIT: Wearable and Implantable Technology • Vadim Zipunnikov • Jennifer Schrack • John Muschelli • Ciprian Crainiceanu • Xinkai Zhou
is the midpoint bin 𝑙 ∈ 1, … , 𝐿 Considerations • Bin width: simplicity- equidistance and non- overlapping • Number of bins • Too many bins: bin width is too small, identifiability issues
is the midpoint bin 𝑙 ∈ 1, … , 𝐿 Considerations • Bin width: simplicity- equidistance and non- overlapping • Number of bins • Too many bins: bin width is too small, identifiability issues • Too few bins: bins width too big, don’t capture shape of underlying function
is the midpoint bin 𝑙 ∈ 1, … , 𝐿 Considerations • Bin width: simplicity- equidistance and non- overlapping • Number of bins • Too many bins: bin width is too small, identifiability issues • Too few bins: bins width too big, don’t capture shape of underlying function
Fit separate GLMM in each bin to get latent estimates • 𝑔 𝐸 𝑌! 𝑠"! = 𝛽$ 𝑠"! + 𝑏! 𝑠"! = 𝜂! 𝑠"! • 𝑠"! : time 𝑠 at the midpoint of bin 𝑙 • 𝛽$ 𝑠"! : fixed effect mean • 𝑏! 𝑠"! : subject-specific random effect • 𝜂! 𝑠"! : linear predictor, local latent estimates • Estimates are not on the original domain • On domain defined by bin midpoints • Model assumes constant effect for 𝛽% , 𝑏! across each bin • Used for estimating covariance matrix and eigenfunctions
Step 3 • 𝑔 𝐸 𝑌! 𝑠 | = 𝛽$ 𝑠 + ∑%&' ( 𝜉!% / 𝜙% 𝑠 • Eigenfunctions are orthogonal basis functions • Reduces number of covariance parameters that need to be estimated for random effects • Simple implemention • mgcv::bam()