Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-driven discovery in the astronomical time ...

Data-driven discovery in the astronomical time domain

Probabilistic modeling, open source, interdisciplinary collaboration, etc.

Interdisciplinary colloquium at The Flatiron Institute.

Dan Foreman-Mackey

May 04, 2018
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. noise models, algorithm development, and open source scientific software dan

    foreman-mackey cca@flatiron dfm.io github.com/dfm @exoplaneteer
  2. my goals for today Discuss the role of open source

    software in astrophysics motivated by the study of time domain astronomical surveys. Demonstrate the impact of interdisciplinary collaboration on this research.
  3. time domain astronomy Measure [something] as a function of time.

    where [something] = position velocity brightness color ...
  4. source: The Open Exoplanet Catalogue 2000 2005 2010 2015 year

    of discovery 0 250 500 750 1000 1250 number of exoplanets transit RV microlensing direct imaging timing
  5. 1 10 100 orbital period [days] 1 10 planet radius

    [R ] data: NASA Exoplanet Archive
  6. 1 10 100 1000 10000 orbital period [days] 1 10

    planet radius [R ] data: NASA Exoplanet Archive
  7. 1.0 0.5 0.0 0.5 1.0 time since transit [days] 100

    50 0 relative brightness [ppm]
  8. kepler 190k targets 30 min cadence 4 year baseline 5,000

    planet "candidates"  *note: all numbers are approximate
  9. MCMC emcee: The MCMC Hammer DANIEL FOREMAN-MACKEY,1 DAVID W. HOGG,1,2

    DUSTIN LANG,3,4 AND JONATHAN GOODMAN5 Received 2013 January 09; accepted 2013 January 30; published 2013 February 25 ABSTRACT. We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ∼N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2. Note: If you want to get started immediately with the emcee package, start at Appendix A or visit the online documentation at http://dan.iel.fm/emcee. If you are sampling with emcee and having low-acceptance-rate or other issues, there is some advice in § 4. PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 125:306–312, 2013 March © 2013. The Astronomical Society of the Pacific. All rights reserved. Printed in U.S.A. emcee: The MCMC Hammer DANIEL FOREMAN-MACKEY,1 DAVID W. HOGG,1,2 DUSTIN LANG,3,4 AND JONATHAN GOODMAN5 Received 2013 January 09; accepted 2013 January 30; published 2013 February 25 ABSTRACT. We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ∼N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2. Note: If you want to get started immediately with the emcee package, start at Appendix A or visit the online documentation at http://dan.iel.fm/emcee. If you are sampling with emcee and having low-acceptance-rate or other issues, there is some advice in § 4. 1. INTRODUCTION of dimensions. This has proven useful in too many research PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 125:306–312, 2013 March © 2013. The Astronomical Society of the Pacific. All rights reserved. Printed in U.S.A.
  10. MCMC – 8 – Algorithm 3 The parallel stretch move

    update step 1: for i ∈ {0, 1} do 2: for k = 1, . . . , K/2 do 3: // This loop can now be done in parallel for all k 4: Draw a walker Xj at random from the complementary ensemble S(∼i)(t) 5: Xk ← S(i) k 6: z ← Z ∼ g(z), Equation (10) 7: Y ← Xj + z [Xk (t) − Xj ] 8: q ← zn−1 p(Y )/p(Xk (t)) 9: r ← R ∼ [0, 1] 10: if r ≤ q, Equation (9) then 11: Xk (t + 1 2 ) ← Y 12: else 13: Xk (t + 1 2 ) ← Xk (t) 14: end if 15: end for 16: t ← t + 1 2 17: end for acceptance fraction af . This is the fraction of proposed steps that are accepted. There appears to be no agreement on the optimal acceptance rate but it is clear that both extrema are unacceptable. If af ∼ 0, then nearly all proposed steps are rejected, so the chain will have very few independent samples and the sampling will not be representative of the target density. Conversely, if af ∼ 1 then nearly all steps are accepted and the chain is performing a random walk with no regard for the target density so this will also not produce representative samples. As a rule of thumb, the acceptance fraction should be between 0.2 and 0.5 (for example, Gelman, Roberts, & Gilks 1996). For the M–H algorithm, these effects can generally be counterbalanced by decreasing (or increasing, respectively) the eigenvalues of the proposal distribution covariance. For the stretch move, the parameter a effectively controls the step size so it can be used to similar effect. In our tests, it has never been necessary to use a value of a other than 2, but we make no guarantee that this is the optimal from: DFM, Hogg, Lang, Goodman (2013)
  11. + planet star spacecraft detector observation + + = a

    Gaussian Process (incl. physics) physics GPs
  12. GPs

  13. 1 r everest rotation 2390 2400 2410 2420 2430 2440

    2450 2460 time [days] 0.4 0.2 0.0 0.2 0.4 de-trended flux [ppt] 16.5 17.0 period [days] marginalized posterior probability EPIC 212509747 GPs
  14. 102 103 104 number of datapoints 10 4 10 3

    10 2 10 1 100 computational cost [s] experiment O(N3) GPs
  15. 102 103 104 number of datapoints 10 4 10 3

    10 2 10 1 100 computational cost [s] experiment O(N3) GPs
  16. GPs 1 Fast Direct Methods for Gaussian Processes Sivaram Ambikasaran,

    Daniel Foreman-Mackey, Leslie Greengard, Member, IEEE, David W. Hogg, and Michael O’Neil, Member, IEEE Abstract—A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) dis- tribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the n-dimensional setting, however, it requires the inversion of an n ⇥ n covariance matrix, C, as well as the evaluation of its determinant, det(C). In many cases, such as regression using Gaussian processes, the covariance matrix is of the form C = 2I + K, where K is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix C is typically dense, causing standard direct methods for inversion and determinant evaluation to require O(n3) work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix C evaluation of p(✓|x, y) / 1 | det(C(x; ✓))|1/2 e 1 2 ytC 1(x;✓)y p(✓), (1) where C(x; ✓) is an n ⇥ n symmetric, positive-definite co- variance matrix. The explicit dependence of C on particular parameters ✓ is shown here, and may be dropped in the proceeding discussion. In the one-dimensional case, C is simply the scalar variance. Thus, computing the probability requires only the evaluation of the corresponding Gaussian. In the n-dimensional setting, however, C is typically dense, so that its inversion requires O(n3) work as does the evaluation of its determinant det(C). This cost is prohibitive for large n. 4 Apr 2015 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 38, NO. 2, 2015
  17. GPs

  18. GPs Fast and Scalable Gaussian Process Modeling with Applications to

    Astronomical Time Series Daniel Foreman-Mackey1,2,6 , Eric Agol1,7 , Sivaram Ambikasaran3 , and Ruth Angus4,5 1 Astronomy Department, University of Washington, Seattle, WA, USA 2 Center for Computational Astrophysics, Flatiron Institute, 162 5th Avenue, 6th Floor, New York, NY 10010, USA 3 Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India 4 Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027, USA Received 2017 March 27; revised 2017 October 9; accepted 2017 October 10; published 2017 November 9 Abstract The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large data sets. Gaussian processes (GPs) are a popular class of models used for this purpose, but since the computational cost scales, in general, as the cube of the number of data points, their application has been limited to small data sets. In this paper, we present a novel method for GPs modeling in one dimension where the computational requirements scale linearly with the size of the data set. We demonstrate the method by applying it to simulated and real astronomical time series data sets. These demonstrations are examples of probabilistic inference of stellar rotation periods, asteroseismic oscillation spectra, and transiting planet parameters. The method exploits structure in the problem when the covariance function is expressed as a mixture of complex exponentials, without requiring evenly spaced observations or uniform noise. This form of covariance arises naturally when the process is a mixture of stochastically driven damped harmonic oscillators—providing a physical motivation for and interpretation of this choice—but we also demonstrate that it can be a useful effective The Astronomical Journal, 154:220 (21pp), 2017 December https://doi.org/10.3847/1538-3881/a © 2017. The American Astronomical Society. All rights reserved. Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series Daniel Foreman-Mackey1,2,6 , Eric Agol1,7 , Sivaram Ambikasaran3 , and Ruth Angus4,5 1 Astronomy Department, University of Washington, Seattle, WA, USA 2 Center for Computational Astrophysics, Flatiron Institute, 162 5th Avenue, 6th Floor, New York, NY 10010, USA 3 Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India 4 Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027, USA Received 2017 March 27; revised 2017 October 9; accepted 2017 October 10; published 2017 November 9 Abstract The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large data sets. Gaussian processes (GPs) are a popular class of models used for this purpose, but since the computational cost scales, in general, as the cube of the number of data points, their application has been limited to small data sets. In this paper, we present a novel method for GPs modeling in one dimension where the computational requirements scale linearly with the size of the data set. We demonstrate the method by applying it to simulated and real astronomical time series data sets. These demonstrations are examples of probabilistic inference of stellar rotation periods, asteroseismic oscillation spectra, and transiting planet parameters. The method exploits structure in the problem when the covariance function is expressed as a mixture of complex exponentials, without requiring evenly spaced observations or uniform noise. This form of covariance arises naturally when the process is a mixture of stochastically driven damped harmonic oscillators—providing a physical motivation for and interpretation of this choice—but we also demonstrate that it can be a useful effective model in some other cases. We present a mathematical description of the method and compare it to existing scalable GP methods. The method is fast and interpretable, with a range of potential applications within astronomical data analysis and beyond. We provide well-tested and documented open-source implementations of The Astronomical Journal, 154:220 (21pp), 2017 December https://doi.org/10.3847/1538-3881/aa9332 © 2017. The American Astronomical Society. All rights reserved.
  19. GPs 102 103 104 105 number of data points [N]

    10 5 10 4 10 3 10 2 10 1 100 computational cost [seconds] 1 2 4 8 16 32 64 128 256 direct O(N) 100 101 102 number of terms [J] 64 256 1024 4096 16384 65536 262144 O(J2) from: DFM, Agol, Ambikasaran, Angus (2017)
  20. GPs 102 103 104 105 number of data points [N]

    10 5 10 4 10 3 10 2 10 1 100 computational cost [seconds] 1 2 4 8 16 32 64 128 256 direct O(N) 100 101 102 number of terms [J] 64 256 1024 4096 16384 65536 262144 O(J2) from: DFM, Agol, Ambikasaran, Angus (2017)
  21. GPs

  22. 1 10 100 1000 10000 orbital period [days] 1 10

    planet radius [R ] source: NASA Exoplanet Archive
  23. DFM+ (2016); arXiv:1607.08237 1 10 100 1000 10000 orbital period

    [days] 1 10 planet radius [R ] source: NASA Exoplanet Archive
  24. 2.00 ± 0.72 planets per Sun-like star expected number of

    exoplanets in the range: 2 – 25 years, 0.1 – 1 RJ DFM+ (2016); arXiv:1607.08237
  25. take homes & themes Modern astrophysics requires the development (and

    implementation) of new algorithms. This really benefits from interdisciplinary collaboration.
  26. take homes & themes Every project described here is open

    source software with an associated journal article.
  27. take homes & themes Every project described here is open

    source software with an associated journal article. Is this a hack?
  28. references dfm/emcee dfm/george dfm/celerite dfm/peerless gradient-free MCMC in Python simple

    Gaussian processes in Python fast & scalable Gaussian processes long period transiting exoplanets dan foreman-mackey cca@flatiron dfm.io github.com/dfm @exoplaneteer