Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TrendCalculus: A data science for trends

TrendCalculus: A data science for trends

Andrew Morgan, CEO @byteSumo talk at @ds_ldn meetup

Data Science London

June 03, 2015
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. A data science for studying trends. 2015-04-21. Data Science London,

    Queen Mary. TrendCalculus Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  2. Trends?! Data science seems so focused on the micro scale:

    deeper granularity higher frequency… ! Set your PDF viewer to display this doc as a slide show for best viewing Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  3. Trends?! My focus is broad patterns; big flock behaviours, and

    my objective is long range predictions. Trends are a natural way to think, explain, and forecast. Yet we lack tools to understand Trends, scientifically. TrendCalculus is my unfinished research to that end. ! Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  4. What’s a trend? ! “A Trend is defined by a

    shift in behaviour or mentality that influences a significant amount of people.” - Salomé Areias ! “A Trend is the slow variation over a longer period of time, usually several years, generally associated with the structural causes affecting the phenomenon being measured.” - Eurostat ! Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  5. 400+ years of trend discussion What do you see? Perhaps

    a shift in behaviour or mentality? Maybe a drift in language use? How do we quantify and study the trend? - Wolfram Alpha ? ? “Trend” Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  6. 400+ years of trend discussion - Wolfram Alpha ? ?

    “Trend” What might cause Trend as a topic to be losing popularity? ! Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  7. 400+ years of trend discussion - Wolfram Alpha ? ?

    “Trend” What might cause Trend as a topic to be losing popularity? Maybe traditional trend analysis is flawed and the collective knows it. Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  8. What if we could do better? ! What would you

    do if you really understood trends and when they reversed? Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  9. TrendCalculus is our new trend reversal detection algorithm for streamed

    numeric data. It produces trendwise partitioning over all timeframes in a tree. It’s fast. Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  10. What does output look like? Yearly Trend Reversal Monthly Trend

    Reversal 100 200 300 400 500 600 700 AAPL_Daily_N=21_2008_ [2008-01-01/2013-11-05] 100 200 300 400 500 600 700 100 200 300 400 500 600 700 Last 525.45 mrev :531.910 yrev :702.100 Jan 01 2008 Jul 01 2008 Jan 01 2009 Jul 01 2009 Jan 01 2010 Jul 01 2010 Jan 03 2011 Jul 01 2011 Jan 02 2012 Jul 02 2012 Jan 01 2013 Jun 28 2013 yearly down trend yearly uptrend yearly down trend two timeframes: Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  11. What does output look like? 4000 5000 6000 7000 [

    trend_calc 1 [2000-01-03/2014-01-10] 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 Last 6739.9 p1 :6717.900 p2 :6440.000 p2 :6440.000 p3 :6625.400 p4 :6840.300 p5 :4944.400 p6 :3512.100 Jan 03 2000 Jan 01 2001 Jan 01 2002 Jan 01 2003 Jan 01 2004 Jan 03 2005 Jan 02 2006 Jan 01 2007 Jan 01 2008 Jan 01 2009 Jan 01 2010 Jan 03 2011 Jan 02 2012 Jan 01 2013 Dec 31 2013 many multiple stacked timeframes build up into long term structures Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  12. What does output look like? 4000 5000 6000 7000 [

    trend_calc 1 [2000-01-03/2014-01-10] 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 4000 5000 6000 7000 Last 6739.9 p1 :6717.900 p2 :6440.000 p2 :6440.000 p3 :6625.400 p4 :6840.300 p5 :4944.400 p6 :3512.100 Jan 03 2000 Jan 01 2001 Jan 01 2002 Jan 01 2003 Jan 01 2004 Jan 03 2005 Jan 02 2006 Jan 01 2007 Jan 01 2008 Jan 01 2009 Jan 01 2010 Jan 03 2011 Jan 02 2012 Jan 01 2013 Dec 31 2013 multiple stacked timeframes build up into long term structures 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p3 :6625.400 p4 :6840.300 p5 :4944.400 p6 :3512.100 partitions build, bottom up into a hierarchical structure... Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  13. Time Series Representations Time Series Representations Data Adaptive Data Adaptive

    Non Data Adaptive Non Data Adaptive Spectral Wavelets Piecewise Aggregate Approximation Piecewise Polynomial Symbolic Singular Value Approximation Random Mappings Piecewise Linear Approximation Adaptive Piecewise Constant Approximation Discrete Fourier Transform Discrete Cosine Transform Haar Daubechies dbn n > 1 Coiflets Symlets Sorted Coefficients Orthonormal Bi-Orthonormal Interpolation Regression Trees Natural Language Strings Symbolic Aggregate Approximation Non Lower Bounding Chebyshev Polynomials Data Dictated Data Dictated Model Based Model Based Hidden Markov Models Statistical Models Value Based Slope Based Grid Clipped Data TrendCalculus: Is a multi-scale, bottom up, trend reversal detected, Trendwise Approximation that produces a hierarchical time series partitioning. Where does this fit? * Author: Eamonn Keogh Professor Computer Science & Engineering Department University of California - Riverside Riverside, CA 92521 * Trendwise Partitioning Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  14. What does MTA offer? If offers rich time series methods...

    to better predict to correlate time series to index and compress to do cross-scale retrieval of “motifs” to build ‘episodic memory’ stores to normalise signal extraction, reduce noise to convert sub-symbolic data to rich symbolic data Multiscale Trend Analysis Ilya Zaliapin ∗ , Andrei Gabrielov † , and Vladimir Keilis-Borok‡ Revised: February 02, 2004 ∗Institute of Geophysics and Planetary Physics, University of California, Los Ange 1567, USA and International Institute of Earthquake Prediction Theory and Mathematic Russian Academy of Sciences, Moscow, Russia, E-mail: [email protected] Phone: +10-310 +10-310-2063051, corresponding author. †Departments of Mathematics and Earth and Atmospheric Sciences, Purdue University IN 47907-2067, USA. E-mail: [email protected] ‡ the MTA paper is a good read: Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  15. What are multi scale trends? A time series is decomposed

    into local linear trends. 0 1 L t 0 ( ) X t ( ) a) 1 b) X t X t L t 1 ( ) = ( ) - ( ) 0 L t 1 ( ) Upward trend Downward trend t1 1 t1 2 t1 3 t1 4 v0 v1 5 v1 4 v1 3 v1 2 v1 1 c) Figure 1: Scheme of the Multiscale Trend Decomposition. a) At zero step X(t) is ap- * these pictures are from the paper Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  16. From the MTA paper…. Level 0 1 interval Level 1

    3 intervals Level 2 7 intervals Level 10 23 intervals a) MTD for Brownian walk b) The corresponding hierarchy of trends upward downward 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time Time 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time Time igure 3: Decomposition of a Fractional Brownian walk with Hurst exponent H = 0 * these pictures are from the paper Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  17. The idea is to ignore noise b) c) d) 0.4

    0.7 0.45 0.53 0 1 Reversals found on the scale of interest * these pictures are from the paper Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  18. Build a tree of local trends The trends are stacked

    in a hierarchy. Like a b-tree, we index time series data into a shallow tree which is isn’t balanced per se, but partitions are interpretable and meaningful (not necessarily stationary) r0 a) b) c) r1 r1 r2 r2 r2 r2 r2 1 1 2 1 2 3 4 5 e0 e1 e1 e2 e2 e2 e2 e2 1 1 2 1 2 3 4 5 12: Three levels of detail in MTA description of a time series. a) Topol tric, based on the interval partition. c) e-metric, based on local linear fit * these pictures are from the paper Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  19. Trends are signed integers - + + - + -

    + P 1 P 2 - + - P I 1 I2 I I 1 I 2 I a b c a b c a) b) c) 14: Signed partition corresponding to a piecewise linear approximation (pan on of signed partitions (panel b), and triplet (a, b, c) for an interval of a union Trending Together? = a correlation measure. Multiply the trend signs at time t. If answer is +1 they are trending together… partitions of X partitions of Y Compare: multiply signs Compare: ratio of overlapping portions * these pictures are from the paper Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  20. What does ByteSumo bring? We created a Bottom Up algorithm,

    that detects Trend Reversals, aka “Knots”, at a Scale, based on a window, N. Stacked, it creates multi-scale partitions over a stream of time series data. It’s fast, because we changed the definition of a Trend (?!) Yes - We abandoned linear regressions… Our definition is: Rising = Higher Highs, Higher Lows Falling = Lower Lows, Lower Highs Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  21. Let’s see it in action! Let’s try the FTSE 100,

    extended back to 1935 via the FTSE 30 data. Time Series length: 21499 records (daily closes) This run uses window size of n=200 (market days) The process in Lua creates lots of intermediate calculations for each window size from n down to 1 … so it should be slow…. Total run time is ~13 seconds on my mac. Output is shown left: 51 major trend reversals found that approximate the time series. Alternatively, we could say we have “generalised the time series” into 51 important change points. it’s true luajit can speed this up… but is how else might we be able to to speed it up? Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  22. Let’s see it in action! Let’s try another way. Stacking

    the calculations: i.e. Pipe output back through the algo again x3. ! ! ! ! ! There is practically a magnitude improvement in performance when stacking. With a setting of N=5, I just processed the stack of 4 runs in less than 2 seconds using straight lua on my mac for 21,499 input records. that’s ~10k streamed records per second. With luajit it will drop further! The partitions in the trend tree we calculated are: level 4 = 0 trend reversals level 3 = 28 trend reversals level 2 = 249 trend reversals level 1 = 2,079 trend reversals. Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  23. Let’s see it in action! level 2 [level 3] level

    1 Here is the last 14 years of the stacked output. The 3 levels of partitions are seen nested: [level 3] [level 3] Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  24. Let’s see it in action! level 2 [level 3] level

    1 [level 3] [level 3] Here is the last 14 years of the stacked output. The 3 levels of partitions are seen nested: Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  25. Let’s see it in action! level 2 [level 3] level

    1 [level 3] [level 3] Here is the last 14 years of the stacked output. The 3 levels of partitions are seen nested: Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  26. Let’s see it in action! level 2 [level 3] level

    1 Zoom out. Here is from 1945 to Present. We see the 28 “level 3” partitions as red knots. [level 3] level 2 Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  27. Let’s see it in action! Zoom in. Here is 2013.

    Here we can see some of the 2,079 fine grain “level 1” reversals up close: level 1 Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  28. Let’s see it in action! Zoom in. Here is 2013.

    Here we can see some of the 2,079 fine grain “level 1” reversals up close: Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  29. Let’s see it in action! Zoom in. Here is 2013.

    Here we can see some of the 2,079 fine grain “level 1” reversals up close: ! ! While we didn’t set out to generate piecewise linear regressions, (we abandoned regression remember) you can see the results are often not bad if we judge it on that basis. ! Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  30. Let’s see it in action! Zoom in. Here is 2013.

    Here we can see some of the 2,079 fine grain “level 1” reversals up close: ! ! ! ! ! All these partitions were created in that 2 second run, ~10k data points per second. ! ! ! ! ! ! Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  31. A Rolling Trend score? This involves moving away from fixed

    windows of N and to rolling arrays for all timeframes to N. The information revealed is not trend reversals, but the underlying data used in their calculation. I will output my internal arrays to feed deep learning algorithms as a form of “trend feature generator”. For display, I turn values into symbols, and we can see rich patterns emerging from the trends across all scales. Quants who reviewed this said: “ah, it shows the relationship of the price to the Pivot Points” Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  32. Rolling Trends - all timeframes Bullish Bearish neutral bearish neutral

    bullish $DJI closes 10 40 timeframe N : 80 120 160 200 240 280 320 360 Here I present the trend score as a symbol for each timeframe to a max N to build a “multi scale trend map” Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  33. Rolling Trends - all timeframes The columns are symbols representing

    the value of the rolling channels I calculate in my array for a value n. A timeframe becomes vertical stripes on the “trend map” from 1 .. n 10 40 timeframe N : 80 120 160 200 240 280 320 360 Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  34. Trend Maps of all timeframes. The next steps are to

    use all these rich inputs to see if we can make long range predictions… ! .. by for instance feeding deep learning algorithms with all these trends to predict future trend reversals. ! It means I’ll use TrendCalculus to generate interesting trend features. ! Lots of potential for further work. ! ! # Downtrend * Uptrend : Neutral - bearish . Neutral - Bullish The identified trend reversals, as outer-join back to time series Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  35. Consider the rolling trend matrix (on all timeframes) when shown

    aligned under the chart. ! It’s drawn using a levelplot in R… ! See the fractal shapes? ! ! Training Targets Feature Matrix See the legacy of the credit crunch in the top right? Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  36. The trend maps show “Top - Bottom - Top” motif

    shapes, similar on different time frames. Small shapes under big ones. Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  37. The trend maps show “Top - Bottom - Top” motif

    shapes, similar on different time frames. Small shapes under big ones. Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  38. Deep learning the matrix patterns against the reversals offers interesting

    avenues for prediction. Copyright © 2015 ByteSumo Limited 2015. All rights reserved.
  39. A “data science” for trends requires a community of data

    scientists. ! I am launching a community of practitioners to build on my work to develop a science of trend. ! It’s a job far bigger than one person. ! If this work interests you, and you want to have a go please join:
  40. Trendwise Research London ! http://www.meetup.com/Trendwise-Research-London We will build a data

    science for trends. We will publish methods, results and papers. We will complete and open source the code.
  41. THANK YOU. You want more details, naturally. ! Then join

    the open research team. ! Be a contributor, not a consumer. This offer isn’t open forever.
  42. [email protected] Andrew Morgan is a practicing Senior Enterprise Data Architect

    and Data Scientist and currently is designing a data science practice and platform for a top 4 audit firm client. He is also the CEO of ByteSumo, a data science Consultancy He is a specialist in data processing languages, data platform design, emerging data technologies, exotic data structures, data science methods, technical architecture, and data security systems. He founded ByteSumo to build a data science led consultancy that has the experts and tools needed to transform and disrupt traditional enterprises. (curr. client role) 2014 - 2015 Interim Head of Data Science ByteSumo 2013 - present CEO Capgemini 2010 - 2013 Senior Enterprise Architect, BIM Thomson Reuters 2006 - 2010 Architect, Senior Technologist Aprimo (now Teradata) 2005 - 2006 Senior Consultant Acxiom Corporation 2000 - 2005 Business Solutions Architect dunnhumby 1999 - 2000 Database Consultant Elf Gas & Power UK 1995 - 1999 Operational Dev. Executive Gov’t of Ontario 1994 - 1994 Jnr. Planner, GIS systems. Bachelor of Arts, Geography. University of Toronto. 1994 Data Science, Data Architecture, Big Data Engineering.
  43. Attribution Salomé Areias: http://salomeareias.com/what-is-a-trend/ Eurostat: http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Glossary:Trend_cycle MultiScale Trend Analysis: I.

    Zaliapin, A. Gabrielov, V. Keilis-Borok. Multiscale trend analysis for time series. Fractals, v.12, p.275-292, 2004. http://www.math.purdue.edu/~agabriel/mta.pdf ! Eamonn Keogh: Time Series Representations - a slide found in the tutorials found here: http://www.cs.ucr.edu/~eamonn/tutorials.html