Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hacking the Rail: Ingesting, analysing & visual...

Hacking the Rail: Ingesting, analysing & visualising realtime streaming data

Charles Cai, Chief Architect & Head of Data Science, Major Oil&Gas Company, talk at @ds_ldn meetup

Data Science London

June 04, 2015
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. Hacking  the  Rail   Inges0ng,  analysing  &  visualising  real-­‐0me  streaming

      rail  systems  data   Charles  Cai   21  April  2015   Data  Science  London  Meetup     h8p://www.meetup.com/Data-­‐Science-­‐London/events/221885254/      
  2. About  me   Hashtag  bio:     #Intrapreneur  #Innovator  #Disruptor

     #DataScienFst     #ETRM  (Energy  Trading  &  Risk  Management)  #IB  #FO  #MO  #BO   #BigData  #MachineLearning  #Cloud  #UI  #UX   Twi8er:  @caidong   h8ps://www.linkedin.com/in/charlescai   GitHub:  charles-­‐cai  
  3. HackTrain – our challenges for you Incident  dura0on   es0mates

        •  Currently  based  on  ‘best  guess’   and  experience   •  Seeking  smart  soluFon   •  Using  historical  data   •  Accuracy  needs  to  improve   •  Incident  ‘types’  are  the  challenge     Journey  check       •  Currently  based  on  ‘best   guess’  and  experience   •  Seeking  smart  soluFon   •  Using  historical  data   •  Accuracy  needs  to  improve   •  Incident  ‘types’  are  the   challenge  
  4. Na0onal  Public  Transport  Access  Nodes  (NaPTAN)   NaPTAN  is  Britain's

     naFonal  system  for  uniquely  idenFfying  points  of  access  to  public   transport.       It  is  a  core  component  of  the  naFonal  transport  informaFon  infrastructure  and  is  used   by  a  number  of  other  UK  standards  and  informaFon  systems.       There  is  a  NaPTAN  record  for  every  bus  stop,  railway  sta0on,  airport,  ferry  terminal   etc.  in  England,  Scotland  and  Wales.  Record  a8ributes  include  co-­‐ordinates  (OSGR   and  Lat-­‐Long),  NPTG  locality  reference,  name  components  and  SMS  code   200MB  CSV  /  600MB  XML!   h8ps://www.gov.uk/government/publicaFons/naFonal-­‐public-­‐transport-­‐access-­‐node-­‐schema    
  5. •  About  us   – We  run,  maintain  and  develop  Britain’s

     rail   tracks,  signalling,  bridges,  tunnels,  level   crossings,  viaducts  and  18  key  sta<ons   h8p://www.networkrail.co.uk/data-­‐feeds/  
  6. 9   Network  Rail  and  Open  Data   Name  

    Descrip0on   Frequency   BPLAN   Train  planning  data,  including  locaFons  and  secFonal  running  Fmes.   Twice  a  year   Corpus   LocaFon  reference  data.   Monthly   Movement   Train  posiFoning  and  movement  event  data.   Real-­‐0me   RTPPM  (real  0me  public   performance  measure)   Performance  of  trains  against  the  Fmetable,  measured  as  the   percentage  of  trains  arriving  at  their  desFnaFon  on-­‐Fme.   One  Message  /   Minute   SMART   Train  describer  berth  offset  data  used  for  train  reporFng.   Monthly   TD   Train  posiFoning  data  at  signalling  berth  level.   Real-­‐0me   TSR  (Temporary  speed   restric0ons)   Details  of  temporary  reducFons  in  permissible  speed  across  the  rail   network.   Once  a  week  /   Friday  Morning   VSTP  (Very  short  term   plan)   Train  schedules  created  via  the  very  short  term  plan  process  which  are   not  available  via  the  Schedule  feed.   Real-­‐0me  
  7. ATOC  brings  together  the  23  train  companies  that   serve

     the  length  and  breadth  of  the  UK,  to  preserve   and  enhance  the  benefits  for  passengers  of  Britain’s   naFonal  rail  network.  
  8. Name   Descrip0on   Frequency   Timetable  Feed   Full

     Timetable  File:  details  of  all  naFonal  rail  passenger  train  services,   CIF  format   Manual  Train  File:  (Z  Trains  File)   Master  StaFons  Names  File:  all  locaFon  specific  data  relevant  to  FTF   Fixed  Link  File  /  Set  File  /  Report  File:  …   Weekly   Fares  Feed   Train  fares,  including  promoFonal  fares,  correcFons,  under  strict  rules   by  government   January,  May   and  September   London  Terminals  Feed     Valid  London  railway  staFon  for  any  fare  adverFsed  with  a  desFnaFon   of  ‘London  Terminals’   -­‐   Avan0x  Fares  Applica0on   h8p://data.atoc.org    
  9. •  Darwin  –  a  complex  applicaFon  –  taking  data  

    from  a  wide  range  of  industry  sources   •  Uses  predicFve  and  heurisFc  technology  to   convert  data  into  useful  predicFons  of  train   running   •  Scheduled  Fmetable  and  movement  data  by   train  company  and  NaFonal  Rail   CommunicaFon  Centre   •  Taking  GPS  data  directly  from  trains  with  Wi-­‐ Fi  +  trains  with  GPS  locators   •  Darwin  CIS  –  Customer   InformaFon  Systems   (completed  by  April  2015)   •  Real  Fme  display   throughout  UK   •  NRE  App   •  NaFonalrail.co.uk   •  NRE  telephone  /  Mobile   channels   *HackTrain:  a  full  copy  of  March  2015  SQL  RTTI   Database  Dump   -­‐  9  Million  messages   -­‐  Half  million  forecast  messages  with  7  reasons     h8p://www.naFonalrail.co.uk/46391.aspx    
  10. Iden0fying  Loca0ons:   STANOX  –  StaFon  Number   TIPLOC  –

     Timing  Point  LocaFon   NLC  –  NaFonal  LocaFon  Code   3-­‐Alpha  Code  for  CRS  –  Computer  ReservaFon   System  or  NRS  –  NaFonal  ReservaFon  System     •  Knowledgebase  &  XML  Feeds   •  Incident  XML  (Service  DisrupFon)   •  Incident  XML  (Engineering  work)   •  NaFonal  Service  Indicator  (NSI)   •  StaFons,  PromoFons,  Ticket  Types,   TOCs   •  Darwin  Push-­‐Port   •  ConFnuously  streaming  of  train   schedule  +  train  running  predicFons   •  Area  of  interest  /  EnFre  country     •  Extremely  high-­‐volume   •  Darwin  Timetables   •  Schedule  changes  (delta)   h8p://nrodwiki.rockshore.net/index.php/Main_Page    
  11. h8ps://github.com/openraildata   Peter  Hicks  from  NRE  manages  the   Github

     repo.  Peter  also  manages     h8p://www.openraildata.info       I’ll  upload  my  Darwin  code  to   stomp-­‐client-­‐python  soon.       Please  contact  me  via  TwiXer  /   Github  or  LinkedIn  if  you  are   interested  in  Visualiza0on,   Predic0ons,  and  Mobile  Apps  using   rail  data  as  well  other  data  sets   (e.g.  MET  Weather,  TwiXer  and   Facebook  etc)!    
  12. Appendix:  our  team’s  project  in   HackTrain   •  And

     you  can  find  more  informaFon  on  the   winning  teams’  work  here:   h8p://hacktrain.com