Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Watt-Wise Web: Architecting a Responsive and En...

Yuhao Zhu
October 26, 2020

Watt-Wise Web: Architecting a Responsive and Energy-Efficient Mobile Web

Guest lecture given at University of Utah on Web browsing

Yuhao Zhu

October 26, 2020
Tweet

More Decks by Yuhao Zhu

Other Decks in Research

Transcript

  1. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3
  2. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3
  3. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3 In pursuit of high performance
  4. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3 Throttling In pursuit of high performance
  5. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3 Throttling In pursuit of high performance SoC Thermal Design Power (TDP)
  6. 4 Mobile Processor Design “Strategy” 2007 (In-order) 2011 (Out-of-order) 2012

    (Multi-core) 2013 (Asymmetric Multi-core) Performance Power
  7. 4 Mobile Processor Design “Strategy” 2007 (In-order) 2011 (Out-of-order) 2012

    (Multi-core) 2013 (Asymmetric Multi-core) Performance Power Squeeze 3 decades of desktop CPU techniques into a 6-year span
  8. “Improving” Energy Capacity 5 Screen Size (inches) Battery Capacity (mAh)

    600 smartphone from 2006 to 2014 on http://www.gsmarena.com/makers.php3             
  9. “Improving” Energy Capacity 5 Screen Size (inches) Battery Capacity (mAh)

    600 smartphone from 2006 to 2014 on http://www.gsmarena.com/makers.php3             
  10. “Improving” Energy Capacity 5 Screen Size (inches) Battery Capacity (mAh)

    600 smartphone from 2006 to 2014 on http://www.gsmarena.com/makers.php3             
  11. Mobile Applications 8 ^ Web Monthly Unique Mobile Users Non-Web

    Web 3.3 M 8.9 M comScore Mobile Metrix, U.S., June 2015
  12. Mobile Applications 8 ^ Web Monthly Unique Mobile Users 2

    4 6 8 10 12 Jun
 2014 Sep
 2014 Dec
 2014 Mar
 2015 Jun
 2015 Sep
 2015 Dec
 2015 Mar
 2016 Jun
 2016 Million Web Non-Web comScore 2016 U.S. Mobile App Report
  13. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 Is This (Just) a Network Issue? [IEEE MICRO 2015]
  14. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  15. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  16. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  17. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 Compute LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  18. Measured Power (W) 0.0 3.0 6.0 9.0 2009 2010 2011

    2012 2013 2014 2015 Screen Radio CPU 11 Compute Is This (Just) a Network Issue? [IEEE MICRO 2015]
  19. 12 Compute Is This (Just) a Network Issue? [IEEE MICRO

    2015] Measured Power (W) 0.0 3.0 6.0 9.0 2009 2010 2011 2012 2013 2014 2015 Screen Radio CPU
  20. Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security

    Local Storage User Input Layout Render Runtime 15 Web Browsing from a Compute Perspective Compute Architecture Application
  21. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017]
  22. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017]
  23. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017]
  24. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    WebRT Fast, Energy-Efficient Mobile Web Runtime GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017] [HPCA 2013] [HPCA 2015] [ISCA 2019]
  25. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    WebRT Fast, Energy-Efficient Mobile Web Runtime GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017] [HPCA 2013] [HPCA 2015] [ISCA 2019]
  26. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click

    touchstart touchmove scroll Event Loop Event Queue CPU
  27. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click

    touchstart touchmove scroll Event is the atomic unit of execution; optimize latency/energy at the event-level. Event Loop Event Queue CPU
  28. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click touchstart touchmove scroll Event Loop Event Queue CPU
  29. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 18 Event-based Web Runtime Optimization
  30. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 18 ▸ Mechanism: Provide just enough energy to meet QoE requirement for different events Event-based Web Runtime Optimization
  31. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 18 ▸ Mechanism: Provide just enough energy to meet QoE requirement for different events ▸ Implementation: Map events to different heterogeneous hardware configurations Event-based Web Runtime Optimization
  32. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack keyup
  33. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack click keyup
  34. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup
  35. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup ▸ Wide distribution of event latencies. Events exhibit different slacks. ▹ How to exploit event slacks?
  36. !20 Event-based Scheduler (EBS) ▸ Goal: For each event, find

    the most energy-efficient hardware configuration that meets the latency target
  37. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Event-based Scheduler Events-based Scheduling Event Latency Event Energy Event Queue
  38. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core
  39. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core Voltage/ Frequency Levels
  40. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core
  41. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core
  42. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03
  43. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency =
  44. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory +
  45. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  46. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  47. Evaluation Methodology ▸ Implemented inside Google Chromium Web browser ▸

    Representative hardware platform ▹ Exynos 5410 SoC (A15 + A7) ▸UI-level record and replay for reproducibility. ▹ Mosaic [ISPASS 2015] https://github.com/Matthalp/mosaic 24
  48. 25 Evaluation Results 0 0.2 0.4 0.6 0.8 1 1.2

    Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf
  49. 26 Evaluation Results 0 0.2 0.4 0.6 0.8 1 1.2

    Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Android
  50. 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0

    0.2 0.4 0.6 0.8 1 Perf 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 27 Evaluation Results 29.2% Energy Savings
  51. 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4

    0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf 28 Evaluation Results Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1
  52. 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4

    0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 29 Evaluation Results
  53. 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4

    0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 30 Evaluation Results 0.8% More QoS violations
  54. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Pending Events onclick GC task Timer event Event Queue Time Event Queue Pending Events
  55. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Reactive Strategy Pending Events onclick GC task Timer event Event Queue Time Event Queue Pending Events
  56. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Reactive Strategy Pending Events onclick GC task Timer event ontouchmove onsubmit Event Queue Time Event Queue Pending Events
  57. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Proactive Strategy Pending & Speculative Events onclick GC task Timer event ontouchmove onsubmit Event Queue Time Event Queue Pending Events Speculative Events
  58. Inefficiency of Current Schedulers 32 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 slack OS
  59. Inefficiency of Current Schedulers 32 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 slack OS
  60. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 EBS OS
  61. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 E1 EBS OS
  62. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 EBS OS
  63. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 ? EBS OS
  64. Inefficiency of Current Schedulers 34 Time Oracle input 1 input

    2 QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 EBS OS
  65. Inefficiency of Current Schedulers 34 Time Oracle input 1 input

    2 QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 E1 E2 EBS OS
  66. Inefficiency of Current Schedulers 35 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 Schedule across both pending and future events Oracle
  67. Prediction Model 40 Prediction Model The distance of click The

    number of scrolls The number of navigations Features encoding past interactions …
  68. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model 0.10 0.12 0.58 0.07 0.02
  69. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model All Event Types
  70. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model Filter Unlikely Events!
  71. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Program State Analysis DOM Tree
  72. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Program State Analysis DOM Tree Viewport
  73. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  74. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  75. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  76. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  77. Integrating Program States 45 Features encoding past interactions Click ScrollUp

    ScrollDown ZoomIn ZoomOut … Current
 application states + + Prediction Model
  78. Overview of the Predictor 46 Click ScrollUp ScrollDown ZoomIn ZoomOut

    … + Prediction Model ~10 us Features encoding past interactions Current
 application states +
  79. Overview of the Predictor 46 Click ScrollUp ScrollDown ZoomIn ZoomOut

    … + Prediction Model Stop until the cumulative confidence of the predicted event sequence is below a threshold ~10 us Features encoding past interactions Current
 application states +
  80. Constrained Optimization Formulation 48 E1 E2 E3 Time ▸ Goal:

    minimize total energy while meeting deadlines
  81. Objective: Constrained Optimization Formulation 48 E1 E2 E3 Time ▸

    Goal: minimize total energy while meeting deadlines N ∑ i Min. Energy (i)
  82. Objective: Constrained Optimization Formulation 48 E1 E2 E3 Time △Texe

    1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines N ∑ i △Texe (i) x Min.
  83. Objective: Constrained Optimization Formulation 48 E1 E2 E3 Time △Texe

    1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines N ∑ i △Texe (i) x Power (i) Min.
  84. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines N ∑ i △Texe (i) x Power (i) Min.
  85. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines Order: ≤ Tend (i) Tstart (i+1) N ∑ i △Texe (i) x Power (i) Min.
  86. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    Tstart 1 Tstart 2 Tstart 3 TQoS 1 TQoS 2 TQoS 3 △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + N ∑ i △Texe (i) x Power (i) Min.
  87. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    Tstart 1 Tstart 2 Tstart 3 TQoS 1 TQoS 2 TQoS 3 △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + Scheduling knobs: Big/little + DVFS for each event. N ∑ i △Texe (i) x Power (i) Min.
  88. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  89. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Tmemory + Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  90. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. Tcpu △Texe (i) = Tmemory + Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  91. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Tmemory + Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  92. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Tmemory + Constants Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  93. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Min. △Texe (i) = Tmemory + Constants Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + Pmap (i) Pmap (i) = Freq2Power (f(i))
  94. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Min. △Texe (i) = Tmemory + Constants Offline profile Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + Pmap (i) Pmap (i) = Freq2Power (f(i))
  95. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  96. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1}
  97. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1} Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0
  98. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1} Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  99. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1} Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0 Only one DVFS setting is chosen for each event
  100. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  101. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  102. Overhead: Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling

    Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  103. Overhead: 10 DVFS configurations Each Event: Objective: Constraints: Problem Formulation

    50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  104. Overhead: 10 DVFS configurations 8 events look ahead Each Event:

    Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  105. Overhead: 80 variables in ILP Each Event: Objective: Constraints: Problem

    Formulation 50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  106. Overhead: Runtime overhead: 10ms 80 variables in ILP Each Event:

    Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0 Dynamic programing
  107. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Events Speculative Schedules Scheduler Predictions Time Prediction Schedule
  108. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Events Speculative Schedules Scheduler Predictions Time Prediction Schedule F1 Pending Frame Buffer F2 F3
  109. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Scheduler Predictions Time Prediction Schedule F1 Pending Frame Buffer F2 F3
  110. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Scheduler Predictions Time Prediction Schedule F1 Pending Frame Buffer F2 F3
  111. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Scheduler Predictions Commit Time Prediction Schedule F1 Pending Frame Buffer F2 F3 ✔ ✔
  112. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Recover Scheduler Predictions Commit Time Prediction Schedule F1 Pending Frame Buffer F2 F3 ✔ ✔ ✘
  113. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Recover Scheduler Predictions Commit Time Prediction Schedule F1 Pending Frame Buffer F2 F3 ✔ ✔ ✘ F3 ✔
  114. Evaluation 52 52 ▸Traces ▹Training: 10 users, more than 100

    traces. ▹Testing: 36 traces, 3 for each Web application. ▸Baseline Mechanisms ▹Interactive governor (Interactive) — Android default ▹EBS: a state-of-the-art reactive event-based scheduler ▹Oracle: optimal scheduler ▸Metrics ▹Energy consumption ▹QoS violation
  115. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter
  116. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Mis-prediction Waste (ms) 0 10 20 30 40 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter
  117. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Mis-prediction Waste (ms) 0 10 20 30 40 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter
  118. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Mis-prediction Waste (ms) 0 10 20 30 40 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter 92% Prediction Accuracy and 20 ms of Mis-Prediction Waste
  119. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  120. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  121. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  122. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  123. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  124. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  125. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  126. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  127. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  128. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle 61% less QoS Violation and 26% of Energy Reduction
  129. Event-driven Processing is a Fundamental and Prevalent Paradigm Edge Computing

    Database Crowdsourced Sensor Network Mobile Applications Virtual Reality Internet-of-things Cloud Computing
  130. “Microkernel”-based Browser ▸ A web app/website exercises only a small

    portion of the Web specification. There are “hot” tags, CSS properties, etc. ▸ Design a minimalistic browser core, and load the rest only when requested by a specific webpage. 57 https://www.advancedwebranking.com/html/ https://maqentaer.com/devopera-static-backup/http/dev.opera.com/articles/view/mama-css-syntax/index.html
  131. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. 58
  132. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX 58
  133. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. 58
  134. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. 58
  135. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines 58
  136. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines ▸ Proposal: empower developers to directly express their requirements of user-centric performance goals (up to the browser to deliver). 58
  137. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines ▸ Proposal: empower developers to directly express their requirements of user-centric performance goals (up to the browser to deliver). ▹ Paint order: developers identify “hero” elements and specify the order in which hero elements are painted. div#hero {paint-order:1} 58
  138. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines ▸ Proposal: empower developers to directly express their requirements of user-centric performance goals (up to the browser to deliver). ▹ Paint order: developers identify “hero” elements and specify the order in which hero elements are painted. div#hero {paint-order:1} ▹ Interactive state: developers specify the kind of interaction that should ideally be granted when a particular element is painted. div#menu{istate:touchstart,50} 58