Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Proposal talk: Energy-Efficient Mobile Web Comp...

Yuhao Zhu
February 17, 2016

Proposal talk: Energy-Efficient Mobile Web Computing

Ph.D. Proposal

Yuhao Zhu

February 17, 2016
Tweet

More Decks by Yuhao Zhu

Other Decks in Technology

Transcript

  1. 2

  2. 3

  3. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014) Performance
  4. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014) Performance Power
  5. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014) Performance Power At the Expense of Excessive Power
  6. Thesis Statement 6 Energy-Efficiency Conflicting requirements A mobile computing system

    that satisfies user QoS requirements on a mobile energy budget Responsiveness
  7. Thesis Statement 6 Energy-Efficiency Conflicting requirements A mobile computing system

    that satisfies user QoS requirements on a mobile energy budget Responsiveness for the mobile Web
  8. 7

  9. 7

  10. 7

  11. 7

  12. 8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers

    Cellular Network [MICRO 2015] (Top Picks Honorable Mention)
  13. Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

    Client compute doesn’t matter much Resource loading is the bottleneck
  14. Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

    Client compute doesn’t matter much Resource loading is the bottleneck Conclusions circa 2010!
  15. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 Isn’t Responsiveness a Network Issue? A Year 2015 Experiment!
  16. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 Isn’t Responsiveness a Network Issue? ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  17. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  18. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  19. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  20. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  21. 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render
  22. 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application
  23. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application
  24. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture
  25. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture ▸ Voltage/frequency scaling on general-purpose processors
  26. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors
  27. ▸ Parallelize browser computation ▸ Ignored! 14 Traditional Approach Frameworks

    and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors
  28. ▸ Parallelize browser computation ▸ Ignored! 14 Traditional Approach Frameworks

    and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors ▸ End of Dennard Scaling! ▸ Diminishing return
  29. ▸ Parallelize browser computation ▸ Ignored! 15 My Approach Frameworks

    and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture WebCore Web-specific Architecture
  30. ▸ Parallelize browser computation 15 My Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Lost page-level diversity ▸ Lost user QoS requirements WebCore Web-specific Architecture
  31. ▸ Parallelize browser computation 15 My Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture ▸ Lost page-level diversity ▸ Lost user QoS requirements WebCore Web-specific Architecture
  32. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  33. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  34. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  35. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  36. WebRT Energy-aware Web Runtime 16 My Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  37. Runtime 17 My Approach Architecture Application WebRT Energy-aware Web Runtime

    WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  38. Runtime 17 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions [PLDI 2016] [ISCA 2014] [HPCA 2013] [HPCA 2015] [CAL 2014] (Best of CAL)
  39. Runtime 18 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions [PLDI 2016] [ISCA 2014] [HPCA 2013] [HPCA 2015] [CAL 2014] (Best of CAL)
  40. 19 Execution Time Energy ASIC? Extremely challenging ‣Chrome: 17M LoC,

    29 languages ▹ c.f., H264 codec: 0.13M LoC, 6 languages ‣Code base is very irregular ▹ No fine-grained parallelism General-Purpose Designs WebCore: a Web-Specific Mobile Architecture
  41. WebCore Philosophy 20 Claim: Instead of directly jumping to fully

    specialization, we must take it step by step
  42. Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized

    GPP Specialization Customized GPP Customization Tune uarch parameters Specialization Accelerate key kernels
  43. Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized

    GPP Specialization Customized GPP Customization Tune uarch parameters Specialization Accelerate key kernels WebCore
  44. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? 22 22
  45. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration. 22 22
  46. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration. 22 22
  47. Design Space Exploration (DSE) Setup ▸Search space of over 3

    billion design points ▹ Leverage statistical inference models to increase search speed ▸Use integrated simulators ▹McPAT for Power ▹Marss86 for Performance (x86 full-system simulator) ▸Chromium Web browser 23
  48. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25 Out-of-order design
  49. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25 Out-of-order design ▸An Out-of-order design can accommodate kernel variance
  50. Customization: Identifying Major Sources of Energy Inefficiency 26 P1 P2

    ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 26
  51. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  52. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  53. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction supply 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  54. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction supply ▸Data feeding 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  55. Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack

    more operations in one instruction ▸Data feeding ▹ Move operands closer to operations
  56. Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack

    more operations in one instruction ▸Data feeding ▹ Move operands closer to operations
  57. Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack

    more operations in one instruction ▸Data feeding ▹ Move operands closer to operations
  58. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 10% 13% 17% 25% 35% Render Style Other Layout DOM 12% 14% 16% 18% 40% Render Style Other Layout DOM Execution time breakdown Energy breakdown
  59. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 10% 13% 17% 25% 35% Render Style Other Layout DOM 12% 14% 16% 18% 40% Render Style Other Layout DOM Execution time breakdown Energy breakdown
  60. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  61. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  62. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  63. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  64. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP)
  65. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP) ▸ Exploiting the parallelism to increase the arithmetic intensity
  66. ▸ A running example from www.cnn.com
 30 Rule Property 1

    Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 Style Resolution Kernel
  67. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 Style Resolution Kernel
  68. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  69. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  70. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  71. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  72. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  73. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  74. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority Style Resolution Kernel
  75. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority Style Resolution Kernel
  76. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP 31
  77. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP 31 Input Scratchpad
  78. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad
  79. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution
  80. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m Prop m 31 Input Scratchpad Conflict Resolution
  81. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m 31 Input Scratchpad Conflict Resolution
  82. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution Compute Lanes
  83. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution Output Scratchpad Compute Lanes
  84. Evaluation Results 32 ▸Fully synthesized using Synopsys 28 nm toolchain

    ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  85. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  86. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  87. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  88. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  89. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  90. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  91. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  92. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  93. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  94. WebCore in SoC 33 CPUs GPUs Specialized Logics Memory WebCore

    ▸ One of the cores in the multicore SoC ▸ Becomes “dark” when other applications are executing
  95. Runtime 34 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  96. Runtime 34 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  97. ▸ Why ACMP?: Offer a large performance-energy trade-off space for

    energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings 36 WebRT: Energy-aware Web Runtime
  98. ▸ Why ACMP?: Offer a large performance-energy trade-off space for

    energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings ▸ Idea: Provide just-enough energy to meet performance target 36 WebRT: Energy-aware Web Runtime
  99. ▸ Why ACMP?: Offer a large performance-energy trade-off space for

    energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings ▸ Idea: Provide just-enough energy to meet performance target ▸ Approach: Systematically understand user interactions and bridge the gap between user behavior and system execution. 36 WebRT: Energy-aware Web Runtime
  100. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Proactive Mechanism WebRT Component
  101. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Proactive Mechanism WebRT Component Repetitive in a usage session
  102. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Proactive Mechanism WebRT Component History- based Mechanism
  103. Optimizing for Loading ▸ Observation: Web applications have different characteristics

    that lead to different loading times and energy consumptions 40
  104. Optimizing for Loading ▸ Observation: Web applications have different characteristics

    that lead to different loading times and energy consumptions 40 ▸ Mechanism: Predict the ideal ACMP configuration (<core, frequency>) and schedule application loading accordingly
  105. Optimizing for Loading ▸ Observation: Web applications have different characteristics

    that lead to different loading times and energy consumptions 40 ▸ Mechanism: Predict the ideal ACMP configuration (<core, frequency>) and schedule application loading accordingly ▸ Effect: Properly provision the hardware resources based on application characteristics
  106. Big/Little Setup 41 ODroid XU+E development board, which contains an

    Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity
  107. Big/Little Setup 41 Little core cluster: ARM Cortex A7, In-order

    with 2 issue DVFS: 350 MHz ~ 600 MHz at a 50 MHz granularity ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity
  108. Big/Little Setup 41 Little core cluster: ARM Cortex A7, In-order

    with 2 issue DVFS: 350 MHz ~ 600 MHz at a 50 MHz granularity ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity Overhead: ▸ Frequency switch: 100 us ▸ Core migration: 20 us
  109. Power and Energy Measurements 42 + - Vin+ Vin- Vout

    GND Sense resistor 15mΩ SoC ARM Cortex A9 VRM Gain x50 Probe Data Acquisition (DAQ)
  110. Enegy Consumption (J) 0 2 4 6 8 Load time

    (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  111. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  112. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  113. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  114. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off
  115. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off
  116. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off
  117. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com 30% Performance-Energy Trade-off
  118. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off
  119. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off
  120. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off
  121. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com 80% Performance-Energy Trade-off
  122. 46 Breaking Down the Computations DOM Tree Tag Attribute HTML

    (Structure) CSS (Style) Selector Property 46
  123. 46 Breaking Down the Computations DOM Tree Tag Attribute HTML

    (Structure) CSS (Style) Selector Property 46 Web Primitives
  124. 46 Breaking Down the Computations DOM Tree Tag Attribute HTML

    (Structure) CSS (Style) Selector Property 46 Web Primitives
  125. 47 47 HTML Tag Analysis Number of Tags (K) 5

    Webpages ▸ Web applications have different tag counts
  126. 48 48 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts
  127. 49 49 ms mJ 0 175 350 525 700 0

    45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts Tag Processing Overhead
  128. 50 50 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts
  129. 51 51 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts
  130. 51 51 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Tags have different processing overheads ▸ Web applications have different tag counts
  131. Root-cause of Web Application Variance 51 51 Tag Processing Overhead

    ▸ Tags have different processing overheads ▸ Web applications have different tag counts
  132. Predicting Loading Performance & Energy 52 Idea: predict load time

    & energy (responses) based on Web primitives (predictors)
  133. Predicting Loading Performance & Energy 52 Identify Predictors Training using

    hottest 2,500 webpages Predictors (HTML, CSS) Responses (Time, Energy)
  134. Predicting Loading Performance & Energy 52 Identify Predictors Training using

    hottest 2,500 webpages Model Construction & Refinement Refine the linear model Predictors (HTML, CSS) Responses (Time, Energy) Mitigate Over-fitting Model Non-Linearity Linear Regression
  135. Predicting Loading Performance & Energy 52 Identify Predictors Training using

    hottest 2,500 webpages Model Construction & Refinement Refine the linear model Model Validation Validating on another 2,500 webpages Predictors (HTML, CSS) Responses (Time, Energy) Mitigate Over-fitting Model Non-Linearity Linear Regression Loading Time Model Energy Model
  136. 53 0.00 0.05 0.10 0.15 0.20 performance • • •

    • • • • • • • • • • • • • • • • • • • • • • 0.00 0.05 0.10 0.15 0.20 energy Median prediction error is less than 5% Predicting Loading Performance & Energy
  137. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores)
  138. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations
  139. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations 83.0% energy savings over Perf, 4.1% more QoS violations
  140. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations 83.0% energy savings over Perf, 4.1% more QoS violations 8.6% energy savings over OS, 0.1% more QoS violations
  141. 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart

    touchmove scroll Optimize post-loading at an event-granularity Event Loop Event Queue
  142. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll Event Loop Event Queue
  143. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 58 Optimizing Post-loading Interactions
  144. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 58 ▸ Mechanism: Event-based scheduling to predict the ACMP configuration that exploits event slacks and saves energy Optimizing Post-loading Interactions
  145. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 58 ▸ Mechanism: Event-based scheduling to predict the ACMP configuration that exploits event slacks and saves energy ▸ Effect: Properly provision the hardware resources based on event characteristics Optimizing Post-loading Interactions
  146. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack keyup
  147. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack click keyup
  148. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup
  149. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup ▸ Wide distribution of event latencies. Events exhibit different slacks. ▹ How to exploit event slacks?
  150. 60 Event-based Scheduler (EBS) ▸ Goal: For each event, find

    the most energy-efficient ACMP configuration that meets the latency target
  151. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Event-based Scheduler Events-based Scheduling Event Latency Event Energy Event Queue
  152. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03
  153. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency =
  154. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory +
  155. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  156. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  157. 62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based

    Scheduler Model <core, freq> Events ▸ Fine-tune the model when over or under-predict
  158. 62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based

    Scheduler Model Recalibrate <core, freq> Events ▸ Fine-tune the model when over or under-predict ▸ Recalibrate if it mispredicts too often
  159. Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee

    responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63
  160. Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee

    responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63 ▸Metrics ▹Energy Savings ▹QoS Violations
  161. Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee

    responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63 ▸Metrics ▹Energy Savings ▹QoS Violations 37.9% - 41.2% energy savings, 0.1% more QoS violations
  162. Runtime 64 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  163. Runtime 64 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  164. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience
  165. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience
  166. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible
  167. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable
  168. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable
  169. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  170. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  171. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  172. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  173. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  174. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  175. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  176. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  177. Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  178. ▸ QoS Type: performance metric Imperceptible Unusable Tolerable 66 GreenWeb:

    QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  179. ▸ QoS Type: performance metric ▸ QoS Target: threshold performance

    values Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  180. 67 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▸ QoS Target: threshold performance values element {event: Type, Target} When event is triggered on element, the QoS type and QoS target is Type and Target, respectively. Semantics: Syntax (CSS Compatible)
  181. 68 Future Work ▸ Automatic GreenWeb Annotation ▹ Empower the

    developers, but not overburden them! ▸ GreenWeb Composability ▹ Can GreenWeb programs be safely integrated with other code? ▹ How to compose comprehensive QoS abstractions? ▸ Integrating WebRT with GreenWeb ▹ How can WebRT adapt to different QoS constraints?
  182. Timeline 69 Key Tasks Program-level Composability Study (Goal: Improve the

    composability and flexibility of GreenWeb extensions.) Automatic Annotation System for GreenWeb (Goal: Explore the feasibility of automatic applying GreenWeb annotations.) Thesis Writing APR MAY JUNE JULY AUG FEB MAR WebRT Adaptivity Study (Goal: Evaluate the sensitivity of WebRT with respect to different QoS constraints.)
  183. Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ General-purpose

    vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization
  184. Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ Exposing

    Hardware Complexities ▹ WebRT Leverages Core Type and Core Frequency ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization
  185. Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ Empowering

    the Developers ▹ GreenWeb Language Extensions Provide QoS Abstractions ▸ Exposing Hardware Complexities ▹ WebRT Leverages Core Type and Core Frequency ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization
  186. [PLDI 2016] Yuhao Zhu, Vijay Janapa Reddi, “GreenWeb: Language Extensions

    for Energy-Efficient Mobile Web Computing” [HPCA 2015] Yuhao Zhu, Matthew Halpern, Vijay Janapa Reddi, “Event- Based Scheduling for Energy-Efficient QoS (eQoS) in Mobile Web Applications” [HPCA 2013] Yuhao Zhu, Vijay Janapa Reddi, “High-Performance and Energy-Efficient Mobile Web Browsing on Big/Little Systems” [CAL 2012] Yuhao Zhu, Aditya Srikanth, Jingwen Leng, Vijay Janapa Reddi, “Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing” (Best of CAL) [ISCA 2014] Yuhao Zhu, Vijay Janapa Reddi, “WebCore: Architectural Support for Mobile Web Browsing” [IEEE MICRO 2015] Yuhao Zhu, Matthew Halpern, Vijay Janapa Reddi, “The Role of the CPU in Energy-Efficient Mobile Web Browsing” [HPCA 2016] Matthew Halpern, Yuhao Zhu, Vijay Janapa Reddi, “Mobile CPU’s Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction” [MICRO 2015] Yuhao Zhu, Daniel Richins, Matthew Halpern, Vijay Janapa Reddi, “Microarchitectural Implications of Event-driven Server- side Web Applications” (Top Picks Honorable Mention) GreenWeb WebRT WebCore Motivational Studies Server Microarch
  187. [DAC 2011] Yuhao Zhu, Yangdong Deng, Yubei Chen, “Hermes: An

    Integrated CPU/GPU Microarchitecture for IP Routing.” [DAC 2010] Bo Wang, Yuhao Zhu, Yangdong Deng, “Distributed Time, Conservative Parallel Logic Simulation on GPUs.” [TODAES 2011] Yuhao Zhu, Bo Wang, Yangdong Deng, “Massively Parallel Logic Simulation with GPUs.” [ISPASS 2015] Matthew Halpern, Yuhao Zhu, Ramesh Peri, and Vijay Janapa Reddi, “Mosaic: Cross-platform User-interaction Record and Replay for the Fragmented Android Ecosystem.” [IRPS 2014] Chen Zhou, Xiaofei Wang, Weichao Xu, Yuhao Zhu, Vijay Janapa Reddi, Chris Kim, “Estimation of Instantaneous Frequency Fluctuation in a Fast DVFS Environment Using an Empirical BTI Stress- Relaxation Model.” GPGPU & IP Routing Architecture Tools Reliability
  188. Coursework 73 Name Instructor Semester SUP Grade COMPILERS Keshav Pingali

    Fall 2010 A ADV EMBED MICROCONTROL SYS Mark McDermott Spring 2011 A- MEMORY MANAGEMENT Kathryn McKinley Spring 2011 Y A VLSI I Jacob Abraham Fall 2011 A- COMP ARCH: PARALLISM/LOCLTY Mattan Erez Fall 2011 A MICROARCHITECTURE Yale Patt Spring 2012 B DYNAMIC COMPILATION Vijay Janapa Reddi Spring 2012 A- COMP PERF EVAL/BENCHMARKING Lizy John Fall 2012 B+ PARALLEL COMP ARCHITECTURE Derek Chiou Spring 2013 B+ HUMAN COMPUT & CROWDSRCING Matt Lease Fall 2015 Y A-
  189. Scheduling Results 75 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  190. Scheduling Results 76 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  191. Scheduling Results 77 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  192. Scheduling Results 78 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  193. Scheduling Results 78 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  194. Scheduling Results 79 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  195. Scheduling Results 80 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  196. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  197. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  198. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline 83.0% energy savings over Perf, 4.1% more QoS violations
  199. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline 8.6% energy savings over OS, 0.1% more QoS violations 83.0% energy savings over Perf, 4.1% more QoS violations
  200. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82
  201. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82
  202. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios Performance QoS Experience Unusable Tolerable Imperceptible
  203. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility Performance QoS Experience Unusable Tolerable Imperceptible
  204. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility ▹ Scheduling for tolerability Performance QoS Experience Unusable Tolerable Imperceptible
  205. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility ▹ Scheduling for tolerability Performance QoS Experience Unusable Tolerable Imperceptible
  206. Evaluation Results 83 QoS Violations (%) 0.0 1.5 3.0 4.5

    6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Ondemand Energy
  207. 84 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy Evaluation Results No QoS Violations
  208. 85 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy Evaluation Results No QoS Violations
  209. 86 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results
  210. 87 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results
  211. 88 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay
  212. 89 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9
  213. 90 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9
  214. 91 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9
  215. 91 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 37.9% - 41.2% energy savings, 0.1% more QoS violations
  216. Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  217. ▸ QoS Type: performance metric Imperceptible Unusable Tolerable 92 GreenWeb:

    QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  218. ▸ QoS Type: performance metric ▹ Single (frame latency) vs.

    Continuous (frame throughput) Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  219. ▸ QoS Type: performance metric ▹ Single (frame latency) vs.

    Continuous (frame throughput) ▸ QoS Target: threshold performance values Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  220. ▸ QoS Type: performance metric ▹ Single (frame latency) vs.

    Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu) Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  221. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  222. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  223. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  224. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  225. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  226. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions Selector button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  227. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions {QoS Declaration} Selector button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  228. 94 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions {QoS Declaration} Selector Semantics: QoS is evaluated by a single frame latency when clicking the button button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  229. 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  230. 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  231. 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  232. Overwrite default targets 95 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  233. Overwrite default targets 95 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  234. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96
  235. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1
  236. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by # webpage elements
  237. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by IPC
  238. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1
  239. Design Considerations 97 How large should the scratchpad memory be?

    ... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k
  240. Design Considerations 97 How large should the scratchpad memory be?

    100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  241. Design Considerations 97 How large should the scratchpad memory be?

    100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  242. Design Considerations 97 How large should the scratchpad memory be?

    100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  243. Design Considerations 97 How large should the scratchpad memory be?

    ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  244. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  245. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP ... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k
  246. 100 80 60 40 20 0 Total CSS Properties (%)

    96 64 32 0 PLP Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  247. 100 80 60 40 20 0 Total CSS Properties (%)

    96 64 32 0 PLP Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP
  248. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP
  249. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 32 Lanes 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP
  250. SRU Integration 98 IF ID EX MEM WB ALU MUL

    FPU SRU Style_apply(DOMNodeId, matchedRules); Hardware Layer API Layer Runtime Layer Software Failsafe SRU Access ISA support
  251. Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24

    representative webpages 99 www.amazon.com www.cnn.com www.msn.com www.google.com.hk www.twitter.com www.espn.go.com www.bbc.co.uk www.slashdot.org www.youtube.com www.ebay.com www.sina.com.cn www.163.com Desktop and mobile versions 99
  252. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) ▸Fully synthesized using Synopsys 28 nm toolchain
  253. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design ▸Fully synthesized using Synopsys 28 nm toolchain
  254. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain
  255. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain
  256. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain
  257. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain
  258. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain
  259. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain
  260. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain
  261. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead
  262. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches
  263. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches I$
  264. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches D$
  265. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches I+D$
  266. 01. 1 2 Smartphone Models Energy-Efficiency Plateaued 101 Motorola Droid

    2009 Galaxy S Nexus Galaxy S3 Galaxy S4 Galaxy S5 2010 2011 2012 2013 2014 Galaxy S6 2015