Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WebCore: Architectural Support for Mobile Web B...

WebCore: Architectural Support for Mobile Web Browsing

ISCA 2014 Main talk

Avatar for Yuhao Zhu

Yuhao Zhu

June 18, 2014
Tweet

More Decks by Yuhao Zhu

Other Decks in Education

Transcript

  1. WebCore: Architectural Support for Mobile Web Browsing Yuhao Zhu, Vijay

    Janapa Reddi Department of Electrical and Computer Engineering The University of Texas at Austin ISCA MainTalk — June 18th, 2014
  2. The Fundamental Challenges 4 Achieving High Performance Demanded by End-User

    Conserving Energy Due to Limited Battery Capacity
  3. The Fundamental Challenges 4 Achieving High Performance Demanded by End-User

    Conserving Energy Due to Limited Battery Capacity Conflicting requirements
  4. The Fundamental Challenges How to achieve high performance with low

    energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements
  5. The Fundamental Challenges How to achieve high performance with low

    energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements A mobile architecture
  6. The Fundamental Challenges How to achieve high performance with low

    energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements A mobile architecture WebCore:
  7. Executive Summary 5 Time Energy General Purpose Designs ASIC? Extremely

    challenging ‣Chrome: 7M LoC, 29 languages ‣Firefox: 10M LoC, 33 languages
  8. Executive Summary 6 Time Energy General Purpose Designs Customizing µarch

    Parameters Specialized FU and Memory WebCore Goal
  9. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization 7
  10. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results 7
  11. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 7
  12. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 8
  13. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    Customization: Find the Ideal General Purpose Baseline Architecture
  14. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? Customization: Find the Ideal General Purpose Baseline Architecture
  15. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? Customization: Find the Ideal General Purpose Baseline Architecture
  16. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture
  17. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture
  18. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture
  19. Design Space Exploration (DSE) Setup ▸Integrated power (McPAT) and performance

    x86 full-system simulator (Marss86) ▸WebKit engine in the Chromium Web browser 10
  20. Design Space Exploration (DSE) Setup ▸Integrated power (McPAT) and performance

    x86 full-system simulator (Marss86) ▸WebKit engine in the Chromium Web browser 10
  21. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA
  22. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1
  23. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by # webpage elements
  24. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by IPC
  25. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA
  26. Design Space Exploration (DSE) Findings ▸Out-of-order µarchitecture is much more

    flexible 12 ▸In-order cores are acceptable if end-users can tolerate latency
  27. ▸In-order designs show strong kernel variance Understand the Difference Using

    Kernel Knowledge In-order design 13 Out-of-order design
  28. ▸In-order designs show strong kernel variance Understand the Difference Using

    Kernel Knowledge In-order design 13 Out-of-order design ▸An Out-of-order design can accommodate kernel variance
  29. 14 Customization: Identifying Major Sources of Energy Inefficiency P1 P2

    ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096
  30. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  31. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  32. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  33. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  34. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  35. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery ▸Data feeding 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  36. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization -Mitigate instruction delivery: Style resolution unit (SRU) -Improving data feeding: Browser engine cache ▸Evaluation Results ▸Related Work 16
  37. L1 D-cache WebCore Specialization Overview 17 Customized core IF ID

    MEM WB ALU MUL FPU SRU Hardware Layer Browser Engine Cache
  38. L1 D-cache WebCore Specialization Overview 17 Customized core IF ID

    MEM WB ALU MUL FPU SRU Hardware Layer API Layer Browser Engine Cache
  39. L1 D-cache WebCore Specialization Overview 17 Customized core IF ID

    MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Browser Engine Cache
  40. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Browser Engine Cache
  41. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Browser Engine Cache
  42. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management Browser Engine Cache
  43. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management SRU Access Browser Engine Cache
  44. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management Software Failsafe SRU Access Browser Engine Cache
  45. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization -Mitigate instruction delivery: Style resolution unit (SRU) -Improving data feeding: Browser engine cache ▸Evaluation Results ▸Related Work 18
  46. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 Execution time breakdown Energy consumption breakdown
  47. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  48. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  49. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  50. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  51. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP)
  52. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP) ▸Exploiting the parallelism to increase the arithmetic intensity and reduce instruction footprint
  53. ▸A running example from www.cnn.com Style Resolution Unit (2) Rule

    Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0
  54. ▸A running example from www.cnn.com Style Resolution Unit (2) Rule

    Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  55. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  56. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  57. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  58. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  59. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  60. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  61. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority
  62. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority
  63. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority
  64. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 ▸Order Matters in RLP ▸Order Does Not Matter in PLP
  65. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory ▸Order Matters in RLP ▸Order Does Not Matter in PLP
  66. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  67. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  68. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m Prop m
  69. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m
  70. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution Compute Lanes ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  71. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Output Scratchpad Memory Conflict Resolution Compute Lanes ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  72. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 22
  73. Evaluations 23 ▸Fully synthesized using Synopsys 28 nm toolchain ▸24

    representative webpages www.amazon.com www.cnn.com www.msn.com www.google.com.hk www.twitter.com www.espn.go.com www.bbc.co.uk www.slashdot.org www.youtube.com www.ebay.com www.sina.com.cn www.163.com Desktop and mobile versions
  74. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s)
  75. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design
  76. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization
  77. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization
  78. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization
  79. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization
  80. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization
  81. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization
  82. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0%
  83. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead
  84. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches
  85. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches I$
  86. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches D$
  87. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches I+D$
  88. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo
  89. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling
  90. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo ASIC Tegra 4 WebRTC accelerator SiChrome System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling
  91. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo ASIC Tegra 4 WebRTC accelerator SiChrome System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling WebCore
  92. Conclusions 27 The Web browser has become a general purpose

    platform that supports a wide range of mobile Web applications Customization allows us to find the ideal general-purpose baseline architecture Hardware/software collaborative specialization leverages application knowledge to mitigate inefficiencies in general-purpose architectures