Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WebCore: Architectural Support for Mobile Web B...

WebCore: Architectural Support for Mobile Web Browsing

ISCA 2014 Main talk

Yuhao Zhu

June 18, 2014
Tweet

More Decks by Yuhao Zhu

Other Decks in Education

Transcript

  1. WebCore: Architectural Support for Mobile Web Browsing Yuhao Zhu, Vijay

    Janapa Reddi Department of Electrical and Computer Engineering The University of Texas at Austin ISCA MainTalk — June 18th, 2014
  2. The Fundamental Challenges 4 Achieving High Performance Demanded by End-User

    Conserving Energy Due to Limited Battery Capacity
  3. The Fundamental Challenges 4 Achieving High Performance Demanded by End-User

    Conserving Energy Due to Limited Battery Capacity Conflicting requirements
  4. The Fundamental Challenges How to achieve high performance with low

    energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements
  5. The Fundamental Challenges How to achieve high performance with low

    energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements A mobile architecture
  6. The Fundamental Challenges How to achieve high performance with low

    energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements A mobile architecture WebCore:
  7. Executive Summary 5 Time Energy General Purpose Designs ASIC? Extremely

    challenging ‣Chrome: 7M LoC, 29 languages ‣Firefox: 10M LoC, 33 languages
  8. Executive Summary 6 Time Energy General Purpose Designs Customizing µarch

    Parameters Specialized FU and Memory WebCore Goal
  9. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization 7
  10. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results 7
  11. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 7
  12. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 8
  13. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    Customization: Find the Ideal General Purpose Baseline Architecture
  14. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? Customization: Find the Ideal General Purpose Baseline Architecture
  15. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? Customization: Find the Ideal General Purpose Baseline Architecture
  16. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture
  17. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture
  18. ▸Why customization?!? ▸What is a proper general purpose baseline architecture?

    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture
  19. Design Space Exploration (DSE) Setup ▸Integrated power (McPAT) and performance

    x86 full-system simulator (Marss86) ▸WebKit engine in the Chromium Web browser 10
  20. Design Space Exploration (DSE) Setup ▸Integrated power (McPAT) and performance

    x86 full-system simulator (Marss86) ▸WebKit engine in the Chromium Web browser 10
  21. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA
  22. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1
  23. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by # webpage elements
  24. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by IPC
  25. ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total)

    10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA
  26. Design Space Exploration (DSE) Findings ▸Out-of-order µarchitecture is much more

    flexible 12 ▸In-order cores are acceptable if end-users can tolerate latency
  27. ▸In-order designs show strong kernel variance Understand the Difference Using

    Kernel Knowledge In-order design 13 Out-of-order design
  28. ▸In-order designs show strong kernel variance Understand the Difference Using

    Kernel Knowledge In-order design 13 Out-of-order design ▸An Out-of-order design can accommodate kernel variance
  29. 14 Customization: Identifying Major Sources of Energy Inefficiency P1 P2

    ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096
  30. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  31. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  32. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  33. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  34. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  35. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery ▸Data feeding 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency
  36. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization -Mitigate instruction delivery: Style resolution unit (SRU) -Improving data feeding: Browser engine cache ▸Evaluation Results ▸Related Work 16
  37. L1 D-cache WebCore Specialization Overview 17 Customized core IF ID

    MEM WB ALU MUL FPU SRU Hardware Layer Browser Engine Cache
  38. L1 D-cache WebCore Specialization Overview 17 Customized core IF ID

    MEM WB ALU MUL FPU SRU Hardware Layer API Layer Browser Engine Cache
  39. L1 D-cache WebCore Specialization Overview 17 Customized core IF ID

    MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Browser Engine Cache
  40. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Browser Engine Cache
  41. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Browser Engine Cache
  42. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management Browser Engine Cache
  43. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management SRU Access Browser Engine Cache
  44. DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17

    Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management Software Failsafe SRU Access Browser Engine Cache
  45. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization -Mitigate instruction delivery: Style resolution unit (SRU) -Improving data feeding: Browser engine cache ▸Evaluation Results ▸Related Work 18
  46. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 Execution time breakdown Energy consumption breakdown
  47. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  48. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  49. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  50. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  51. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP)
  52. ▸Style kernel is the most critical kernel Style Resolution Unit

    19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP) ▸Exploiting the parallelism to increase the arithmetic intensity and reduce instruction footprint
  53. ▸A running example from www.cnn.com Style Resolution Unit (2) Rule

    Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0
  54. ▸A running example from www.cnn.com Style Resolution Unit (2) Rule

    Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  55. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  56. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  57. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  58. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  59. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  60. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority
  61. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority
  62. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority
  63. Property 1 Property 1 Property 2 Property 2 Property 3

    Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority
  64. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 ▸Order Matters in RLP ▸Order Does Not Matter in PLP
  65. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory ▸Order Matters in RLP ▸Order Does Not Matter in PLP
  66. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  67. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  68. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m Prop m
  69. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m
  70. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution Compute Lanes ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  71. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Output Scratchpad Memory Conflict Resolution Compute Lanes ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority
  72. Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of

    the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 22
  73. Evaluations 23 ▸Fully synthesized using Synopsys 28 nm toolchain ▸24

    representative webpages www.amazon.com www.cnn.com www.msn.com www.google.com.hk www.twitter.com www.espn.go.com www.bbc.co.uk www.slashdot.org www.youtube.com www.ebay.com www.sina.com.cn www.163.com Desktop and mobile versions
  74. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s)
  75. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design
  76. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization
  77. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization
  78. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization
  79. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization
  80. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization
  81. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization
  82. Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0%
  83. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead
  84. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches
  85. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches I$
  86. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches D$
  87. Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2

    2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches I+D$
  88. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo
  89. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling
  90. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo ASIC Tegra 4 WebRTC accelerator SiChrome System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling
  91. Related Work 26 Hardware Software Focus on Performance Focus on

    Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo ASIC Tegra 4 WebRTC accelerator SiChrome System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling WebCore
  92. Conclusions 27 The Web browser has become a general purpose

    platform that supports a wide range of mobile Web applications Customization allows us to find the ideal general-purpose baseline architecture Hardware/software collaborative specialization leverages application knowledge to mitigate inefficiencies in general-purpose architectures