Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Write-Rationing Garbage Collection for Hybrid M...

Write-Rationing Garbage Collection for Hybrid Memories

Virtual Machine Meetup

October 05, 2017
Tweet

More Decks by Virtual Machine Meetup

Other Decks in Education

Transcript

  1. Write-Rationing Garbage Collection for Hybrid Memories Shoaib Akram, Jennifer B.

    Sartor, Kathryn S. Mckinely, Lieven Eeckhout Ghent University, Belgium [email protected]
  2. PCM cells have limited write endurance, shortening its lifetime Current

    (Temperature) Time Read Reset to amorphous Set to crystalline 610°C 350°C
  3. Speed ++ Lifetime ++ Energy Density DRAM Speed Lifetime Energy

    ++ Density ++ Hybrid memory is the best of DRAM and PCM PCM
  4. Memory automatically reclaimed for reuse More than just reclaim, stuff

    better organized Garbage collection: key advantage of using a managed language
  5. Memory automatically reclaimed for reuse More than just reclaim, stuff

    better organized Our Contribution Write-Rationing Garbage Collection Ration (verb) To restrict the consumption of (a commodity, food, etc.) dictionary.com
  6. Modern collectors segregate young and old objects •  Nursery objects

    die quickly •  When nursery is full – Traced for live objects – Survivors copied to mature space Nursery (young) Mature (old)
  7. Young objects incur 70% of all writes to Java applications

    0 25 50 75 100 % of writes Nursery 70
  8. Old objects incur 30% of all writes to Java applications

    0 25 50 75 100 % of writes Mature Nursery 30 70
  9. Top 10% most-written to old objects receive 29% of writes

    0 25 50 75 100 % of writes Mature Nursery Top 10% 30 70
  10. Top 2% most-written to old objects receive 24% of writes

    0 25 50 75 100 % of writes Mature Nursery Top 10% 30 70 24 Top 2%
  11. The two enemies of PCM lifetime: nursery and writers Enemy

    no. 1: Nursery Enemy no. 2: Writers Nursery (young) Mature (old)
  12. Kingsguard-Nursery: protect PCM from nursery nursery mature meta data large

    mutator mutator runtime GC DRAM PCM JVM OS PCM DRAM
  13. Kingsguard-Writers: protect PCM from nursery and writers nursery mature DRAM

    observer PCM mature meta data large mutator mutator large How to monitor writes to observers?
  14. Nursery (young) Mature (old) Roots Live object Write barrier: extra

    work on each pointer update outside nursery •  Don’t want to scan all objects in mature •  Remember mature to young pointers in a set Remembered set
  15. Kingsguard-Writers uses write barriers for monitoring writes Primitive Fields Reference

    Fields Object Meta-Data Write Intensity if (dst_object outside nursery) add src_object to remembered set update_write_intensity(src_object) if (src_object outside nursery) update_write_intensity(src_object) reference_barrier (src_object, dst_object) primi<ve_barrier (src_object)
  16. Kingsguard-Writers does additional optimizations to protect PCM 1.  Placing PCM

    meta-data in DRAM 2.  Allocating large objects in nursery
  17. Kingsguard-Writers optimization 2: Placing PCM meta-data in DRAM Mature (PCM)

    A B C D . . . Mature graph traversal PCM LLC . A Writes to PCM
  18. Kingsguard-Writers optimization 2: Placing PCM meta-data in DRAM . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mature (PCM) Meta (DRAM)
  19. Kingsguard-Writers optimization 2: Placing PCM meta-data in DRAM Mature (PCM)

    Meta (DRAM) ........... ........... ........... ...........
  20. Kingsguard-Writers optimization 2: Placing PCM meta-data in DRAM Mature (PCM)

    Meta (DRAM) ........... ........... ........... ........... P start_address + index_of_object
  21. Kingsguard-Writers optimization 2: Meta-data in DRAM overhead Mature (PCM) Meta

    (DRAM) ........... ........... ........... ........... Smallest object = 4 B 25% overhead Common case > 16 B 6.25% overhead For objects smaller than 16 B : use header byte On allocation, set a bit in header if object size < 16 B P start_address index_of_object +
  22. Kingsguard-Writers: protect PCM from nursery and writers nursery mature DRAM

    observer PCM mature meta data large mutator mutator large runtime meta data
  23. Kingsguard-Writers optimization : Allocating large objects in nursery Nursery (DRAM)

    ½ of remaining nursery Large (PCM) if (alloc_rate_large >> alloc_rate_nursery) then trigger_large_opt()
  24. Kingsguard-Writers: protect PCM from nursery and writers nursery mature DRAM

    observer PCM mature meta data large mutator mutator large runtime meta data
  25. KG-N & KG-W summary KG-N KG-W •  Hybrid memory aware

    JVM •  Nursery in DRAM •  Observers •  Mature DRAM and mature PCM •  Large object opt •  Meta-data opt
  26. Experimental Setup ² Fast architecture simulator ² Helped us gain insight Example:

    track origin of writes ² Easy to modify ² Framework to control non-determinism
  27. Benchmark Characteristics Benchmark Function Heap Size in MB Allocation in

    MB % Nursery survival Xalan XML parser 108 980 17 Pmd Code analysis 98 364 23 Lusearch Text search 68 4294 4 Bloat Bytecode optimizer 66 1246 15 Antlr Parser generator 48 246 4 Summary Real world apps 80 avg 1000 avg 13 avg
  28. Benchmark Characteristics Benchmark Function Heap Size in MB Allocation in

    MB % Nursery survival Xalan XML parser 108 980 17 Pmd Code analysis 98 364 23 Lusearch Text search 68 4294 4 Bloat Bytecode optimizer 66 1246 15 Antlr Parser generator 48 246 4 Summary Real world apps 80 avg 1000 avg 13 avg
  29. Nursery and observer sizing nursery mature DRAM observer PCM mature

    meta data large mutator mutator large runtime meta data 4 MB 8 MB
  30. PCM parameters Parameter Relative to DRAM Read latency 4X Write

    latency 10X Read power 0.9X Write power 4X Standby power ~0X Refresh power 0X
  31. PCM lifetime model (1) First line of defence Hardware line

    wear levelling 0 1 2 3 4 N … 0 1 2 3 4 N
  32. PCM lifetime model (1) First line of defence Hardware line

    wear levelling N 0 1 2 3 4 … (2) OS policy On page failure, give a new page to the JVM (2) Analytical model to compute lifetime # bytes × endurance # years = bytes_per_sec × 2∧25 0 1 2 3 4 N
  33. Two baseline heap configurations nursery mature meta data large mutator

    mutator runtime GC DRAM nursery mature meta data large mutator mutator runtime GC PCM DRAM-Only PCM-Only
  34. Proposed heap configuration (2) nursery mature DRAM observer PCM mature

    meta data large mutator mutator large runtime meta data KG-W
  35. Simulated main memory systems Physical memory Heap Organization 32 GB

    DRAM DRAM-Only 32 GB PCM PCM-Only 1 GB DRAM + 32 GB PCM KG-N & KG-W
  36. PCM-Only has an average lifetime of up to 2 years,

    making it impractical ! 0 10 20 30 4 GB 8 GB 16 GB 32 GB PCM life<me in years Main memory capacity PCM-only KG-N KG-W
  37. KG-N improves PCM lifetime by 5X compared to PCM-Only 0

    10 20 30 4 GB 8 GB 16 GB 32 GB PCM life<me in years Main memory capacity PCM-only KG-N KG-W
  38. KG-W improves PCM lifetime by 11X compared to PCM-Only 0

    10 20 30 4 GB 8 GB 16 GB 32 GB PCM life<me in years Main memory capacity PCM-only KG-N KG-W
  39. OS write partitioning (WP) moves frequently written pages to DRAM

    Rank 0 Threshold 1 2 3 1 2 4 8 P0 P0 P0 P0 2Rank
  40. OS write partitioning (WP) moves frequently written pages to DRAM

    0 1 2 3 1 2 4 8 P7 P6 P3 P0 P1 P2 P4 P5 Rank Threshold 2Rank PCM DRAM time promotion quantum
  41. OS write partitioning (WP) moves frequently written pages to DRAM

    0 1 2 3 1 2 4 8 P7 P6 P3 P0 P1 P2 P4 P5 Rank Threshold 2Rank time promotion quantum demotion quantum PCM DRAM
  42. OS write partitioning (WP) moves frequently written pages to DRAM

    0 1 2 3 1 2 4 8 P6 P3 P0 P1 P2 P4 P5 Rank Threshold 2Rank time promotion quantum demotion quantum PCM DRAM
  43. Summary of OS write partitioning ² Keep often written pages in

    DRAM using a promotion mechanism ² Maximize the use of PCM using a demotion mechanism
  44. WP results in 22% additional writes to PCM compared to

    KG-W 0 0.1 0.2 0.3 0.4 KG-N KG-W WP Writes normalized to PCM-Only 8 queues, 10 ms promotion, 50 ms demotion User + JVM Demotion
  45. KG-N and KG-W reduces EDP by 36% over DRAM-Only 0.0

    0.5 1.0 1.5 2.0 EDP rela<ve to DRAM-only DRAM-Only PCM-Only KG-N KG-W
  46. KG-W improves performance by 30% over PCM-Only 1.0 1.5 2.0

    Performance rela<ve to DRAM-only PCM-Only KG-N KG-W 1.4 1.7
  47. PCM access latency dominate other overheads in KG-W 0 10

    20 30 40 50 % Overhead Extra pointer tracking Observer GC Monitoring writes PCM latency Other
  48. Conclusions ² Monitor fine-grained write behaviour of objects in hybrid memory

    ² Exploit managed runtime for organizing objects in hybrid memory ² Use Kingsguard collectors to improve lifetime of upcoming hybrid memories