with Java I worked in a big company. ā This work was similar to assembly line work.. I made a part of a product. I didn't understand whole product. ā ā 13/207
current work Currently, I work at NaCl. ā matz and shyouhei and takaokouji are my co-workers. ā shugo is my boss. They are CRuby committers. ā ā 17/207
Collection for me GC technology is very interesting for me. ā GC is a garbage collecting machine. ā I've been creating it since then. It's very fun!! ā 21/207
It treats long-life objects as a special case. similar to Generational GC. ā ā LonglifeGC was rejected in CRuby 1.9.2 by some reason. :'( ā ā 32/207
Traditional M&S GC executes mark and sweep atomically. Ruby application stops during GC (stop-the-world). ā ā In Lazy sweeping, sweeping is lazy. ā 37/207
is a dead object? A dead object is an object that is never referenced by the program. ā In GC terms, we say a that dead object is unreachable from Roots. ā 52/207
is Parallel Marking? Collector run several marking processes in parallel by using native threads. ā ā We will be happy on multi-core machine. ā 70/207
not perform sweeping in parallel The sweeping is much faster than the marking. You can see ko1's research ā <URL:http://www.atdot.net/~ko1/ diary/201011.html#d4> ā ā 73/207
means.. Tasks are distributed to multiple threads. ā The task of marking the entire heap is divided into several tasks, each marking a single branch. ā 84/207
law is used to find the maximum expected improvement to an overall system when only part of the system is improved. [cited from `Amdahl's law - Wikipedia'] 102/207
law is used in parallel computing If parallel portion of the system is X% ā And number of processors is Y, ā How much speedup can we expect? ā 103/207
conclusion so far We should consider how we can efficiently balance workloads. So, we use Task Stealing. ā ā We should eliminate non-parallel parts by using wait-free algorithm. ā ā 109/207
Deque Deque stands for the Double- Ended Queue. ā In Arora's Deque, the deque contains tasks as elements. ā It's a wait-free data structure. ā 113/207
what ways could shift() cause contention problems? e.g... shift() and pop() could be called at the same time when deque has only one element. ā ā 123/207
for Arora's Deque A simple data structure for Task Stealing. ā Each worker has a single deque. ā Stealing (shift operation) is wait- free! ā 128/207
Marker uses Arora's Deque as a marking stack. ā A "task" means an object. The granularity of the task is very fine. ā ā This is a naive implementation. ā 140/207
point & Bad point Number of calls to Deque's operations was reduced. Marking speed of the worker is improved. ā ā However, Coarse-grained tasks decrease parallelism. ā 155/207
for large Array objects and Hash objects Each marker has a special deque to manage them. ā A marker divides them into fixed size tasks. e.g. 0-9 elements of Array, 10-19 elements of Array... ā ā 162/207
The naive implementation was slow. Grain of the task was too fine. ā ā A "task" means a branch in Roots Grain of the task is coarse. ā ā It's faster!! ā 164/207
benchmark program is make rdoc make rdoc generates the Ruby documentation. ā This benchmark measures execution time and the GC execution time of make rdoc. ā ā 173/207
many core environment I expect we get a large improvement. e.g. 8 core, 16 core... ā ā But, my machine has just 2 cores. I can't see it :( ā ā 178/207
case for Parallel GC If the objects are many. In this case, mark targets is also many. ā ā If the objects are long-lived. Server-side application? ā ā 179/207
characteristics of SUPER NARIO GC GC is running in fixed intervals. ā A lot of objects are generated to increase GC's burden. Burden = Game Level ā ā 187/207
to compare Original GC and Parallel GC Original GC pause time is long. This game will be difficult. ā ā Parallel GC pause time is short. This game will be easy. ā ā 188/207
OS is not supported Mark Worker uses pthread as native thread. ā And, uses some gcc built-in functions. ā But, I'll support for Windows eventually. ā 198/207