Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How I made a pure-Ruby word2vec program more th...

remore
December 02, 2016

How I made a pure-Ruby word2vec program more than 3x faster

Slides for my talk at RubyConf Taiwan 2016
https://2016.rubyconf.tw/#Kei Sawada

remore

December 02, 2016
Tweet

More Decks by remore

Other Decks in Technology

Transcript

  1. How I made a pure-Ruby word2vec program more than 3x

    faster RubyConf Taiwan 2016 @remore
  2. Who Am I Kei Sawada @remore A rubyist from Tokyo"

    An weekend contrabassist Engineering Manager at Recruit Holdings Co.,Ltd., VP of Engineering at NIJIBOX Co.,Ltd.
  3. Me And Taiwan A Taiwanese coworker who is working at

    NIJIBOX Many #rubyfriends in Taiwan Eddie, Ryudo, Chao, Yu-Cheng, lulalala, Lin Yu Hsiang and many others Super glad to be here today!
  4. Who is interested in Ruby’s Performance micro-benchmarking results YARV, ISeq

    and profiling tools⏱ Who may be interested in RPC(IPC) with Python and Julia from Ruby This Talk Is Mainly For The Rubyist
  5. > echo "x=2.5; 1.upto(10){|i| x=x+i}; p x" | time ruby

    57.5 0.13 real 0.06 user 0.05 sys 0sec 0.05sec 0.1sec 0.15sec 0.2sec 10 Ruby
  6. > echo "x=2.5; 1.upto(100){|i| x=x+i}; p x" | time ruby

    5052.5 0.11 real 0.06 user 0.04 sys 0.1sec 0.108sec 0.115sec 0.123sec 0.13sec 10 100 Ruby
  7. > echo "x=2.5; 1.upto(1000){|i| x=x+i}; p x" | time ruby

    500502.5 0.15 real 0.07 user 0.05 sys 0sec 0.038sec 0.075sec 0.113sec 0.15sec 10 100 1000 Ruby
  8. > echo "x=2.5; 1.upto(10000){|i| x=x+i}; p x" | time ruby

    50005002.5 0.11 real 0.06 user 0.04 sys 0sec 0.038sec 0.075sec 0.113sec 0.15sec 10 100 1000 10000 Ruby
  9. > echo "x=2.5; 1.upto(1e5){|i| x=x+i}; p x" | time ruby

    5000050002.5 0.14 real 0.08 user 0.05 sys 0sec 0.038sec 0.075sec 0.113sec 0.15sec 10 100 1000 10000 1e5 Ruby
  10. > echo "x=2.5; 1.upto(1e6){|i| x=x+i}; p x" | time ruby

    500000500002.5 0.25 real 0.20 user 0.04 sys 0sec 0.065sec 0.13sec 0.195sec 0.26sec 10 100 1000 10000 1e5 1e6 Ruby
  11. > echo "x=2.5; 1.upto(1e7){|i| x=x+i}; p x" | time ruby

    50000005000002.5 1.58 real 1.52 user 0.05 sys 0sec 0.4sec 0.8sec 1.2sec 1.6sec 10 100 1000 10000 1e5 1e6 1e7 Ruby
  12. > echo "x=2.5; 1.upto(1e8){|i| x=x+i}; p x" | time ruby

    5.000000050000003e+15 14.56 real 14.37 user 0.09 sys 0sec 4sec 8sec 12sec 16sec 10 100 1000 10000 1e5 1e6 1e7 1e8 Ruby
  13. > echo "x=2.5; 1.upto(1e9){|i| x=x+i}; p x" | time ruby

    5.00000000067109e+17 157.27 real 150.16 user 1.30 sys 0sec 40sec 80sec 120sec 160sec 10 100 1000 10000 1e5 1e6 1e7 1e8 1e9 Ruby
  14. > PY=$(cat << EOS "n=2.5 for i in range(1,int(\$N)+1): n=i+n;

    print(n)" EOS ) > N=1e3 && eval echo "$PY" | time python 500502.5 0.10 real 0.01 user 0.01 sys How About Python?
  15. > N=1e5 && eval echo "$PY" | time python 5000050002.5

    0.13 real 0.03 user 0.01 sys 0sec 0.038sec 0.075sec 0.113sec 0.15sec 10 100 1000 10000 1e5 Ruby Python
  16. > N=1e6 && eval echo "$PY" | time python 5.00000500002e+11

    0.38 real 0.23 user 0.02 sys 0sec 0.1sec 0.2sec 0.3sec 0.4sec 10 100 1000 10000 1e5 1e6 Ruby Python
  17. > N=1e7 && eval echo "$PY" | time python 5.0000005e+13

    2.66 real 2.35 user 0.17 sys 0sec 0.75sec 1.5sec 2.25sec 3sec 10 100 1000 10000 1e5 1e6 1e7 Ruby Python
  18. > N=1e8 && eval echo "$PY" | time python 5.00000005e+15

    48.27 real 25.87 user 10.67 sys 0sec 12.5sec 25sec 37.5sec 50sec 10 100 1000 10000 1e5 1e6 1e7 1e8 Ruby Python
  19. > N=1e9 && eval echo "$PY" | time python 5.00000005e+15

    48.27 real 25.87 user 10.67 sys 0sec 150sec 300sec 450sec 600sec 10 100 1000 10000 1e5 1e6 1e7 1e8 1e9 Ruby Python
  20. > N=1e9 && eval echo "$PY" | time python 5.00000005e+15

    48.27 real 25.87 user 10.67 sys 0sec 150sec 300sec 450sec 600sec 10 100 1000 10000 1e5 1e6 1e7 1e8 1e9 Ruby Python Attention Please BTW take note that this micro benchmark is done by my MacBook Pro(2015) with Ruby 2.3.0 and Python 2.7. With my environment Python looks pretty slow but it’s never be a fair judge. Please do not take this measurement result seriously, but please just use this to grab the feeling of the order of each programming environment speed!
  21. > SRC=$(cat << EOS "#include \"stdio.h\" int main(){ double n=2.5;

    for(int i=1;i<=\$N;i++){ n=i+n; } printf(\"%lf\", n); }" EOS ) What About C?
  22. > N=1e5 && eval echo "$SRC" > main.c; gcc main.c;

    time ./a.out 5000050002.500000 real 0m0.006s user 0m0.001s sys 0m0.002s 0sec 0.04sec 0.08sec 0.12sec 0.16sec 10 100 1000 10000 1e5 Ruby Python C
  23. 0sec 0.1sec 0.2sec 0.3sec 0.4sec 10 100 1000 10000 1e5

    1e6 Ruby Python C > N=1e6 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out 500000500002.500000 real 0m0.009s user 0m0.004s sys 0m0.002s
  24. 0sec 0.75sec 1.5sec 2.25sec 3sec 10 100 1000 10000 1e5

    1e6 1e7 Ruby Python C > N=1e7 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out 50000005000002.500000 real 0m0.033s user 0m0.029s sys 0m0.002s
  25. 0sec 12.5sec 25sec 37.5sec 50sec 10 100 1000 10000 1e5

    1e6 1e7 1e8 Ruby Python C > N=1e8 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out 5000000050000003.000000 real 0m0.287s user 0m0.281s sys 0m0.003s
  26. 0sec 150sec 300sec 450sec 600sec 10 100 1000 10000 1e5

    1e6 1e7 1e8 1e9 Ruby Python C > N=1e9 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out 500000000067108992.000000 real 0m2.815s user 0m2.799s sys 0m0.008s
  27. Introducing Julia Julia is A dynamic programming language 4 years

    old since open sourced in 2012 Desgined for scientific computing Fast
  28. > JL=$(cat << EOS "function sample_loop(n) for i in 1:\$N

    n = i+n end n end println(sample_loop(2.5))" EOS ) How About Julia?
  29. > N=1e5 && eval echo "$JL" | time julia sample_loop

    (generic function with 1 method) 5.0000500025e9 0.91 real 0.48 user 0.14 sys 0sec 0.125sec 0.25sec 0.375sec 0.5sec 10 100 1000 10000 1e5 Ruby Python C Julia
  30. 0sec 0.125sec 0.25sec 0.375sec 0.5sec 10 100 1000 10000 1e5

    1e6 Ruby Python C Julia > N=1e6 && eval echo "$JL" | time julia sample_loop (generic function with 1 method) 5.000005000025e11 0.45 real 0.44 user 0.08 sys
  31. 0sec 0.75sec 1.5sec 2.25sec 3sec 10 100 1000 10000 1e5

    1e6 1e7 Ruby Python C Julia > N=1e7 && eval echo "$JL" | time julia sample_loop (generic function with 1 method) 5.00000050000025e13 0.50 real 0.47 user 0.09 sys
  32. 0sec 12.5sec 25sec 37.5sec 50sec 10 100 1000 10000 1e5

    1e6 1e7 1e8 Ruby Python C Julia > N=1e8 && eval echo "$JL" | time julia sample_loop (generic function with 1 method) 5.000000050000003e15 1.82 real 0.76 user 0.09 sys
  33. 0sec 150sec 300sec 450sec 600sec 10 100 1000 10000 1e5

    1e6 1e7 1e8 1e9 Ruby Python C Julia > N=1e9 && eval echo "$JL" | time julia sample_loop (generic function with 1 method) 5.00000000067109e17 1.71 real 1.70 user 0.08 sys
  34. Findings Ruby works reasonably fast for smaller number of loops,

    but for huge number of loops it is advisable to consider to switch language Primary option would be using C Julia is also dynamic language but it can be FAST
  35. Chapter 2 3x Challenge An experiment to workaround this performance

    issue using Julia programming language along with ruby2julia transpiler
  36. Given That Performance Issue, Which Option Is The Best To

    Workaround? Give up and use other languages anyway? Make Ruby itself faster? Make a gem to boost my ruby program?
  37. Idea: Transpiler What if we can run arbitrary ruby code

    on a Julia process? It may look something like `some_ruby_code.to_other_lang`
  38. # ruby for i in 1..N n = i+n end

    # julia for i in 1:N n = i+n end Sometimes it's hopeful range operator create range object
  39. # ruby class Sample def context return self end end

    p Sample.new.instance_eval{context} # julia # too many gaps to be filled such as OOP things, methods for reflection etc... But Sometimes it's NOT
  40. A ruby2julia Transpiler Implementation: Julializer github.com/remore/julializer Very limited syntax is

    supported as of v0.1.2 TrueClass, FalseClass, Fixnum, Float, integer, Numeric, Random Array, Range, Hash are also partially supported(only very few methods as of now) TBH still need huge improvements including developing error checking tool and writing documentations "Ͱ΋΍ΔΜͩΑ"
  41. $ echo “-1.6.to_i” | julializer trunc(Int64,parse(string((-1.6)))); $ cat sample.rb for

    i in 0..list.size-1 list[i] = (i-list.size/2).abs end $ julializer sample.rb for i::Int64 = 0:size(list)[1]-1;list[i+1]=abs((i- size(list)[1]/2));;end;; Examples
  42. $ ruby -r julializer -e "p Julializer.ruby2julia(File.read('calc.rb'))" "const max_exp=6;;const exp_table_size=1000;;const

    max_sentence_length=1000;;function init_unigram_table(table_size, vocab);train_words_pow=0.0;;power=0.75;;table=fill(0, table_size);;for a::Int64 = 0:size(vocab)[1]-1;train_words_pow+=vocab[a+1] [0+1]^power;;end;;i=0;;d1=(vocab[i+1][0+1]^power)/train_words_pow;;for a::Int64 = 0:table_size-1;table[a+1]=i;if a/float(table_size)>d1;i+=1;;d1+=(vocab[i+1] [0+1]^power)/train_words_pow;;;end;if i>=size(vocab)[1];i=size(vocab)[1]-1;;end;;end;;return table;;;end;;function addop(size, list, base, target);for i::Int64 = 0:size-1;list[i+base+1]+=target[i+1];;end;;list;;end;;function addop2(size, list, base, coefficient, target, base2);for i::Int64 = 0:size-1;list[i+base +1]+=coefficient*target[i+base2+1];;end;;list;;end;;function addop3(size, f, coefficient, target, base);for i::Int64 = 0:size-1;f+=coefficient[i+1]*target[i+base +1];;end;;f;;end;;function addop4(size, list, target, base);for i::Int64 = 0:size-1;list[i+1]+=target[i+base+1];;end;;list;;end;;myrandom=0;;function next_random();global myrandom;myrandom=abs((myrandom*25214903917+11));;return myrandom;;;end;;function exptable(num);num=exp((num/ float(exp_table_size)*2-1)*max_exp);;num/(num+1);;end;;function bsearch_index(list, target);a=0;;z=size(list)[1]-1;;while (true);current_entry=list[a+1:z+1] [floor(Int64,((z-a)/2))+1];if current_entry<target;next_entry=list[a+1:z+1][floor(Int64,((z-a)/2+1))+1];;if (next_entry>=target)||z-a<=1;return round(Int64,(a+(z- a)/2+1));;;else;a=round(Int64,(a+(z-a)/2));;;end;;;;else;if a>=target||z-a<=1;return a;;end;;z=round(Int64,(z-(z-a)/2));;;end;;;end;;;end;;function calc_vec(iter, original_text, sample, train_words, debug_mode, __vocab_index_hash, vocab, syn0, syn1neg, negative, alpha, __cum_table, table_size, layer1_size, window);sentence_position=0;;sentence_length=0;;word_count=0;;word_count_actual=0;;last_word_count=0;;sen=[];;local_iter=iter;;neu1=[];;neu1e=[];;backup=copy( original_text);;__denominator=trunc(Int64,parse(string((exp_table_size/max_exp/ 2))));;__sample_train_words=sample*train_words;;table_size=trunc(Int64,parse(string(1e8)));;table=init_unigram_table(table_size,vocab);;starting_alpha=alpha;; while true;if sentence_position%500==0&&debug_mode>1;print(@sprintf(\"%d %d / \",word_count,last_word_count));;end;if word_count- last_word_count>10000;word_count_actual+=word_count-last_word_count;;last_word_count=word_count;;if debug_mode>1;print(string(\"\\r Alpha: \",@sprintf(\"%f\",alpha),\" Progress: \",@sprintf(\"%.2f\",(word_count_actual/float((iter*train_words+1))*100)),\"%\"));;end;;alpha=starting_alpha*(1- word_count_actual/float((iter*train_words+1)));;if alpha<starting_alpha*0.0001;alpha=starting_alpha*0.0001;;end;;;end;if sentence_length==0;skipped=0;;sen=[];;___state = start(original_text);while !done(original_text, ___state);___i, ___state = next(original_text, ___state);e = ___i;if haskey(__vocab_index_hash, e);word=__vocab_index_hash[string(e)];;;else;skipped+=1;;continue;;;end;;;word_count+=1;;if word==0;break;;end;;if sample>0;ran=(sqrt(vocab[word+1][0+1]/__sample_train_words)+1)*__sample_train_words/vocab[word+1][0+1];;if ran<(next_random()&(0xFFFF+0))/ 65536.0;continue;;end;;;end;;push!(sen, word);sentence_length+=1;;if sentence_length>=max_sentence_length;break;;end;;;end;;if max_sentence_length +skipped<=length(original_text)-1;splice!(original_text, 0+1:0+0+max_sentence_length+skipped+1);;else;original_text=[];;;end;;;sentence_position=0;;;end;if size(original_text)[1]==0||word_count>train_words;word_count_actual+=word_count-last_word_count;;local_iter-=1;;if debug_mode>1;print(local_iter);;end;;if local_iter==0;break;;end;;word_count=0;;last_word_count=0;;sentence_length=0;;original_text=copy(backup);;sen=[];;continue;;;end;if sentence_position>=size(sen) [1];continue;;end;word=sen[sentence_position+1];neu1=fill(0.0, layer1_size);neu1e=fill(0.0, layer1_size);b=next_random()%window;cw=0;for j::Int64 = b:window*2- b;if j!=window;k=sentence_position-window+j;;if k<0||k>=sentence_length;continue;;end;;if k>=size(sen)[1];continue;;end;;last_word=sen[k +1];;neu1=addop4(layer1_size,neu1,syn0,last_word*layer1_size);;cw+=1;;;end;;end;if cw!=0;for j::Int64 = 0:layer1_size-1;neu1[j+1]/=cw;;end;;if negative>0;for j::Int64 = 0:negative;if j==0;target=word;;label=1;;;else;nr=next_random();;target=table[(nr>>16)%table_size+1];;if target==0;target=nr%(size(vocab) [1]-1)+1;;end;;if target==word;continue;;end;;label=0;;;end;;l2=target*layer1_size;f=0.0;f=addop3(layer1_size,f,neu1,syn1neg,l2);if f>max_exp;g=(label-1)*alpha;;;elseif f<(-max_exp);g=label*alpha;;;else;g=(label-exptable(trunc(Int64,parse(string(((f +max_exp)*__denominator))))))*alpha;;;end;;;neu1e=addop2(layer1_size,neu1e,0,g,syn1neg,l2);syn1neg=addop2(layer1_size,syn1neg,l2,g,neu1,0);;end;;;end;;for j::Int64 = b:window*2-b;if j!=window;c=sentence_position-window+j;;if c<0||c>=sentence_length;continue;;end;;if c>=size(sen)[1];continue;;end;;last_word=sen[c +1];;syn0=addop(layer1_size,syn0,last_word*layer1_size,neu1e);;;end;;end;;;end;sentence_position+=1;if sentence_position>=sentence_length;sentence_length=0;;end;;end;;[syn0,syn1neg];;end;;" You can convert word2vec.rb
  43. Next Problem: How To Run a Julia Program from Ruby

    For example: Run the external program like this? Process.spawn(\"echo 'p 123' | julializer | julia\", :out=>”STDOUT") Obviously not good solution(you need to marshal data manually + Julia language VM must be booted up at every single function call)
  44. Idea: IPC With Julia What if we can pass arbitrary

    Ruby value to running Julia background process throughout Module via IPC?
  45. Sample Usage(1): Calling Julia from Ruby jl = VirtualModule.new(:julia=>["Clustering"]) include

    jl r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
  46. jl = VirtualModule.new(:julia=>["Clustering"]) include jl r = Clustering.kmeans(jl.rand(5, 1000), 20,

    maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8] Sample Usage(1): Calling Julia from Ruby By calling VirtualModule#new method, Julia background process is booted up and starts to idle
  47. jl = VirtualModule.new(:julia=>["Clustering"]) include jl r = Clustering.kmeans(jl.rand(5, 1000), 20,

    maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8] Sample Usage(1): Calling Julia from Ruby VirtualModule#new method will give you back an instance of Module class
  48. jl = VirtualModule.new(:julia=>["Clustering"]) include jl r = Clustering.kmeans(jl.rand(5, 1000), 20,

    maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8] Sample Usage(1): Calling Julia from Ruby Since it’s an instance of Module class, you can #include it
  49. jl = VirtualModule.new(:julia=>["Clustering"]) include jl r = Clustering.kmeans(jl.rand(5, 1000), 20,

    maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8] Sample Usage(1): Calling Julia from Ruby Now is the time to call arbitrary function in Julia. Every single parameters passed in Ruby’s world is converted to Julia’s value by msgpack
  50. jl = VirtualModule.new(:julia=>["Clustering"]) include jl r = Clustering.kmeans(jl.rand(5, 1000), 20,

    maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8] Sample Usage(1): Calling Julia from Ruby msgpack does convert only basic data types(such as Integer, String Array etc). In this case, since kmeans function returns a value of `Clustering.KmeansResult{Float64}` Type, r is still an instance of Module class which keep pointer to `Clustering.KmeansResult{Float64}` value kept in background process.
  51. jl = VirtualModule.new(:julia=>["Clustering"]) include jl r = Clustering.kmeans(jl.rand(5, 1000), 20,

    maxiter:200, display: :iter) p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8] Sample Usage(1): Calling Julia from Ruby Since Clustering.assignments function returns basic data type which can be converted to Ruby’s Array, finally we’ve got the clustering result!
  52. Sample Usage(2): Calling Python(sklearn) from Ruby skl = VirtualModule.new( :lang=>:python,

    :pkgs=>["sklearn"=>["datasets", "svm", "grid_search", “cross_validation"]] ) include skl iris = datasets.load_iris(:_) clf = grid_search.GridSearchCV( svm.LinearSVC(:_), {'C':[1, 3, 5],'loss':['hinge', 'squared_hinge']}, verbose:0 ) clf.fit(iris.data, iris.target) p "Best Params: #{best_params = clf.best_params_}" #"Best Params: {\"loss\"=>\"squared_hinge\", \"C\"=>1}" score = cross_validation.cross_val_score( svm.LinearSVC(loss:'squared_hinge', C:1), iris.data, iris.target, cv:5 ) p "Scores: #{[:mean,:min,:max,:std].map{|e| e.to_s + '=' + score.send(e, :_).to_s }.join(',')}" # "Scores: mean=0.9666666666666668,min=0.9,max=1.0,std=0.04216370213557838"
  53. Sample Usage(3): Defining Custom Methods With Julializer vm = VirtualModule.new(methods:<<EOS,

    :transpiler=>->(s) {Julializer.ruby2julia(s)}) def init_table(list) for i in 0..list.size-1 list[i]+=Random.rand end list end EOS p vm.init_table([1,20]) # [1.3066601775641218, 20.17001189249985]
  54. $ SRC=$(cat << EOS "p VirtualModule.new(methods:<<METHOD).sample_loop(2.5) def sample_loop(n) for i

    in 1..\$N n = i+n end n end METHOD" EOS ) Let’s Run The Simple Huge Loop
  55. > N=1e5 && eval echo "$SRC" | time ruby -r

    virtual_module 5000050002.5 2.68 real 1.58 user 0.36 sys 0sec 0.45sec 0.9sec 1.35sec 1.8sec 10 100 1000 10000 1e5 Ruby Python C Julia VirtualModule
  56. 0sec 0.75sec 1.5sec 2.25sec 3sec 10 100 1000 10000 1e5

    1e6 Ruby Python C Julia VirtualModule > N=1e6 && eval echo "$SRC" | time ruby -r virtual_module 500000500002.5 2.20 real 1.46 user 0.25 sys
  57. 0sec 0.75sec 1.5sec 2.25sec 3sec 10 100 1000 10000 1e5

    1e6 1e7 Ruby Python C Julia VirtualModule > N=1e7 && eval echo "$SRC" | time ruby -r virtual_module 50000005000002.5 1.68 real 1.51 user 0.21 sys
  58. 0sec 12.5sec 25sec 37.5sec 50sec 10 100 1000 10000 1e5

    1e6 1e7 1e8 Ruby Python C Julia VirtualModule > N=1e8 && eval echo "$SRC" | time ruby -r virtual_module 5.000000050000003e+15 1.95 real 1.75 user 0.21 sys
  59. 0sec 150sec 300sec 450sec 600sec 10 100 1000 10000 1e5

    1e6 1e7 1e8 1e9 Ruby Python C Julia VirtualModule > N=1e9 && eval echo "$SRC" | time ruby -r virtual_module 5.00000000067109e+17 4.50 real 4.29 user 0.21 sys
  60. $ cd example $ ruby word2vec.rb --output /tmp/vectors.bin --train ../doc/benchmark_word2vec/training_data/

    10mb.txt --size 20 --window 10 --negative 5 --sample 1e-4 --binary 1 --iter 3 --debug 0 > /dev/null 2>&1 $ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import gensim >>> model = gensim.models.Word2Vec.load_word2vec_format('/tmp/vectors.bin', binary=True) >>> model.most_similar("japan") [(u'netherlands', 0.9741939902305603), (u'china', 0.9712631702423096), (u'county', 0.9686408042907715), (u'spaniards', 0.9669440388679504), (u'vienna', 0.9614173769950867), (u'abu', 0.9587018489837646), (u'korea', 0.9565504789352417), (u'canberra', 0.954473614692688), (u'erupts', 0.9540712833404541), (u'prefecture', 0.9534248113632202)] Benchmarking Using Pure-Ruby word2vec implementation
  61. But Still Problematic If the size of the text going

    seriously huge, compared to C there is still big gap….
  62. Chapter 3 Why Slow An ongoing profiling attempt to know

    what part of the program cause this performance issue
  63. $ cat profile.rb RubyProf.start x=2.5 1.upto(1e4){|i| x=x+i}; p x result

    = RubyProf.stop RubyProf::FlatPrinter.new(result).print(STDOUT) $ ruby -r ruby-prof profile.rb ruby-prof
  64. $ ruby -r ruby-prof profile.rb 50005002.5 Measure Mode: wall_time Thread

    ID: 70204763724260 Fiber ID: 70204768000900 Total: 0.009597 Sort by: self_time %self total self wait child calls name 52.19 0.010 0.005 0.000 0.005 1 Integer#upto 16.92 0.002 0.002 0.000 0.000 10001 Fixnum#> 15.79 0.002 0.002 0.000 0.000 10000 Fixnum#+ 14.60 0.001 0.001 0.000 0.000 10000 Float#+ 0.22 0.010 0.000 0.000 0.010 1 Global#[No method] 0.19 0.000 0.000 0.000 0.000 1 Kernel#p 0.09 0.000 0.000 0.000 0.000 1 Float#inspect #upto is the slowest. #> is also slow.
  65. $ ruby -e "printf RubyVM::InstructionSequence.compile('x=2.5; 1.upto(1e4){ |i| x = x+i

    }').disasm" == disasm: #<ISeq:<compiled>@<compiled>>================================ == catch table | catch type: break st: 0006 ed: 0013 sp: 0000 cont: 0013 |------------------------------------------------------------------------ local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] x 0000 trace 1 ( 1) 0002 putobject 2.5 0004 setlocal_OP__WC__0 2 0006 putobject_OP_INT2FIX_O_1_C_ 0007 putobject 10000.0 0009 send <callinfo!mid:upto, argc:1>, <callcache>, block in <compiled> 0013 leave == disasm: #<ISeq:block in <compiled>@<compiled>>======================= == catch table | catch type: redo st: 0002 ed: 0014 sp: 0000 cont: 0002 | catch type: next st: 0002 ed: 0014 sp: 0000 cont: 0014 |------------------------------------------------------------------------ local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] i<Arg> 0000 trace 256 ( 1) 0002 trace 1 0004 getlocal_OP__WC__1 2 0006 getlocal_OP__WC__0 2 0008 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache> 0011 dup 0012 setlocal_OP__WC__1 2 0014 trace 512 0016 leave will help us understand what’s happening internally
  66. $ ruby -e "printf RubyVM::InstructionSequence.compile('x=2.5; 1.upto(1e4){ |i| x = x+i

    }').disasm" == disasm: #<ISeq:<compiled>@<compiled>>================================ == catch table | catch type: break st: 0006 ed: 0013 sp: 0000 cont: 0013 |------------------------------------------------------------------------ local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] x 0000 trace 1 ( 1) 0002 putobject 2.5 0004 setlocal_OP__WC__0 2 0006 putobject_OP_INT2FIX_O_1_C_ 0007 putobject 10000.0 0009 send <callinfo!mid:upto, argc:1>, <callcache>, block in <compiled> 0013 leave == disasm: #<ISeq:block in <compiled>@<compiled>>======================= == catch table | catch type: redo st: 0002 ed: 0014 sp: 0000 cont: 0002 | catch type: next st: 0002 ed: 0014 sp: 0000 cont: 0014 |------------------------------------------------------------------------ local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] i<Arg> 0000 trace 256 ( 1) 0002 trace 1 0004 getlocal_OP__WC__1 2 0006 getlocal_OP__WC__0 2 0008 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache> 0011 dup 0012 setlocal_OP__WC__1 2 0014 trace 512 0016 leave But which instruction on earth is really slow?
  67. Introducing yarv-prof github.com/remore/yarv-prof A tiny DTrace-Based YARV profiler Instrumented profiling

    with walltime or cputime Only basic dataset are provided so far. Still under development
  68. Chapter 4 Your Turn! Why not to attempt by yourself

    towards Ruby 3x3? Or even “5xRuby”?