student at university of Tsukuba • I will be a Ph.D student next year • Research: unsupervised machine learning • Graph Data and Topic Models • OSS contribution: Keras and document translation • Keras is a very popular deep learning framework
evaluation metric for classification model • First easy task • To learn Hive and Hivemall • Merged 2. SLIM: fast recommendation algorithm • The hardest work for me… • Merged 3. word2vec: unsupervised word feature learning algorithm • Challenging task • Under review
• e.g: User gender, positive/negative book review, … • Prediction model is trained on labeled dataset Background: Binary Classification Problem OR Dataset Train ML model 8IJDIJT.-NPEFMCFUUFS )PXUPDIPPTFCFUUFSQBSBNFUFSTPG.-NPEFM
Higher F-measure indicates better model • When β=1, it is called F1–score/F–score • Hivemall supported F1-score F = (1 + 2 ) precision ⇤ recall 2 precision + recall
algorithm Xia Ning and George Karypis. SLIM: Sparse Linear Methods for Top-N Recommender Systems. In ICDM, 2011. minwj 1 2 ||aj Awj ||2 2 + 2 ||wj ||2 2 + ||wj ||1 subject to W 0 diag( W ) = 0 × ≒ A W A’ items users
by coordinate descent • Approximate matrix product AW • Only use top-k similar items per item 2. Prediction • Approximate Matrix product AW • Weights matrix W is sparse aj
of top-k similar items of item i with ratings • Larger k makes better recommendation, but memory usage and training time increase * th book x5 x5 x4 x k
use document dataset like a Wikipedia • High impact algorithms • Very fast • Simple model • Other domain applications • E.g. item purchase history, graph data
dimension vector • About 100 — 1000 dimension • Features • Similar words are similar vectors • Finding synonyms • Good feature for other ML tasks • Word analogy • King - Man + Woman ~= Queen • Reading - Watching + Watched ~= Read • France - Paris + Tokyo ~= Japan King Man Queen Woman
dimension vector • About 100 — 1000 dimension • Features • Similar words are similar vectors • Finding synonyms • Good feature for other ML tasks • Word analogy • King - Man + Woman ~= Queen • Reading - Watching + Watched ~= Read • France - Paris + Tokyo ~= Japan King Man Queen Woman 5PQTJNJMBSXPSETPG lKBWBz
dimension vector • About 100 — 1000 dimension • Features • Similar words are similar vectors • Finding synonyms • Good feature for other ML tasks • Word analogy • King - Man + Woman ~= Queen • Reading - Watching + Watched ~= Read • France - Paris + Tokyo ~= Japan King Man Queen Woman
words given the current word Alice was beginning to get very Cited from T. Mikolov, et al., Efficient Estimation of Word Representations in Vector Space. In ICLR, 2013.
very • Train word vector by predicting the current word based on nearby words • It is better model in order to obtain low frequency words Cited from T. Mikolov, et al., Efficient Estimation of Word Representations in Vector Space. In ICLR, 2013.
O(V) : V is the size of vocabulary • Negative sampling approximates softmax • Use “negative words” sampled from noise distribution as negative example • The number of negative samples is 5 – 25
0.2 0.25 0.25 Elements of array that has 100 elements • 0—29: The • 30—49: Hive • 50—74: word2vec • 75—99: is Sampling by nextint(100) : O(1) Using too long array for V
A A A The 0.8 1.0 0 1. Sampling index by nextint(V) 2. Sampling double nextDouble() 3 2 1 nextDouble() = 0.7 If random value < 0.8 1th element: Hive Else the other word in 1st element: The
negative sampling • Fast sampling from noise distribution • Saving memory • Original: use too long array for the size of vocabulary 2. Data parallel training • No parameter synchronous • Guarantee the same initialization for vector weights • By using word index as seed value • Unfortunately, vector quality is not good…
• e.g. Recommendation quality and speed for SLIM • Improvement quality during data parallel training • parameter server ? • If we can do, I also want to write a paper based on hivemall implementation…