Upgrade to Pro — share decks privately, control downloads, hide ads and more …

音楽配信サービスにおける 推薦システムの概要と 数理モデルについて

Hiroki Mizukami
September 16, 2019
200

音楽配信サービスにおける 推薦システムの概要と 数理モデルについて

日本数学会 2019年度秋季総合分科会 ワークショップ
数学ソフトウェアとフリードキュメントでの講演内容です
https://mathsoc.jp/meeting/kanazawa19sept/

Hiroki Mizukami

September 16, 2019
Tweet

Transcript

  1. 1. Իָ഑৴ࣄۀͷਪનγεςϜ ਪનγεςϜΛମݧͯ͠ΈΔ Ϩίʔυ࢈ۀͷਰ੎ ਪનγεςϜʹظ଴͢Δ໾ׂ ਪનΛ༻͍ͨϢʔβମݧͷઃܭ ਪનγεςϜʹؔ͢Δޡղ ਪનγεςϜͷߏ੒ཁૉ 2. ਪનγεςϜΛࢧ͑Δ਺ཧ

    ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ େن໛σʔλͱػցֶश ௿࣍ݩಛ௃ྔͷ֫ಘ ࠷ۙ๣୳ࡧ໰୊ γεςϜߏ੒ 3. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ
  2. Ϩίʔυ࢈ۀͷਰ੎ • Recorded Music Industry • ࿥Ի͞ΕͨԻݯͷྲྀ௨ • 2014೥ΑΓࢢ৔ن໛͕ճ෮܏޲ •

    ෺ཧϝσΟΞͷࢢ৔ن໛ͷॖখ • σδλϧϝσΟΞͷࢢ৔ن໛ͷ֦େ • σδλϧԽʹ൐͏มભ • ྲྀ௨ͷޮ཰Խ • εϚʔτϑΥϯͷීٴ • ਪનɾݕࡧٕज़ͷ৽͍͠Ԡ༻ઌ IFPI Global Music Report 2019 https://www.ifpi.org/news/IFPI-GLOBAL-MUSIC-REPORT-2019 Point!!
  3. Ϩίʔυ࢈ۀͷਰ੎ ෺ཧϝσΟΞ σδλϧϝσΟΞ ݕࡧ ୨͔Βݟ͚ͭͯ͘Δ δϟέοτʹهࡌͷ৘ใΛ໨ࢹ େن໛σʔλϕʔεʹΑΔݕࡧ ๲େͳߏ଄ԽσʔλΛࢀর ʢ࡞ۂऀɾδϟϯϧɾۂௐͳͲʣ ਪન

    ൢചళͷ͓͢͢Ί ࡶࢽɾϚεϝσΟΞ ஌ਓ͔Βͷ͓͢͢Ί ྨࣅָۂ୳ࡧ ๲େͳߦಈཤྺʹجͮ͘ਪન ڠௐϑΟϧλϦϯά • Ϩίʔυ࢈ۀʹ͓͍ͯ΋৘ใॲཧٕज़ɾ਺ཧతϞσϧͷ׆༻͕׆ൃʹͳΔ
  4. ਪનγεςϜʹظ଴͢Δ໾ׂ • ਪન • ސ٬ͷҙࢥܾఆΛޙԡ͢͠Δ࢓૊Έɾίϛϡχέʔγϣϯ • “Right Information, to the

    right people, in the right way.” • ਪનγεςϜ • ਪનʹవΘΔίϛϡχέʔγϣϯΛιϑτ΢ΣΞͷྗͰεέʔϧͤ͞Δ ϋϯόʔά ͍ͩ͘͞ʂ ͚ͭ͋Θͤʹ ͓໺ࡊ͸͍͔͕ Ͱ͔͢ʁ ϋϯόʔά ͸ॏ͍ͳ͊ ͪ͜Βͷڕͷϝχϡʔ Ͱͨ͠ΒαούϦͯ͠ ·͢
  5. ਪનʹΑΔϢʔβମݧͷઃܭ • ϊϯλʔήςΟϯάࢪࡦ • Trends • ਺ཧతͳΞϧΰϦζϜʹΑΔ • ྲྀߦΛࣗಈͰݕ஌ɾ֦ࢄ •

    Focus • ϓϩͷΤσΟλɾDJʹΑΔબۂ • ྲྀߦ΍ελΠϧͷఏҊ • ༷ʑͳίϯςΩετΛߟྀ • ༵೔ɼ࣌ؒɼΠϕϯτɼetc…
  6. ਪનʹΑΔϢʔβମݧͷઃܭ • λʔήςΟϯάࢪࡦ • Discovery • ࠶ੜϩάʹجͮ͘ηάϝϯςʔγϣϯ • ޷ΈʹԠͨ͡ϓϨΠϦετͷఏҊ •

    Related Playlists • ࠶ੜϩάʹجͮ͘ηάϝϯςʔγϣϯ • ΞϧΰϦζϜʹΑΔྨࣅϓϨΠϦετͷఏҊ • ຊ೔ͷ͓࿩
  7. 1. Իָ഑৴ࣄۀͷਪનγεςϜ ਪનγεςϜΛମݧͯ͠ΈΔ Ϩίʔυ࢈ۀͷਰ੎ ਪનγεςϜʹظ଴͢Δ໾ׂ ਪનΛ༻͍ͨϢʔβମݧͷઃܭ ਪનγεςϜʹؔ͢Δޡղ ਪનγεςϜͷߏ੒ཁૉ 2. ਪનγεςϜΛࢧ͑Δ਺ཧ

    ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ େن໛σʔλͱػցֶश ௿࣍ݩಛ௃ྔͷ֫ಘ ࠷ۙ๣୳ࡧ໰୊ γεςϜߏ੒ 3. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ
  8. ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ • https://mf.awa.fm/2NVL2eX • ϓϨΠϦετ • ར༻ऀ͕࡞੒ɾެ։Ͱ͖Δָۂͷྻ • ྦྷܭ1000ສ௒ •

    ࡞੒ɾߋ৽ස౓͕ߴ͍ • ؔ࿈ϓϨΠϦετ • ࠶ੜதͷ΋ͷͱࣅͨงғؾͷϓϨΠϦετ • ࿈ଓతͳԻָମݧͷఏڙ
  9. ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ • ߹ܭͷॲཧ࣌ؒΛখ͘͢͞Δ • ਺ཧతͳཁૉٕज़ • େن໛σʔλͱػցֶशʹ͍ͭͯ • ௿࣍ݩಛ௃ྔͷ֫ಘʹ͍ͭͯ •

    ࠷ۙ๣୳ࡧ໰୊ʹ͍ͭͯ • ݁Ռ • ϦΞϧλΠϜͳۙ๣ϓϨΠϦετ୳ࡧ͕Մೳʹ • ϓϨΠϦετ࡞੒ɾߋ৽Πϕϯτ͔Β਺ඵ
  10. େن໛σʔλͱػցֶशʹ͍ͭͯ • Ұൠతͳ࠷దԽ໰୊ • : ໨తؔ਺ • : ੍໿৚݅ •

    େҬత࠷దੑ f(θ) θ ∈ S f(θ) Minimize subject to θ ∈ S ⊂ ℝd Point!!
  11. • యܕతͳػցֶशʹ͓͚Δ࠷దԽ໰୊ • : ໨తؔ਺ɹˠ ଛࣦʹؔ͢Δ࿨ͷܗʹͳ͍ͬͯΔ͜ͱ͕͓͓͍ • : ੍໿৚݅ɹˠ ແ੍໿Ͱ͋Δ͜ͱ͕͓͓͍

    • େҬత࠷దੑɹˠ ࣮ݧతͳੑೳΛॏࢹʢ෼ྨਫ਼౓ɾֶश࣌ؒͳͲʣ f(θ) θ ∈ S େن໛σʔλͱػցֶशʹ͍ͭͯ f(θ) = n ∑ i=1 loss(xi |θ) Minimize θ ∈ ℝd Point!! subject to
  12. • యܕతͳػցֶशʹ͓͚Δ࠷దԽ໰୊ • : ໨తؔ਺ɹˠ ଛࣦʹؔ͢Δ࿨ͷܗʹͳ͍ͬͯΔ͜ͱ͕͓͓͍ • : ੍໿৚݅ɹˠ ແ੍໿Ͱ͋Δ͜ͱ͕͓͓͍

    • େҬత࠷దੑɹˠ ࣮ݧతͳੑೳΛॏࢹʢ෼ྨਫ਼౓ɾֶश࣌ؒͳͲʣ f(θ) θ ∈ S େن໛σʔλͱػցֶशʹ͍ͭͯ Minimize Point!! େن໛σʔλ͸͕͜͜େن໛ʂ f(θ) = n ∑ i=1 loss(xi |θ) θ ∈ ℝd subject to
  13. • ޯ഑߱Լ๏ • ޯ഑ͱεςοϓ෯ ΛٻΊΔ • Λߋ৽ • ͷܗΛ͍ͯ͠Δ৔߹͸֬཰తޯ഑͕࢖͑Δ •

    ࣗಈඍ෼ιϑτ΢ΣΞͷൃల • ࣾ಺Ή͚ͷγϯϓϧͳࣗಈඍ෼ͷઆ໌ • https://github.com/piroyoung/akb-study-meet-helloween/blob/master/automatic_differentiation.ipynb αk θ f(θ) = ∑ i loss(xi |θ) େن໛σʔλͱػցֶशʹ͍ͭͯ θk+1 ← θk − αk ⋅ ∇f(θk ) Point!!
  14. • ֬཰తޯ഑߱Լ๏ʢStochastic Gradient Descent = SGDʣ • ޯ഑ͷ୅ΘΓʹ֬཰తޯ഑Ͱۙࣅ͢Δํ๏ • ΦϯϥΠϯֶश

    • MiniBatchSGDͳͲͷόϦΤʔγϣϯ • লϝϞϦͳ࠷దԽ࣮૷ େن໛σʔλͱػցֶशʹ͍ͭͯ Point!! ∵ E[loss(xi |θk )] = 1 n ∇f(θk ) : ϥϯμϜʹҰͭબͿ i θk+1 ← θk − αk ⋅ ∇loss(xi |θk )
  15. • ߦಈϩά͔ΒΞΠςϜΛಛ௃෇͚͍ͨ • ಛ௃ྔ͸௒ߴ࣍ݩͷϕΫτϧͱͯ͠ಘΒΕΔ • ͜ͷ৔߹࣍ݩʹར༻ऀ਺ • ๲େͳۭؒܭࢉྔ • Ґ૬Λอͭ௿࣍ݩϕΫτϧΛಘΔํ๏͕ݕ౼͞Ε͍ͯΔ

    • • ͍͍ͩͨ d͸਺ඦສ͔Β਺ઍສɼk͸10͔Β500ఔ౓ • ʮιϑτΫϥελϦϯάʯ • ྫ • ඇෛ஋ߦྻҼࢠ෼ղ • skip-gram f : ℝd → ℝk ௿࣍ݩಛ௃ྔͷ֫ಘ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ ߪೖ ߪೖ ΞΠςϜ ߪೖ ΞΠςϜ ߪೖ ߪೖ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ    ΞΠςϜ    ΞΠςϜ    ←௒ߴ࣍ݩ→
  16. • ඇෛ஋ߦྻҼࢠ෼ղʢNon-negative Matrix Factorization) • ΞΠςϜ1ͱΞΠςϜ2͕ྨࣅͯ͠Δ͜ͱΛ௿࣍ݩͰදݱ • ͜ͷΑ͏ͳϕΫτϧΛ࣍ͷΑ͏ͳ࠷దԽ໰୊ΑΓಘΔ • ͜͜Ͱ

    ͸֤੒෼ͷ2৐࿨ ∥ ⋅ ∥2 ௿࣍ݩಛ௃ྔͷ֫ಘ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ ߪೖ ߪೖ ΞΠςϜ ߪೖ ΞΠςϜ ߪೖ ߪೖ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ    ΞΠςϜ    ΞΠςϜ    ( 1 1 0 0 0 1 1 1 0 ) = ( 1 0 0 1 1 0 ) ⋅ ( 1 1 0 0 0 1) Item vector User vector ࢀߟɿɹhttp://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization ∥R − U ⋅ V∥2 Minimize s. t. U ≥ 0,V ≥ 0
  17. • FeatureEmbedding • ࣗવݴޠॲཧ෼໺ΑΓ • ʮ୯ޠͷੑ࣭͸ͦͷपลޠ͕ಛ௃෇͚Δʯ • ྛޝɼསɹʔʼɹ[ൽɼണ͘ɼ৯΂Δ] • ୯ޠͷܥྻ

    ʹରͯ͠ • ͜ͷ Λ௿࣍ݩͷಛ௃ྔͱֶͯ͠श͢Δ w1 , w2 , ⋯, wt , ⋯, wT vwt ∈ ℝk ௿࣍ݩಛ௃ྔͷ֫ಘ ࢲ͸ྛޝͷൽΛണ͍ͯ৯΂ͨ ࢲ͸སͷൽͷണ͍ͯ৯΂ͨ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. Maximize ͨͩ͠ 0 0 1 0 ⋮ 0 0
  18. • FeatureEmbedding • ҙຯ্ͷؔ܎ΛزԿֶతɾ୅਺తʹ࠶ݱ͢Δέʔε͕֬ೝ • ϞεΫϫ - ϩγΞ + ೔ຊ

    = ౦ژ ͳͲ ௿࣍ݩಛ௃ྔͷ֫ಘ ࢲ͸ྛޝͷൽΛണ͍ͯ৯΂ͨ ࢲ͸སͷൽͷണ͍ͯ৯΂ͨ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
  19. • FeatureEmbedding • ָۂͷ࠶ੜܥྻʹ΋ಉ͡ԾઆΛద༻͢Δ • ָۂΛपลͷָۂ͕ಛ௃෇͚Δ • ϓϨΠϦετ͸ָۂͷܥྻ • ָۂͷಛ௃ྔͷॏ৺ͳͲΛϓϨΠϦετͷಛ௃ྔʹ

    ௿࣍ݩಛ௃ྔͷ֫ಘ ࢲ͸ྛޝͷൽΛണ͍ͯ৯΂ͨ ࢲ͸སͷൽͷണ͍ͯ৯΂ͨ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
  20. • ࠷ۙ๣୳ࡧ໰୊ ࠷ۙ๣୳ࡧ໰୊ = {yi ∈ ℝk |i = 1,2,⋯,

    n} ༗ݶݸͷΞΠςϜͷಛ௃ྔͷू߹Λ ͱ͢Δɽ͜ͷͱ͖೚ҙͷΫΤϦϕΫτϧ ʹରͯ͠ x ∈ ℝk NN(x) = argmin y∈ d(x, y) ͳΔ ΛΈ͚͍ͭͨɽ NN(x) x NN(x) • ୯७ͳઢܗ୳ࡧʹ͸๲େͳܭࢉྔ͕ඞཁ • ۙࣅతͳ࠷ۙ๣୳ࡧ͕ఏҊ͞Ε͍ͯΔ • Approximately Nearest Neighborʢ=ANNʣ
  21. • TreeܥΞϧΰϦζϜ • ୳ࡧର৅ʹ໦ߏ଄Λߏங • ಛ௃ྔΛϝϞϦʹ֨ೲ͢Δ • ਺ઍສx਺ඦ࣍ݩ • ۭؒܭࢉྔେ

    • ໦ߏ଄ͷ࡞ΓํʹόϦΤʔγϣϯ͕͋Δ • B-tree • kd-tree • Randomized kd-tree ࠷ۙ๣୳ࡧ໰୊ x NN(x)
  22. • GraphܥΞϧΰϦζϜ • ୳ࡧର৅ʹ࿈݁ͳάϥϑߏ଄Λߏங • ಛ௃ྔΛϝϞϦʹ֨ೲ͢Δ • ༷ʑͳڑ཭Ͱ࢖͑Δ • ϥϯμϜͳॳظ఺͔Β͍ۙํͷϊʔυʹҠಈ͍ͯ͘͠

    • άϥϑߏ଄ͷ࡞Γํɾ୳ࡧํ๏ʹόϦΤʔγϣϯ͕͋Δ • ֊૚తͳάϥϑΛ༻͍Δ΋ͷ • ୳ࡧࡁΈͷϊʔυʹભҠ͠ͳ͍ํ๏ ࠷ۙ๣୳ࡧ໰୊ x NN(x) ࢀߟɿhttps://github.com/nmslib/nmslib
  23. • ௚ੵྔࢠԽʢProduct quantization) • ಛ௃ྔ͸۠෼తʹූ߸Խͯ͠ϝϞϦʹ֨ೲ • k-means๏ʹΑΔϋʔυΫϥελϦϯά • Ϋϥελຖͷॏ৺΋ϝϞϦʹ֨ೲ •

    ֤ූ߸ʹରԠ͢ΔΫϥελͷॏ৺Λ༻͍ͨಛ௃ྔ • ϢʔΫϦουڑ཭͕ۙࣅͰ͖Δ • • • ۠෼ຖͷϢʔΫϦουڑ཭ͷ࿨ͱҰக͢Δ͜ͱΛ࢖͏ q(y) : d(x, y) ≃ d (q(x), q(y)) d(x, y) ≃ d (x, q(y)) ࠷ۙ๣୳ࡧ໰୊ Jegou, Herve, Matthijs Douze, and Cordelia Schmid. "Product quantization for nearest neighbor search." IEEE transactions on pattern analysis and machine intelligence 33.1 (2010): 117-128.
  24. ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ Parser Learner Registerer Finder ඼࣭είΞ ϑΟϧλ ࠶ੜϩά Track→Vector Playlist→Vector

    HTTP Server ࠶ੜϩά Stream ϓϨΠϦετ ࡞੒ɾߋ৽Stream Request Response લॲཧ ָۂͷಛ௃ྔͷֶश ϓϨΠϦετͷಛ௃ྔ ਪનର৅ͱ͢Δ͔൱͔ ۙࣅ࠷ۙ๣୳ࡧ ϦΫΤετड෇ɾΩϟογϡ
  25. 1. Իָ഑৴ࣄۀͷਪનγεςϜ ਪનγεςϜΛମݧͯ͠ΈΔ Ϩίʔυ࢈ۀͷਰ੎ ਪનγεςϜʹظ଴͢Δ໾ׂ ਪનΛ༻͍ͨϢʔβମݧͷઃܭ ਪનγεςϜʹؔ͢Δޡղ ਪનγεςϜͷߏ੒ཁૉ 2. ਪનγεςϜΛࢧ͑Δ਺ཧ

    ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ େن໛σʔλͱػցֶश ௿࣍ݩಛ௃ྔͷ֫ಘ ࠷ۙ๣୳ࡧ໰୊ γεςϜߏ੒ 3. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ
  26. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ • ୅දతͳΦʔϓϯιʔειϑτ΢ΣΞ ػցֶश Tensorflow PyTorch Chainer ۙࣅ࠷ۙ๣୳ࡧ annoy: https://github.com/spotify/annoy

    nmslib: https://github.com/nmslib/nmslib • ࣮༻্͸ OSSʹ͸PythonͷΠϯλʔϑΣʔεΛ࣋ͭ΋ͷ͕ଟ͍͕࢖͍উख͕ѱ͍͜ͱ͕͋Δɽ ळ༿ݪϥϘͰ͸ΞϓϦέʔγϣϯͷཁٻʹԠͯ͡JavaͳͲͰ࠶࣮૷ͯ͠ར༻
  27. ࢀߟจݙ • IPFI, Global Music Report 2018 : ANNUAL STATE

    OF THE INDUSTRY, https://www.ifpi.org/downloads/GMR2018.pdf • Ұൠࣾஂ๏ਓ೔ຊϨίʔυڠձɿStatistics Trends – the Recording Industry in Japan, https://www.riaj.or.jp/f/pdf/issue/industry/ RIAJ2018.pdf • Cosley, D. et al. : Is Seeing Believing? : How Recommender System Interfaces Affect Users' Opinions, Proceedings of The SIGCHI Conference on Human Factors in Computing Systems, ACM (2003). • Agarwal, D. k. and Chen, B-C. : Statistical Methods for Recommender Systems, Cambridge University Press (2016). • Mikolov, T. et al. : Distributed Representations of Words and Phrases and Their Compositionality, Advances in Neural Information Processing Systems (2013). • Jégou, H., Douze, M. and Schmid, C. : Product Quantization for Nearest Neighbor Search, IEEE Transactions on Pattern Analysis and Machine Intelligence 33.1, 117-128 (2011). • Baylor, D. et al. : Tfx : A Tensorflow-based Production-scale Machine Learning Platform, Proceedings of The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2017)