Upgrade to Pro — share decks privately, control downloads, hide ads and more …

音楽配信サービスにおける 推薦システムの概要と 数理モデルについて

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Hiroki Mizukami Hiroki Mizukami
September 16, 2019
220

音楽配信サービスにおける 推薦システムの概要と 数理モデルについて

日本数学会 2019年度秋季総合分科会 ワークショップ
数学ソフトウェアとフリードキュメントでの講演内容です
https://mathsoc.jp/meeting/kanazawa19sept/

Avatar for Hiroki Mizukami

Hiroki Mizukami

September 16, 2019
Tweet

Transcript

  1. 1. Իָ഑৴ࣄۀͷਪનγεςϜ ਪનγεςϜΛମݧͯ͠ΈΔ Ϩίʔυ࢈ۀͷਰ੎ ਪનγεςϜʹظ଴͢Δ໾ׂ ਪનΛ༻͍ͨϢʔβମݧͷઃܭ ਪનγεςϜʹؔ͢Δޡղ ਪનγεςϜͷߏ੒ཁૉ 2. ਪનγεςϜΛࢧ͑Δ਺ཧ

    ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ େن໛σʔλͱػցֶश ௿࣍ݩಛ௃ྔͷ֫ಘ ࠷ۙ๣୳ࡧ໰୊ γεςϜߏ੒ 3. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ
  2. Ϩίʔυ࢈ۀͷਰ੎ • Recorded Music Industry • ࿥Ի͞ΕͨԻݯͷྲྀ௨ • 2014೥ΑΓࢢ৔ن໛͕ճ෮܏޲ •

    ෺ཧϝσΟΞͷࢢ৔ن໛ͷॖখ • σδλϧϝσΟΞͷࢢ৔ن໛ͷ֦େ • σδλϧԽʹ൐͏มભ • ྲྀ௨ͷޮ཰Խ • εϚʔτϑΥϯͷීٴ • ਪનɾݕࡧٕज़ͷ৽͍͠Ԡ༻ઌ IFPI Global Music Report 2019 https://www.ifpi.org/news/IFPI-GLOBAL-MUSIC-REPORT-2019 Point!!
  3. Ϩίʔυ࢈ۀͷਰ੎ ෺ཧϝσΟΞ σδλϧϝσΟΞ ݕࡧ ୨͔Βݟ͚ͭͯ͘Δ δϟέοτʹهࡌͷ৘ใΛ໨ࢹ େن໛σʔλϕʔεʹΑΔݕࡧ ๲େͳߏ଄ԽσʔλΛࢀর ʢ࡞ۂऀɾδϟϯϧɾۂௐͳͲʣ ਪન

    ൢചళͷ͓͢͢Ί ࡶࢽɾϚεϝσΟΞ ஌ਓ͔Βͷ͓͢͢Ί ྨࣅָۂ୳ࡧ ๲େͳߦಈཤྺʹجͮ͘ਪન ڠௐϑΟϧλϦϯά • Ϩίʔυ࢈ۀʹ͓͍ͯ΋৘ใॲཧٕज़ɾ਺ཧతϞσϧͷ׆༻͕׆ൃʹͳΔ
  4. ਪનγεςϜʹظ଴͢Δ໾ׂ • ਪન • ސ٬ͷҙࢥܾఆΛޙԡ͢͠Δ࢓૊Έɾίϛϡχέʔγϣϯ • “Right Information, to the

    right people, in the right way.” • ਪનγεςϜ • ਪનʹవΘΔίϛϡχέʔγϣϯΛιϑτ΢ΣΞͷྗͰεέʔϧͤ͞Δ ϋϯόʔά ͍ͩ͘͞ʂ ͚ͭ͋Θͤʹ ͓໺ࡊ͸͍͔͕ Ͱ͔͢ʁ ϋϯόʔά ͸ॏ͍ͳ͊ ͪ͜Βͷڕͷϝχϡʔ Ͱͨ͠ΒαούϦͯ͠ ·͢
  5. ਪનʹΑΔϢʔβମݧͷઃܭ • ϊϯλʔήςΟϯάࢪࡦ • Trends • ਺ཧతͳΞϧΰϦζϜʹΑΔ • ྲྀߦΛࣗಈͰݕ஌ɾ֦ࢄ •

    Focus • ϓϩͷΤσΟλɾDJʹΑΔબۂ • ྲྀߦ΍ελΠϧͷఏҊ • ༷ʑͳίϯςΩετΛߟྀ • ༵೔ɼ࣌ؒɼΠϕϯτɼetc…
  6. ਪનʹΑΔϢʔβମݧͷઃܭ • λʔήςΟϯάࢪࡦ • Discovery • ࠶ੜϩάʹجͮ͘ηάϝϯςʔγϣϯ • ޷ΈʹԠͨ͡ϓϨΠϦετͷఏҊ •

    Related Playlists • ࠶ੜϩάʹجͮ͘ηάϝϯςʔγϣϯ • ΞϧΰϦζϜʹΑΔྨࣅϓϨΠϦετͷఏҊ • ຊ೔ͷ͓࿩
  7. 1. Իָ഑৴ࣄۀͷਪનγεςϜ ਪનγεςϜΛମݧͯ͠ΈΔ Ϩίʔυ࢈ۀͷਰ੎ ਪનγεςϜʹظ଴͢Δ໾ׂ ਪનΛ༻͍ͨϢʔβମݧͷઃܭ ਪનγεςϜʹؔ͢Δޡղ ਪનγεςϜͷߏ੒ཁૉ 2. ਪનγεςϜΛࢧ͑Δ਺ཧ

    ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ େن໛σʔλͱػցֶश ௿࣍ݩಛ௃ྔͷ֫ಘ ࠷ۙ๣୳ࡧ໰୊ γεςϜߏ੒ 3. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ
  8. ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ • https://mf.awa.fm/2NVL2eX • ϓϨΠϦετ • ར༻ऀ͕࡞੒ɾެ։Ͱ͖Δָۂͷྻ • ྦྷܭ1000ສ௒ •

    ࡞੒ɾߋ৽ස౓͕ߴ͍ • ؔ࿈ϓϨΠϦετ • ࠶ੜதͷ΋ͷͱࣅͨงғؾͷϓϨΠϦετ • ࿈ଓతͳԻָମݧͷఏڙ
  9. ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ • ߹ܭͷॲཧ࣌ؒΛখ͘͢͞Δ • ਺ཧతͳཁૉٕज़ • େن໛σʔλͱػցֶशʹ͍ͭͯ • ௿࣍ݩಛ௃ྔͷ֫ಘʹ͍ͭͯ •

    ࠷ۙ๣୳ࡧ໰୊ʹ͍ͭͯ • ݁Ռ • ϦΞϧλΠϜͳۙ๣ϓϨΠϦετ୳ࡧ͕Մೳʹ • ϓϨΠϦετ࡞੒ɾߋ৽Πϕϯτ͔Β਺ඵ
  10. େن໛σʔλͱػցֶशʹ͍ͭͯ • Ұൠతͳ࠷దԽ໰୊ • : ໨తؔ਺ • : ੍໿৚݅ •

    େҬత࠷దੑ f(θ) θ ∈ S f(θ) Minimize subject to θ ∈ S ⊂ ℝd Point!!
  11. • యܕతͳػցֶशʹ͓͚Δ࠷దԽ໰୊ • : ໨తؔ਺ɹˠ ଛࣦʹؔ͢Δ࿨ͷܗʹͳ͍ͬͯΔ͜ͱ͕͓͓͍ • : ੍໿৚݅ɹˠ ແ੍໿Ͱ͋Δ͜ͱ͕͓͓͍

    • େҬత࠷దੑɹˠ ࣮ݧతͳੑೳΛॏࢹʢ෼ྨਫ਼౓ɾֶश࣌ؒͳͲʣ f(θ) θ ∈ S େن໛σʔλͱػցֶशʹ͍ͭͯ f(θ) = n ∑ i=1 loss(xi |θ) Minimize θ ∈ ℝd Point!! subject to
  12. • యܕతͳػցֶशʹ͓͚Δ࠷దԽ໰୊ • : ໨తؔ਺ɹˠ ଛࣦʹؔ͢Δ࿨ͷܗʹͳ͍ͬͯΔ͜ͱ͕͓͓͍ • : ੍໿৚݅ɹˠ ແ੍໿Ͱ͋Δ͜ͱ͕͓͓͍

    • େҬత࠷దੑɹˠ ࣮ݧతͳੑೳΛॏࢹʢ෼ྨਫ਼౓ɾֶश࣌ؒͳͲʣ f(θ) θ ∈ S େن໛σʔλͱػցֶशʹ͍ͭͯ Minimize Point!! େن໛σʔλ͸͕͜͜େن໛ʂ f(θ) = n ∑ i=1 loss(xi |θ) θ ∈ ℝd subject to
  13. • ޯ഑߱Լ๏ • ޯ഑ͱεςοϓ෯ ΛٻΊΔ • Λߋ৽ • ͷܗΛ͍ͯ͠Δ৔߹͸֬཰తޯ഑͕࢖͑Δ •

    ࣗಈඍ෼ιϑτ΢ΣΞͷൃల • ࣾ಺Ή͚ͷγϯϓϧͳࣗಈඍ෼ͷઆ໌ • https://github.com/piroyoung/akb-study-meet-helloween/blob/master/automatic_differentiation.ipynb αk θ f(θ) = ∑ i loss(xi |θ) େن໛σʔλͱػցֶशʹ͍ͭͯ θk+1 ← θk − αk ⋅ ∇f(θk ) Point!!
  14. • ֬཰తޯ഑߱Լ๏ʢStochastic Gradient Descent = SGDʣ • ޯ഑ͷ୅ΘΓʹ֬཰తޯ഑Ͱۙࣅ͢Δํ๏ • ΦϯϥΠϯֶश

    • MiniBatchSGDͳͲͷόϦΤʔγϣϯ • লϝϞϦͳ࠷దԽ࣮૷ େن໛σʔλͱػցֶशʹ͍ͭͯ Point!! ∵ E[loss(xi |θk )] = 1 n ∇f(θk ) : ϥϯμϜʹҰͭબͿ i θk+1 ← θk − αk ⋅ ∇loss(xi |θk )
  15. • ߦಈϩά͔ΒΞΠςϜΛಛ௃෇͚͍ͨ • ಛ௃ྔ͸௒ߴ࣍ݩͷϕΫτϧͱͯ͠ಘΒΕΔ • ͜ͷ৔߹࣍ݩʹར༻ऀ਺ • ๲େͳۭؒܭࢉྔ • Ґ૬Λอͭ௿࣍ݩϕΫτϧΛಘΔํ๏͕ݕ౼͞Ε͍ͯΔ

    • • ͍͍ͩͨ d͸਺ඦສ͔Β਺ઍສɼk͸10͔Β500ఔ౓ • ʮιϑτΫϥελϦϯάʯ • ྫ • ඇෛ஋ߦྻҼࢠ෼ղ • skip-gram f : ℝd → ℝk ௿࣍ݩಛ௃ྔͷ֫ಘ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ ߪೖ ߪೖ ΞΠςϜ ߪೖ ΞΠςϜ ߪೖ ߪೖ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ    ΞΠςϜ    ΞΠςϜ    ←௒ߴ࣍ݩ→
  16. • ඇෛ஋ߦྻҼࢠ෼ղʢNon-negative Matrix Factorization) • ΞΠςϜ1ͱΞΠςϜ2͕ྨࣅͯ͠Δ͜ͱΛ௿࣍ݩͰදݱ • ͜ͷΑ͏ͳϕΫτϧΛ࣍ͷΑ͏ͳ࠷దԽ໰୊ΑΓಘΔ • ͜͜Ͱ

    ͸֤੒෼ͷ2৐࿨ ∥ ⋅ ∥2 ௿࣍ݩಛ௃ྔͷ֫ಘ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ ߪೖ ߪೖ ΞΠςϜ ߪೖ ΞΠςϜ ߪೖ ߪೖ ར༻ऀ ར༻ऀ ར༻ऀ ΞΠςϜ    ΞΠςϜ    ΞΠςϜ    ( 1 1 0 0 0 1 1 1 0 ) = ( 1 0 0 1 1 0 ) ⋅ ( 1 1 0 0 0 1) Item vector User vector ࢀߟɿɹhttp://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization ∥R − U ⋅ V∥2 Minimize s. t. U ≥ 0,V ≥ 0
  17. • FeatureEmbedding • ࣗવݴޠॲཧ෼໺ΑΓ • ʮ୯ޠͷੑ࣭͸ͦͷपลޠ͕ಛ௃෇͚Δʯ • ྛޝɼསɹʔʼɹ[ൽɼണ͘ɼ৯΂Δ] • ୯ޠͷܥྻ

    ʹରͯ͠ • ͜ͷ Λ௿࣍ݩͷಛ௃ྔͱֶͯ͠श͢Δ w1 , w2 , ⋯, wt , ⋯, wT vwt ∈ ℝk ௿࣍ݩಛ௃ྔͷ֫ಘ ࢲ͸ྛޝͷൽΛണ͍ͯ৯΂ͨ ࢲ͸སͷൽͷണ͍ͯ৯΂ͨ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. Maximize ͨͩ͠ 0 0 1 0 ⋮ 0 0
  18. • FeatureEmbedding • ҙຯ্ͷؔ܎ΛزԿֶతɾ୅਺తʹ࠶ݱ͢Δέʔε͕֬ೝ • ϞεΫϫ - ϩγΞ + ೔ຊ

    = ౦ژ ͳͲ ௿࣍ݩಛ௃ྔͷ֫ಘ ࢲ͸ྛޝͷൽΛണ͍ͯ৯΂ͨ ࢲ͸སͷൽͷണ͍ͯ৯΂ͨ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
  19. • FeatureEmbedding • ָۂͷ࠶ੜܥྻʹ΋ಉ͡ԾઆΛద༻͢Δ • ָۂΛपลͷָۂ͕ಛ௃෇͚Δ • ϓϨΠϦετ͸ָۂͷܥྻ • ָۂͷಛ௃ྔͷॏ৺ͳͲΛϓϨΠϦετͷಛ௃ྔʹ

    ௿࣍ݩಛ௃ྔͷ֫ಘ ࢲ͸ྛޝͷൽΛണ͍ͯ৯΂ͨ ࢲ͸སͷൽͷണ͍ͯ৯΂ͨ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
  20. • ࠷ۙ๣୳ࡧ໰୊ ࠷ۙ๣୳ࡧ໰୊ = {yi ∈ ℝk |i = 1,2,⋯,

    n} ༗ݶݸͷΞΠςϜͷಛ௃ྔͷू߹Λ ͱ͢Δɽ͜ͷͱ͖೚ҙͷΫΤϦϕΫτϧ ʹରͯ͠ x ∈ ℝk NN(x) = argmin y∈ d(x, y) ͳΔ ΛΈ͚͍ͭͨɽ NN(x) x NN(x) • ୯७ͳઢܗ୳ࡧʹ͸๲େͳܭࢉྔ͕ඞཁ • ۙࣅతͳ࠷ۙ๣୳ࡧ͕ఏҊ͞Ε͍ͯΔ • Approximately Nearest Neighborʢ=ANNʣ
  21. • TreeܥΞϧΰϦζϜ • ୳ࡧର৅ʹ໦ߏ଄Λߏங • ಛ௃ྔΛϝϞϦʹ֨ೲ͢Δ • ਺ઍສx਺ඦ࣍ݩ • ۭؒܭࢉྔେ

    • ໦ߏ଄ͷ࡞ΓํʹόϦΤʔγϣϯ͕͋Δ • B-tree • kd-tree • Randomized kd-tree ࠷ۙ๣୳ࡧ໰୊ x NN(x)
  22. • GraphܥΞϧΰϦζϜ • ୳ࡧର৅ʹ࿈݁ͳάϥϑߏ଄Λߏங • ಛ௃ྔΛϝϞϦʹ֨ೲ͢Δ • ༷ʑͳڑ཭Ͱ࢖͑Δ • ϥϯμϜͳॳظ఺͔Β͍ۙํͷϊʔυʹҠಈ͍ͯ͘͠

    • άϥϑߏ଄ͷ࡞Γํɾ୳ࡧํ๏ʹόϦΤʔγϣϯ͕͋Δ • ֊૚తͳάϥϑΛ༻͍Δ΋ͷ • ୳ࡧࡁΈͷϊʔυʹભҠ͠ͳ͍ํ๏ ࠷ۙ๣୳ࡧ໰୊ x NN(x) ࢀߟɿhttps://github.com/nmslib/nmslib
  23. • ௚ੵྔࢠԽʢProduct quantization) • ಛ௃ྔ͸۠෼తʹූ߸Խͯ͠ϝϞϦʹ֨ೲ • k-means๏ʹΑΔϋʔυΫϥελϦϯά • Ϋϥελຖͷॏ৺΋ϝϞϦʹ֨ೲ •

    ֤ූ߸ʹରԠ͢ΔΫϥελͷॏ৺Λ༻͍ͨಛ௃ྔ • ϢʔΫϦουڑ཭͕ۙࣅͰ͖Δ • • • ۠෼ຖͷϢʔΫϦουڑ཭ͷ࿨ͱҰக͢Δ͜ͱΛ࢖͏ q(y) : d(x, y) ≃ d (q(x), q(y)) d(x, y) ≃ d (x, q(y)) ࠷ۙ๣୳ࡧ໰୊ Jegou, Herve, Matthijs Douze, and Cordelia Schmid. "Product quantization for nearest neighbor search." IEEE transactions on pattern analysis and machine intelligence 33.1 (2010): 117-128.
  24. ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ Parser Learner Registerer Finder ඼࣭είΞ ϑΟϧλ ࠶ੜϩά Track→Vector Playlist→Vector

    HTTP Server ࠶ੜϩά Stream ϓϨΠϦετ ࡞੒ɾߋ৽Stream Request Response લॲཧ ָۂͷಛ௃ྔͷֶश ϓϨΠϦετͷಛ௃ྔ ਪનର৅ͱ͢Δ͔൱͔ ۙࣅ࠷ۙ๣୳ࡧ ϦΫΤετड෇ɾΩϟογϡ
  25. 1. Իָ഑৴ࣄۀͷਪનγεςϜ ਪનγεςϜΛମݧͯ͠ΈΔ Ϩίʔυ࢈ۀͷਰ੎ ਪનγεςϜʹظ଴͢Δ໾ׂ ਪનΛ༻͍ͨϢʔβମݧͷઃܭ ਪનγεςϜʹؔ͢Δޡղ ਪનγεςϜͷߏ੒ཁૉ 2. ਪનγεςϜΛࢧ͑Δ਺ཧ

    ؔ࿈ϓϨΠϦετ୳ࡧγεςϜ େن໛σʔλͱػցֶश ௿࣍ݩಛ௃ྔͷ֫ಘ ࠷ۙ๣୳ࡧ໰୊ γεςϜߏ੒ 3. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ
  26. ਪનγεςϜʹ·ͭΘΔιϑτ΢ΣΞ • ୅දతͳΦʔϓϯιʔειϑτ΢ΣΞ ػցֶश Tensorflow PyTorch Chainer ۙࣅ࠷ۙ๣୳ࡧ annoy: https://github.com/spotify/annoy

    nmslib: https://github.com/nmslib/nmslib • ࣮༻্͸ OSSʹ͸PythonͷΠϯλʔϑΣʔεΛ࣋ͭ΋ͷ͕ଟ͍͕࢖͍উख͕ѱ͍͜ͱ͕͋Δɽ ळ༿ݪϥϘͰ͸ΞϓϦέʔγϣϯͷཁٻʹԠͯ͡JavaͳͲͰ࠶࣮૷ͯ͠ར༻
  27. ࢀߟจݙ • IPFI, Global Music Report 2018 : ANNUAL STATE

    OF THE INDUSTRY, https://www.ifpi.org/downloads/GMR2018.pdf • Ұൠࣾஂ๏ਓ೔ຊϨίʔυڠձɿStatistics Trends – the Recording Industry in Japan, https://www.riaj.or.jp/f/pdf/issue/industry/ RIAJ2018.pdf • Cosley, D. et al. : Is Seeing Believing? : How Recommender System Interfaces Affect Users' Opinions, Proceedings of The SIGCHI Conference on Human Factors in Computing Systems, ACM (2003). • Agarwal, D. k. and Chen, B-C. : Statistical Methods for Recommender Systems, Cambridge University Press (2016). • Mikolov, T. et al. : Distributed Representations of Words and Phrases and Their Compositionality, Advances in Neural Information Processing Systems (2013). • Jégou, H., Douze, M. and Schmid, C. : Product Quantization for Nearest Neighbor Search, IEEE Transactions on Pattern Analysis and Machine Intelligence 33.1, 117-128 (2011). • Baylor, D. et al. : Tfx : A Tensorflow-based Production-scale Machine Learning Platform, Proceedings of The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2017)