Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -...
Search
Takuma OKAMOTO
August 24, 2021
1
480
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -実装 is all we need-
Takuma OKAMOTO
August 24, 2021
Tweet
Share
More Decks by Takuma OKAMOTO
See All by Takuma OKAMOTO
[INTERSPEECH 2024] Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet source-filter neural vocoder
takuma_okamoto
0
58
[Internoise 2023 (invited)] Multilingual sound spot synthesis systems
takuma_okamoto
0
200
マルチスポット再生 meets 多言語同時通訳システム
takuma_okamoto
0
190
[SPEASIP 2023招待講演] マルチスポット再生 meets 多言語ニューラル音声合成 ~実装 is ホンマに all we need~
takuma_okamoto
1
300
和歌山大学2022年度教養科目「世界の情報通信を知る」:音響・音声情報処理編
takuma_okamoto
0
190
[asj2022a] 16チャネル小型円形スピーカアレイを用いたマルチスポット再生システムの実装
takuma_okamoto
0
420
[asj2022a] Harmonic-Net+:高調波入力とLayerwise-Quasi-Periodic畳み込みを用いた基本周波数制御可能な高速ニューラルボコーダ
takuma_okamoto
0
250
[ASJ_23rd_summer_seminar] 高品質ニューラル音声合成×ト×音声マルチスポット再生 -やはり実装 is all we need-
takuma_okamoto
0
380
[ASRU 2021] Multi-stream HiFi-GAN with data-driven waveform decomposition
takuma_okamoto
0
550
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Building Adaptive Systems
keathley
38
2.2k
Learning to Love Humans: Emotional Interface Design
aarron
272
40k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Music & Morning Musume
bryan
46
6.1k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
Into the Great Unknown - MozCon
thekraken
31
1.5k
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Building Better People: How to give real-time feedback that sticks.
wjessup
363
19k
A better future with KSS
kneath
238
17k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Transcript
ܥྻมͰͰ͖ΔԻೝࣝɾԻ߹ʴ㱣 ࣮JTBMMXFOFFE Ԭຊຏ ࠃཱݚڀ։ൃ๏ਓɹใ௨৴ݚڀػߏ /*$5 ˞֤ςʔϚͷεϥΠυΧϥʔ ɹԻೝࣝܥ ɹԻ߹ܥ ɹԻऩܥ
ɹԻ੍ޚܥ UI"VH OE4VNNFS4FNJOBSPG"4+!Y
ࣗݾհ ԻೝࣝɾԻ߹ɾػց༁ͱ Կ͕͍͠ͷ͔ ҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ ܥྻมϞσϧͷԠ༻ྫͱՄೳੑ㱣 ܥྻมϞσϧͷ՝ "UUFOUJPOJTBMMZPVOFFE࣮JTBMMXFOFFE ·ͱΊ એɿདྷ݄ͷݚڀൃදձͰͷൃද
ຊͷൃද
ࣗݾհ Ԭຊຏ ݚڀςʔϚ Իڹ৴߸ॲཧɿಛʹԻऩɾ੍ޚ ϚΠΫϩϗϯɾεϐʔΧΞϨΠ৴߸ॲཧ ݄ʙ݄ɿ౦େֶઌԻใγεςϜ म࢜ɾത࢜ɾϙευΫ
݄ʙ݄ɿ/*$5ྟײϓϩδΣΫτˏ/*$5 ʙݱࡏɿࣗͷՊݚඅ!/*$5 Իॲཧ ݄ʙ݄ɿԻରɾݴޠࣝผ!/*$5 ݄ʙݱࡏɿχϡʔϥϧωοτϫʔΫΛ༻͍ͨԻ߹ɾԻܗੜ झຯ ҿΈձ ίϩφͷͨΊօແˠࣗ൩ऌঢ়ଶ ɼδϣΪϯά ݄ؒΩϩ հϖʔδ /*$5ݚڀ৬ɾݚڀٕज़৬࠾༻αΠτɿIUUQTXXXOJDUHPKQFNQMPZNFOUSFTFBSDIFSPLBNPUPUBLVNBIUNM ຊԻڹֶձࢽΩϟϦΞύεখಛूʮೋీΛ͏ͷԿీΛಘΔ ʯɿIUUQTEPJPSHKBTK@ ͦͷଞɿ݄ʙ݄ɿຊԻڹֶձֶੜɾएखϑΥʔϥϜװࣄձୈظද ͷΘΒ͡ݚڀੜ׆ˠඇৗʹָ͍͠ ͚Ͳ͍͠
/*$5͕ఏڙ͢ΔԻ༁ΞϓϦ7PJDF5SBΛྫʹ Իೝࣝ ສਓͷԻΛςΩετม ػց༁ ೖྗ͞ΕͨςΩετΛผͷݴޠม ςΩετԻ߹ ೖྗ͞ΕͨςΩετΛԻ৴߸ม ԻೝࣝɾԻ߹ɾػց༁ͱ
χϡʔϥϧԻ߹σϞ ࢲͷΘΓʹࠃࡍձٞϏσΦͰ͍͍ͯͨͩͨ͠
Noise level limited sub-modeling for diffusion probabilistic vocoders Takuma Okamoto1,
Tomoki Toda2,1, Yoshinori Shiga1* and Hisashi Kawai1 1National Institute of Information and Communications Technology (NICT), Japan 2Nagoya University, Japan *Y. Shiga is currently with the Tokyo Denki University, Japan WaveGrad + DiffWave
͠͞ͷͭɿೖྗͱग़ྗͷ͕͞શવҧ͏ ࣮ࡍʹԻ߹ͯ͠Έͨྫ ςΩετɿจࣈ ʴ۟ ɼԻૉྻɿ Իڹಛྔ ϝϧεϖΫτϩάϥϜ ɿϑϨʔϜ γϑτྔNT
Իܗ αϯϓϦϯάपL)[ ɿ αϯϓϧ Կ͕͍͠ͷ͔ ͋ΒΏΔݱ࣮Λɼͯࣗ͢ͷ΄͏Ͷ͡ۂ͛ͨͷͩ BSBZVSVHF/KJUTVPQBVTVCFUFKJCV/OPIPPFOFKJNBHFUBOPEB
ػց༁͔Βੜ·Εͨҙػߏ͖ܥྻมχϡʔϥϧωοτϫʔΫϞσϧ ೖྗ.ͱग़ྗ/ͷ͕͞ҧ͏ˠߦྻԋࢉʹΑͬͯมՄೳɿ<"Y.>Y<.Y/><"Y/> .ߦ/ྻͷมߦྻ ೖྗͱग़ྗͱͷҐஔؔΞϥΠϝϯτ ֶशʹΑΓࣗಈ֫ಘ͢ΔҰ؏ֶश ɹҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ σίʔμ ςΩετ Իૉྻ
. Τϯίʔμ / ɾҙػߏ "UUFOUJPO ߦྻ ɹೖྗͷͲ͜ʹҙΛ͚Δ͔ ɾΤϯίʔμͱσίʔμʹ ɹҙػߏΛ࣋ͨͤΔ ɹࣗݾҙػߏ 4FMGBUUFOUJPO ˣ ༁ɾೝࣝɾ߹͚ͩͰͳ͘ ༷ʑͳʹͯ׆༻
ܥྻมϞσϧͷಛ ೖྗͱग़ྗͷϖΞσʔλ͕͋Ε͍Ζ͍Ζͱ͑Δ ྫɿςΩετˡˠԻ ༁ɾೝࣝɾ߹Ͱಉ͡ωοτϫʔΫ͕͑ΔͷͰॳֶऀͷෑډ͍ ࢀর Ի͚ͩͰԠ༻༷ʑ ը૾ೝࣝͰଟ࠾༻
ऀμΠΞϦθʔγϣϯɿऩԻˠ୭͕͍͔ͭͨ͠ ෳऀԻೝࣝɿऩԻˠ୭͕ԿΛ͔ͨ͠ &OEUPFOEԻ༁ɿຊޠԻˠɹɹɹɹɹˠӳޠ༁Ի ԻڧௐɿϊΠδʔԻˠΫϦʔϯԻ Իݯɿࠞ߹ԻˠԻ ԻڹΠϕϯτݕग़ɿऩܗˠΠϕϯτϥϕϧ แஸͰࡊΛΔԻɼͳͲ ࣭มɿऀ"ͷԻˠɹɹɹɹɹɹˠऀ#ͷԻ ܥྻมϞσϧͷԠ༻ྫ ໊ ྫɿTQFFDIFOIBODFNFOU BUUFOUJPO Ͱݕࡧ͢Δͱ͍Ζ͍Ζݟ͔ͭΔ
ઢεϐʔΧΞϨΠΛ༻͍ͨϚϧνεϙοτ࠶ੜγεςϜ *$"441 +"DPVTU4PD"N ͜Μʹͪ )FMMP 㟬 Japanese area <latexit
sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> English area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Chinese area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Bright zone (Listening area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Dark zone (Quiet area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Իͷฉ͑͜Δ ΤϦΞ Իͷฉ͑͜ͳ͍ ΤϦΞ ॏͶ߹Θͤ
χϡʔϥϧԻ༁ٕज़ͱϚϧνεϙοτ࠶ੜٕज़ͷ༥߹ ଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ ݄/*$5খۚҪΦʔϓϯϋε αΠΤϯετʔΫ͓Αͼ"453&$σϞలࣔʹͯެ։ ͷ͕ͣίϩφʹΑΓதࢭ ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP
㟬 ӳޠ͚͕ͩ ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ
͜ΜͳԠ༻ྫΞϦ &OEUPFOEଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ Ԡ༻ઌ㱣 ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP 㟬 ӳޠ͚͕ͩ
ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ ܥྻมχϡʔϥϧωοτ
ԻೝࣝԻ߹ҙػߏߦྻͷॏΈ͕ର֯ʹͳΔඞཁ͋Γ ɹҙػߏਪఆ͕ࣦഊͨ͠߹ͷ೦ͳྫ ҙػߏͷਪఆ͕ࣦഊ͢Δͱʜ ࣮αʔϏεͰ͑ͳ͍ˠݚڀ՝
ݚڀΛՃͤ͞Δެ։࣮Ϟσϧ ओʹ1ZUIPO ɾίʔύε (JUIVCʹΑΔιʔείʔυͷެ։ ެ։ίʔύε Ի߹ ɿ-+4QFFDI ӳޠ ɼ7$5, ӳޠෳऀ
ɼ-JCSJ554 ӳޠෳऀ ɼ+465 ຊޠ ɼ +74 ຊޠෳऀ ɼʜ &41OFUFOEUPFOETQFFDIQSPDFTTJOHUPPMLJU ܥྻม &OEUPFOE ༻ԻॲཧπʔϧΩοτɿӳޠ͚ͩͲ ओ࠵ऀؚΊ ຊਓଟࢀը ԻೝࣝɼԻ߹ɼԻ༁ɼԻڧௐɼ࣭มɼԻݴޠཧղɼͰಉ͕ؔ͡ΘΕ͍ͯΔˠԠ༻ઌ㱣 Α͘Θ͔Βͳ͍ ࣮ͯ͠ΈΑ͏ (PPHMFͷܥྻม 5SBOTGPSNFS ͷจλΠτϧ l"UUFOUJPOJTBMMZPVOFFEz ˣ ࣮JTBMMXFOFFE ཧղ͢Δʹ࣮͋ΔͷΈମͰཧղ͢Δ ࣗͷ࣮͕ಈ͘ˠೝࣝͰ͖ΔPS߹Ͱ͖Δˠײແྔ
छͷਆث ܥྻมϞσϧষ ܥྻมϞσϧɾষ
χϡʔϥϧωοτͷجૅͪ͜Β %FFQ͔ͩΒਂւڕ
ܥྻมϞσϧͱ ೖྗͱܥྻ͕ҧ͏ग़ྗͷม͕ՄೳɿԻςΩετɼςΩετԻɼʜ ܥྻΛม͑ΔͨΊͷߦྻΛֶश ॴͷग़ྗΛಘΔʹೖྗͷͲ͜ʹ͢Δͷ͔Λֶश Ԡ༻ઌ㱣 ࣮JTBMMXFOFFE ཧղ͢ΔͨΊʹମͰ֮͑Δ࣮͋ΔͷΈ ࣮Ͱ͖Δڥेʹ͋ΔɿHJUIVCɼίʔύε ࠓޙͬͱ૿͑Δϋζ
Ի߹ɿ࣮ࡍʹ࡞ͬͨϞσϧͰ͠Όͬͨ࣌ײಈ ·ͱΊ
ߴ࠶ੜɾෳऀԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ ̍ɿদݪ ਆށେ.ɿ/*$5ݚमੜ ɼԬຊɼߴౡ ਆށେ ɼୌޱ ਆށେ ɼށా ໊େ ɼՏҪɼ
)J'J("/Ϙίʔμʹ͓͚Δ-1$/FUಛྔͷݕ౼ ݴޠ֫ಘΤʔδΣϯτ ԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ εϖγϟϧηογϣϯ ɿాத ౦େमྃੜ ɼԬຊɼࣰ࡚ ౦େ ɼ Իݴޠ֫ಘγεςϜͷͨΊͷ8BWF(SBEΛ༻͍ͨԻൃػߏͱൃԻదԠ $16ͷΈͰߴੜɾߴ࣭ຊޠχϡʔϥϧԻ߹Ϟσϧ 1ɿԬຊɼށా ໊େ ɼՏҪ ڧ੍ΞϥΠϝϯτ൛1BSBMMFM5BDPUSPOͱ)J'J("/Λ༻͍ͨ$16ܕϦΞϧλΠϜຊޠχϡʔϥϧςΩετԻ ߹γεςϜͷ࣮ ෳྖҬԻ੍ޚ ࣗͷՊݚඅςʔϚ ɿԬຊɼ Իͱ෦֎෦ಉ੍࣌ޚʹجͮ͘ϚϧνԻ੍ޚ એɿདྷ݄ͷݚڀൃදձͰͷൃද
Thank you for your !! Ԭຊຏ (Takuma OKAMOTO)ˏNICT e-mail: HP:
https://www.okamotocamera.com Twitter: @okamotocamerea
ԻೝࣝͱԻ߹ɼԿ͕͍͠ͷ͔ ڞ௨ɿೖྗͱग़ྗͷ͕͞શવҧ͏ˠԻڹಛྔɿඦϑϨʔϜɼςΩετɿेจࣈ ೝࣝɿશਓྨͷൃ͕λʔήοτ ലେͳଟ༷ੑ ɼόϥόϥ ߹ɿೖྗͱग़ྗͷ͞͞Βʹશવҧ͏ˠԻܗ L)[ ɿͨͬͨඵͰສαϯϓϧ ͜Ε·ͰͷԻೝࣝͱԻ߹ɿϋʔυϧߴ͍
ઐࣝଟඞཁ ม·Ͱʹ༷ʑͳϞδϡʔϧΛͦΕͧΕֶशɾ࿈݁ ڞ௨ɿԻڹಛྔͱςΩετͱͷҐஔؔΛֶश ΞϥΠϝϯτ ೝࣝɿԻڹϞσϧɼൃԻࣙॻɼݴޠϞσϧɼσίʔμʔ ߹ɿԻૉܧଓϞσϧɼԻڹϞσϧɼܗੜϞσϧ Ϙίʔμ ܥྻมϞσϧͷొɿϋʔυϧ͍ ઐࣝχϡʔϥϧωοτ͕Χόʔ ͭͷχϡʔϥϧωοτʹΑΔҰׅม͕Մೳ ೝࣝɿԻڹಛྔˠ<ܥྻมϞσϧ>ˠ୯ޠྻ ςΩετ ߹ɿςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧ>ˠԻڹಛྔˠ<ܗੜϞσϧ>ˠԻܗ ɹɹɹςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧʴܗੜϞσϧ>ˠԻܗ ɿԻೝࣝɾԻ߹ͷ͜Ε·Ͱͱݱࡏ