Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -...
Search
Takuma OKAMOTO
August 24, 2021
1
490
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -実装 is all we need-
Takuma OKAMOTO
August 24, 2021
Tweet
Share
More Decks by Takuma OKAMOTO
See All by Takuma OKAMOTO
2025/1/30「システムデザイン論」@東京都立大学日野キャンパス
takuma_okamoto
0
89
[INTERSPEECH 2024] Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet source-filter neural vocoder
takuma_okamoto
0
97
[Internoise 2023 (invited)] Multilingual sound spot synthesis systems
takuma_okamoto
0
240
マルチスポット再生 meets 多言語同時通訳システム
takuma_okamoto
0
190
[SPEASIP 2023招待講演] マルチスポット再生 meets 多言語ニューラル音声合成 ~実装 is ホンマに all we need~
takuma_okamoto
1
320
和歌山大学2022年度教養科目「世界の情報通信を知る」:音響・音声情報処理編
takuma_okamoto
0
210
[asj2022a] 16チャネル小型円形スピーカアレイを用いたマルチスポット再生システムの実装
takuma_okamoto
0
450
[asj2022a] Harmonic-Net+:高調波入力とLayerwise-Quasi-Periodic畳み込みを用いた基本周波数制御可能な高速ニューラルボコーダ
takuma_okamoto
0
280
[ASJ_23rd_summer_seminar] 高品質ニューラル音声合成×ト×音声マルチスポット再生 -やはり実装 is all we need-
takuma_okamoto
0
400
Featured
See All Featured
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Designing for Performance
lara
604
68k
How STYLIGHT went responsive
nonsquared
98
5.4k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Testing 201, or: Great Expectations
jmmastey
42
7.2k
The Power of CSS Pseudo Elements
geoffreycrofte
75
5.5k
Optimizing for Happiness
mojombo
376
70k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.2k
A Philosophy of Restraint
colly
203
16k
Transcript
ܥྻมͰͰ͖ΔԻೝࣝɾԻ߹ʴ㱣 ࣮JTBMMXFOFFE Ԭຊຏ ࠃཱݚڀ։ൃ๏ਓɹใ௨৴ݚڀػߏ /*$5 ˞֤ςʔϚͷεϥΠυΧϥʔ ɹԻೝࣝܥ ɹԻ߹ܥ ɹԻऩܥ
ɹԻ੍ޚܥ UI"VH OE4VNNFS4FNJOBSPG"4+!Y
ࣗݾհ ԻೝࣝɾԻ߹ɾػց༁ͱ Կ͕͍͠ͷ͔ ҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ ܥྻมϞσϧͷԠ༻ྫͱՄೳੑ㱣 ܥྻมϞσϧͷ՝ "UUFOUJPOJTBMMZPVOFFE࣮JTBMMXFOFFE ·ͱΊ એɿདྷ݄ͷݚڀൃදձͰͷൃද
ຊͷൃද
ࣗݾհ Ԭຊຏ ݚڀςʔϚ Իڹ৴߸ॲཧɿಛʹԻऩɾ੍ޚ ϚΠΫϩϗϯɾεϐʔΧΞϨΠ৴߸ॲཧ ݄ʙ݄ɿ౦େֶઌԻใγεςϜ म࢜ɾത࢜ɾϙευΫ
݄ʙ݄ɿ/*$5ྟײϓϩδΣΫτˏ/*$5 ʙݱࡏɿࣗͷՊݚඅ!/*$5 Իॲཧ ݄ʙ݄ɿԻରɾݴޠࣝผ!/*$5 ݄ʙݱࡏɿχϡʔϥϧωοτϫʔΫΛ༻͍ͨԻ߹ɾԻܗੜ झຯ ҿΈձ ίϩφͷͨΊօແˠࣗ൩ऌঢ়ଶ ɼδϣΪϯά ݄ؒΩϩ հϖʔδ /*$5ݚڀ৬ɾݚڀٕज़৬࠾༻αΠτɿIUUQTXXXOJDUHPKQFNQMPZNFOUSFTFBSDIFSPLBNPUPUBLVNBIUNM ຊԻڹֶձࢽΩϟϦΞύεখಛूʮೋీΛ͏ͷԿీΛಘΔ ʯɿIUUQTEPJPSHKBTK@ ͦͷଞɿ݄ʙ݄ɿຊԻڹֶձֶੜɾएखϑΥʔϥϜװࣄձୈظද ͷΘΒ͡ݚڀੜ׆ˠඇৗʹָ͍͠ ͚Ͳ͍͠
/*$5͕ఏڙ͢ΔԻ༁ΞϓϦ7PJDF5SBΛྫʹ Իೝࣝ ສਓͷԻΛςΩετม ػց༁ ೖྗ͞ΕͨςΩετΛผͷݴޠม ςΩετԻ߹ ೖྗ͞ΕͨςΩετΛԻ৴߸ม ԻೝࣝɾԻ߹ɾػց༁ͱ
χϡʔϥϧԻ߹σϞ ࢲͷΘΓʹࠃࡍձٞϏσΦͰ͍͍ͯͨͩͨ͠
Noise level limited sub-modeling for diffusion probabilistic vocoders Takuma Okamoto1,
Tomoki Toda2,1, Yoshinori Shiga1* and Hisashi Kawai1 1National Institute of Information and Communications Technology (NICT), Japan 2Nagoya University, Japan *Y. Shiga is currently with the Tokyo Denki University, Japan WaveGrad + DiffWave
͠͞ͷͭɿೖྗͱग़ྗͷ͕͞શવҧ͏ ࣮ࡍʹԻ߹ͯ͠Έͨྫ ςΩετɿจࣈ ʴ۟ ɼԻૉྻɿ Իڹಛྔ ϝϧεϖΫτϩάϥϜ ɿϑϨʔϜ γϑτྔNT
Իܗ αϯϓϦϯάपL)[ ɿ αϯϓϧ Կ͕͍͠ͷ͔ ͋ΒΏΔݱ࣮Λɼͯࣗ͢ͷ΄͏Ͷ͡ۂ͛ͨͷͩ BSBZVSVHF/KJUTVPQBVTVCFUFKJCV/OPIPPFOFKJNBHFUBOPEB
ػց༁͔Βੜ·Εͨҙػߏ͖ܥྻมχϡʔϥϧωοτϫʔΫϞσϧ ೖྗ.ͱग़ྗ/ͷ͕͞ҧ͏ˠߦྻԋࢉʹΑͬͯมՄೳɿ<"Y.>Y<.Y/><"Y/> .ߦ/ྻͷมߦྻ ೖྗͱग़ྗͱͷҐஔؔΞϥΠϝϯτ ֶशʹΑΓࣗಈ֫ಘ͢ΔҰ؏ֶश ɹҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ σίʔμ ςΩετ Իૉྻ
. Τϯίʔμ / ɾҙػߏ "UUFOUJPO ߦྻ ɹೖྗͷͲ͜ʹҙΛ͚Δ͔ ɾΤϯίʔμͱσίʔμʹ ɹҙػߏΛ࣋ͨͤΔ ɹࣗݾҙػߏ 4FMGBUUFOUJPO ˣ ༁ɾೝࣝɾ߹͚ͩͰͳ͘ ༷ʑͳʹͯ׆༻
ܥྻมϞσϧͷಛ ೖྗͱग़ྗͷϖΞσʔλ͕͋Ε͍Ζ͍Ζͱ͑Δ ྫɿςΩετˡˠԻ ༁ɾೝࣝɾ߹Ͱಉ͡ωοτϫʔΫ͕͑ΔͷͰॳֶऀͷෑډ͍ ࢀর Ի͚ͩͰԠ༻༷ʑ ը૾ೝࣝͰଟ࠾༻
ऀμΠΞϦθʔγϣϯɿऩԻˠ୭͕͍͔ͭͨ͠ ෳऀԻೝࣝɿऩԻˠ୭͕ԿΛ͔ͨ͠ &OEUPFOEԻ༁ɿຊޠԻˠɹɹɹɹɹˠӳޠ༁Ի ԻڧௐɿϊΠδʔԻˠΫϦʔϯԻ Իݯɿࠞ߹ԻˠԻ ԻڹΠϕϯτݕग़ɿऩܗˠΠϕϯτϥϕϧ แஸͰࡊΛΔԻɼͳͲ ࣭มɿऀ"ͷԻˠɹɹɹɹɹɹˠऀ#ͷԻ ܥྻมϞσϧͷԠ༻ྫ ໊ ྫɿTQFFDIFOIBODFNFOU BUUFOUJPO Ͱݕࡧ͢Δͱ͍Ζ͍Ζݟ͔ͭΔ
ઢεϐʔΧΞϨΠΛ༻͍ͨϚϧνεϙοτ࠶ੜγεςϜ *$"441 +"DPVTU4PD"N ͜Μʹͪ )FMMP 㟬 Japanese area <latexit
sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> English area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Chinese area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Bright zone (Listening area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Dark zone (Quiet area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Իͷฉ͑͜Δ ΤϦΞ Իͷฉ͑͜ͳ͍ ΤϦΞ ॏͶ߹Θͤ
χϡʔϥϧԻ༁ٕज़ͱϚϧνεϙοτ࠶ੜٕज़ͷ༥߹ ଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ ݄/*$5খۚҪΦʔϓϯϋε αΠΤϯετʔΫ͓Αͼ"453&$σϞలࣔʹͯެ։ ͷ͕ͣίϩφʹΑΓதࢭ ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP
㟬 ӳޠ͚͕ͩ ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ
͜ΜͳԠ༻ྫΞϦ &OEUPFOEଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ Ԡ༻ઌ㱣 ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP 㟬 ӳޠ͚͕ͩ
ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ ܥྻมχϡʔϥϧωοτ
ԻೝࣝԻ߹ҙػߏߦྻͷॏΈ͕ର֯ʹͳΔඞཁ͋Γ ɹҙػߏਪఆ͕ࣦഊͨ͠߹ͷ೦ͳྫ ҙػߏͷਪఆ͕ࣦഊ͢Δͱʜ ࣮αʔϏεͰ͑ͳ͍ˠݚڀ՝
ݚڀΛՃͤ͞Δެ։࣮Ϟσϧ ओʹ1ZUIPO ɾίʔύε (JUIVCʹΑΔιʔείʔυͷެ։ ެ։ίʔύε Ի߹ ɿ-+4QFFDI ӳޠ ɼ7$5, ӳޠෳऀ
ɼ-JCSJ554 ӳޠෳऀ ɼ+465 ຊޠ ɼ +74 ຊޠෳऀ ɼʜ &41OFUFOEUPFOETQFFDIQSPDFTTJOHUPPMLJU ܥྻม &OEUPFOE ༻ԻॲཧπʔϧΩοτɿӳޠ͚ͩͲ ओ࠵ऀؚΊ ຊਓଟࢀը ԻೝࣝɼԻ߹ɼԻ༁ɼԻڧௐɼ࣭มɼԻݴޠཧղɼͰಉ͕ؔ͡ΘΕ͍ͯΔˠԠ༻ઌ㱣 Α͘Θ͔Βͳ͍ ࣮ͯ͠ΈΑ͏ (PPHMFͷܥྻม 5SBOTGPSNFS ͷจλΠτϧ l"UUFOUJPOJTBMMZPVOFFEz ˣ ࣮JTBMMXFOFFE ཧղ͢Δʹ࣮͋ΔͷΈମͰཧղ͢Δ ࣗͷ࣮͕ಈ͘ˠೝࣝͰ͖ΔPS߹Ͱ͖Δˠײແྔ
छͷਆث ܥྻมϞσϧষ ܥྻมϞσϧɾষ
χϡʔϥϧωοτͷجૅͪ͜Β %FFQ͔ͩΒਂւڕ
ܥྻมϞσϧͱ ೖྗͱܥྻ͕ҧ͏ग़ྗͷม͕ՄೳɿԻςΩετɼςΩετԻɼʜ ܥྻΛม͑ΔͨΊͷߦྻΛֶश ॴͷग़ྗΛಘΔʹೖྗͷͲ͜ʹ͢Δͷ͔Λֶश Ԡ༻ઌ㱣 ࣮JTBMMXFOFFE ཧղ͢ΔͨΊʹମͰ֮͑Δ࣮͋ΔͷΈ ࣮Ͱ͖Δڥेʹ͋ΔɿHJUIVCɼίʔύε ࠓޙͬͱ૿͑Δϋζ
Ի߹ɿ࣮ࡍʹ࡞ͬͨϞσϧͰ͠Όͬͨ࣌ײಈ ·ͱΊ
ߴ࠶ੜɾෳऀԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ ̍ɿদݪ ਆށେ.ɿ/*$5ݚमੜ ɼԬຊɼߴౡ ਆށେ ɼୌޱ ਆށେ ɼށా ໊େ ɼՏҪɼ
)J'J("/Ϙίʔμʹ͓͚Δ-1$/FUಛྔͷݕ౼ ݴޠ֫ಘΤʔδΣϯτ ԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ εϖγϟϧηογϣϯ ɿాத ౦େमྃੜ ɼԬຊɼࣰ࡚ ౦େ ɼ Իݴޠ֫ಘγεςϜͷͨΊͷ8BWF(SBEΛ༻͍ͨԻൃػߏͱൃԻదԠ $16ͷΈͰߴੜɾߴ࣭ຊޠχϡʔϥϧԻ߹Ϟσϧ 1ɿԬຊɼށా ໊େ ɼՏҪ ڧ੍ΞϥΠϝϯτ൛1BSBMMFM5BDPUSPOͱ)J'J("/Λ༻͍ͨ$16ܕϦΞϧλΠϜຊޠχϡʔϥϧςΩετԻ ߹γεςϜͷ࣮ ෳྖҬԻ੍ޚ ࣗͷՊݚඅςʔϚ ɿԬຊɼ Իͱ෦֎෦ಉ੍࣌ޚʹجͮ͘ϚϧνԻ੍ޚ એɿདྷ݄ͷݚڀൃදձͰͷൃද
Thank you for your !! Ԭຊຏ (Takuma OKAMOTO)ˏNICT e-mail: HP:
https://www.okamotocamera.com Twitter: @okamotocamerea
ԻೝࣝͱԻ߹ɼԿ͕͍͠ͷ͔ ڞ௨ɿೖྗͱग़ྗͷ͕͞શવҧ͏ˠԻڹಛྔɿඦϑϨʔϜɼςΩετɿेจࣈ ೝࣝɿશਓྨͷൃ͕λʔήοτ ലେͳଟ༷ੑ ɼόϥόϥ ߹ɿೖྗͱग़ྗͷ͞͞Βʹશવҧ͏ˠԻܗ L)[ ɿͨͬͨඵͰສαϯϓϧ ͜Ε·ͰͷԻೝࣝͱԻ߹ɿϋʔυϧߴ͍
ઐࣝଟඞཁ ม·Ͱʹ༷ʑͳϞδϡʔϧΛͦΕͧΕֶशɾ࿈݁ ڞ௨ɿԻڹಛྔͱςΩετͱͷҐஔؔΛֶश ΞϥΠϝϯτ ೝࣝɿԻڹϞσϧɼൃԻࣙॻɼݴޠϞσϧɼσίʔμʔ ߹ɿԻૉܧଓϞσϧɼԻڹϞσϧɼܗੜϞσϧ Ϙίʔμ ܥྻมϞσϧͷొɿϋʔυϧ͍ ઐࣝχϡʔϥϧωοτ͕Χόʔ ͭͷχϡʔϥϧωοτʹΑΔҰׅม͕Մೳ ೝࣝɿԻڹಛྔˠ<ܥྻมϞσϧ>ˠ୯ޠྻ ςΩετ ߹ɿςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧ>ˠԻڹಛྔˠ<ܗੜϞσϧ>ˠԻܗ ɹɹɹςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧʴܗੜϞσϧ>ˠԻܗ ɿԻೝࣝɾԻ߹ͷ͜Ε·Ͱͱݱࡏ