Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TTS Skins: Speaker Conversion via ASR
Search
peisuke
November 20, 2020
Technology
460
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
TTS Skins: Speaker Conversion via ASR
Interspeech2020音声読み会発表資料
peisuke
November 20, 2020
More Decks by peisuke
See All by peisuke
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
peisuke
0
250
VGGT: Visual Geometry Grounded Transformer
peisuke
1
1.8k
AI for Kids:小学生に画像認識を教えてみた話
peisuke
1
100
LangGraphで始めるマルチエージェントシステム
peisuke
14
5k
Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections
peisuke
9
1.6k
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
peisuke
0
14k
LangChain Toolsの運用と改善
peisuke
5
3k
GNeRF: GAN-based Neural Radiance Field without Posed Camera
peisuke
1
850
A Quantum Computational Approach to Correspondence Problems on Point Sets
peisuke
0
780
Other Decks in Technology
See All in Technology
気軽に使える"情報のハブ"としてのNotion活用 〜フロー情報の集積点 と、 Claude Code × Notion AI〜
syucream
1
200
AIのReact習熟度を測る
uhyo
2
690
技術・能力を向上する原理原則 #きのこセッションa #きのこ2026
bash0c7
0
130
AIに障害切り分けを全部やってもらった。 。 。 。
estie
0
160
入門!AWS Blocks
ysuzuki
1
190
AIネイティブな開発のサプライチェーンリスク対策 〜激動の開発現場でリスクに立ち向かう〜【ZennFes】
cscengineer
PRO
2
160
元・セキュリティ学習経験0大学生による業務紹介 / An Introduction to the Job by a Former College Student with Zero Security Training Experience
nttcom
0
190
AI-DLCを “そのまま導入しなかった”話 ~組織に合わせてアジャストした 私たちの実践共有~
hiroramos4
PRO
1
430
脱SaaS!FDEを支えるプロビジョニングと分離設計
knih
0
300
AIチャットの改善から見えた、良いAI体験とは / What Constitutes a Good AI Experience: Insights from Improving AI Chat
kubode
0
120
AIが自律的に回る開発ループを設計してチーム開発に組み込む
nekorush14
0
130
いまさら聞けない「仕様駆動開発入門」 〜AI活用時代の開発プロセスを考える〜
findy_eventslides
2
200
Featured
See All Featured
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
540
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
A designer walks into a library…
pauljervisheath
211
24k
Documentation Writing (for coders)
carmenintech
77
5.4k
The Invisible Side of Design
smashingmag
301
52k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2.1k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
590
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
420
4 Signs Your Business is Dying
shpigford
187
22k
It's Worth the Effort
3n
188
29k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
Transcript
TTS Skins: Speaker Conversion via ASR Authors: A. Polyak, L.
Wolf, Y. Taigman presenter: @peisuke
2016 ABEJA 2016 Twitter @peisuke Github https://github.com/peisuke Qiita https://qiita.com/peisuke SlideShare
https://www.slideshare.net/FujimotoKeisuke
• • TTS Skins: Speaker Conversion via ASR • •
• ASR WaveNet • • ASR
• Text-to-Speech 100 • • Text-to-Speech • • TTS
• ASR F0
• • Jasper: An End-to-End Convolutional Neural Acoustic Model •
https://github.com/NVIDIA/OpenSeq2Seq • • 1DConv-BN-ReL • Skip-Connection • Pre-trained •
• WaveNet • condition • https://github.com/NVIDIA/nv-WaveNet • • • •
F0 •
• • Look up table pytorch Embedding • • •
F0 • • fine tuning
• • LibriTTS VCTK • • Many-to-many seen unseen •
TTS • • • MOS • Mel cepstral distortion • Speaker classification • • WaveNet AutoEncoder • PPG
Seen • Seen-to-seen • A B • • Identification F0
LibriTTS VCTK MOS MCD Identification MOS MCD Identification Full method 3.78±0.83 96.12 4.08±0.75 8.76±1.72 98.97 w/o F0 3.61±0.83 96.96 3.59±0.96 8.99±1.5 96.89 AE baseline 2.89±0.88 29.19 3.46±1.07 9.45±1.63 69.26 PPG 2.82±0.91 94.01 2.67±0.93 9.19±1.50 98.77 PPG2 2.87±1.00 95.77 3.03±1.06 9.18±1.52 96.24
Uneen • Uneen-to-seen • A B • LibriTTS VCTK MOS
MCD Identification MOS MCD Identification Full method 3.70±0.80 97.10 4.05±0.74 8.94±1.53 98.33 w/o F0 3.67±0.82 97.15 3.62±0.99 9.25±1.62 95.69 AE baseline 3.02±0.89 32.55 3.83±0.91 9.65±1.51 66.20 PPG 2.79±0.93 94.05 2.89±0.93 9.45±1.45 97.45 PPG2 2.71±0.93 95.43 3.19±1.04 9.79±1.86 97.25
TTS • TTS • TTS LibriTTS VCTK MOS MCD Identification
MOS MCD Identification Original TTS 4.25±0.77 10.12±1.27 4.37±0.80 14.52±2.40 Full method 3.67±0.81 8.13±0.95 96.06 4.17±0.88 12.68±2.17 99.25 w/o F0 3.47±0.76 8.43±0.97 96.66 3.75±1.07 13.06±2.26 96.36 AE baseline 3.02±0.84 9.38±1.09 60.26 3.85±1.05 13.81±2.29 75.56 PPG 2.91±0.94 8.52±0.93 96.63 3.50±0.83 12.45±1.92 98.36 PPG2 2.85±0.87 8.76±1.06 95.08 3.66±1.03 12.57±2.10 97.62
• The voice conversion challenge 2018 • 1 81 4-5
• Hub Spoke • Hub Spoke MOS Similarity MOS Similarity Ours 3.84±0.85 2.87±1.14 4.00±0.55 3.14±0.97 N10 3.92±0.75 2.83±1.20 3.98±0.52 3.13±0.97 N17 3.27±0.95 2.77±1.17 3.40±0.88 3.05±0.96
• • TTS • ASR F0 Conditional WaveNet •