Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
音声情報処理に便利な (Python) パッケージやソフトウェア
Search
Akira Tamamori
December 30, 2020
Research
3
880
音声情報処理に便利な (Python) パッケージやソフトウェア
Tokyo BISH Bashの資料から独立させたもの
Akira Tamamori
December 30, 2020
Tweet
Share
More Decks by Akira Tamamori
See All by Akira Tamamori
音声認識と音声合成の超入門
tam17aki
0
450
Tokyo BISH Bash #02 音声情報処理と音声変換技術入門
tam17aki
2
2.1k
[ICASSP2020音響音声読み会] State-Space Gaussian Process for Drift Estimation in Stochastic Differential Equations
tam17aki
0
560
Other Decks in Research
See All in Research
プロシェアリング白書2025_PROSHARING_REPORT_2025
circulation
1
390
Sosiaalisen median katsaus 03/2025 + tekoäly
hponka
0
730
クラウドのテレメトリーシステム研究動向2025年
yuukit
3
830
さくらインターネット研究所 アップデート2025年
matsumoto_r
PRO
0
500
IM2024
mamoruk
0
260
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
satai
3
110
NLP2025SharedTask翻訳部門
moriokataku
0
270
[輪講] Transformer Layers as Painters
nk35jk
4
750
チャッドローン:LLMによる画像認識を用いた自律型ドローンシステムの開発と実験 / ec75-morisaki
yumulab
1
140
BtoB プロダクトにおけるインサイトマネジメントの必要性 現場ドリブンなカミナシがインサイトマネジメントに取り組むワケ / Why field-driven Kaminashi is working on insight management
kaminashi
1
380
言語モデルによるAI創薬の進展 / Advancements in AI-Driven Drug Discovery Using Language Models
tsurubee
2
290
Security, Privacy, and Trust in Generative AI
tsubasashi
0
110
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.5k
4 Signs Your Business is Dying
shpigford
183
22k
The Cult of Friendly URLs
andyhume
78
6.3k
Testing 201, or: Great Expectations
jmmastey
42
7.5k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
34
2.2k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
178
53k
Code Review Best Practice
trishagee
67
18k
How to Ace a Technical Interview
jacobian
276
23k
GraphQLとの向き合い方2022年版
quramy
46
14k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Building Applications with DynamoDB
mza
94
6.3k
Transcript
ԻใॲཧԻมʹ ศརͳ (Python) ύοέʔδ ιϑτΣΞͨͪ 1 ༻ײͳͲࢲݟΛؚΈ·͢
TPYQZTPY • ίϚϯυϥΠϯ͔ΒϑΥʔϚοτมͳͲΛ͓खܰʹ • ϑΥʔϚοτมʢwav to mp3 ͳͲʣ • ݁߹ϛοΫεɺτϦϛϯά
Մೳ • όονॲཧָʢγΣϧεΫϦϓτͳͲʣ • Pythonϥούʔ pysox͋Δ • Πϯετʔϧ • brew install sox ͳͲ • pip install sox ← pysox ͷΠϯετʔϧ͜Ε 2
MJCSPTBʢ͜ΕຊʹΦεεϝʣ • Ի/ԻָͷੳʹศརͳϞδϡʔϧ͕ଗͬͨύοέʔδ • ެࣜϚχϡΞϧɾνϡʔτϦΞϧͷॆ࣮ॿ͔Δ • ݸਓతʹΑ͘͏ػೳ • ܗදࣔɺεϖΫτϩάϥϜදࣔ •
Իಛྔநग़ʢରϝϧεϖΫτϩάϥϜʣ • Πϯετʔϧ pip install librosa • ެࣜϖʔδ https://librosa.org/librosa/index.html 3
1Z8PSME • Իͷੳ࠶߹Λߦ͏Ϙίʔμʔͷύοέʔδ • ԻΛʮ৭ɾͷߴ͞ɾͷ͔͢Εʯͷ֤ʹղ͠࠶߹ • C++൛ͷPythonϥούʔ • Իͷಛྔநग़ʹ͑ͯศར ⇒
PySPTKʢޙड़ʣΑΓ࣭ͷΑ͍εϖΫτϧแབྷ • Πϯετʔϧ pip install pyworld • ެࣜϖʔδ https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder 4
1Z"VEJP • ετϦʔϜԻ / ࠶ੜʹศརͳύοέʔδ • ϦΞϧλΠϜͷԻೖྗɾԻग़ྗʹ͑Δ • ϦΞϧλΠϜԻม with
PythonͳͲՄೳ • Πϯετʔϧ • pip install pyaudio ※ཁportaudio (e.g., brew install portaudio) 5
1Z"VEJPͱ1Z8PSMEͷΈ߹Θͤ • ؆қ൛ͷϘΠενΣϯδϟʔ • ؆қϘΠενΣϯδϟʔͷεΫϦϓτΛվྑɿPyQt5ͷεϥΠμʔʹΑΓ ϐονͱϑΥϧϚϯτΛϦΞϧλΠϜௐ͢ΔػೳΛՃʢฐϒϩάʣ • banibiku • Zoom৴͚ʹ̎࣍ݩΩϟϥʹͳΓ͖Δ͜ͱΛࢦͨ͠ϓϩδΣΫτ
• scripts/voice_converter.py ͕ྑ͍ײ͡ͷϘΠενΣϯδϟʔ → ฐϒϩάͷαϯϓϧεΫϦϓτͷόάϑΟοΫεؚ͕·ΕΔ 6 https://tam5917.hatenablog.com/entry/2019/04/30/213321 https://github.com/peisuke/babiniku
1Z415, • ԻใॲཧπʔϧΩοτSPTKͷPythonϥούʔ • SPTKࣗମLinuxίϚϯυ܈ • Իڹಛྔநग़ʹ͏ͷ͕ศར • Իੳ߹Ͱ͖Δ͕ɺ࣭ࣗମWORLDͷ΄͏্͕ •
Πϯετʔϧ pip install pysptk • ެࣜϖʔδ https://pysptk.readthedocs.io/en/latest/ 7
OONOLXJJ <OBOBNJO LBXBJJ> • DNNԻ߹ʹཱͭϞδϡʔϧΛूΊͨύοέʔδ • ͲͪΒ͔ͱ͍͏ͱݚڀ༻్ • લॲཧԻڹಛྔநग़ͷΫϥε͕Ұ௨Γଗ͍ͬͯΔ •
จͷ࠶ݱ࣮Λ͢Δͱ͖ͳͲʹେ͍ʹཱͭ • Πϯετʔϧ pip install nnmnkwii • ެࣜϖʔδ https://r9y9.github.io/nnmnkwii/stable/index.html 8
1ZEVC • Pydub • ܗฤूʹศརͳϞδϡʔϧΛूΊͨύοέʔδ • αϙʔτ͢ΔϑΝΠϧܗࣜ๛ʢwav, mp3, mp4, wma,
aac, ...ʣ • ػೳ Γग़͠ɺׂɺϛοΫεɺϑΣʔυΠϯΞτɺແԻૠೖɺͳͲͳͲ • Ұ෦ͷػೳ pysoxͷ΄͏͕ߴͱ͍͏ӟ?ʢະ֬ೝʣ • Πϯετʔϧ pip install pydub • ެࣜϖʔδ http://pydub.com/ 9
TQSPDLFU • ౷ܭత࣭มͷͨΊͷπʔϧΩοτ (not ύοέʔδ) • ͲͪΒ͔ͱ͍͏ͱݚڀ༻ ʢMITϥΠηϯεʣ • ݚڀͷʮϕʔεϥΠϯʯߏஙʹ࠷ద
• ެࣜϖʔδ https://github.com/k2kobayashi/sprocket • ղઆจ ʰ౷ܭత࣭มιϑτΣΞೖʱ https://www.jstage.jst.go.jp/article/isciesci/62/2/62_69/_article/-char/ja/ • νϡʔτϦΞϧ (εϥΠυ & notebook) https://github.com/kan-bayashi/INTERSPEECH19_TUTORIAL 10
"VEBDJUZ ೖΕ͓ͯ͘ͱ҆৺ • ϑϦʔͷܗฤूιϑτɺϚϧνϓϥοτϑΥʔϜ • ๛ͳαϯυΤϑΣΫτՃػೳ • ެࣜϖʔδ https://www.audacityteam.org/ 11
(16্ͰԻॲཧ͍ͨ͠Ϛϯʹ ͓͢͢Ίͷύοέʔδ 12 ͓·͚
UPSDIBVEJP • Pytorchެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥϥΠϒϥϦ • PytorchܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • ެࣜϖʔδ https://pytorch.org/audio/stable/index.html 13
UGTJHOBM • TensorFlowެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥͷؔ܈ • TFܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • FFT/iFFT, DCT, MDCT, STFTͳͲ
• ެࣜϖʔδ https://www.tensorflow.org/api_docs/python/tf/signal 14
UPSDIMJCSPTB • PytorchΛόοΫΤϯυʹͯ͠librosaΛGPU্Ͱಈ͔͢ • Πϯετʔϧ pip install torchlibrosa • ެࣜϖʔδ
https://github.com/qiuqiangkong/torchlibrosa 15
LBQSF • Kerasʢͱ͍͏͔TFʣΛόοΫΤϯυʹͯ͠Իॲཧ͢Δ • STFTiSTFTɺϝϧεϖΫτϩάϥϜͳͲ • CQTͳͲͳ͍ • ެࣜϖʔδ https://github.com/keunwoochoi/kapre
• Πϯετʔϧ pip install kapre 16
OO"VEJP • PytorchΛόοΫΤϯυʹͯ͠STFTͳͲΛGPU্Ͱಈ͔͢ • STFTɺٯSTFTɺCQTͳͲΑ͘͏ಛநग़ܥ͕ଗ͏ • ެࣜϖʔδ https://github.com/KinWaiCheuk/nnAudio 17
OO"VEJPʢ͖ͭͮʣ • ൺֱද 18