Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
音声情報処理に便利な (Python) パッケージやソフトウェア
Search
Akira Tamamori
December 30, 2020
Research
3
910
音声情報処理に便利な (Python) パッケージやソフトウェア
Tokyo BISH Bashの資料から独立させたもの
Akira Tamamori
December 30, 2020
Tweet
Share
More Decks by Akira Tamamori
See All by Akira Tamamori
音声認識と音声合成の超入門
tam17aki
0
470
Tokyo BISH Bash #02 音声情報処理と音声変換技術入門
tam17aki
2
2.2k
[ICASSP2020音響音声読み会] State-Space Gaussian Process for Drift Estimation in Stochastic Differential Equations
tam17aki
0
570
Other Decks in Research
See All in Research
EOGS: Gaussian Splatting for Efficient Satellite Image Photogrammetry
satai
4
440
Trust No Bot? Forging Confidence in AI for Software Engineering
tomzimmermann
1
260
不確実性下における目的と手段の統合的探索に向けた連続腕バンディットの応用 / iot70_gp_rff_mab
monochromegane
2
130
SSII2025 [SS1] レンズレスカメラ
ssii
PRO
2
1k
SSII2025 [SS2] 横浜DeNAベイスターズの躍進を支えたAIプロダクト
ssii
PRO
7
3.9k
大規模な2値整数計画問題に対する 効率的な重み付き局所探索法
mickey_kubo
1
340
Google Agent Development Kit (ADK) 入門 🚀
mickey_kubo
2
1.6k
SSII2025 [TS2] リモートセンシング画像処理の最前線
ssii
PRO
7
3k
とあるSREの博士「過程」 / A Certain SRE’s Ph.D. Journey
yuukit
9
4.1k
IMC の細かすぎる話 2025
smly
2
590
MIRU2025 チュートリアル講演「ロボット基盤モデルの最前線」
haraduka
15
7.3k
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
satai
3
110
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
Designing for humans not robots
tammielis
253
25k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
The Art of Programming - Codeland 2020
erikaheidi
55
13k
Why Our Code Smells
bkeepers
PRO
338
57k
Music & Morning Musume
bryan
46
6.8k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Transcript
ԻใॲཧԻมʹ ศརͳ (Python) ύοέʔδ ιϑτΣΞͨͪ 1 ༻ײͳͲࢲݟΛؚΈ·͢
TPYQZTPY • ίϚϯυϥΠϯ͔ΒϑΥʔϚοτมͳͲΛ͓खܰʹ • ϑΥʔϚοτมʢwav to mp3 ͳͲʣ • ݁߹ϛοΫεɺτϦϛϯά
Մೳ • όονॲཧָʢγΣϧεΫϦϓτͳͲʣ • Pythonϥούʔ pysox͋Δ • Πϯετʔϧ • brew install sox ͳͲ • pip install sox ← pysox ͷΠϯετʔϧ͜Ε 2
MJCSPTBʢ͜ΕຊʹΦεεϝʣ • Ի/ԻָͷੳʹศརͳϞδϡʔϧ͕ଗͬͨύοέʔδ • ެࣜϚχϡΞϧɾνϡʔτϦΞϧͷॆ࣮ॿ͔Δ • ݸਓతʹΑ͘͏ػೳ • ܗදࣔɺεϖΫτϩάϥϜදࣔ •
Իಛྔநग़ʢରϝϧεϖΫτϩάϥϜʣ • Πϯετʔϧ pip install librosa • ެࣜϖʔδ https://librosa.org/librosa/index.html 3
1Z8PSME • Իͷੳ࠶߹Λߦ͏Ϙίʔμʔͷύοέʔδ • ԻΛʮ৭ɾͷߴ͞ɾͷ͔͢Εʯͷ֤ʹղ͠࠶߹ • C++൛ͷPythonϥούʔ • Իͷಛྔநग़ʹ͑ͯศར ⇒
PySPTKʢޙड़ʣΑΓ࣭ͷΑ͍εϖΫτϧแབྷ • Πϯετʔϧ pip install pyworld • ެࣜϖʔδ https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder 4
1Z"VEJP • ετϦʔϜԻ / ࠶ੜʹศརͳύοέʔδ • ϦΞϧλΠϜͷԻೖྗɾԻग़ྗʹ͑Δ • ϦΞϧλΠϜԻม with
PythonͳͲՄೳ • Πϯετʔϧ • pip install pyaudio ※ཁportaudio (e.g., brew install portaudio) 5
1Z"VEJPͱ1Z8PSMEͷΈ߹Θͤ • ؆қ൛ͷϘΠενΣϯδϟʔ • ؆қϘΠενΣϯδϟʔͷεΫϦϓτΛվྑɿPyQt5ͷεϥΠμʔʹΑΓ ϐονͱϑΥϧϚϯτΛϦΞϧλΠϜௐ͢ΔػೳΛՃʢฐϒϩάʣ • banibiku • Zoom৴͚ʹ̎࣍ݩΩϟϥʹͳΓ͖Δ͜ͱΛࢦͨ͠ϓϩδΣΫτ
• scripts/voice_converter.py ͕ྑ͍ײ͡ͷϘΠενΣϯδϟʔ → ฐϒϩάͷαϯϓϧεΫϦϓτͷόάϑΟοΫεؚ͕·ΕΔ 6 https://tam5917.hatenablog.com/entry/2019/04/30/213321 https://github.com/peisuke/babiniku
1Z415, • ԻใॲཧπʔϧΩοτSPTKͷPythonϥούʔ • SPTKࣗମLinuxίϚϯυ܈ • Իڹಛྔநग़ʹ͏ͷ͕ศར • Իੳ߹Ͱ͖Δ͕ɺ࣭ࣗମWORLDͷ΄͏্͕ •
Πϯετʔϧ pip install pysptk • ެࣜϖʔδ https://pysptk.readthedocs.io/en/latest/ 7
OONOLXJJ <OBOBNJO LBXBJJ> • DNNԻ߹ʹཱͭϞδϡʔϧΛूΊͨύοέʔδ • ͲͪΒ͔ͱ͍͏ͱݚڀ༻్ • લॲཧԻڹಛྔநग़ͷΫϥε͕Ұ௨Γଗ͍ͬͯΔ •
จͷ࠶ݱ࣮Λ͢Δͱ͖ͳͲʹେ͍ʹཱͭ • Πϯετʔϧ pip install nnmnkwii • ެࣜϖʔδ https://r9y9.github.io/nnmnkwii/stable/index.html 8
1ZEVC • Pydub • ܗฤूʹศརͳϞδϡʔϧΛूΊͨύοέʔδ • αϙʔτ͢ΔϑΝΠϧܗࣜ๛ʢwav, mp3, mp4, wma,
aac, ...ʣ • ػೳ Γग़͠ɺׂɺϛοΫεɺϑΣʔυΠϯΞτɺແԻૠೖɺͳͲͳͲ • Ұ෦ͷػೳ pysoxͷ΄͏͕ߴͱ͍͏ӟ?ʢະ֬ೝʣ • Πϯετʔϧ pip install pydub • ެࣜϖʔδ http://pydub.com/ 9
TQSPDLFU • ౷ܭత࣭มͷͨΊͷπʔϧΩοτ (not ύοέʔδ) • ͲͪΒ͔ͱ͍͏ͱݚڀ༻ ʢMITϥΠηϯεʣ • ݚڀͷʮϕʔεϥΠϯʯߏஙʹ࠷ద
• ެࣜϖʔδ https://github.com/k2kobayashi/sprocket • ղઆจ ʰ౷ܭత࣭มιϑτΣΞೖʱ https://www.jstage.jst.go.jp/article/isciesci/62/2/62_69/_article/-char/ja/ • νϡʔτϦΞϧ (εϥΠυ & notebook) https://github.com/kan-bayashi/INTERSPEECH19_TUTORIAL 10
"VEBDJUZ ೖΕ͓ͯ͘ͱ҆৺ • ϑϦʔͷܗฤूιϑτɺϚϧνϓϥοτϑΥʔϜ • ๛ͳαϯυΤϑΣΫτՃػೳ • ެࣜϖʔδ https://www.audacityteam.org/ 11
(16্ͰԻॲཧ͍ͨ͠Ϛϯʹ ͓͢͢Ίͷύοέʔδ 12 ͓·͚
UPSDIBVEJP • Pytorchެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥϥΠϒϥϦ • PytorchܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • ެࣜϖʔδ https://pytorch.org/audio/stable/index.html 13
UGTJHOBM • TensorFlowެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥͷؔ܈ • TFܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • FFT/iFFT, DCT, MDCT, STFTͳͲ
• ެࣜϖʔδ https://www.tensorflow.org/api_docs/python/tf/signal 14
UPSDIMJCSPTB • PytorchΛόοΫΤϯυʹͯ͠librosaΛGPU্Ͱಈ͔͢ • Πϯετʔϧ pip install torchlibrosa • ެࣜϖʔδ
https://github.com/qiuqiangkong/torchlibrosa 15
LBQSF • Kerasʢͱ͍͏͔TFʣΛόοΫΤϯυʹͯ͠Իॲཧ͢Δ • STFTiSTFTɺϝϧεϖΫτϩάϥϜͳͲ • CQTͳͲͳ͍ • ެࣜϖʔδ https://github.com/keunwoochoi/kapre
• Πϯετʔϧ pip install kapre 16
OO"VEJP • PytorchΛόοΫΤϯυʹͯ͠STFTͳͲΛGPU্Ͱಈ͔͢ • STFTɺٯSTFTɺCQTͳͲΑ͘͏ಛநग़ܥ͕ଗ͏ • ެࣜϖʔδ https://github.com/KinWaiCheuk/nnAudio 17
OO"VEJPʢ͖ͭͮʣ • ൺֱද 18