Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
音声情報処理に便利な (Python) パッケージやソフトウェア
Search
Akira Tamamori
December 30, 2020
Research
3
900
音声情報処理に便利な (Python) パッケージやソフトウェア
Tokyo BISH Bashの資料から独立させたもの
Akira Tamamori
December 30, 2020
Tweet
Share
More Decks by Akira Tamamori
See All by Akira Tamamori
音声認識と音声合成の超入門
tam17aki
0
460
Tokyo BISH Bash #02 音声情報処理と音声変換技術入門
tam17aki
2
2.2k
[ICASSP2020音響音声読み会] State-Space Gaussian Process for Drift Estimation in Stochastic Differential Equations
tam17aki
0
570
Other Decks in Research
See All in Research
心理言語学の視点から再考する言語モデルの学習過程
chemical_tree
2
380
業界横断 副業・兼業者の実態調査
fkske
0
160
RapidPen: AIエージェントによるペネトレーションテスト 初期侵入全自動化の研究
laysakura
0
1.5k
Sosiaalisen median katsaus 03/2025 + tekoäly
hponka
0
1.3k
近似動的計画入門
mickey_kubo
4
970
Towards a More Efficient Reasoning LLM: AIMO2 Solution Summary and Introduction to Fast-Math Models
analokmaus
2
230
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
satai
3
220
Transparency to sustain open science infrastructure - Printemps Couperin
mlarrieu
1
180
EarthSynth: Generating Informative Earth Observation with Diffusion Models
satai
3
100
学生向けアンケート<データサイエンティストについて>
datascientistsociety
PRO
0
3.2k
90 分で学ぶ P 対 NP 問題
e869120
17
7.5k
20250624_熊本経済同友会6月例会講演
trafficbrain
1
150
Featured
See All Featured
Six Lessons from altMBA
skipperchong
28
3.9k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.8k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
720
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
Build The Right Thing And Hit Your Dates
maggiecrowley
36
2.8k
Building an army of robots
kneath
306
45k
Done Done
chrislema
184
16k
Rails Girls Zürich Keynote
gr2m
94
14k
Producing Creativity
orderedlist
PRO
346
40k
Transcript
ԻใॲཧԻมʹ ศརͳ (Python) ύοέʔδ ιϑτΣΞͨͪ 1 ༻ײͳͲࢲݟΛؚΈ·͢
TPYQZTPY • ίϚϯυϥΠϯ͔ΒϑΥʔϚοτมͳͲΛ͓खܰʹ • ϑΥʔϚοτมʢwav to mp3 ͳͲʣ • ݁߹ϛοΫεɺτϦϛϯά
Մೳ • όονॲཧָʢγΣϧεΫϦϓτͳͲʣ • Pythonϥούʔ pysox͋Δ • Πϯετʔϧ • brew install sox ͳͲ • pip install sox ← pysox ͷΠϯετʔϧ͜Ε 2
MJCSPTBʢ͜ΕຊʹΦεεϝʣ • Ի/ԻָͷੳʹศརͳϞδϡʔϧ͕ଗͬͨύοέʔδ • ެࣜϚχϡΞϧɾνϡʔτϦΞϧͷॆ࣮ॿ͔Δ • ݸਓతʹΑ͘͏ػೳ • ܗදࣔɺεϖΫτϩάϥϜදࣔ •
Իಛྔநग़ʢରϝϧεϖΫτϩάϥϜʣ • Πϯετʔϧ pip install librosa • ެࣜϖʔδ https://librosa.org/librosa/index.html 3
1Z8PSME • Իͷੳ࠶߹Λߦ͏Ϙίʔμʔͷύοέʔδ • ԻΛʮ৭ɾͷߴ͞ɾͷ͔͢Εʯͷ֤ʹղ͠࠶߹ • C++൛ͷPythonϥούʔ • Իͷಛྔநग़ʹ͑ͯศར ⇒
PySPTKʢޙड़ʣΑΓ࣭ͷΑ͍εϖΫτϧแབྷ • Πϯετʔϧ pip install pyworld • ެࣜϖʔδ https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder 4
1Z"VEJP • ετϦʔϜԻ / ࠶ੜʹศརͳύοέʔδ • ϦΞϧλΠϜͷԻೖྗɾԻग़ྗʹ͑Δ • ϦΞϧλΠϜԻม with
PythonͳͲՄೳ • Πϯετʔϧ • pip install pyaudio ※ཁportaudio (e.g., brew install portaudio) 5
1Z"VEJPͱ1Z8PSMEͷΈ߹Θͤ • ؆қ൛ͷϘΠενΣϯδϟʔ • ؆қϘΠενΣϯδϟʔͷεΫϦϓτΛվྑɿPyQt5ͷεϥΠμʔʹΑΓ ϐονͱϑΥϧϚϯτΛϦΞϧλΠϜௐ͢ΔػೳΛՃʢฐϒϩάʣ • banibiku • Zoom৴͚ʹ̎࣍ݩΩϟϥʹͳΓ͖Δ͜ͱΛࢦͨ͠ϓϩδΣΫτ
• scripts/voice_converter.py ͕ྑ͍ײ͡ͷϘΠενΣϯδϟʔ → ฐϒϩάͷαϯϓϧεΫϦϓτͷόάϑΟοΫεؚ͕·ΕΔ 6 https://tam5917.hatenablog.com/entry/2019/04/30/213321 https://github.com/peisuke/babiniku
1Z415, • ԻใॲཧπʔϧΩοτSPTKͷPythonϥούʔ • SPTKࣗମLinuxίϚϯυ܈ • Իڹಛྔநग़ʹ͏ͷ͕ศར • Իੳ߹Ͱ͖Δ͕ɺ࣭ࣗମWORLDͷ΄͏্͕ •
Πϯετʔϧ pip install pysptk • ެࣜϖʔδ https://pysptk.readthedocs.io/en/latest/ 7
OONOLXJJ <OBOBNJO LBXBJJ> • DNNԻ߹ʹཱͭϞδϡʔϧΛूΊͨύοέʔδ • ͲͪΒ͔ͱ͍͏ͱݚڀ༻్ • લॲཧԻڹಛྔநग़ͷΫϥε͕Ұ௨Γଗ͍ͬͯΔ •
จͷ࠶ݱ࣮Λ͢Δͱ͖ͳͲʹେ͍ʹཱͭ • Πϯετʔϧ pip install nnmnkwii • ެࣜϖʔδ https://r9y9.github.io/nnmnkwii/stable/index.html 8
1ZEVC • Pydub • ܗฤूʹศརͳϞδϡʔϧΛूΊͨύοέʔδ • αϙʔτ͢ΔϑΝΠϧܗࣜ๛ʢwav, mp3, mp4, wma,
aac, ...ʣ • ػೳ Γग़͠ɺׂɺϛοΫεɺϑΣʔυΠϯΞτɺແԻૠೖɺͳͲͳͲ • Ұ෦ͷػೳ pysoxͷ΄͏͕ߴͱ͍͏ӟ?ʢະ֬ೝʣ • Πϯετʔϧ pip install pydub • ެࣜϖʔδ http://pydub.com/ 9
TQSPDLFU • ౷ܭత࣭มͷͨΊͷπʔϧΩοτ (not ύοέʔδ) • ͲͪΒ͔ͱ͍͏ͱݚڀ༻ ʢMITϥΠηϯεʣ • ݚڀͷʮϕʔεϥΠϯʯߏஙʹ࠷ద
• ެࣜϖʔδ https://github.com/k2kobayashi/sprocket • ղઆจ ʰ౷ܭత࣭มιϑτΣΞೖʱ https://www.jstage.jst.go.jp/article/isciesci/62/2/62_69/_article/-char/ja/ • νϡʔτϦΞϧ (εϥΠυ & notebook) https://github.com/kan-bayashi/INTERSPEECH19_TUTORIAL 10
"VEBDJUZ ೖΕ͓ͯ͘ͱ҆৺ • ϑϦʔͷܗฤूιϑτɺϚϧνϓϥοτϑΥʔϜ • ๛ͳαϯυΤϑΣΫτՃػೳ • ެࣜϖʔδ https://www.audacityteam.org/ 11
(16্ͰԻॲཧ͍ͨ͠Ϛϯʹ ͓͢͢Ίͷύοέʔδ 12 ͓·͚
UPSDIBVEJP • Pytorchެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥϥΠϒϥϦ • PytorchܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • ެࣜϖʔδ https://pytorch.org/audio/stable/index.html 13
UGTJHOBM • TensorFlowެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥͷؔ܈ • TFܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • FFT/iFFT, DCT, MDCT, STFTͳͲ
• ެࣜϖʔδ https://www.tensorflow.org/api_docs/python/tf/signal 14
UPSDIMJCSPTB • PytorchΛόοΫΤϯυʹͯ͠librosaΛGPU্Ͱಈ͔͢ • Πϯετʔϧ pip install torchlibrosa • ެࣜϖʔδ
https://github.com/qiuqiangkong/torchlibrosa 15
LBQSF • Kerasʢͱ͍͏͔TFʣΛόοΫΤϯυʹͯ͠Իॲཧ͢Δ • STFTiSTFTɺϝϧεϖΫτϩάϥϜͳͲ • CQTͳͲͳ͍ • ެࣜϖʔδ https://github.com/keunwoochoi/kapre
• Πϯετʔϧ pip install kapre 16
OO"VEJP • PytorchΛόοΫΤϯυʹͯ͠STFTͳͲΛGPU্Ͱಈ͔͢ • STFTɺٯSTFTɺCQTͳͲΑ͘͏ಛநग़ܥ͕ଗ͏ • ެࣜϖʔδ https://github.com/KinWaiCheuk/nnAudio 17
OO"VEJPʢ͖ͭͮʣ • ൺֱද 18