Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GPU on OpenStack日本語版
Search
masafumi_ohta
May 15, 2017
Technology
800
0
Share
GPU on OpenStack日本語版
日本語バージョン作りました。若干訂正も加えました。
masafumi_ohta
May 15, 2017
More Decks by masafumi_ohta
See All by masafumi_ohta
NJ-WS10Aを使ったおいしい使い方
masafumi_ohta
0
44
COSCUP25 Intro at OSC Tokyo Spring 25, at Komazawa Univ.
masafumi_ohta
0
38
Deepdiving to Raspberry Pi 5
masafumi_ohta
0
58
树莓派的历史、相关信息及使用案例
masafumi_ohta
0
440
海外カンファレンスのCFPの正しい書き方
masafumi_ohta
4
640
GPD.pdf
masafumi_ohta
0
84
GPD MicroPCのご紹介
masafumi_ohta
0
190
3大あくじょ考察
masafumi_ohta
0
500
GPU on OpenStack GPUインターナルクラウドのベストプラクティス
masafumi_ohta
0
330
Other Decks in Technology
See All in Technology
RubyでRuby拡張を書いたらRubyより35倍速になったってどういうこと??
kazuho
3
710
大学生が本気でDatabricksを活用してDiscordサークルをデータ駆動させてみた
phantomjuju
0
250
まだ道半ば、AI-DLCを歩み始めている話
news_it_enj
2
210
JEP 522 Deep Dive - G1 GC同期コスト削減によるスループット向上を徹底検証&解説
tabatad
1
250
開発を止めない CI/CD ~CI Visibilityによる継続的最適化~
pensuke628
0
150
freee-mcpを Local→Remote で出してわかった MCP認可実装のリアル
terara
3
910
最低限これだけ押さえれ大丈夫_Claude Enterprise/Team企業展開ガバナンス入門
tkikuchi
1
420
AI時代の私の技術インプットとアウトプット術
tonkotsuboy_com
15
7.5k
自称宇宙最速で不合格となったAIP-C01にリベンジを果たすべくAIで問題集アプリを作ってみた。
yama3133
0
230
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development
yoshidashingo
0
100
NFLコンペ2026 解法
lycorptech_jp
PRO
0
120
エンジニアは生成AIと どのように向き合うべきか? ことばの意味という観点から
verypluming
3
280
Featured
See All Featured
KATA
mclloyd
PRO
35
15k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
The browser strikes back
jonoalderson
0
1.1k
Amusing Abliteration
ianozsvald
1
180
Navigating Weather and Climate Data
rabernat
0
200
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3.2k
Designing for Timeless Needs
cassininazir
1
230
Designing for Performance
lara
611
70k
Un-Boring Meetings
codingconduct
0
300
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.5k
Accessibility Awareness
sabderemane
1
130
Transcript
GPU on OpenStack ͓͓ͨɹ·͞;Έ @masafumiohta
͓લͩΕʁ> SIerۈ ͍͔ͭ͘ͷOpenStack PoCʹࢀՃɺٴͼΞυόΠε OpenStackͷఏҊɾޚ૬ஊঝͬͯ·͢ɻ OpenStackίʔυಡΜͰमਖ਼ͨ͠Γͱ͔… ଓ͖WebͰ https://jp.linkedin.com/in/ohtamasafumi
͡Ίʹ OpenStackಛघར༻ Sahara(HadoopͷOpenStack)ͳͲͳͲ ΄ͱΜͲ͕·ͱΊΒΕͯͳ͙ͬͯ͘ௐΔ͔͠ͳ͍ ͍ΘΏΔʰυΩϡϝϯτϩετʱ docs.openstack.orgɹʹ͋͛ͯ΄͍͠ɻ
GPU on OpenStackͱ
GPUτϨϯυ GPUϝχʔίΞత ֤MPUίΞҰݸ͋ͨΓখ͍͕͘͞ز͔ͭͷܭࢉ (MATLABܥͳͲ)Ͱ༗ޮ GPUͷμΠूੵʹର͢ΔলిྗHPCϢʔβʹ ͱͬͯେ͖͍ɻ ίϯύΫτͳγεςϜলిྗɾলεϖʔε͕ඞਢͳຊ ͷHPCγεςϜʹେ͖͍
GPUͲ͏ͬͯ͏ͷ͔ʁ LinuxͷPCIύεεϧʔٕज़Ͱ͏ AWSଟ͜ͷϕʔε KVMʹґଘ VSphereGPUίΞͷεϓϦοτ͕Մೳ VSphereXenͷΑ͏ͳGPUεϓϦοτՄೳ͔ʁ ݱߦͰ͖ͳ͍͕Nvidia͕αϙʔτ༧ఆ
GPU Docker DockerΛ͔ͭͬͨGPUͷར༻ ίΞׂͰ͖ͳ͍͕λεΫʹԠͯ͡GPUίΞར༻Λ ࣗಈར༻ׂɻ ࠷ۙͷGPU on OpenStackͱ͍͑͜ͷGPU Dockerར ༻
ͨͩɺύεεϧʔ+KVMΑΓύϑΥʔϚϯεͰͳ͍
GPU OpenStack୭ͷͨΊ HPCͷςϯϙϥϦར༻ ͪΐͬͱܭࢉͬͯऴΘͬͨΒVM͝ͱյ͢ ͍͔ͭ͘ͷVMΛ͔ͭͬͯ؆୯ʹHPCάϦουͱ͔.. EC2ͷΑ͏ʹHPCΛͬͨΓ͢Δ ෦ར༻Ͱ͏ɺಛʹۀͰEC2ͳͲΫϥυʹग़ͪ͠Ό ͍͚ͳ͍ਓ͚
GPU on OpenStackΛ͏
PCIύεεϧʔ(1) PCIσόΠεΛμΠϨΫτʹVMʹͭͳ͙ ཧϗετΑΓ͔ΒPCIσόΠεΛΓ͢ඞཁ͕͋Δɻ KVMͷػೳʹࠨӈ͞ΕΔ͜ͱʹͳ͍ͬͯΔɻ OpenStackͱؔͳ͍ ҰͭͷσόΠεʹҰͭͷVM GPUࣗମ֤VMʹର͠ڞ༗͓ΑͼׂෆՄೳ ͋͘·ͰKVMͷ੍ݶɺOpenStackͰͳ͍ɻ
PCIύεεϧʔ(2) RedhatͰެࣜαϙʔτ ͔͠͠ར༻ੵۃਪ͍ͯ͠ͳ͍ʢ͍͋͠ʁ) ࣮ࡍRedhatͰϋϚΔ.. UbuntuͰυΩϡϝϯτԽ͞Ε͍ͯͳ͍ άάͬͯใΛ͕͞͞ͳ͖Ό͍͚ͳ͍ɻ
K Linux OS for KVM hypervisor GPU Driver App Instance
VMM/KVM IOMMU/Vt-d PCI Express x16 Linux/Win OS ComputeNode GPU Card GPU Card Nova Compute Nova Scheduler AMQP Nova API Linux OS ControllerNode ਤ:GPUύεεϧʔͲ͏ͬͯOpenStackͰಈ͔͘ʢҹ͕Πϯελϯεൃߦϓϩηεʣ
KVMϗετͷGPU ϗετͷGPUΛνΣοΫ͢Δ: lspci -nn | grep -i nvidia lspci -nn
| grep -i nvidia 88:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:11b4] (rev a1) 88:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1) GPUͷϢχοτ͝ͱύεεϧʔ͢Δඞཁ͕͋Δ GPUϗετ͚ͩͰͳ͘HDMIͳͲϢχοτ͝ͱΓ͢ඞཁ͋Γ Ͱͳ͍ͱVM্Ͱಈ͍ͯ͘Εͳ͍=ύεεϧʔͰ͖ͳ͍
GPUͷϙʔτΛΈΔ GPUʹHDMIϏσΦͱΦʔσΟΦ͕͋ΔͷͰ ͜Ε͝ͱύεεϧʔ͢Δඞཁ͕༗Δ
IOMMUͷηοτΞοϓ ཧσόΠεΛ͏ͨΊͷԾγεςϜʹ͓͍ͯ IOMMU(Input/Output Memory Management Unit) ηοτΞο ϓඞਢ ͪΖΜ intel
vt-d (I/OԾԽ)Φϯʹ (EFI/BIOSνΣοΫ) grubͰͷηοτΞοϓඞཁ:/etc/default/grubΛมߋهड़ GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1”
pci-stub pci-stub ཧPCIσόΠεΛϗετͰར༻Ͱ͖ͳ͍Α͏ʹ͢Δ ͨΊʹઃఆ͢Δ /etc/moduleʹσϑΥϧτͰઃఆ͞Ε͍ͯͳ͍ͷͰઃఆ͢Δɻؔ ࿈͢Δίϯϙʔωϯτઃఆ͢Δ (vfio,kvm)ɻ pci_stub vfio vfio_iommu_type1
vfio_pci kvm kvm_intel
VFIO(1) ύεεϧʔ͞ΕΔͷΛཧσόΠε͔ΒΓͨ͢ ΊʹVFIO(Virtual Function IO)্ʹՃ͢Δඞཁ͕͋Δɻ ͜ΕΒͷσόΠεΛramfsͰೝࣝ͞ΕΔͷΛ͙ /etc/initramfs-tools/modules to initramfs (ubuntu)
echo ‘pci_stub ids=10de:11b4,10de:0e0a’ >> /etc/initramfs-tools/modules sudo update-initramfs -u && sudo reboot
VFIO(2) ͜ΕΒͷσόΠε·ͨbootͷࡍʹೝࣝ͞Εͳ͍ Α͏ʹ͢Δɻͯ͢ͷboot upγʔέϯεͰཧσό ΠεͰͷೝࣝΛࢭ͢Δɻ /etc/modprobe.d/blacklist.conf ʹՃ: blacklist nvidia blacklist
nvidia-uvm NvidiaޓυϥΠόೝࣝͤ͞ͳ͍Α͏ʹՃ blacklist nouveau
ཧ͔ΒΓ͢ pci-stubΛ͔ͭͬͯཧσόΠεΑΓΓ͠VMͱ͚ͬͭ͘Δɻσ όΠεIDΛnew_idʹՃ͢Δɻ·ͨؔ࿈ࣝผࢠΛΓ͠ɺVMͱ ͚ͬͭ͘Δɻ echo 11de 11b4 > /sys/bus/pci/drivers/pci-stub/new_id echo
11de 0e0a > /sys/bus/pci/drivers/pci-stub/new_id echo 0000:88:00.0 > /sys/bus/pci/devices/0000:88:00.0/driver/unbind echo 0000:88:00.1 > /sys/bus/pci/devices/0000:88:00.1/driver/unbind echo 0000:88:00.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:88:00.1 > /sys/bus/pci/drivers/pci-stub/bind ཧϚγϯ͔ΒΓ͞Ε͔ͨɺ’claimed’ʹͳ͍ͬͯΔ͔ௐΔɻ pci-stub 0000:88:00.1: claimed by stub
modprobe /etc/modprobe.d/blacklist.conf pci-stab /sys/bus/pci/drivers/pci-stub/ /sys/bus/pci/devices/$(Identifier)/driver/unbind ramfs /etc/initramfs-tools/modules GRUB /etc/default/grub modules
/etc/modules UEFI/BIOS Vt-d ਤ:GPUύεεϧʔͰComputeNodeͷBOOTϓϩηε্Ͱઃఆ͢Δͷ B O O T ϓ ϩ η ε IOMMU IOMMU BLACK LIST BLACK LIST IOMMU BLACK LIST
ཧσόΠε pci-stub (ԾͰ͏) GPUϢχοτ (σόΠεશମ) GPUΛཧσόΠε͔ΒΓ͠ɺ ԾσόΠεʹ͚ସ͑Δɻ echo 11de 11b4
> /sys/bus/pci/drivers/pci-stub/new_id echo 11de 0e0a > /sys/bus/pci/drivers/pci-stub/new_id echo 0000:88:00.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:88:00.1 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:88:00.0 > /sys/bus/pci/devices/0000:88:00.0/driver/unbind echo 0000:88:00.1 > /sys/bus/pci/devices/0000:88:00.1/driver/unbind
͞ΒʹGPUΛՃ͑Δ(1) lspciͷ݁ՌΛ֬ೝɺ2ͭͷσόΠεID͕݁Ռͱͯ͠ݟ͑Δɺ ͜ͷݟ͑ํγεςϜʹґଘɻ lspci -nn | grep -i nvidia 88:00.0
VGA compatible controller [0300]: NVIDIA Corporation Device [10de:11b4] (rev a1) 88:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1) 84:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:11b4] (rev a1) 84:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1)
͞ΒʹGPUΛՃ͑Δ(2) pci-stubΛར༻͠͞ΒʹσόΠεΛύεεϧʔ͢Δɻ echo 0000:84:00.0 > /sys/bus/pci/devices/0000:84:00.0/driver/unbind echo 0000:84:00.1 > /sys/bus/pci/devices/0000:84:00.1/driver/unbind
echo 0000:84:00.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:84:00.1 > /sys/bus/pci/drivers/pci-stub/bind CUDAΞϓϦΛ͏্ͰಉػछͷGPUͷՃ͕ඞਢ(͕ͪ ͏ͷෆՄ)ɺಉ͡ͷͰ͋Δ͔ฉ͔ΕΔ͜ͱ͕͋Δɻ /nbody -benchmark -numdevices=2 -num bodies=65536
͞ΒʹGPUΛՃ͑Δ(3) ύεεϧʔʹޭ͢ΔͱVM্Ͱಈ͍͍ͯΔͷ͕lspciʹͯ νΣοΫͰ͖Δɻ ubuntu@guestos$ lspci -nn | grep -i nvidia
00:07.0 VGA compatible controller [0300]: NVIDIA Corporation GK104GL [Quadro K4200] [10de:11b4] (rev a1) 00:08.0 VGA compatible controller [0300]: NVIDIA Corporation GK104GL [Quadro K4200] [10de:11b4] (rev a1)
Novaͷه(1) ComputeNodeͷwhitelist͕PCIύεεϧʔͱGPUͷ VMͷσϓϩΠϝϯτͰඞཁɻ /etc/nova/nova.conf ʹ pci_passthrough_whitelist Λه pci_passthrough_whitelist={"name":"K4200","vendor_id":"10de","product_id": "11b4"}
Novaͷه(2) ίϯτϩʔϥʔϊʔυͷnova aliasͷઃఆ͕ඞཁɺޙ ड़͢Δflavor-key Ͱར༻͢Δɻ to /etc/nova/nova.confʹ pci_aliasesΛه pci_alias={“name”:”K4200”,"vendor_id":"10de","product_id":"11b4"}
Novaͷه(3) ίϯτϩʔϥϊʔυʹͯPCIύεεϧʔϑΟϧλʔΛ novaʹՃɻ /etc/nova/nova.conf ʹΞϯμʔϥΠϯ෦Ճ scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciP assthroughFilter scheduler_default_filters=DifferentHostFilter,RetryFilter,AvailabilityZoneFilter, RamFilter,CoreFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,Imag
ePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,Aggre gateInstanceExtraSpecsFilter,PciPassthroughFilter
nova alias GPUΛ͏ͨΊʹflavor-keyΛه͢ΔɻϑϨʔόʹset໊ Ͱطଘflavorʹର͠pciύεεϧʔͷͰܾΊͨGPUͷalias ໊ɺ͍͍ͨGPUϢχοτΛՃ͢Δɻ ্هલड़·ͰͷաఔͰɺQuadro K4200Λ2ϢχοτՃ͍ͯ͠ Δ͜ͱ͕લఏͳͷͰɺҎԼͷΑ͏ͳهड़ʹͳΔɻ pci_aliasΛK4200ͱ͍ͯ͠ΔͨΊҎԼͷهड़ͱͳΔɻ nova
flavor-key $flavor_name set “pci_passthrough:alias”=“K4200:$amount_of_gpu”
طͷ
ΫϥυΠϝʔδͷ Πϝʔδ͕GPUΛ͏ʹͱͯখ͘͞qemu-imgͰΠϝʔδͷϦα ΠζΛͯ͠Δඞཁ͕༗Δɻ CUDA driverΛΠϯετʔϧ͢Δʹperl-packagesΛطଘͷΫϥυ Πϝʔδʹ͍ΕͯΔඞཁ͕͋Δɻ CUDAυϥΠό.deb or .rpMύοέʔδͰόΠφϦϑΝΠϧͰͳ͘ɺΠϯε τʔϧ࣌ʹιʔεΛmakeͰΠϯετʔϧ͢ΔܗΛͱ͍ͬͯΔ=.runϑΝΠϧͱม
ΘΒͣɻ NvidiaকདྷͷϦϦʔεͰspecϑΝΠϧʹperlؔ࿈ϑΝΠϧͷهΛ͢ΔܗͰfix ͢Δ༧ఆɻ 7.6Ҏ߱Ͱగਖ਼ͯ͘͠ΕΔɺͱ͍͍͕ͬͯͨ…(·ͩςετͯ͠ͳ͍ɻʣ
Windows as VDI Windows্ͷCUDAͪΌΜͱΠϯετʔϧͰ͖Εૣ ͘ͳΔ͚Ͳ࣌ͨ·ΧΫΧΫͯ͠͠·͏ɻ vmͷdisk speedͷɺΤϑΣϝϥϧʹ͢ΔͳΓૣ͍σΟεΫΛ ͏ͳΓͰͦͦ͜͜ղܾ͢Δ(SSD,NVMe or..etc) vmcontext
switchͰr/wΛߦ͍ͬͯΔͷͰϔϏʔϫʔΫϩʔυ ͷCUDAͳΓ͜ͷΑ͏ͳΧΫΧΫঢ়ଶΛى͜͢Մೳੑ͕͋Δɻ ͜ͷ͋ͱޙʹ·͔͕ͤͨɺগ͠ΤϑΣϝϥϧͰվળͨ͠Β͍͠ ސ٬·ɺͳΜͱ͔ͳͬͨͱͱΓ͋͑ͣͷຬΒͬͨɻ
ϥΠϒϚΠάϨʔγϣϯ ϥΠϒϚΠΫϨʔγϣϯͰ͖ͳ͍ɺvm͕ݹ͍ϗετͷଓ ใΛΓͤͳ͍ɻݹ͍ϗετͷใΛѲͬͨ··ʹͳΔɻ ϫʔΫΞϥϯυ:nova.pci_devicesͷMySQLͷDBใΛফ͠ ͯɺݹ͍ϗετΛ࠶ىಈ͢Δʼҙຯͳ͠ʂ | 2016-08-11 00:54:45 | 2016-08-19
04:58:01 | NULL | 0 | 45 | 21 | 0000:84:00.0 | 11b4 | 10de | type-PCI | pci_0000_84_00_0 | label_10de_11b4 | available | {} | NULL | NULL | 1 | <<-- old-host | 2016-08-11 00:54:45 | 2016-08-19 04:58:01 | NULL | 0 | 48 | 21 | 0000:88:00.0 | 11b4 | 10de | type-PCI | pci_0000_88_00_0 | label_10de_11b4 | available | {} | NULL | NULL | 1 | <<-- old-host
·ͱΊ OpenStackͷಛघར༻͕Ͱ͖Δ͔Ͳ͏͔ͷOS࣍ୈ Ͱ͢ɻ ͷOS͕μϝͳͷʹOpenStack͕..ͱ͍͏ͷΠέͯͳ͍ਓ͕͍ ͏͓ݴ༿… αʔόܥOSҙ֎ʹ͍Ζ͍ΖΠέͯ·ͤΜ…͋Γ͋ΓͰ͢ɻ GPUͷར༻ࠓ͜ͷύεεϧʔ͕ݱߦͰ͕͢ɺNvidia Intel࣍ୈ͔ͱ(AMD…Ͳ͏͢ΔΜͩΖ͏…)
Thank you Masafumi Ohta @masafumiohta