Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
BHyVeでOSvを起動したい 〜BIOSがなくてもこの先生きのこるには〜
Search
Takuya ASADA
December 08, 2013
Technology
900
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
BHyVeでOSvを起動したい 〜BIOSがなくてもこの先生きのこるには〜
Takuya ASADA
December 08, 2013
More Decks by Takuya ASADA
See All by Takuya ASADA
Interrupt Affinityについて
syuu1228
0
330
僕のIntel NUCが起動しないわけがない
syuu1228
3
4.5k
Introduction to bhyve
syuu1228
1
460
OSv on bhyve
syuu1228
3
460
ruby-virtualmachine
syuu1228
0
300
10GbE時代のネットワークI/O高速化
syuu1228
14
8.9k
Play with UEFI
syuu1228
1
390
仮想化環境での利用者公平性
syuu1228
0
210
/proc/irq/<IRQ>/ smp_affinity
syuu1228
0
490
Other Decks in Technology
See All in Technology
40代で“やっとエンジニアになれた”――閉じた学びを開き、空の青さを知る / 20260628 Naoki Takahashi
shift_evolve
PRO
4
830
FPGAの開発コンペでZephyrを使ってみた
iotengineer22
0
200
【Snowflake Summit 2026 Recap!!】Snowflake Summit Deep Dive: Security & Governance
civitaspo
1
310
本当の”仕事”を手放せる未来が見えた
mu7889yoon
0
120
AIが自律的に回る開発ループを設計してチーム開発に組み込む
nekorush14
0
130
FPC(フレキシブル)基板にZephyr実装してみた。
iotengineer22
0
170
サイバーエージェントにおけるAI推進戦略と変革への取り組み
shotatsuge
0
530
ぼっちではじめた登壇が「51名」「241件」の発信に化けた
subroh0508
1
310
コミュニティの有益性 ~JAWS Days 2026 での体験を通して~ / The Benefits of a Community ~Through My Experience at JAWS Days 2026~
seike460
PRO
0
270
Lightning近況報告
kozy4324
0
220
Agile and AI Redmine Japan 2026
hiranabe
4
480
AWS Security Hub CSPMの成功・失敗体験
cmusudakeisuke
0
540
Featured
See All Featured
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
470
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.5k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Google's AI Overviews - The New Search
badams
0
1k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
Building an army of robots
kneath
306
46k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
180
Deep Space Network (abreviated)
tonyrice
0
210
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
330
Designing for humans not robots
tammielis
254
26k
The Curse of the Amulet
leimatthew05
2
13k
Transcript
#)Z7FͰ04WΛىಈ͍ͨ͠ ʙ#*04͕ͳͯ͘͜ͷઌੜ ͖ͷ͜Δʹʙ !TZVV
͜Ε·Ͱͷ͓͞Β͍ w #)Z7F͍ͬͯ͡·͢ w 04W͍͡Γ͡Ί·ͨ͠ w #)Z7FͰ04WΛىಈ͍ͨ͠ w #*04͕ແ͍͔Βىಈ͠ͳ͍ʂʢ͍ͭͷྲྀΕ
OSvͱʁ • ԾԽڥ্ͰಛఆʢJavaʣΞϓϦέʔγϣϯΛޮΑ࣮͘ߦ͢ ΔࣄʹಛԽͨ͠OS • Cloudis Systems͕։ൃ KVMΛ։ൃͨ͠Qumranetͷϝϯόʔ Avi Kivity͕CTO
• KVM, XenͰಈ͘ˠAmazon EC2σϓϩΠՄೳ • BSDϥΠηϯε • http://osv.io/ • https://github.com/cloudius-systems/osv
#)Z7Fͱʁ w 'SFF#4%൛ͷ-JOVY,7.ͷΑ͏ͳͷ w ΧʔωϧଆυϥΠόɾϢʔβϥϯυϓϩάϥϜڞʹ'SFF#4%CBTFπϦʔ্Ͱ։ൃ ͞Ε͍ͯΔ w *OUFM75Λ༻͍ͨϋΠύʔόΠβ w ,7.ͱҟͳΓϢʔβϥϯυଆ
w 'SFF#4%3&-&"4&Ͱಉࠝ w ήετ04ͱͯ͠'SFF#4% -JOVY 0QFO#4%͕ىಈ ʢୠ͠Y@൛ͷ04ͷΈʣ w IUUQCIZWFPSH w ৄ͘͠ࢲͷ4PGUXBSF%FTJHOͷ࿈ࡌΛಡΜͰԼ͍͞
04WͷϒʔτίʔυΛಡΜͰ #)Z7F༻ͷ04ϩʔμΛॻ͜͏ w 04Wͷϒʔτίʔυ͜ͷล w IUUQTHJUIVCDPNDMPVEJVTTZTUFNTPTWCMPC NBTUFSBSDIYCPPU4 w IUUQTHJUIVCDPNDMPVEJVTTZTUFNTPTWCMPC NBTUFSBSDIYCPPU4
w ॻ͍ͯΈͨ#)Z7F༻04ϩʔμͪ͜Β w IUUQTHJUIVCDPNTZVVCIZWFPTWMPBE
ήετ04ͷμΠϨΫτϒʔτ ͱ w ,7.ͰRFNVLFSOFMWNMJOV[JOJUSEJOJUSEJNH w #)Z7FͰCIZWFMPBEίϚϯυ w ϗετ04্ͰήετϚγϯʹ'SFF#4%Χʔωϧ Λϩʔυ֤͠छϨδελϖʔδςʔϒϧΛॳظ Խ͍ͯ͠Δ
w ήετϚγϯCIZWFίϚϯυ࣮ߦ࣌ʹCJUͷ ΤϯτϦϙΠϯτ͔Β࣮ߦ͞ΕΔ
#*04Λ༻͍Δैདྷͷϒʔτϓ ϩηε w #*04͕.#3͔ΒϒʔτηΫλΛϩʔυ w ϒʔτηΫλ#*04ίʔϧΛ༻͍ͯϒʔτϩʔμΛϩʔυˍ ࣮ߦ w ϒʔτϩʔμϖʔδςʔϒϧϨδελΛॳظԽ w
ϒʔτϩʔμϑΝΠϧγεςϜ͔ΒΧʔωϧΛ୳ͯ͠ಡΈࠐ Έʢ*0ʹ#*04ίʔϧΛ༻ʣ w ϒʔτϩʔμ$16ΛCJUϞʔυʹΓସ͑ͯΧʔωϧͷ CJUΤϯτϦϙΠϯτδϟϯϓ
μΠϨΫτϒʔτ࣌ͷϒʔτϓ ϩηε w Ϣʔβϗετ04্Ͱήετ04ϩʔμΛ࣮ߦ w ήετ04ϩʔμήετϚγϯͷϖʔδςʔϒϧ ήετϚγϯͷϨδελΛॳظԽ w ήετ04ϩʔμήετ04σΟεΫΠϝʔδ͔Β ΧʔωϧΛ୳ͯ͠ಡΈࠐΈ
w ήετ04ϩʔμԾ$16ΛCJUϞʔυʹΓସ ͑ͯΧʔωϧͷCJUΤϯτϦϙΠϯτδϟϯϓ
۩ମతʹͲ͏࣮͢Δͷ͔ʁ ʢ̍ʣ w ίϯιʔϧͷจࣈྻදࣔʢϒʔτϩʔμͰ*/5Iʣ printf() w σΟεΫͷಡΈࠐΈʢϒʔτϩʔμͰ*/5Iʣ fd = open(disk_image)
read(fd, buf, len) w ϝϞϦͷॻ͖ࠐΈ ctx = vm_open(vm_name) ptr = vm_map_gpa(ctx, offset, len) memcpy(ptr, data, len)
۩ମతʹͲ͏࣮͢Δͷ͔ʁ ʢ̎ʣ w Ϩδελͷॻ͖ࠐΈʢηάϝϯτϨδελҎ֎ʣ ctx = vm_open(vm_name) vm_set_register(ctx, cpuno,
VM_REG_GUEST_RFLAGS, val) w Ϩδελͷॻ͖ࠐΈʢηάϝϯτϨδελʣ ctx = vm_open(vm_name) vm_set_desc(ctx, cpuno, VM_REG_GUEST_CS, base, limit, access) vm_set_register(ctx, cpuno, VM_REG_GUEST_CS, selector)
͔͜͜ΒCPPU4Λ༁։࢝ w IUUQTHJUIVCDPNDMPVEJVTTZTUFNTPTWCMPCNBTUFSBSDI YCPPU4 w .#3্ͷϒʔτηΫλʹଘࡏ w ͜͜Ͱ͍ͬͯΔ͜ͱ w σΟεΫ͔ΒΧʔωϧҾΛϩʔυ
w σΟεΫ͔ΒΧʔωϧͷ&-'όΠφϦΛϩʔυ w #*04͔ΒϝϞϦϚοϓΛऔಘ w ΧʔωϧͷCJUΤϯτϦϙΠϯτΤϯτϦ
disk image layout • ൚༻ϒʔτϩʔμΛΘͳ͍͜ͱ ʹΑΓϒʔτ࣌ؒॖ • ΧʔωϧҾʗELFόΠφϦΛ σΟεΫʹॻ͍͍ͯΔ •
ϒʔτϩʔμELFόΠφϦͷΤϯ τϦϙΠϯτࣗྗͰδϟϯϓ • ΧʔωϧҾɺϝϞϦαΠζͷ ใϝϞϦ্ʹஔ͍ͯmultibootͱ ޓੑͷ͋ΔܗࣜͰΧʔωϧʹ ͢ MBR (boot16.S) 0 1 cmdline 64 128 blank? loader.elf 262144 bootfs (bootfs.manifest) nsectors ZFS (usr.manifest)
ϒʔτίʔυΛ༁ʢ̍ʣ DNEMJOFMPBE cmdline = 0x7e00 ! mb_info = 0x1000
mb_cmdline = (mb_info + 16) ! int1342_boot_struct: # for command line ← DAP .byte 0x10 ← size of DAP .byte 0 ← unused .short 0x3f # 31.5k ← number of sectors to be read .short cmdline ← segment:offset pointer to the memory bufferʢoffsetଆʣ .short 0 ←ʢsegmentଆʣ .quad 1 ← absolute number of the start of the sectors to be read ! init: xor %ax, %ax mov %ax, %ds ← DS = 0 ! lea int1342_boot_struct, %si ← DS:SIͰDAPΛࢦఆ mov $0x42, %ah mov $0x80, %dl int $0x13 ← INT 13h AH=42h: Extended Read Sectors From Drive movl $cmdline, mb_cmdline ← mb_info-‐>mb_cmdlineʹ0x7e00Λೖ
None
ϒʔτίʔυΛ༁ʢ̍ʣ DNEMJOFMPBE char *cmdline; struct multiboot_info_type *mb_info; !
cmdline = vm_map_gpa(ctx, 0x7e00, 1 * 512); pread(disk_fd, cmdline, 0x3f * 512, 1 * 512); ! mb_info = vm_map_gpa(ctx, 0x1000, sizeof(*mb_info)); mb_info-‐>cmdline = 0x7e00;
ϒʔτίʔυΛ༁ʢ̎ʣ LFSOFMMPBE tmp = 0x80000 count32: .short 4096 #
in 32k units, 4096=128MB int1342_struct: .byte 0x10 .byte 0 .short 0x40 # 32k .short 0 .short tmp / 16 lba: .quad 128 ! read_disk: lea int1342_struct, %si mov $0x42, %ah mov $0x80, %dl int $0x13 jc done_disk cli lgdtw gdt mov $0x11, %ax lmsw %ax ljmp $8, $1f 1: .code32 mov $0x10, %ax mov %eax, %ds mov %eax, %es mov $tmp, %esi mov xfer, %edi mov $0x8000, %ecx rep movsb mov %edi, xfer mov $0x20, %al mov %eax, %ds mov %eax, %es ljmpw $0x18, $1f 1: .code16 mov $0x10, %eax mov %eax, %cr0 ljmpw $0, $1f 1: xor %ax, %ax mov %ax, %ds mov %ax, %es sti addl $(0x8000 / 0x200), lba decw count32 jnz read_disk ← count32ճϧʔϓ done_disk:
ϒʔτίʔυΛ༁ʢ̎ʣ LFSOFMMPBE char *target; ! target = vm_map_gpa(ctx, 0x200000,
1 * 512); pread(disk_fd, target, 0x40 * 4096 * 512, 128 * 512);
ϒʔτίʔυΛ༁ʢ̏ʣ F mb_info = 0x1000 mb_mmap_len = (mb_info +
44) mb_mmap_addr = (mb_info + 48) e820data = 0x2000 ! mov $e820data, %edi ← ES:DI Buffer Pointer mov %edi, mb_mmap_addr ← mb_info-‐>mb_mmap_addrʹ0x2000Λೖ xor %ebx, %ebx ← Continuation more_e820: mov $100, %ecx ← Buffer Size mov $0x534d4150, %edx ← Signature 'SMAP' mov $0xe820, %ax add $4, %edi int $0x15 ← INT 15h, AX=E820h -‐ Query System Address Map jc done_e820 mov %ecx, -‐4(%edi) add %ecx, %edi test %ebx, %ebx jnz more_e820 done_e820: sub $e820data, %edi mov %edi, mb_mmap_len ← mb_info-‐>mb_mmap_lenʹe820dataͷαΠζΛೖ
ϒʔτίʔυΛ༁ʢ̏ʣ F struct e820ent *e820data; ! e820data = vm_map_gpa(ctx,
0x1100, sizeof(struct e820ent) * 3); e820data[0].ent_size = 20; e820data[0].addr = 0x0; e820data[0].size = 654336; e820data[0].type = 1; e820data[1].ent_size = 20; e820data[1].addr = 0x100000; e820data[1].size = mem_size -‐ 0x100000; e820data[1].type = 1; e820data[2].ent_size = 20; e820data[2].addr = 0; e820data[2].size = 0; e820data[2].type = 0; ! mb_info-‐>mmap_addr = 0x1100; mb_info-‐>mmap_length = sizeof(struct e820ent) * 3;
ϒʔτίʔυΛ༁ʢ̐ʣ FOUSZUPQSPUFDUFENPEF cmdline = 0x7e00 target = 0x200000
entry = 24+target mb_info = 0x1000 ! ljmp $8, $1f 1: .code32 mov $0x10, %ax mov %eax, %ds mov %eax, %es mov %eax, %gs mov %eax, %fs mov %eax, %ss mov $target, %eax ← 0x200000Λeaxʹઃఆ mov $mb_info, %ebx ← 0x1000Λebxʹઃఆ jmp *entry ← 32bit protected modeͷίʔυΛಈ͔ͭ͢Γͳ͍ͷͰແࢹ
ϒʔτίʔυΛ༁ʢ̐ʣ FOUSZUPQSPUFDUFENPEF vm_set_register(ctx, 0, VM_REG_GUEST_EAX, 0x200000); vm_set_register(ctx, 0, VM_REG_GUEST_EBX,
0x1000);
͔͜͜ΒCPPU4Λ༁։࢝ w IUUQTHJUIVCDPNDMPVEJVTTZTUFNTPTW CMPCNBTUFSBSDIYCPPU4 w ΧʔωϧͷCJUΤϯτϦϙΠϯτʹଘࡏ w ͜͜Ͱ͍ͬͯΔ͜ͱ w (%5
ϖʔδςʔϒϧͳͲΛ༻ҙͯ͠MPOH NPEFΓସ͑
ϒʔτίʔυΛ༁ʢ̑ʣ (%5ͷॳظԽ gdt_desc: .short gdt_end -‐
gdt -‐ 1 .long gdt ! .align 8 gdt = . -‐ 8 .quad 0x00af9b000000ffff # 64-‐bit code segment .quad 0x00cf93000000ffff # 64-‐bit data segment .quad 0x00cf9b000000ffff # 32-‐bit code segment gdt_end = . ! lgdt gdt_desc
ϒʔτίʔυΛ༁ʢ̑ʣ (%5ͷॳظԽ /* gdtrۭ͍ͯͦ͏ͳదͳྖҬʹஔ͘ */ uint64_t *gdtr = vm_map_gpa(ctx, 0x5000,
sizeof(struct uint64_t) * 4); gdtr[0] = 0x0; gdtr[1] = 0x00af9b000000ffff; gdtr[2] = 0x00cf93000000ffff; gdtr[3] = 0x00cf9b000000ffff; vm_set_desc(ctx, 0, VM_REG_GUEST_GDTR, gdtr, sizeof(struct uint64_t) * 4 -‐ 1, 0);
ϒʔτίʔυΛ༁ʢ̒ʣ ϖʔδςʔϒϧͷॳظԽ .data .align 4096 ident_pt_l4:
.quad ident_pt_l3 + 0x67 .rept 511 .quad 0 .endr ident_pt_l3: .quad ident_pt_l2 + 0x67 .rept 511 .quad 0 .endr ident_pt_l2: index = 0 .rept 512 .quad (index << 21) + 0x1e7 index = index + 1 .endr ! lea ident_pt_l4, %eax mov %eax, %cr3
ϒʔτίʔυΛ༁ʢ̒ʣ ϖʔδςʔϒϧͷॳظԽ uint64_t *PT4; uint64_t *PT3; uint64_t *PT2; /*
PT4-‐2ۭ͍ͯͦ͏ͳదͳྖҬʹஔ͘ */ PT4 = vm_map_gpa(ctx, 0x4000, sizeof(uint64_t) * 512); PT3 = vm_map_gpa(ctx, 0x3000, sizeof(uint64_t) * 512); PT2 = vm_map_gpa(ctx, 0x2000, sizeof(uint64_t) * 512); for (i = 0; i < 512; i++) { PT4[i] = (uint64_t) ADDR_PT3; PT4[i] |= PG_V | PG_RW | PG_U; PT3[i] = (uint64_t) ADDR_PT2; PT3[i] |= PG_V | PG_RW | PG_U; PT2[i] = i * (2 * 1024 * 1024); PT2[i] |= PG_V | PG_RW | PG_PS | PG_U; } vm_set_register(ctx, 0, VM_REG_GUEST_CR3, 0x4000);
ϒʔτίʔυΛ༁ʢ̓ʣ ֤छϨδελͷઃఆ #define BOOT_CR0 ( X86_CR0_PE \
| X86_CR0_WP \ | X86_CR0_PG ) ! #define BOOT_CR4 ( X86_CR4_DE \ | X86_CR4_PSE \ | X86_CR4_PAE \ | X86_CR4_PGE \ | X86_CR4_PCE \ | X86_CR4_OSFXSR \ | X86_CR4_OSXMMEXCPT ) and $~7, %esp mov $BOOT_CR4, %eax mov %eax, %cr4 ← PAE༗ޮͳͲ mov $0xc0000080, %ecx mov $0x00000900, %eax xor %edx, %edx wrmsr ← EFERͷLMEϑϥάΛཱ͍ͯͯΔ mov $BOOT_CR0, %eax mov %eax, %cr0 ← PE,PG༗ޮͳͲ ljmpl $8, $start64 .code64 .global start64 start64:
ϒʔτίʔυΛ༁ʢ̓ʣ ֤छϨδελͷઃఆ vm_set_register(ctx, 0, VM_REG_GUEST_RSP, ADDR_STACK); vm_set_register(ctx, 0,
VM_REG_GUEST_EFER, 0x00000d00); vm_set_register(ctx, 0, VM_REG_GUEST_CR4, 0x000007b8); vm_set_register(ctx, 0, VM_REG_GUEST_CR0, 0x80010001);
ϒʔτίʔυΛ༁ʢ̔ʣ CJUΤϯτϦϙΠϯτ #define BOOT_CR0 ( X86_CR0_PE \
| X86_CR0_WP \ | X86_CR0_PG ) ! #define BOOT_CR4 ( X86_CR4_DE \ | X86_CR4_PSE \ | X86_CR4_PAE \ | X86_CR4_PGE \ | X86_CR4_PCE \ | X86_CR4_OSFXSR \ | X86_CR4_OSXMMEXCPT ) and $~7, %esp mov $BOOT_CR4, %eax mov %eax, %cr4 mov $0xc0000080, %ecx mov $0x00000900, %eax xor %edx, %edx wrmsr mov $BOOT_CR0, %eax mov %eax, %cr0 ljmpl $8, $start64 .code64 .global start64 ← ͜͜ΛRIPʹ͍ͨ͠ start64:
͋ͬʜ w ͜ͷΞυϨεɺϦϯΧͰݻఆ͞ΕͨΓͯ͠ͳ͍ ʜͲ͏͠Αʜ
Αʔ͠ύύ&-'ύʔαͬͪΌ ͏ͧʔ FMG ɺHFMG Λ໊ͬͯؔˠΞυϨεͷมίʔ υΛ࣮ int elfparse_open_memory(char
*image, size_t size, struct elfparse *ep); int elfparse_close(struct elfparse *ep); uintmax_t elfparse_resolve_symbol(struct elfparse *ep, char *name);
ϒʔτίʔυΛ༁ʢ̔ʣ CJUΤϯτϦϙΠϯτ struct elfparse ep; uint64_t start64; if (elfparse_open_memory(target,
0x40 * 4096 * 512, &ep)); start64 = elfparse_resolve_symbol(&ep, "start64"); vm_set_register(ctx, 0, VM_REG_GUEST_RIP, start64);
ʂ # /usr/local/sbin/bhyveosvload -‐m 1024 -‐d ../loader.img osv0 sizeof
e820data=48 cmdline=java.so -‐jar /usr/mgmt/web-‐1.0.0.jar app prod start64:0x208f13 ident_pt_l4:0x8d5000 gdt_desc:0x8d8000 # /usr/sbin/bhyve -‐c 1 -‐m 1024 -‐AI -‐H -‐P -‐g 0 -‐s 0:0,hostbridge -‐s 1:0,virtio-‐net,tap0 -‐s 2:0,virtio-‐blk,../loader.img -‐S 31,uart,stdio osv0 ACPI: RSDP 0xf0400 00024 (v02 BHYVE ) ACPI: XSDT 0xf0480 00034 (v01 BHYVE BVXSDT 00000001 INTL 20130823) ACPI: APIC 0xf0500 0004A (v01 BHYVE BVMADT 00000001 INTL 20130823) ACPI: FACP 0xf0600 0010C (v05 BHYVE BVFACP 00000001 INTL 20130823) ACPI: DSDT 0xf0800 000F2 (v02 BHYVE BVDSDT 00000001 INTL 20130823) ACPI: FACS 0xf0780 00040 Assertion failed: st == AE_OK (../../drivers/hpet.cc: hpet_init: 171) Aborted
σϞ
·ͱΊ w ϒʔτϩʔμΛҙਂ͘ಡΈղ͚؆ૉͳ$ίʔυͰ࣮͞Ε ͨϗετ04Ͱಈ࡞͢Δήετ04ϩʔμ༁ՄೳͰ͋Δ w #*04ͳΜͯཁΒͳ͔ͬͨ w 6&'*ͳΜͯཁΒͳ͔ͬͨ w MJCWNNBQJͷCJOEJOH͑͋͞ΕεΫϦϓτݴޠͰ࣮͠͏
Δ w ࣮൚༻04ϩʔμͱ͔։ൃ͞Εͭͭ͋Δ IUUQTHJUIVCDPNHSFIBOGSFFCTEHSVCCIZWF