Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

いまどきのVulkan

Fadis
November 20, 2021

 いまどきのVulkan

3DグラフィクスAPI Vulkanの基本と最近のVulkanで使えるようになった機能について解説します
これは2021年11月20日に行われた カーネル/VM探検隊 online part4での発表資料です

動画: https://youtu.be/CIfezfwbA3g
ソースコード: https://github.com/Fadis/gct/tree/kernelvm-online-4

Fadis

November 20, 2021
Tweet

More Decks by Fadis

Other Decks in Programming

Transcript

  1. float x32 Tensor Core ϩʔυετΞ σΟεύον໋ྩΩϟογϡ ϨδελόϯΫ GeForce RTX3080ͷ৔߹ ALU

    εʔύʔεΧϥͷҝͷ ෳࡶͳґଘؔ܎ͷ νΣοΫ౳͸࣋ͨͳ͍ ∴͜ͷϓϩηοα1ݸͷ τϥϯδελ਺͸ খ͘͞཈͑ΒΕΔ Warp (Subgroup)
  2. GPUͷϝϞϦʹσʔλΛૹΔ 0x1000 IOMMU 0x5000 MMU 0x4000 0x1000 ͜ͷίϐʔ͸memcpyͰྑ͍ ͜ͷྖҬͷ֬อ͸ mallocͰྑ͍

    ͜ͷྖҬͷ֬อʹ͸ ઐ༻ͷAPI͕ཁΔ ͜ͷྖҬͷ֬อʹ΋ ઐ༻ͷAPI͕ཁΔ ͜ͷίϐʔΛߦ͏ʹ͸ ઐ༻ͷAPI͕ཁΔ
  3. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ GPUͷϝϞϦʹ ಠཱͨ͠ώʔϓ͕2ͭ CPUͷϝϞϦʹ ಠཱͨ͠ώʔϓ͕1ͭ
  4. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ ͜ͷล͸ ಛघ༻్ͳͷͰ ࠓ͸ແࢹ ϝϞϦλΠϓ ͲΜͳৼΔ෣͍Λ͢Δ ϝϞϦΛ֬อͰ͖Δ͔
  5. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ GPUͷϝϞϦʹ GPUͷΈ͔Βݟ͑ΔྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͢ΔྖҬΛ ֬อͰ͖Δ GPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ
  6. ಛघͳϝϞϦ͸vkAllocateMemoryͰ֬อ VkResult vkAllocateMemory( VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, const VkAllocationCallbacks*

    pAllocator, VkDeviceMemory* pMemory ); typedef struct VkMemoryAllocateInfo { VkStructureType sType; const void* pNext; VkDeviceSize allocationSize; uint32_t memoryTypeIndex; } VkMemoryAllocateInfo; ͜ͷαΠζ ͜ͷϝϞϦλΠϓͷϝϞϦΛ ͘Ε ͜ͷGPU༻ʹ
  7. ֬อͨ͠ϝϞϦΛ ܭࢉʹ࢖͏σʔλΛஔ͘ όοϑΝͱͯ͠࢖͏ ͱ͍͏ҙࢥදࣔΛ͢Δ VkResult vkCreateBuffer( VkDevice device, const VkBufferCreateInfo*

    pCreateInfo, const VkAllocationCallbacks* pAllocator, VkBuffer* pBuffer ); typedef struct VkBufferCreateInfo { VkStructureType sType; const void* pNext; VkBufferCreateFlags flags; VkDeviceSize size; VkBufferUsageFlags usage; VkSharingMode sharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; } VkBufferCreateInfo; ͜ͷαΠζͷ ͜ͷGPU༻ʹ ͜Μͳ༻్ͷόοϑΝΛ ࡞ͬͯ VkDeviceMemory VkBuffer ϝϞϦͷத਎͸൚༻తͳσʔλͰ͢
  8. ֬อͨ͠ϝϞϦΛ ܭࢉʹ࢖͏σʔλΛஔ͘ όοϑΝͱͯ͠࢖͏ ͱ͍͏ҙࢥදࣔΛ͢Δ VkResult vkCreateBuffer( VkDevice device, const VkBufferCreateInfo*

    pCreateInfo, const VkAllocationCallbacks* pAllocator, VkBuffer* pBuffer ); typedef struct VkBufferCreateInfo { VkStructureType sType; const void* pNext; VkBufferCreateFlags flags; VkDeviceSize size; VkBufferUsageFlags usage; VkSharingMode sharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; } VkBufferCreateInfo; ͜ͷαΠζͷ ͜ͷGPU༻ʹ VkResult vkBindBufferMemory( VkDevice device, VkBuffer buffer, VkDeviceMemory memory, VkDeviceSize memoryOffset ); ͜ͷϝϞϦΛ ࢖͏ ͜ͷόοϑΝ͸ ͜Μͳ༻్ͷόοϑΝΛ ࡞ͬͯ
  9. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} CPU͔Βݟ͑Δଐੑͷ͍ͭͨϝϞϦ͸ GPUͷϝϞϦʹ GPUͷΈ͔Βݟ͑ΔྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͢ΔྖҬΛ ֬อͰ͖Δ GPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ
  10. { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags":

    0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} CPU͕Ωϟογϡ͢ΔྖҬΛ ֬อͰ͖Δ GPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ VkResult vkMapMemory( VkDevice device, VkDeviceMemory memory, VkDeviceSize offset, VkDeviceSize size, VkMemoryMapFlags flags, void** ppData ); ͜ͷϝϞϦͷ ઌ಄ΞυϨε͕ฦͬͯ͘Δ vkMapMemory͔ͯ͠ΒvkUnmapMemory͢Δ·Ͱͷؒ ϓϩηεͷΞυϨεۭؒʹϝϞϦ͕Ϛοϓ͞ΕΔ ͜ͷҐஔ͔Β ͜ͷ௕͞ͷൣғͷ
  11. "queue_family": [ { "basic": { "minImageTransferGranularity": { ... }, "queueCount":

    16, "queueFlags": 15, "timestampValidBits": 64 } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount": 2, "queueFlags": 12, "timestampValidBits": 64 } vkGetPhysicalDeviceQueueFamilyPropertiesͰ࢖͑ΔΩϡʔΛௐ΂Δ άϥϑΟοΫʹؔΘΔίϚϯυΛྲྀͤΔ GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕16ຊ GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕2ຊ
  12. } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount":

    2, "queueFlags": 12, "timestampValidBits": 64 } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount": 8, "queueFlags": 14, "timestampValidBits": 64 } }, GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕2ຊ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕8ຊ GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕ 8ج͋Δͱ͍͏͜ͱ
  13. } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount":

    2, "queueFlags": 12, "timestampValidBits": 64 } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount": 8, "queueFlags": 14, "timestampValidBits": 64 } }, GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕2ຊ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕8ຊ GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕ 8ج͋Δͱ͍͏͜ͱ
  14. ίϚϯυϓʔϧ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ⋯ ίϚϯυόοϑΝ ίϚϯυ vkAllocateCommandBuffers ίϚϯυ͸ઐ༻ͷϝϞϦʹ ੵ·ͳ͚Ε͹ͳΒͳ͍ࣄ͕͋ΔͷͰ ઐ༻ͷϝϞϦϓʔϧ͔ΒׂΓ౰ͯ

    vkCreateCommandPool σόΠε ϓʔϧΛ࡞੒ ίϚϯυόοϑΝΛऔಘ vkFreeCommandBuffers ίϚϯυόοϑΝΛฦ٫ ࢖͍ऴΘͬͨΒ
  15. ίϚϯυϓʔϧ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ⋯ ίϚϯυόοϑΝ vkCmdCopyBuffer vkAllocateCommandBuffers vkCreateCommandPool vkCmdCopyBufferΛ ίϚϯυόοϑΝʹੵΜͰ

    ΩϡʔʹSubmit࣮ͯ͠ߦ VkResult vkQueueSubmit( VkQueue queue, uint32_t submitCount, const VkSubmitInfo* pSubmits, VkFence fence ); ͜ͷΩϡʔʹ
  16. vkCmdCopyBuffer ίϚϯυόοϑΝʹੵΜͰ ΩϡʔʹSubmit࣮ͯ͠ߦ VkResult vkQueueSubmit( VkQueue queue, uint32_t submitCount, const

    VkSubmitInfo* pSubmits, VkFence fence ); ͜ͷΩϡʔʹ typedef struct VkSubmitInfo { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; const VkPipelineStageFlags* pWaitDstStageMask; uint32_t commandBufferCount; const VkCommandBuffer* pCommandBuffers; uint32_t signalSemaphoreCount; const VkSemaphore* pSignalSemaphores; } VkSubmitInfo; ͜ͷ ίϚϯυόοϑΝΛ ྲྀͯ͠
  17. VkResult vkQueueSubmit( VkQueue queue, uint32_t submitCount, const VkSubmitInfo* pSubmits, VkFence

    fence ); VkResult vkWaitForFences( VkDevice device, uint32_t fenceCount, const VkFence* pFences, VkBool32 waitAll, uint64_t timeout ); ͜͜ͰSubmitͨ͠ ίϚϯυόοϑΝͷ ಺༰͕ ׬ྃ͢Δ͔ timeoutͷ࣌ؒܦա͢Δ·Ͱ ଴ػͯ͠ VkResult vkCreateFence( VkDevice device, const VkFenceCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkFence* pFence ); FenceΛ࡞ͬͯ׬ྃ௨஌Λड͚औΔ
  18. --- gcn.list 2021-11-09 02:04:47.899271324 +0900 +++ rdna2.list 2021-11-09 02:22:47.976688357 +0900

    @@ -1,29 +1,41 @@ -V_ADDC_U32 +V_ADD3_U32 +V_ADD_CO_CI_U32 +V_ADD_CO_U32 +V_ADD_F16 V_ADD_F32 V_ADD_F64 -V_ADD_I32 +V_ADD_LSHL_U32 +V_ADD_NC_I16 +V_ADD_NC_I32 +V_ADD_NC_U16 +V_ADD_NC_U32 V_ALIGNBIT_B32 V_ALIGNBYTE_B32 V_AND_B32 -V_ASHRREV_I32 -V_ASHR_I32 -V_ASHR_I64 +V_AND_OR_B32 +V_ASHRREV_B32 +V_ASHRREV_I16 +V_ASHRREV_I64 V_BCNT_U32_B32 V_BFE_I32 V_BFE_U32 V_BFI_B32 V_BFM_B32 V_BFREV_B32 +V_CEIL_F16 V_CEIL_F32 V_CEIL_F64 V_CLREXCP V_CNDMASK_B32 +V_COS_F16 V_COS_F32 V_CUBEID_F32 V_CUBEMA_F32 V_CUBESC_F32 V_CUBETC_F32 V_CVT_F16_F32 +V_CVT_F16_I16 +V_CVT_F16_U16 V_CVT_F32_F16 V_CVT_F32_F64 V_CVT_F32_I32 @@ -36,135 +48,205 @@ V_CVT_F64_I32 V_CVT_F64_U32 V_CVT_FLR_I32_F32 +V_CVT_I16_F16 V_CVT_I32_F32 V_CVT_I32_F64 +V_CVT_NORM_I16_F16 V_MAC_F32 -V_MAC_LEGACY_F32 -V_MADAK_F32 -V_MADI64_I32 -V_MADMK_F32 -V_MADU64_U32 -V_MAD_F32 +V_MAD_I16 +V_MAD_I32_I16 V_MAD_I32_I24 -V_MAD_LEGACY_F32 +V_MAD_I64_I32 +V_MAD_U16 +V_MAD_U32_U16 V_MAD_U32_U24 +V_MAD_U64_U32 +V_MAX3_F16 V_MAX3_F32 +V_MAX3_I16 V_MAX3_I32 +V_MAX3_U16 V_MAX3_U32 +V_MAX_F16 V_MAX_F32 V_MAX_F64 +V_MAX_I16 V_MAX_I32 -V_MAX_LEGACY_F32 +V_MAX_U16 V_MAX_U32 V_MBCNT_HI_U32_B32 V_MBCNT_LO_U32_B32 +V_MED3_F16 V_MED3_F32 V_MED3_I32 V_MED3_U32 +V_MIN3_F16 V_MIN3_F32 +V_MIN3_I16 V_MIN3_I32 +V_MIN3_U16 V_MIN3_U32 +V_MIN_F16 V_MIN_F32 V_MIN_F64 +V_MIN_I16 V_MIN_I32 -V_MIN_LEGACY_F32 +V_MIN_U16 V_MIN_U32 V_MOVRELD_B32 +V_MOVRELSD_2_B32 V_MOVRELSD_B32 V_MOVRELS_B32 V_MOV_B32 +V_MOV_FED_B32 V_MQSAD_PK_U16_U8 AMD GCNͱAMD RDNA2ͷ ϕΫλԋࢉ໋ྩͷdiff ݁ߏͳ਺ͷ໋ྩ͕ ৽͍͠RDNA2Ͱ͸ ࡟আ͞Ε͍ͯΔ GPU͸ಉ͡ϕϯμͰ͋ͬͯ΋ ໋ྩηοτͷޓ׵ੑ͸ͳ͘ͳΓ͕ͪ
  19. GPU Aͷ ࣮ߦՄೳόΠφϦ GPU A GPU B GPU C GPUͷ࣮ߦՄೳόΠφϦΛ

    ௚઀༻ҙ࣮ͯ͠ߦ͢Δͱ ಛఆͷGPUͰ͔͠ಈ͔ͳ͘ͳΔ ϋʔυ΢ΣΞΛݶఆͰ͖ΔՈఉ༻ήʔϜػ͸͜ΕΛ΍͍ͬͯΔ ࣮ߦ࣌ ίϯύΠϧ࣌
  20. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; GPU A GPU B GPU C GLSL(ߴڃݴޠ) ࣮ߦ࣌ ίϯύΠϧ࣌ OpenGLͷ৔߹ ࣮ߦ࣌ʹγΣʔμΛ ίϯύΠϧ͢Δ ͕͔͔࣌ؒΔ
  21. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; ߴڃݴޠ a b × + 3 a b × + 3 ࣮ߦՄೳόΠφϦ AST AST ࣈ۟ղੳ ߏจղੳ λʔήοτ ඇґଘͷ ࠷దԽ λʔήοτ όΠφϦͷ ੜ੒ ίϯύΠϥͷॲཧ͸େ͖͘෼͚ͯ4ஈ֊ a b × + 3 AST λʔήοτ ݻ༗ͷ ࠷దԽ
  22. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; ߴڃݴޠ a b × + 3 a b × + 3 ࣮ߦՄೳόΠφϦ AST AST ࣈ۟ղੳ ߏจղੳ λʔήοτ ඇґଘͷ ࠷దԽ λʔήοτ όΠφϦͷ ੜ੒ a b × + 3 AST λʔήοτ ݻ༗ͷ ࠷దԽ ͜ͷ෦෼͸GPUຖʹߦ͏ඞཁ͕͋ΔͷͰ ࣮ߦ࣌ʹ΍Β͟ΔΛಘͳ͍ ͜ͷ෦෼͸ ࣄલʹย෇͚ͯ΋໰୊ͳ͍ a b × + 3 ͜ͷஈ֊ͷASTΛ όΠφϦܗࣜͰ γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏
  23. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; ߴڃݴޠ a b × + 3 a b × + 3 ࣮ߦՄೳόΠφϦ AST AST ࣈ۟ղੳ ߏจղੳ λʔήοτ ඇґଘͷ ࠷దԽ λʔήοτ όΠφϦͷ ੜ੒ a b × + 3 AST λʔήοτ ݻ༗ͷ ࠷దԽ ͜ͷ෦෼͸ ࣄલʹย෇͚ͯ΋໰୊ͳ͍ a b × + 3 SPIR-V ͜ͷஈ֊ͷASTΛ όΠφϦܗࣜͰ γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏
  24. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; GPU A GPU B GPU C GLSL(ߴڃݴޠ) ࣮ߦ࣌ ίϯύΠϧ࣌ Vulkanͷ৔߹ a b × + 3 glslc SPIR-V vkCreateShaderModule
  25. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } ؆୯ͳGLSLͷྫ
  26. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } όοϑΝ
  27. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } εϨουID͔Β όοϑΝͷͲ͜ʹॻ͔ܾ͘ΊΔ
  28. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } όοϑΝͷ1ཁૉʹ1ΛՃ͑Δ value͸1 ࣮ߦ͢Δ౓ʹόοϑΝͷ஋ΛΠϯΫϦϝϯτ͢Δ
  29. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } binding = 1ͷόοϑΝΛ output_dataͱ݁ͼ͚ͭΔ binding = 1ͷόοϑΝͬͯͲͷόοϑΝͷ͜ͱ?
  30. σεΫϦϓληοτ όοϑΝ# CJOEJOH όοϑΝ" CJOEJOH όοϑΝ$ CJOEJOH ⋮ όοϑΝA όοϑΝB

    όοϑΝC #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enabl #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : ena layout(local_size_x_id = 1, local_size_y_id = 2 ) layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWo const uint index = x + y * width; output_data[ index ] += value; } ॻ͖ࠐΈ γΣʔμͷbindingͱvkCreateBufferͰ࡞ͬͨόοϑΝΛରԠ෇͚Δ vkUpdateDescriptorSetsͰొ࿥
  31. σεΫϦϓλϓʔϧ σεΫϦϓληοτ ⋮ όοϑΝA όοϑΝB όοϑΝC σεΫϦϓληοτ͸ ϋʔυ΢ΣΞͷ ݶΒΕͨϨδελΛ ࢖͏Մೳੑ͕͋Δ

    σεΫϦϓληοτ ⋮ ⋯ σεΫϦϓληοτ͸σεΫϦϓλϓʔϧ͔ΒׂΓ౰ͯΔ vkAllocateDescriptorSets ཁΒͳ͘ͳͬͨΒ vkFreeDescriptorSets Ͱฦ٫
  32. σεΫϦϓλϓʔϧ σεΫϦϓληοτ όοϑΝA όοϑΝB όοϑΝC σεΫϦϓληοτ ⋮ ⋯ σεΫϦϓληοτϨΠΞ΢τ όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ

    σεΫϦϓληοτΛ͍ͩ͘͞ ԿΛରԠ͚ͮΔҝͷ σεΫϦϓλ͕ Կݸ༻ҙ͞Ε͍ͯΔ σεΫϦϓληοτ͕ ཉ͍͔͠Λද͢ σεΫϦϓληοτϨΠΞ΢τ
  33. σεΫϦϓλϓʔϧ σεΫϦϓληοτ όοϑΝA όοϑΝB όοϑΝC σεΫϦϓληοτ ⋮ ⋯ σεΫϦϓληοτϨΠΞ΢τ όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ

    σεΫϦϓληοτΛ͍ͩ͘͞ ԿΛରԠ͚ͮΔҝͷ σεΫϦϓλ͕ Կݸ༻ҙ͞Ε͍ͯΔ σεΫϦϓληοτ͕ ཉ͍͔͠Λද͢ σεΫϦϓληοτϨΠΞ΢τ SPIR-VΛ ಡΜͩΒΘ͔ΔͷͰ͸ a b × + 3
  34. SPIR-VΛ ಡΜͩΒΘ͔ΔͷͰ͸ a b × + 3 Q. A. Θ͔Δ

    ͳͷͰSPIR-V͔ΒbindingΛ ړΔϥΠϒϥϦ͕͋Δ SPIRV-Reflect https://github.com/KhronosGroup/SPIRV-Reflect ϕϯμʔຖͷGPUͷυϥΠόʹ ͜ͷػೳΛ࣮૷͠ͳͯ͘ྑ͍
  35. γΣʔμϞδϡʔϧͱσεΫϦϓληοτϨΠΞ΢τΛ͚ͬͭ͘Δ ͬͭ͘͘=์ஔ͞ΕΔbinding͸ଘࡏ͠ͳ͍ ίϯϐϡʔτύΠϓϥΠϯ VkResult vkCreateComputePipelines( VkDevice device, VkPipelineCache pipelineCache, uint32_t

    createInfoCount, const VkComputePipelineCreateInfo* pCreateInfos, const VkAllocationCallbacks* pAllocator, VkPipeline* pPipelines ); typedef struct VkComputePipelineCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCreateFlags flags; VkPipelineShaderStageCreateInfo stage; VkPipelineLayout layout; VkPipeline basePipelineHandle; int32_t basePipelineIndex; } VkComputePipelineCreateInfo;
  36. ίϯϐϡʔτύΠϓϥΠϯ typedef struct VkComputePipelineCreateInfo { VkStructureType sType; const void* pNext;

    VkPipelineCreateFlags flags; VkPipelineShaderStageCreateInfo stage; VkPipelineLayout layout; VkPipeline basePipelineHandle; int32_t basePipelineIndex; } VkComputePipelineCreateInfo; typedef struct VkPipelineShaderStageCreateInfo { VkStructureType sType; const void* pNext; VkPipelineShaderStageCreateFlags flags; VkShaderStageFlagBits stage; VkShaderModule module; const char* pName; const VkSpecializationInfo* pSpecializationInfo; } VkPipelineShaderStageCreateInfo; γΣʔμ Ϟδϡʔϧ
  37. ίϯϐϡʔτύΠϓϥΠϯ VkPipelineShaderStageCreateInfo stage; VkPipelineLayout layout; VkPipeline basePipelineHandle; int32_t basePipelineIndex; }

    VkComputePipelineCreateInfo; VkResult vkCreatePipelineLayout( VkDevice device, const VkPipelineLayoutCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkPipelineLayout* pPipelineLayout ); typedef struct VkPipelineLayoutCreateInfo { VkStructureType sType; const void* pNext; VkPipelineLayoutCreateFlags flags; uint32_t setLayoutCount; const VkDescriptorSetLayout* pSetLayouts; uint32_t pushConstantRangeCount; const VkPushConstantRange* pPushConstantRanges; } VkPipelineLayoutCreateInfo; σεΫϦϓλ ηοτ ϨΠΞ΢τ
  38. ύΠϓϥΠϯΩϟογϡ VkResult vkCreateComputePipelines( VkDevice device, VkPipelineCache pipelineCache, uint32_t createInfoCount, const

    VkComputePipelineCreateInfo* pCreateInfos, const VkAllocationCallbacks* pAllocator, VkPipeline* pPipelines ); Ұ౓࡞ͬͨ ࣮ߦՄೳόΠφϦ౳Λ͓֮͑ͯ͘ ͜Ε Ҏલͱಉ͡಺༰ͰύΠϓϥΠϯͷ࡞੒Λཁٻ͞ΕͨΒ Ωϟογϡͷ಺༰Λ࢖͏
  39. ύΠϓϥΠϯΩϟογϡ VkPipelineCache pipelineCache, uint32_t createInfoCount, const VkComputePipelineCreateInfo* pCreateInfos, const VkAllocationCallbacks*

    pAllocator, VkPipeline* pPipelines ); VkResult vkCreatePipelineCache( VkDevice device, const VkPipelineCacheCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkPipelineCache* pPipelineCache ); typedef struct VkPipelineCacheCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCacheCreateFlags flags; size_t initialDataSize; const void* pInitialData; } VkPipelineCacheCreateInfo;
  40. ύΠϓϥΠϯΩϟογϡ VkResult vkCreatePipelineCache( VkDevice device, const VkPipelineCacheCreateInfo* pCreateInfo, const VkAllocationCallbacks*

    pAllocator, VkPipelineCache* pPipelineCache ); typedef struct VkPipelineCacheCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCacheCreateFlags flags; size_t initialDataSize; const void* pInitialData; } VkPipelineCacheCreateInfo; VkResult vkGetPipelineCacheData( VkDevice device, VkPipelineCache pipelineCache, size_t* pDataSize, void* pData ); ೋ࣍هԱ ࣍ճىಈ࣌͸ γΣʔμͷ ίϯύΠϧΛճආ
  41. ύΠϓϥΠϯΩϟογϡ VkResult vkCreatePipelineCache( VkDevice device, const VkPipelineCacheCreateInfo* pCreateInfo, const VkAllocationCallbacks*

    pAllocator, VkPipelineCache* pPipelineCache ); typedef struct VkPipelineCacheCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCacheCreateFlags flags; size_t initialDataSize; const void* pInitialData; } VkPipelineCacheCreateInfo; VkResult vkGetPipelineCacheData( VkDevice device, VkPipelineCache pipelineCache, size_t* pDataSize, void* pData ); ೋ࣍هԱ ࣍ճىಈ࣌͸ γΣʔμͷ ίϯύΠϧΛճආ
  42. [v1 , v2 , v3 , v4 , v5 ,

    v6 , v7 , v8 , v9 , v10] ͋ͱඞཁͳͷ͸ԿεϨουͰ࣮ߦ͢Δ͔ void vkCmdDispatch( VkCommandBuffer commandBuffer, uint32_t groupCountX, uint32_t groupCountY, uint32_t groupCountZ ); ͜ͷίϚϯυόοϑΝʹ ݸͷεϨουͰ࣮ߦΛ։࢝͢ΔཁٻΛੵΉ groupCountx × groupCounty × groupCountz ͜ͷίϚϯυΛΩϡʔʹྲྀ͢ͱGPUͰγΣʔμ͕࣮ߦ͞ΕΔ
  43. void vkCmdPipelineBarrier( VkCommandBuffer commandBuffer, VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags dependencyFlags,

    uint32_t memoryBarrierCount, const VkMemoryBarrier* pMemoryBarriers, uint32_t bufferMemoryBarrierCount, const VkBufferMemoryBarrier* pBufferMemoryBarriers, uint32_t imageMemoryBarrierCount, const VkImageMemoryBarrier* pImageMemoryBarriers ); typedef struct VkBufferMemoryBarrier { VkStructureType sType; const void* pNext; VkAccessFlags srcAccessMask; VkAccessFlags dstAccessMask; uint32_t srcQueueFamilyIndex; uint32_t dstQueueFamilyIndex; VkBuffer buffer; VkDeviceSize offset; VkDeviceSize size; } VkBufferMemoryBarrier; ͜ͷόοϑΝ
  44. VkDependencyFlags dependencyFlags, uint32_t memoryBarrierCount, const VkMemoryBarrier* pMemoryBarriers, uint32_t bufferMemoryBarrierCount, const

    VkBufferMemoryBarrier* pBufferMemoryBarriers, uint32_t imageMemoryBarrierCount, const VkImageMemoryBarrier* pImageMemoryBarriers ); typedef struct VkBufferMemoryBarrier { VkStructureType sType; const void* pNext; VkAccessFlags srcAccessMask; VkAccessFlags dstAccessMask; uint32_t srcQueueFamilyIndex; uint32_t dstQueueFamilyIndex; VkBuffer buffer; VkDeviceSize offset; VkDeviceSize size; } VkBufferMemoryBarrier; ͜ͷόοϑΝ όϦΞͷલʹ͜ͷόοϑΝΛ৮ͬͨίϚϯυ͕׬ྃ͢Δ·Ͱ όϦΞͷޙͰ͜ͷόοϑΝΛ৮ΔίϚϯυΛ։࢝ͯ͠͸͍͚·ͤΜ
  45. { auto mapped = staging_buffer->map< float >(); std::fill( mapped.begin(), mapped.end(),

    0.f ); } { auto rec = command_buffer->begin(); rec.copy( staging_buffer, device_local_buffer ); rec.barrier( vk::AccessFlagBits::eTransferWrite, vk::AccessFlagBits::eShaderRead, vk::PipelineStageFlagBits::eTransfer, vk::PipelineStageFlagBits::eComputeShader, vk::DependencyFlagBits( 0 ), { device_local_buffer }, {} ); rec.bind_descriptor_set( vk::PipelineBindPoint::eCompute, pipeline_layout, descriptor_set ); θϩΫϦΞͨ͠ ϝϞϦΛ GPUʹૹͬͯ ίϐʔ׬ྃΛ ଴͔ͬͯΒ
  46. rec.bind_descriptor_set( vk::PipelineBindPoint::eCompute, pipeline_layout, descriptor_set ); rec.bind_pipeline( vk::PipelineBindPoint::eCompute, pipeline ); rec->dispatch(

    4, 2, 1 ); rec.barrier( vk::AccessFlagBits::eShaderWrite, vk::AccessFlagBits::eTransferRead, vk::PipelineStageFlagBits::eComputeShader, vk::PipelineStageFlagBits::eTransfer, vk::DependencyFlagBits( 0 ), { device_local_buffer }, {} ); rec.copy( device_local_buffer, staging_buffer ); } σεΫϦϓληοτΛ ࢦఆͯ͠ ύΠϓϥΠϯΛ ࢦఆͯ͠ ࣮ߦͯ͠ ࣮ߦͷ׬ྃΛ ଴͔ͬͯΒ
  47. vk::PipelineStageFlagBits::eComputeShader, vk::PipelineStageFlagBits::eTransfer, vk::DependencyFlagBits( 0 ), { device_local_buffer }, {} );

    rec.copy( device_local_buffer, staging_buffer ); } command_buffer->execute( gct::submit_info_t() ); command_buffer->wait_for_executed(); std::vector< float > host; host.reserve( 1024 ); { auto mapped = staging_buffer->map< float >(); std::copy( mapped.begin(), mapped.end(), std::back_inserter( host ) ); } unsigned int count; nlohmann::json json = host; std::cout << json.dump( 2 ) << std::endl; CPUଆʹίϐʔ JSONʹͯ͠μϯϓ ͜͜·Ͱͷ಺༰ΛΩϡʔʹྲྀͯ͠ ίϚϯυͷ׬ྃΛ଴ͬͯ GPU͔Βདྷͨ σʔλΛ
  48. $ ./src/compute [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,

    1.0, 1.0, ... 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] શ෦ΠϯΫϦϝϯτ͞ΕͯΔ
  49. VkResult vkCreateImage( VkDevice device, const VkImageCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator,

    VkImage* pImage ); typedef struct VkImageCreateInfo { VkStructureType sType; const void* pNext; VkImageCreateFlags flags; VkImageType imageType; VkFormat format; VkExtent3D extent; uint32_t mipLevels; uint32_t arrayLayers; VkSampleCountFlagBits samples; VkImageTiling tiling; VkImageUsageFlags usage; VkSharingMode sharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; VkImageLayout initialLayout; } VkImageCreateInfo; ༻్ VkImage࡞੒࣌ʹ ༻్Λࢦఆ͢Δ ༻్͸ϏοτϑϥάͰ ෳ਺ࢦఆͯ͠΋ྑ͍ VK_IMAGE_USAGE_TRANSFER_DST_BIT| VK_IMAGE_USAGE_SAMPLED_BIT ྫ vkCopyImageͷड͚ଆ͔ͭ ςΫενϟαϯϓϦϯάର৅
  50. void vkCmdPipelineBarrier( VkCommandBuffer commandBuffer, VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags dependencyFlags,

    uint32_t memoryBarrierCount, const VkMemoryBarrier* pMemoryBarriers, uint32_t bufferMemoryBarrierCount, const VkBufferMemoryBarrier* pBufferMemoryBarriers, uint32_t imageMemoryBarrierCount, const VkImageMemoryBarrier* pImageMemoryBarriers ); typedef struct VkImageMemoryBarrier { VkStructureType sType; const void* pNext; VkAccessFlags srcAccessMask; VkAccessFlags dstAccessMask; VkImageLayout oldLayout; VkImageLayout newLayout; uint32_t srcQueueFamilyIndex; uint32_t dstQueueFamilyIndex; VkImage image; VkImageSubresourceRange subresourceRange; } VkImageMemoryBarrier; ͜ͷΠϝʔδΛ ͜ͷϨΠΞ΢τ͔Β ͜ͷϨΠΞ΢τʹ όϦΞ͢Δ͍ͭͰʹ ΠϝʔδͷϨΠΞ΢τΛ มߋͰ͖Δ
  51. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(set = 0, binding = 0, rgba8) uniform writeonly image2D img; void main() { ... imageStore( img, ivec2( pos.xy ), color ); } Storage ImageΛ࢖͏ͱ ίϯϐϡʔτύΠϓϥΠϯ͔ΒΠϝʔδΛಡΈॻ͖Ͱ͖Δ color͸pos.xyͷҐஔͷϐΫηϧ͕ஔ͔ΕΔ΂͖Ґஔʹॻ͔ΕΔ
  52. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ 3DάϥϑΟΫεͷ ඳըखॱͷॴʑͰ ઐ༻ͷϋʔυ΢ΣΞΛ ࢖͍͍ͨ ϋʔυ΢ΣΞ
  53. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ ࢒ΓͷεςοϓͦΕͧΕʹ SPIR-VΛ݁ͼ͚ͭΔ a b × + 3 a b × + 3 a b × + 3 a b × + 3 a b × + 3 a b × + 3 ϋʔυ΢ΣΞ
  54. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend
  55. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend ࣮ߦ࣌ʹಈతʹมߋͰ͖Δ ඞཁ͕͋ΔઃఆΛࢦఆ͢Δ
  56. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Ϩϯμʔύε ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯΛଋͶͨ΋ͷ
  57. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend VkImage ϚϧνύεϨϯμϦϯά VkImage 1ஈ֊໨ͷϨϯμϦϯάͷ݁ՌΛ ೖྗͱͯ͠2ஈ֊໨ͷϨϯμϦϯάΛߦ͏ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend
  58. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend VkImage VkImage VkImage ࠲ඪ ๏ઢ ਂ౓ VkImage ࡐ࣭ VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assemb VS TCS Tessellation TES GS Rasterization FS Color Blend র໌ র໌ র໌ GόοϑΝ
  59. VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS

    Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend র໌ র໌ র໌ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend ∑ VkImage ϨϯμϦϯά݁Ռ
  60. VS TCS sellation TES GS erization FS or Blend Image

    Image Image Image VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend র໌ র໌ র໌ In R ∑ V ϨϯμϦϯά݁Ռ ͜͜Ͱશͯͷর໌Λ ॱʹܭࢉ͢ΔΑΓεέʔϧ͢Δ
  61. Tessellation TES GS Rasterization FS Color Blend VkImage VkImage VkImage

    ࠲ඪ ๏ઢ ਂ౓ VkImage ࡐ࣭ VkImage VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage VS TCS Tessellation TES GS Rasterization FS Color Blend VkIma VS TCS Tessellati TES GS Rasterizat FS Color Ble ϨϯμϦ GόοϑΝʹ࢒Βͳ͔ͬͨ(=ଞͷ΋ͷͷഎޙʹ͋ͬͯݟ͑ͳ͍) ϐΫηϧ͸ҎޙͷܭࢉʹݱΕͳ͍
  62. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend VkImage VkImage VkImage ࠲ඪ ๏ઢ ਂ౓ VkImage ࡐ࣭ VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkIm Input A V TC Tesse TE G Raste F Color র໌ র໌ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend র໌1ͷҐஔ͔Β ϨϯμϦϯά VkImage ਂ౓ র໌1ͷҐஔ͔Βͷ ϨϯμϦϯά݁Ռʹө͍ͬͯͳ͍ͳΒ ͦ͜ʹ͸র໌1ͷޫ͕ಧ͔ͳ͍
  63. VkImage TES GS Rasterization FS Color Blend VkImage TES GS

    Rasterization FS Color Blend VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage ϨϯμϦϯά݁Ռʹը૾ॲཧΛߦ͏ ϨϯμϦϯά݁Ռ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage ඃࣸքਂ౓ޮՌ τʔϯϚοϓͳͲ ը૾ॲཧ͞ΕͨϨϯμϦϯά݁Ռ
  64. ίϚϯυόοϑΝ vkCmdPipelineBarrier Input Assembly VS TCS Tessellation TES GS Rasterization

    FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend όϦΞͰ ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯͷ࣮ߦʹ ґଘؔ܎Λ࣋ͨͤΕ͹ྑ͍ͷͰ͸ ͜ͷํ๏Ͱ΋Ͱ͖Δ ͔͜͠͠ͷํ๏Ͱ͸ ϞόΠϧGPUͰੑೳ͕ग़ͳ͍ ύΠϓϥΠϯΛ࣮ߦ ύΠϓϥΠϯΛ࣮ߦ
  65. CPU GPU ࡉ͍ ଠ͍ SRAM 1 1 2 όϦΞ 1ύε໨Λ1ը໘෼ϝΠϯϝϞϦʹు͍͔ͯΒ

    ϝΠϯϝϞϦΛಡΜͰ2ύε໨Λܭࢉ࢝͠ΊΔ όϦΞΛ࢖ͬͨ Ϛϧνύεͷ৔߹
  66. Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS

    Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Ϩϯμʔύε಺ͷෳ਺ͷύΠϓϥΠϯ͸ ೖग़ྗʹґଘؔ܎Λ࣋ͨͤΔ͜ͱ͕Ͱ͖Δ ͨͩ͠B΍Cͷ ͷϐΫηϧΛܭࢉ͢Δ࣌ ಡΊΔ͜ͱ͕อূ͞ΕΔͷ͸Aͷ ͷҐஔͷ஋͚ͩ (x, y) (x, y) " # $
  67. CPU GPU ࡉ͍ ଠ͍ SRAM 1 2 Ϩϯμʔύεͷ ৔߹ 1ͭͷλΠϧʹର͢Δ

    ෳ਺ͷύΠϓϥΠϯͷॲཧΛ Ұ౓ʹ࣮ߦ ϝΠϯϝϞϦ΁ͷ ॻ͖ࠐΈ͸ ࠷ޙͷ1౓͚ͩ
  68. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ X Window System Wayland Compositor Windows DWM etc.

    Vulkan ΞϓϦέʔγϣϯ ը໘ʹૹΔө૾Λॻ͖ࠐΉҝͷϝϞϦ͸ ଟ͘ͷ৔߹ίϯϙδλ͕઎༗͍ͯ͠Δ
  69. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ X Window System Wayland Compositor Windows DWM etc.

    Vulkan ΞϓϦέʔγϣϯ ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏ ඳը಺༰ͷॻ͖ࠐΈઌ͍ͩ͘͞ ͜͜ʹඳը಺༰Λ ౉͍ͯͩ͘͠͞ αʔϑΣε
  70. ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏ ϓϥοτϑΥʔϜݻ༗ͷϋϯυϥͰ Windows X11 Wayland Android Fuchsia iOS GGP

    Nintendo Switch HWND xcb_window_t* wl_surface* ANativeWindow* zx_handle_t CAMetalLayer* GgpStreamDescriptor void*
  71. HWND xcb_window_t* wl_surface* ANativeWindow* zx_handle_t CAMetalLayer* GgpStreamDescriptor void* vkCreateWin32SurfaceKHR vkCreateImagePipeSurfaceFUCHSIA

    VkSurfaceKHR vkGetPhysicalDeviceXcbPresentationSupportKHR vkCreateIOSSurfaceMVK vkGetPhysicalDeviceWaylandPresentationSupportKHR vkCreateStreamDescriptorSurfaceGGP vkGetPhysicalDeviceWaylandPresentationSupportKHR vkCreateViSurfaceNN
  72. VkResult vkCreateSwapchainKHR( VkDevice device, const VkSwapchainCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator,

    VkSwapchainKHR* pSwapchain ); typedef struct VkSwapchainCreateInfoKHR { VkStructureType sType; const void* pNext; VkSwapchainCreateFlagsKHR flags; VkSurfaceKHR surface; uint32_t minImageCount; VkFormat imageFormat; VkColorSpaceKHR imageColorSpace; VkExtent2D imageExtent; uint32_t imageArrayLayers; VkImageUsageFlags imageUsage; VkSharingMode imageSharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; VkSurfaceTransformFlagBitsKHR preTransform; VkCompositeAlphaFlagBitsKHR compositeAlpha; VkPresentModeKHR presentMode; VkBool32 clipped; VkSwapchainKHR oldSwapchain; } VkSwapchainCreateInfoKHR; ͜ͷຕ਺͘Ε ͜ͷαʔϑΣεʹ ౉ͨ͢Ίͷ ΠϝʔδΛ
  73. εϫοϓνΣʔϯ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory

    VkImage ͜ͷΠϝʔδ͸ ͜ͷϨΠΞ΢τʹ͔͠ͳΕ·ͤΜ ͜ͷϝϞϦ͸ίϯϙδλͷ ϓϩηεͱڞ༗͞Ε͍ͯ·͢ εϫοϓνΣʔϯ͸ ϝϞϦׂ͕Γ౰ͯΒΕͨ Πϝʔδͷଋ ίϯϙδλͷ౎߹Ͱ ϨΠΞ΢τ͕ ݶఆ͞Ε͍ͯΔ
  74. εϫοϓνΣʔϯ VkImage VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage Input

    Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend εϫοϓνΣʔϯͷ Πϝʔδʹ޲͔ͬͯ άϥϑΟΫεύΠϓϥΠϯͰ ϨϯμϦϯά
  75. ϑϨʔϜόοϑΝ νΣʔϯ ge ge age mage VkDeviceMemory VkImage Input Assembly

    VS TCS Tessellation TES GS Rasterization FS Color Blend άϥϑΟΫεύΠϓϥΠϯ͸ ৭ͱਂ౓ͱεςϯγϧΛు͘ VkDeviceMemory VkImage ਂ౓ͱεςϯγϧΛड͚ΔΠϝʔδΛ ࣗ෼Ͱ༻ҙͯ͠ εϫοϓνΣʔϯͷΠϝʔδͱ͚ͬͭͯ͘ ϑϨʔϜόοϑΝʹ͢Δ
  76. ϑϨʔϜόοϑΝ VkDeviceMemory VkImage Input Assembly VS TCS Tessellation TES GS

    Rasterization FS Color Blend VkDeviceMemory VkImage VkResult vkCreateFramebuffer( VkDevice device, const VkFramebufferCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkFramebuffer* pFramebuffer ); typedef struct VkFramebufferCreateInfo { VkStructureType sType; const void* pNext; VkFramebufferCreateFlags flags; VkRenderPass renderPass; uint32_t attachmentCount; const VkImageView* pAttachments; uint32_t width; uint32_t height; uint32_t layers; } VkFramebufferCreateInfo; ࢖͏Πϝʔδͷ Ϗϡʔͷ഑ྻ
  77. ry ry VkDeviceMemory VkImage Input Assembly VS TCS Tessellation TES

    GS Rasterization FS Color Blend VkResult vkQueuePresentKHR( VkQueue queue, const VkPresentInfoKHR* pPresentInfo ); typedef struct VkPresentInfoKHR { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; uint32_t swapchainCount; const VkSwapchainKHR* pSwapchains; const uint32_t* pImageIndices; VkResult* pResults; } VkPresentInfoKHR; ͜ͷεϫοϓνΣʔϯͷ ͜ͷΠϝʔδΛ ίϯϙδλʹૹΕ ඳ͚ͨΒ
  78. εϫοϓνΣʔϯ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory

    VkImage VkResult vkAcquireNextImageKHR( VkDevice device, VkSwapchainKHR swapchain, uint64_t timeout, VkSemaphore semaphore, VkFence fence, uint32_t* pImageIndex ); εϫοϓνΣʔϯͷΠϝʔδ΁ͷॻ͖ࠐΈ͸ ίϯϙδλଆ͕ย෇͍͔ͯΒߦ͏ඞཁ͕͋Δ ΋͏ॻ͚Δ?
  79. VkResult vkAcquireNextImageKHR( VkDevice device, VkSwapchainKHR swapchain, uint64_t timeout, VkSemaphore semaphore,

    VkFence fence, uint32_t* pImageIndex ); VkResult vkCreateSemaphore( VkDevice device, const VkSemaphoreCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSemaphore* pSemaphore ); typedef struct VkSubmitInfo { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; const VkPipelineStageFlags* pWaitDstStageMask; uint32_t commandBufferCount; const VkCommandBuffer* pCommandBuffers; uint32_t signalSemaphoreCount; const VkSemaphore* pSignalSemaphores; } VkSubmitInfo; Πϝʔδͷ४උ͕Ͱ͖ͨΒ ͜ͷηϚϑΥʹ௨஌ ࠓ͔Βྲྀ͢ίϚϯυ͸ ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ ࣮ߦͤΑ Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸ όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏
  80. VkResult vkAcquireNextImageKHR( VkDevice device, VkSwapchainKHR swapchain, uint64_t timeout, VkSemaphore semaphore,

    VkFence fence, uint32_t* pImageIndex ); VkResult vkCreateSemaphore( VkDevice device, const VkSemaphoreCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSemaphore* pSemaphore ); typedef struct VkSubmitInfo { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; const VkPipelineStageFlags* pWaitDstStageMask; uint32_t commandBufferCount; const VkCommandBuffer* pCommandBuffers; uint32_t signalSemaphoreCount; const VkSemaphore* pSignalSemaphores; } VkSubmitInfo; Πϝʔδͷ४උ͕Ͱ͖ͨΒ ͜ͷηϚϑΥʹ௨஌ ࠓ͔Βྲྀ͢ίϚϯυ͸ ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ ࣮ߦͤΑ Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸ όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏
  81. όοϑΝ" CJOEJOH όοϑΝA #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430,

    binding = 1) buffer layout1 { uint16_t output_data[]; }; ... std::vector< std::uint16_t > data; 16bit੔਺ΛόοϑΝʹॻ͍ͯ γΣʔμ͔Β16bit੔਺ͱͯ͠ ಡΉ ܭࢉ͸32bit੔਺Ͱߦ͏ copy 16bitετϨʔδ
  82. typedef struct VkPhysicalDevice16BitStorageFeatures { VkStructureType sType; void* pNext; VkBool32 storageBuffer16BitAccess;

    VkBool32 uniformAndStorageBuffer16BitAccess; VkBool32 storagePushConstant16; VkBool32 storageInputOutput16; } VkPhysicalDevice16BitStorageFeatures; GPU͸16bitͷload/store͕Ͱ͖ͳ͍͔΋͠Εͳ͍ ৽͘͠௥Ճ͞Εͨ VkPhysicalDevice16BitStorageFeatures Λௐ΂Ε͹ GPU͕ͦΕͧΕͷঢ়گͰ16bitͷload/storeΛͰ͖Δ͔͕Θ͔Δ 16bitετϨʔδ
  83. #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430, binding = 1)

    buffer layout1 { float16_t output_data[]; }; ... 16bitͷload/storeʹରԠ͍ͯ͠Δ৔߹ ൒ਫ਼౓ුಈখ਺఺਺ͷload/store΋Ͱ͖Δ #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430, binding = 1) buffer layout1 { f16vec4 output_data[]; }; ... ϕΫλܕ΋OK 16bitετϨʔδ
  84. ⋯ ⋯ ⋯ + + + + + ਨ௚Ճࢉ ී௨ʹa+bΛ͢Δͱ

    ͜ΕʹͳΔ a b Subgroup Operation
  85. ⋯ ⋯ ⋯ ⋯ ਫฏՃࢉ + + + + a

    subgroupAdd(a) ∑ n an Subgroup Operation
  86. ⋯ ⋯ ⋯ ⋯ ਫฏՃࢉ + + + + a

    subgroupInclusiveAdd(a) Subgroup Operation
  87. ⋯ ⋯  ⋯ ⋯ ਫฏՃࢉ + + + a

    subgroupExclusiveAdd(a) + Subgroup Operation
  88. ⋯      ⋯ ⋯ γϟοϑϧ subgroupShuffle(a,b)

    a b ͜ͷॱͰฒ΂ସ͑ Subgroup Operation
  89. struct VkPhysicalDeviceSubgroupProperties { VkStructureType sType; void* pNext; uint32_t subgroupSize; VkShaderStageFlags

    supportedStages; VkSubgroupFeatureFlags supportedOperations; VkBool32 quadOperationsInAllStages; }; SubgroupͷαΠζΛҙࣝ͠ͳ͚Ε͹ͳΒͳ͘ͳͬͨ औಘͰ͖ΔΑ͏ʹ͠Α͏ Subgroup Operation
  90. struct VkPhysicalDeviceSubgroupProperties { VkStructureType sType; void* pNext; uint32_t subgroupSize; VkShaderStageFlags

    supportedStages; VkSubgroupFeatureFlags supportedOperations; VkBool32 quadOperationsInAllStages; }; GPUʹΑͬͯ͸શͯͷਫฏԋࢉΛαϙʔτͰ͖ͳ͍͔΋͠Εͳ͍ ͲΕ͕࢖͑Δ͔ ௐ΂ΒΕΔΑ͏ʹ ͠Α͏ Subgroup Operation
  91. ͜Ε͸Vulkan 1.0Ͱ΋Ͱ͖Δ ຕ໨ͷ (16 + Vulkan 1.0 VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖ = VkDevice

    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε ຕ໨ͷ (16 Vulkan 1.0 VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖ + = VkDevice ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε
  92. ຕ໨ͷ (16 ຕ໨ͷ (16 Vulkan 1.1 = VkDevice ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

    %FWJDF(SPVQ + /7-JOL౳Ͱ઀ଓ͞Εͨෳ਺ͷ(16͔Β ͭͷ࿦ཧσόΠεΛ࡞Δ Device Group
  93. ຕ໨ͷ (16 ຕ໨ͷ (16 %FWJDF(SPVQ ίϚϯυόοϑΝ ίϚϯυ (16͸ෳ਺͚ͩͲ Ωϡʔ͸ಉ͔ͩ͡Β όϦΞͰಉظ͕Ͱ͖Δ

    1ຕ໨ͷGPU͚ͩͰ࣮ߦ ίϚϯυόοϑΝ ίϚϯυ 2ຕ໨ͷGPU͚ͩͰ࣮ߦ ίϚϯυόοϑΝ όϦΞ ྆ํͰ࣮ߦ Device Group
  94. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Ϩϯμʔύε ಉ͡௖఺഑ྻͷඳըཁٻΛ Ϩϯμʔύεͷෳ਺ͷύΠϓϥΠϯʹҰ੪ʹྲྀ͢ ό Ϧ Ξ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend มܗ Multiview
  95. όοϑΝ" CJOEJOH όοϑΝA #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430,

    binding = 1) buffer layout1 { uint8_t output_data[]; }; ... std::vector< std::uint8_t > data; 8bit੔਺ΛόοϑΝʹॻ͍ͯ γΣʔμ͔Β8bit੔਺ͱͯ͠ ಡΉ copy 8bitετϨʔδ 16bitಉ༷ 8bit੔਺ͷϕΫλ (ex. u8vec4) ΋OK
  96. #version 450 ... #extension GL_EXT_buffer_reference : enable layout(buffer_reference) buffer node_t;

    layout(buffer_reference, std430, buffer_reference_align = 16) buffer node_t { int value; node_t next; }; layout(std430) buffer uniforms_t { node_t root; } uniforms; void main() { node_t node = uniforms.root; node = b.next.next; ... } Buffer device address ༻్2: όοϑΝͷσʔλʹ ଞͷόοϑΝͷΞυϨεΛॻ͘ GPU্ͰḷΕΔlinked listΛ࡞ΕΔ GLSLͷbuffer_reference֦ுΛ࢖ͬͯಡΉ
  97. #version 450 ... layout(binding = 1) uniform sampler2D tex1; layout(binding

    = 2) uniform sampler2D tex2; layout(binding = 3) uniform sampler2D tex3; layout(binding = 4) uniform sampler2D tex4; layout(binding = 5) uniform sampler2D tex5; layout(binding = 6) uniform sampler2D tex6; layout(binding = 7) uniform sampler2D tex7; layout(binding = 8) uniform sampler2D tex8; layout(binding = 9) uniform sampler2D tex9; layout(binding = 10) uniform sampler2D tex10; layout(binding = 11) uniform sampler2D tex11; layout(binding = 12) uniform sampler2D tex12; layout(binding = 13) uniform sampler2D tex13; layout(binding = 14) uniform sampler2D tex14; layout(binding = 15) uniform sampler2D tex15; layout(binding = 16) uniform sampler2D tex16; ... int main() { vec4 value = texture2D( tex5, tex_coord ); } γΣʔμʹ౉͢ Ϧιʔε͕૿͑ͯ͘Δͱ ਏ͍ίʔυ͕Ͱ͖Δ
  98. #version 450 ... layout(binding = 1) uniform sampler2D tex[]; ...

    int main() { vec4 value = texture2D( tex[ 4 ], tex_coord ); } σεΫϦϓλͷ഑ྻ Λ࡞ΕΔΑ͏ʹ͢Δ Descriptor Indexing
  99. #version 450 ... layout(binding = 1) uniform sampler2D tex[]; ...

    int main() { vec4 value = texture2D( tex[ 4 ], tex_coord ); } σεΫϦϓλͷ഑ྻ Λ࡞ΕΔΑ͏ʹ͢Δ Descriptor Indexing γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸ ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍ σεΫϦϓληοτͷཁ݅ͷ؇࿨ ίϚϯυόοϑΝͷه࿥தͰ΋ ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍
  100. int main() { vec4 value = texture2D( tex[ 4 ],

    tex_coord ); } Λ࡞ΕΔΑ͏ʹ͢Δ Descriptor Indexing γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸ ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍ σεΫϦϓληοτͷཁ݅ͷ؇࿨ ίϚϯυόοϑΝͷه࿥தͰ΋ ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍ ͱΓ͋͑ͣڊେͳσεΫϦϓληοτΛ࡞͓͍ͬͯͯ ඞཁʹԠͯ͡ඞཁͳཁૉʹϦιʔεΛηοτ͢Δӡ༻͕Մೳʹ
  101. ϑϨʔϜόοϑΝ VkDeviceMemory VkImage Input Assembly VS TCS Tessellation TES GS

    Rasterization FS Color Blend VkDeviceMemory VkImage VkResult vkCreateFramebuffer( VkDevice device, const VkFramebufferCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkFramebuffer* pFramebuffer ); typedef struct VkFramebufferCreateInfo { VkStructureType sType; const void* pNext; VkFramebufferCreateFlags flags; VkRenderPass renderPass; uint32_t attachmentCount; const VkImageView* pAttachments; uint32_t width; uint32_t height; uint32_t layers; } VkFramebufferCreateInfo; ࢖͏Πϝʔδͷ Ϗϡʔͷ഑ྻ ϑϨʔϜόοϑΝΑΓઌʹ Πϝʔδ͕ཁΔ
  102. sType; pNext; Flags flags; renderPass; attachmentCount; pAttachments; width; height; layers;

    Info; NULL typedef struct VkFramebufferAttachmentsCreateInfo { VkStructureType sType; const void* pNext; uint32_t attachmentImageInfoCount; const VkFramebufferAttachmentImageInfo* pAttachmentImageInfos; } VkFramebufferAttachmentsCreateInfo; VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT_KHR ༁:͋ͱͰ typedef struct VkFramebufferAttachmentImageInfo { VkStructureType sType; const void* pNext; VkImageCreateFlags flags; VkImageUsageFlags usage; uint32_t width; uint32_t height; uint32_t layerCount; uint32_t viewFormatCount; const VkFormat* pViewFormats; } VkFramebufferAttachmentImageInfo; ༁:͜ΜͳΠϝʔδϏϡʔ͕ ෇͘༧ఆ Imageless framebuffer
  103. NULL ༁:͋ͱͰ typedef struct VkFramebufferAttachmentImageInfo { VkStructureType sType; const void*

    pNext; VkImageCreateFlags flags; VkImageUsageFlags usage; uint32_t width; uint32_t height; uint32_t layerCount; uint32_t viewFormatCount; const VkFormat* pViewFormats; } VkFramebufferAttachmentImageInfo; ༁:͜ΜͳΠϝʔδϏϡʔ͕ ෇͘༧ఆ Imageless framebuffer typedef struct VkRenderPassAttachmentBeginInfo { VkStructureType sType; const void* pNext; uint32_t attachmentCount; const VkImageView* pAttachments; } VkRenderPassAttachmentBeginInfo; ࢖͏Πϝʔδͷ Ϗϡʔͷ഑ྻ ϨϯμʔύεΛΩϡʔʹ౤͛Δͱ͖ʹ͜ΕΛ෇͚ͯ ࢖͏ΠϝʔδϏϡʔΛܾఆ
  104. VkDeviceMemory VkImage ਂ౓ͱεςϯγϧ͕ ೖͬͯΔ ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ Input Assembly VS TCS Tessellation

    TES GS Rasterization FS Color Blend ό Ϧ Ξ ਂ౓͔͍͠Βͳ͍Μ͚ͩͲ ͍ͬͭͯ͘Δ͔Β ྆ํʹґଘ͢Δ͔͠ͳ͍
  105. VkDeviceMemory VkImage ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ FS Color Blend typedef struct VkAttachmentDescriptionStencilLayout {

    VkStructureType sType; void* pNext; VkImageLayout stencilInitialLayout; VkImageLayout stencilFinalLayout; } VkAttachmentDescriptionStencilLayout; ਂ౓εςϯγϧͷΠϝʔδͷ͏ͪ ͲͪΒ͔ҰํʹͷΈґଘ͕͋ΔࣄΛ໌ࣔͰ͖ΔΑ͏ʹ͢Δ Separate Depth Stencil Layouts
  106. #version 450 #extension GL_ARB_gpu_shader_int64 : enable #extension GL_EXT_shader_atomic_int64 : enable

    ... void main() { uint64_t result = atomicCompSwap( data, 0, 1 ); ... } ʮdataʹஔ͔Εͨ஋͕0ͩͬͨΒ1ʹ͢ΔʯΛෆՄ෼ʹߦ͏ GPU͕αϙʔτ͍ͯ͠Δ৔߹ ͜ͷΑ͏ͳ64bit੔਺ͷAtomicԋࢉΛγΣʔμͰ࢖͑ΔΑ͏ʹͳΔ Atomic 64bit
  107. #version 450 ... #extension GL_EXT_shader_16bit_storage : require layout(std430, binding =

    1) buffer layout1 { f16vec4 input_bufffer[]; }; layout(std430, binding = 2) buffer layout22 { f16vec4 output_buffer[]; }; ... void main() { vec4 value = input_buffer[ gl_GlobalInvocationID.x ]; output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0; } ൒ਫ਼౓ ൒ਫ਼౓ ୯ਫ਼౓ Vulkan 1.1ͷ16bitετϨʔδ͸ 16bitͰϝϞϦʹஔ͍ͯ32bitͰܭࢉͩͬͨ
  108. #version 450 ... #extension GL_EXT_shader_16bit_storage : require layout(std430, binding =

    1) buffer layout1 { f16vec4 input_bufffer[]; }; layout(std430, binding = 2) buffer layout22 { f16vec4 output_buffer[]; }; ... void main() { f16vec4 value = input_buffer[ gl_GlobalInvocationID.x ]; output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0; } ൒ਫ਼౓ ൒ਫ਼౓ ൒ਫ਼౓ Float16 Int8 Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹ ൒ਫ਼౓ͷ··ܭࢉ͕Ͱ͖Δ
  109. #version 450 ... #extension GL_EXT_shader_16bit_storage : require layout(std430, binding =

    1) buffer layout1 { uint8_t input_bufffer[]; }; layout(std430, binding = 2) buffer layout22 { uint8_t output_buffer[]; }; ... void main() { uint8_t value = input_buffer[ gl_GlobalInvocationID.x ]; output_buffer[ gl_GlobalInvocationID.x ] = value * 2; } 8bit੔਺ 8bit੔਺ 8bit੔਺ Float16 Int8 Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹ 8bit੔਺ͷ··ܭࢉ͕Ͱ͖Δ
  110. ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ίϚϯυόοϑΝ 1ͭͷηϚϑΥΛΧ΢ϯτ͍ͯ͘͠ ηϚϑΥΛ+1 ηϚϑΥ͕1ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1

    ηϚϑΥ͕2ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1 ηϚϑΥ͕3ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1 ηϚϑΥ͕4ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1 ಉظՕॴ͕ଟ͍৔߹ʹ؅ཧָ͕ Timeline Semaphore
  111. VK_KHR_video_queue ίϚϯυόοϑΝ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage

    VkDeviceMemory VkImage VkDeviceMemory VkBuffer ͜ͷόοϑΝʹೖͬͨ ಈըͷετϦʔϜΛ σίʔυͯ͠ ͜ͷΠϝʔδͷྻʹు͍ͯ ಈըରԠΩϡʔ GPU͕උ͑Δ ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏
  112. VK_KHR_video_queue ίϚϯυόοϑΝ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage

    VkDeviceMemory VkImage VkDeviceMemory VkBuffer ͜ͷόοϑΝʹೖͬͨ ಈըͷετϦʔϜΛ σίʔυͯ͠ ͜ͷΠϝʔδͷྻʹు͍ͯ ಈըରԠΩϡʔ GPU͕උ͑Δ ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏
  113. v ௖఺഑ྻ ͸ ઢ෼v ͱަࠩ͠·͔͢? ϦΞϧλΠϜͰ൑ఆͯ͠ Ͱ͖·͢ ࣄલʹม׵ ໦ߏ଄ ϦΞϧλΠϜͰ

    มܗʹ௥ैͯ͠ Ͱ͖·ͤΜ! ௖఺഑ྻΛ໦ߏ଄ʹม׵ ൑ఆ͸Ͱ͖Δɺ͕
  114. ڞ༗ϝϞϦ L1Ωϟογϡ RT Core ࠷ۙͷNVIDIAͷ GPUʹࡌͬͯΔ RT Core ௖఺഑ྻ͔Β BVH(໦ߏ଄)Λ

    ര଎Ͱ࡞Γ ര଎Ͱઢ෼ͱͷ ަࠩ൑ఆΛ͢Δ ઐ༻ϋʔυ΢ΣΞ
  115. VK_KHR_acceleration_structure void vkCmdBuildAccelerationStructuresKHR( VkCommandBuffer commandBuffer, uint32_t infoCount, const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,

    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos ); typedef struct VkAccelerationStructureBuildGeometryInfoKHR { VkStructureType sType; const void* pNext; VkAccelerationStructureTypeKHR type; VkBuildAccelerationStructureFlagsKHR flags; VkBuildAccelerationStructureModeKHR mode; VkAccelerationStructureKHR srcAccelerationStructure; VkAccelerationStructureKHR dstAccelerationStructure; uint32_t geometryCount; const VkAccelerationStructureGeometryKHR* pGeometries; const VkAccelerationStructureGeometryKHR* const* ppGeometries; VkDeviceOrHostAddressKHR scratchData; } VkAccelerationStructureBuildGeometryInfoKHR; ͜Εʹ ޲͔ͬͯ
  116. VK_KHR_acceleration_structure onStructureGeometryKHR* pGeometries; onStructureGeometryKHR* const* ppGeometries; essKHR scratchData; ctureBuildGeometryInfoKHR; typedef

    struct VkAccelerationStructureGeometryKHR { VkStructureType sType; const void* pNext; VkGeometryTypeKHR geometryType; VkAccelerationStructureGeometryDataKHR geometry; VkGeometryFlagsKHR flags; } VkAccelerationStructureGeometryKHR; typedef union VkAccelerationStructureGeometryDataKHR { VkAccelerationStructureGeometryTrianglesDataKHR triangles; VkAccelerationStructureGeometryAabbsDataKHR aabbs; VkAccelerationStructureGeometryInstancesDataKHR instances; } VkAccelerationStructureGeometryDataKHR;
  117. VK_KHR_acceleration_structure uctureGeometryKHR; n VkAccelerationStructureGeometryDataKHR { tionStructureGeometryTrianglesDataKHR triangles; tionStructureGeometryAabbsDataKHR aabbs; tionStructureGeometryInstancesDataKHR

    instances; tionStructureGeometryDataKHR; typedef struct VkAccelerationStructureGeometryTrianglesDataKHR { VkStructureType sType; const void* pNext; VkFormat vertexFormat; VkDeviceOrHostAddressConstKHR vertexData; VkDeviceSize vertexStride; uint32_t maxVertex; VkIndexType indexType; VkDeviceOrHostAddressConstKHR indexData; VkDeviceOrHostAddressConstKHR transformData; } VkAccelerationStructureGeometryTrianglesDataKHR; ͜ͷΞυϨεʹ ஔ͍ͯ͋Δ ௖఺഑ྻ͔Β ໦ߏ଄Λੜ੒͢ΔίϚϯυΛΩϡʔʹੵΉ
  118. VK_KHR_acceleration_structure uctureGeometryKHR; n VkAccelerationStructureGeometryDataKHR { tionStructureGeometryTrianglesDataKHR triangles; tionStructureGeometryAabbsDataKHR aabbs; tionStructureGeometryInstancesDataKHR

    instances; tionStructureGeometryDataKHR; typedef struct VkAccelerationStructureGeometryAabbsDataKHR { VkStructureType sType; const void* pNext; VkDeviceOrHostAddressConstKHR data; VkDeviceSize stride; } VkAccelerationStructureGeometryAabbsDataKHR; ͜ͷΞυϨεʹ ஔ͍ͯ͋Δ AABBͷ഑ྻ͔Β ໘ͱͷަࠩͰ͸ͳ͘ AABBͱͷަࠩ൑ఆΛ͢Δ໦ߏ଄Λ࡞Δ͜ͱ΋Ͱ͖Δ
  119. #version 450 #extension GL_EXT_ray_query : enable ... void main() {

    rayQueryEXT ray_query; rayQueryInitializeEXT( ray_query, acceleration_structure, gl_RayFlagsTerminateOnFirstHitEXT, cull_mask, pos, near, direction, far ); while( rayQueryProceedEXT( ray_query ) ) { if( rayQueryGetIntersectionTypeEXT( ray_query, false ) == gl_RayQueryCandidateIntersectionTriangleEXT ) { rayQueryConfirmIntersectionEXT( ray_query ); } } if( rayQueryGetIntersectionTypeEXT( ray_query, true) == gl_RayQueryCommittedIntersectionNoneEXT ) { ... } } VK_KHR_ray_query ͜ͷAcceleration StructureͰ posͷҐஔ͔Βdirectionͷ޲͖ʹ near͔Βfar·Ͱͷڑ཭ͷઢ෼͕ Կ͔ͱަࠩ͢Δ͔ௐ΂ͯ ަࠩ͢Δࡾ֯ܗΛΈ͚ͭͨΒ ःṭ෺͕͋Δͱ͖ͷॲཧ
  120. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend CS ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ ͜ΕΛطଘͷ ύΠϓϥΠϯͰߦ͏ ͷ͸ແཧͦ͏ͩͬͨͷͰ৽͍͠ύΠϓϥΠϯ͕ੜ͑ͨ RayGen Shader Closest Hit Shader Miss Shader ϨΠτϨʔγϯάύΠϓϥΠϯ VK_KHR_ray_tracing_pipeline Ray Query
  121. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ X Window System Wayland Compositor Windows DWM etc.

    Vulkan ΞϓϦέʔγϣϯ ίϯϙδλΛܦ༝͢ΔΦʔόʔϔου͕զຫͰ͖ͳ͍
  122. ͜͜ʹॻ͘ͱग़Δ Vulkan ΞϓϦέʔγϣϯ LinuxͷKernel Mode Settingʹର͢Δബ͍ϥούʔ͕ Vulkanʹ௥Ճ͞ΕΔ σΟεϓϨΠ1΁ͷग़ྗΛ1920x1080@60Hz 24bitʹͯ͠ ͦ͜ʹॻͨ͘ΊͷεϫοϓνΣʔϯΛ࡞੒

    VK_KHR_display_swapchain εϫοϓνΣʔϯ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage σΟεϓϨΠ1
  123. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend VK_EXT_transform_feedback VkDeviceMemory VkBufer άϥϑΟΫεύΠϓϥΠϯΛ δΦϝτϦγΣʔμ·ͰͰࢭΊͯ δΦϝτϦγΣʔμͷग़ྗΛ όοϑΝʹు͘ OpenGLʹ͸ඪ४ͰඋΘͬͯͨ΍ͭ
  124. Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS

    Color Blend " Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend # Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend $ Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend % ϞόΠϧGPUͰͳ͍GPUͰ͸ ϨϯμʔύεΛ׆༻͢Δҙຯ͸͋·Γͳ͍ͷͰ ύΠϓϥΠϯ͕1͚ͭͩͷϨϯμʔύε͕େྔʹͰ͖͕ͪ ϨϯμʔύεΛ࡞Δͷ͕ΊΜͲ͍͘͞
  125. VK_KHR_dynamic_rendering void vkCmdBeginRenderingKHR( VkCommandBuffer commandBuffer, VkRenderingInfoKHR* pRenderingInfo ); void vkCmdEndRenderingKHR(

    VkCommandBuffer commandBuffer ); ͔͜͜Βଈ੮Ͱ࡞ͬͨ ϨϯμʔύεΛ࢖͏ ͜͜·Ͱଈ੮Ͱ࡞ͬͨ ϨϯμʔύεΛ࢖͏ த਎͕ύΠϓϥΠϯ1͚ͭͩͷϨϯμʔύεͳΒ ϨϯμʔύεΛίϚϯυόοϑΝʹੵΉ࣌ʹ ͦͷ৔Ͱ࡞ΕΔΑ͏ʹ͢Δ
  126. ٕज़ॻయ12Ͱ ࠷ۙͷVulkanͷ࿩Λ੝ΓࠐΜͩ 3DάϥϑΟΫεAPI VulkanΛ ग़དྷΔ͚ͩ ΍͘͞͠ ղઆ͢Δຊ Version 3.0 ΛϦϦʔε༧ఆ

    ※ࠨͷը૾͸Version 2.0ͷ΋ͷͰ͢ ిࢠ൛ͷ1.0·ͨ͸2.0Λ͍࣋ͬͯΔ৔߹ ແྉͰΞοϓσʔτΛड͚ΒΕ·͢ ※