Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Compute Kernel with Metal

Avatar for codelynx codelynx
February 11, 2017

Compute Kernel with Metal

This is a presentation materials at Yokohama iOS developer meeting in Feb. 2017. It gives you a very basic idea of Metal computing Kernel, and show the example codes.

Avatar for codelynx

codelynx

February 11, 2017
Tweet

More Decks by codelynx

Other Decks in Programming

Transcript

  1. Executive Summery • Computing method using GPU • Super Parallel

    Computing • May not suitable for complex algorithm
  2. Device, CommandQueue and
 CommandBuffer • MTLDevice • MTLCreateSystemDefaultDevice() • MTLCommandQueue

    • device.makeCommandQueue() • MTLCommandBuffer • commandQueue.makeCommandBuffer() MTLCommandQueue MTLCommandBuffer MTLComputeComman dEncoder
  3. Library, Function and MTLComputePipelineState • MTLLibrary • try! device.makeLibrary(source: shaderSource,

    options: nil) • device.newDefaultLibrary() • MTLFunction • library.makeFunction(name: "bezier_kernel") • MTLComputePipelineState • .library.makeComputePipelineState(function: function) MTLLibrary MTLFunction MTLComputePipelineState
  4. Computing Kernel • C++ 14 subset shading language • Restrictions

    • lambda expressions, dynamic_cast operator, type identification, recursive function calls, new and delete operators, noexcept operator, goto statement, register, thread_local storage qualifiers, virtual function qualifier, derived classes and exception handling
  5. Scalar Types • bool, char, int8_t, unsigned char, uchar •

    short, unsigned short, ushort • int, unsigned int, uint – 32bit • half – 16bit half precision, float – 32bit single precision • size_t, ptrdiff_t, void • no double
  6. Vector and Matrix Types • booln • charn, shortn, ucharn,

    ushortn, uintn • halfn, floatn • halfnxm, floatnxm * n is a number
  7. Glance a code TUSVDU.Z7FSUFY*O\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 


    TUSVDU.Z7FSUFY0VU\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>   EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^ * Just for getting an idea, not working code ←Defining a structure ↓ Defining a kernel code ↓ Specifying Buffer Index
  8. Qualifiers LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>  

    EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^
  9. Address Space • device Address Space • buffer memory objects

    allocated from the device memory pool that are both readable and writeable • threadgroup Address Space • Variables allocated in the threadgroup address space in a kernel function are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel • constant Address Space • The constant address space name refers to buffer memory objects allocated from the device memory pool but are read-only • thread Address Space • The thread address space refers to the per-thread memory address space
  10. Compute Command Encoder • MTLComputeCommandEncoder  MFUFODPEFSDPNNBOE#VGGFSNBLF$PNQVUF$PNNBOE&ODPEFS   FODPEFSTFU$PNQVUF1JQFMJOF4UBUF

    DPNQVUF1JQFMJOF4UBUF   FODPEFSTFU#VGGFS FMFNFOUT#VGGFS PGGTFU BU   FODPEFSTFU#VGGFS WFSUFY#VGGFS PGGTFU BU 
  11. Thread Group • Kernel requires a task broken into small

    pieces  MFUUISFBEHSPVQT1FS(SJE.5-4J[F.BLF FMFNFOUTDPVOU     MFUUISFBET1FS5ISFBEHSPVQ.5-4J[F.BLF      FODPEFSEJTQBUDI5ISFBEHSPVQT UISFBEHSPVQT1FS(SJE 
 UISFBET1FS5ISFBEHSPVQUISFBET1FS5ISFBEHSPVQ * I am still not fully understood
  12. Commit • Finally ready to commit  FODPEFSFOE&ODPEJOH  

    DPNNBOE#VGGFSDPNNJU  • Wait or Add Completion Handler… DPNNBOE#VGGFSXBJU6OUJM$PNQMFUFE 
 DPNNBOE#VGGFSBEE$PNQMFUFE)BOEMFS\ CVGGFS JO EPTPNFXPSLIFSF ^ • Check the buffer • there must be something good in there!
  13. Goal • Give Shader an array of Path Elements or

    equivalent • Produce many consequent positions using Kernel • Using Bezier Calculation Method of my Qiita atricle
  14. Strategies • Path elements buffer • Vertex buffer • CPU

    estimates the length of 
 path elements • A Kernel produces vertices for
 a path element • There may be a better way… #0 p0 … p3 0 #1 p0 … p3 m1 #n p0 … p3 m2 #0 pt … pt #m1 pt … #m2 … pt Element Buffer Vertex Buffer
  15. bezier_kernel shader DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT

     GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL DBTF1BUI&MFNFOU5ZQF$VSWF5P ǘTOJQǘ CSFBL ^ ^ LFSOFMWPJECF[JFS@LFSOFM  DPOTUBOU1BUI&MFNFOU FMFNFOUT<<CVGGFS  >>  EFWJDF7FSUFY PVU7FSUFYFT<<CVGGFS  >>  VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ 1BUI&MFNFOUFMFNFOUFMFNFOUT<JE> JOUOVNCFS0G7FSUFYFTFMFNFOUOVNCFS0G7FSUFYFT GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ  TXJUDI FMFNFOUUZQF \ DBTF1BUI&MFNFOU5ZQF-JOF5P ǘTOJQǘ CSFBL DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT  GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL * not a whole code
  16. Playground! • Yellow – Core Graphics • Red – Compute

    Kernel • Look Good https://github.com/codelynx/BezierKernelPlayground
  17. Double or Triple Buffering • Avoid access collision between CPU

    and GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 Buffer#1 Buffer#1 Buffer#1 →time Buffer#1 ⚡CPU ⚡GPU ⚡CPU ⚡GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 ⚡CPU ⚡CPU ⚡CPU ⚡CPU ⚡GPU ⚡GPU ⚡GPU ⚡GPU →time
  18. Buffer Management • Memory resource is finite, recycle them where

    possible • System crashes at device.makeBuffer() rather returning nil • It is hard to find out the reason (as of iOS10) • Save memory resources and be a good citizen
  19. Wrap Up • Computing Shaders are much easier than Rendering

    Shader • Memory Management could be pain if you wants one more step toward high performance shader • Be aware memory alignment • Shader is hard to debug – no break point nor printf()