t’s 2019, and Moore’s Law is dead. CPU performance is plateauing, but GPUs provide a chance for continued hardware performance gains, if you can structure your programs to make good use of them.
CUDA is a platform developed by Nvidia for GPGPU--general purpose computing with GPUs. It backs some of the most popular deep learning libraries, like Tensorflow and Pytorch, but has broader uses in data analysis, data science, and machine learning.
There are several ways that you can start taking advantage of CUDA in your Python programs.
For some common Python libraries, there are drop-in replacements that let you start running computations on the GPU while still using familiar APIs. For example, CuPy provides a NumPy-like API for interacting with multi-dimensional arrays. Similarly, cuDF is a recent project that mimics the pandas interface for dataframes.
If you want more control over your use of CUDA APIs, you can use the PyCUDA library, which provides bindings for the CUDA API that you can call from your Python code. Compared with drop-in libraries, it gives you the ability to manually allocate memory on the GPU, and write custom CUDA functions (called kernels). However, its drawbacks include writing your CUDA code as large strings in Python, and compiling your CUDA code at runtime.
Finally, for the best performance you can use the Python C/C++ extension interface, the approach taken by deep learning libraries like Pytorch. One of the strengths of Python is the ability to drop down into C/C++, and libraries like NumPy take advantage of this for increased speed. If you use Nvidia’s nvcc compiler for CUDA, you can use the same extension interface to write custom CUDA kernels, and then call them from your Python code.
This talk will explore each of these methods, provide examples to get started, and discuss in more detail the pros and cons of each approach.