The Possibilities of FPGA for Deep Learning

The Possibilities of FPGA for Deep Learning Kohei Matsumoto @kmats_

Challenges of Hardware for Deep Learning • Performance • Power
efﬁciency • Hardware cost • Memory bandwidth • It is required to pass data from layer to layer • Processing bandwidth • How many data are processed simultaneously? • etc. (Re-programmability, Ease of use, …) https://www.altera.com/en_US/pdfs/literature/solution-sheets/efﬁcient_neural_networks.pdf

FPGA? • Field Programmable Gate Array • A “reconﬁgurable” hardware
by Hardware Description Language • Pros • Can re-program any kind of logics • Cons • Lack of resources (processing elements, memory, etc) • Hardware cost (compared to mass-produced devices)

Use-cases of GPU/FPGA • GPU: Massive parallel operations • Graphic
processing, a sort of scientiﬁc simulations, etc. • FPGA: Prototyping of ASICs, Hardware-wise speed is needed and yet logics can be changed, etc. • Search engine accelerator, ﬁnancial simulation, high frequency trading, etc.

http://dea.unsj.edu.ar/sda/FPGA_On_Mars.pdf

GPU: De facto standard of Deep Learning… why? • Deep
Learning ~= a variation of Convolutional Neural Network (CNN) • CNN ~= Massive parallel product-accumulate operations GPU! Yay! • The learning phase needs enormous computing resources (FPGA cannot provide enough resources)

FPGA over GPU in terms of Deep Learning • Pros
% • Power Efﬁciency (Performance per Watt) • Cons & • Difﬁcult implementation • Lack of memory bandwidth • Lack of processing elements for training • Most papers discuss only the inference phase? https://www.tractica.com/automation-robotics/fpgas-challenge-gpus-as-a-platform-for-deep-learning/

Example: CNN Accelerator by Microsoft • “Single-node deep CNN accelerator
on a mid-range FPGA” (only the inference phase) • “Respectable performance relative to prior FPGA designs and high-end GPGPUs at a fraction of the power” https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf

Binarized Neural Network: Highly optimized on FPGA? • Binarizes input,
output and weights deterministically • Stored/Updated weights retain precision • “At test phase, BDNNs are fully binarized and can be implemented in hardware with low circuit complexity” • which means the learning phase is not yet fully binarized https://arxiv.org/abs/1602.02505v2

https://arxiv.org/abs/1602.02505v2

Wrap-up • FPGA: a re-programmable hardware • Power-efﬁcient with optimal
logic • Lack of computing resources • CNN is too big to be implemented - needs to be simpliﬁed • An approach: Binarized Neural Network • It is yet hard to binarize the learning phase fully

References • Efﬁcient Implementation of Neural Network Systems Built on
FPGAs, and Programmed with OpenCL • https://www.altera.com/en_US/pdfs/literature/solution-sheets/efﬁcient_neural_networks.pdf • FPGAs on Mars • http://dea.unsj.edu.ar/sda/FPGA_On_Mars.pdf • FPGAs Challenge GPUs as a Platform for Deep Learning • https://www.tractica.com/automation-robotics/fpgas-challenge-gpus-as-a-platform-for-deep- learning/ • Accelerating Deep Convolutional Neural Networks Using Specialized Hardware • https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf • Banalized Neural Networks • https://arxiv.org/abs/1602.02505v2

The Possibilities of FPGA for Deep Learning

The Possibilities of FPGA for Deep Learning

k-mats

More Decks by k-mats

Other Decks in Technology

Featured

Transcript