Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DNN/GPU with Ruby #rubykaigi
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
ainame
September 19, 2017
Programming
3.3k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
DNN/GPU with Ruby #rubykaigi
ainame
September 19, 2017
More Decks by ainame
See All by ainame
Swift 5.7で変わる正規表現を試してみよう
ainame
4
7.7k
iOSDC 2021 - App Store用スクリーンショットの自動生成をアラビア語対応してSwiftUIで実装してみた
ainame
0
6k
Server Side Swift実用性評価 2017 #iosdc #b
ainame
3
4.5k
Process tons of jobs with Swift
ainame
0
1.9k
Swift on the ObjC #shibuyaswift
ainame
4
920
家族アルバムみてね 開発風景 #realm_jp
ainame
4
4.2k
iOSで無限バックグラウンドアップロード(に挑戦してみた話)
ainame
2
5.3k
リファクタリングとtsort
ainame
1
1.8k
RubyMotionについて #mixiwwdc
ainame
2
450
Other Decks in Programming
See All in Programming
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
110
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
140
エンジニアと一緒にテストコードの設計と実装を改善した話
mototakatsu
0
210
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
4.3k
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
200
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
260
The NotImplementedError Problem in Ruby
koic
1
840
TAKTでAI駆動開発の品質を設計する
j5ik2o
7
1.4k
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
260
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
710
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
3
710
ECSアプリログをFireLensでコスト削減しようとしたけど諦めた話 in Fargate×Node.js
akihisaikeda
2
4.2k
Featured
See All Featured
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.2k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
340
Navigating Team Friction
lara
192
16k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.7k
Paper Plane (Part 1)
katiecoart
PRO
0
9.1k
How to Think Like a Performance Engineer
csswizardry
28
2.7k
The Limits of Empathy - UXLibs8
cassininazir
1
360
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.6k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.5k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
600
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Transcript
DNN/GPU with Ruby @ainame / Satoshi Namai 19th Sep, 2017
RubyKaigi 2017 LT
ruby-dlib/ruby-dlib • Ruby binding for dlib (original author is mrkn-san)
◦ dlib is C++ based toolkit for machine learning ◦ using C extension ◦ $ gem install dlib • Face detector based on DNN (Deep Neural Network) ◦ High accuracy and better than OpenCV ◦ Works on GPU with CUDA SDK
DNN/GPU/FaceDetector input layer output layer hidden layer Powered by GPU...
image = Dlib::Image.load('./face.jpg') detector = Dlib::DNNFaceDetector.new('model.dat') rects = detector.detect(image) #=>
[<Dlib::Rectangle>, <Dlib::Rectangle>] rects.each do |rect| image.draw_rectangle!(rect, [255, 0, 0, 3]) end image.save_jpeg('output.jpg')
Ruby dlib (C++) ruby-dlib (gem) Using only CPU mkmf Makefile
g++
Ruby dlib (C++) ruby-dlib (gem) CUDA nvcc Using GPU and
CPU Makefile g++ mkmf
Ruby dlib (C++) ruby-dlib (gem) CUDA Using GPU and CPU
mkmf Makefile g++ nvcc ????
Problem No API to handle the compiler for CUDA in
mkmf.rb
Hack for “depend” file • “depend” file is where we
should describe dependencies of each C file • “depend” file will be appended to end of Makefile So we can describe everything freely….
$ ruby ext/dlib/exconf.rb SHELL = /bin/sh # V=0 quiet, V=1
verbose. other values don't work. V = 0 Q1 = $(V:1=) Q = $(Q1:0=@) ECHO1 = $(V:1=@:) ECHO = $(ECHO1:0=@echo) NULLCMD = : #### Start of system configuration section. #### srcdir = ext/dlib topdir = /usr/include/ruby-2.3.0 hdrdir = $(topdir) arch_hdrdir = /usr/include/x86_64-linux-gnu/ruby-2.3.0 Generate Makefile by mkmf.rb Makefile
datadir = $(datarootdir) datarootdir = $(prefix)/share libexecdir = $(prefix)/lib/ruby2.3 sbindir
= $(exec_prefix)/sbin bindir = $(exec_prefix)/bin archdir = $(rubyarchdir) CC = gcc CXX = g++ LIBRUBY = $(LIBRUBY_SO) LIBRUBY_A = lib$(RUBY_SO_NAME)-static.a LIBRUBYARG_SHARED = -l$(RUBY_SO_NAME) LIBRUBYARG_STATIC = -l$(RUBY_SO_NAME)-static empty = OUTFLAG = -o $(empty) COUTFLAG = -o $(empty) RUBY_EXTCONF_H = cflags = $(optflags) $(debugflags) $(warnflags) cxxflags = $(optflags) $(debugflags) $(warnflags) Set compilers for C / C++
$(TARGET_SO): $(OBJS) Makefile $(ECHO) linking shared-object $(DLLIB) -$(Q)$(RM) $(@) $(Q)
$(LDSHAREDXX) -o $@ $(OBJS) $(LIBPATH) $(DLDFLAGS) $(LOCAL_LIBS) $(LIBS) $(Q) $(POSTLINK) ### .SUFFIXES: .cu .o DLIB_SRCDIR = $(srcdir)/../dlib-19.4 DLIB_FUNCTIONS = \ geometry.inc \ rectangle.inc \ image.inc \ detector.inc \ find_candidate_object_locations.inc \ dnn_detector.inc \ cuda.inc OBJS += $(DLIB_OJBS) mkmf append “depend” file to end of Makefile Generated Makefile
CUDA_NVCC = /usr/local/cuda/bin/nvcc CUDA_FLAGS = $(CPPFLAGS) -I /usr/local/cuda/include -arch=sm_30 -D__STRICT_ANSI__
-D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -std=c++11 -Xcompiler -fPIC -Xcompiler -funwind-tables ……………… SRCS += $(DLIB_CUDA_SRCS) OBJS += $(DLIB_CUDA_OBJS) .SUFFIXES: .cu .cu.o: $(ECHO) compiling $@ $(Q) $(CUDA_NVCC) $(CUDA_FLAGS) -c -o $@ $< Absolute path is safer. Some envs doesn’t have correct PATH. Add a new suffix rule for CUDA
Let’s scale out
Empower DNN/Face Detector • Finally, face detector get the power
of Ruby • Sidekiq is awesome gem for job queue system • Easy to scale out face detector with Sidekiq Sidekiq http://sidekiq.org/about
class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end
With great power comes great responsibility
class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Load data on GPU memory
CPU GPU GPU memory Main memory Dlib::DNNFaceDetector Instantiate
CPU GPU GPU memory Model Tensor Main memory Dlib::DNNFaceDetector Load
CPU GPU GPU memory Model Tensor Main memory Dlib::Image Dlib::DNNFaceDetector
Instantiate
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::DNNFaceDetector Load Dlib::Image
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::DNNFaceDetector Dlib::Image Detection
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::Image Dlib::DNNFaceDetector Out of scope
CPU GPU GPU memory Dlib::DNNFaceDetector Dlib::Image Main memory Dlib::Image Dlib::DNNFaceDetector
GC.start
CPU GPU GPU memory Main memory
class FaceDetectionJob include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Ensure clearing memories on GPU! A image obj keeps memory area of GPU.
505hal
DNN consume a lot of memory!!! It depends on resolution
of image
Be careful Manage your GPU memory
Demo
Summary • Making a binding gem is good option to
start small • mkmf.rb can support compiling with CUDA • Empower DNN to scale out with Ruby