Object Detector: WHY Do You Need and HOW Can You Own

DAY 1 “技” Developer Day Object Detector: WHY Do You
Need and HOW Can You Own Yaping Sun ABEJA.Inc

Self-Introduction Yaping Sun http://muchuanyun.github.io/ • Majored in Computer Engineering and
Microelectronics • Data Engineer @ABEJA, Inc • Interested in applications of Deep Learning in use-cases

Object Detection and Applications Object Detection in Machine Learning Experience
with ABEJA Platform Datasets Evaluation Criteria Representative Architectures

Object Detection in Machine Learning Experience with ABEJA Platform Datasets
Evaluation Criteria Representative Architectures Object Detection and Applications

Computer Vision Tasks CAT CAT DOG, DOG, CAT DOG, DOG,
CAT http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture11.pdf Classification Object Detection Instance Segmentation

Object Detection • A basic concept from Human Intelligence •
Cornerstone of true AI • Initial step for tracking, identification, human-computer interaction etc. https://github.com/tensorflow/models/tree/master/research/object_detection

• Face Detection • People Counting • Self-Driving Cars •
Pedestrian/Vehicle detection • Video Surveillance • Anomaly Detection • … Why Do You Need?

Case 1: Visual Search • Users upload photo to discover
similar-looking products • Usually multiple objects exist in one image • Object Detection reduces computational cost and improves accuracy in visual search system. • Wide application in Fashion business https://labs.pinterest.com/assets/paper/visual_search_at_pinterest.pdf

Case 2: Analysis in Drone Imagery • Remote monitoring of
a housing construction project through Drone • Routine Inspection of solar farms • Early plant disease detection in agriculture https://medium.com/nanonets/how-we-flew-a-drone-to-monitor- construction-projects-in-africa-using-deep-learning-b792f5c9c471

Case 3: Behavior Observation • Analysis of users’ behavior helps
improving product • Use object detection to track the movement of items in a kitchen • 1/100 time cost of manual work https://six2018.abejainc.com/docs/b3_six2018.pdf

Practice: Deconstruct a Problem Example: Unmanned Store • Required Functions
• e.g. track what customer picks from a shelf • e.g. checkout within shopping carts • Possible Approach • e.g. track hands • e.g. detect products • Feasibility Evaluation • cameras (resolution, position, …) • accuracy expectation • cost vs. RFID?

Object Detection: Problem Deﬁnition Input: • Image (RGB) Output: •
class 0, (x1, y1, w1, h1), p1 • class 0, (x2, y2, w2, h2), p2 • class 1, (x3, y3, w3, h3), p3 • … (x, y) w h ‘cat’ (cj , bj , pj ) This image is CC0 public domain.

Famous Challenges PASCAL VOC (2007) ImageNet ILSVRC (2013) MS COCO
(2015) Open Images (2018) # Classes 20 200 80 500 # Training Images 11K 476K 200K 1.7M # Objects 27K 534K 1.5M 12M Note standard scaled up version of PASCAL VOC more difficult than VOC broader range of classes http://host.robots.ox.ac.uk/pascal/VOC/ http://www.image-net.org/challenges/LSVRC/ http://cocodataset.org/#home https://www.kaggle.com/c/google-ai-open-images-object-detection-track

Object Detection: Evaluation (1) AP (Average Precision) average of the
maximum precisions at different recall values. mAP (mean Average Precision) mean of AP over all categories AP@IoU average precision over all IoU thresholds [0.5:0.05:0.95]. AP@Scales average precision for different object sizes [small, medium, large]. AR (Average Recall) averaged maximum recall given a fixed number of detections per image

Object Detection: Evaluation (2) TP (True Positive): correct class and
IoU > 0.5 FP(False Positive): wrong class or IoU < 0.5 FN (False Negative): missed object Ground truth Prediction IoU = area of overlap area of union Precision = TP TP + FP Recall = TP TP + FN AP = ∑ r∈Recall([0,1]) Precision(tr ) |Recall([0,1])| AP: average of maximum precision at all recall levels Intersection over Union:

Object Detection: Evaluation (3) Example: For category ‘cat’: # Ground
truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 3 4 5 6 7 8 9 10 Precision = TP TP + FP Recall = TP TP + FN TP=1, FP=0, FN=4 Precision = 1/1 Recall = 1/5

truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 4 5 6 7 8 9 10 Precision = TP TP + FP Recall = TP TP + FN TP=2, FP=0, FN=3 Precision = 2/2 Recall = 2/5

truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 5 6 7 8 9 10 Precision = TP TP + FP Recall = TP TP + FN TP=2, FP=1, FN=3 Precision = 2/3 Recall = 2/5

truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 FALSE 0.5 0.4 5 FALSE 0.4 0.4 6 TRUE 0.5 0.6 7 TRUE 0.57 0.8 8 FALSE 0.5 0.8 9 FALSE 0.44 0.8 10 TRUE 0.5 1.0 Precision = TP TP + FP Recall = TP TP + FN AP: average of maximum precision at all recall levels

Object Detection: Evaluation (3) Rank Correct? Precision Recall 1 TRUE
1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 FALSE 0.5 0.4 5 FALSE 0.4 0.4 6 TRUE 0.5 0.6 7 TRUE 0.57 0.8 8 FALSE 0.5 0.8 9 FALSE 0.44 0.8 10 TRUE 0.5 1.0 Recall* Precision* 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10

Object Detection: Evaluation (3) AP = (5x1.0+4x0.57+2x0.5)/11 Rank Correct? Precision
Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 FALSE 0.5 0.4 5 FALSE 0.4 0.4 6 TRUE 0.5 0.6 7 TRUE 0.57 0.8 8 FALSE 0.5 0.8 9 FALSE 0.44 0.8 10 TRUE 0.5 1.0 Recall* Precision* 0 1.0 0.1 1.0 0.2 1.0 0.3 1.0 0.4 1.0 0.5 0.57 0.6 0.57 0.7 0.57 0.8 0.57 0.9 0.5 1.0 0.5 Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10

Challenges we are facing Illumination Blur & Motion Occlusion Scale,
Size, Pose, Clutter Deformation

Think Intuitively… • Use a sliding window to go over
the full image • Crop the area and do classification • Repeat for different window size But… • Return multiple detections • Too slow

Non-Maximum-Suppression (NMS) • Start with detection with highest confidence score
• Measure its IoUs with other detections • Remove detections with IoU > threshold (e.g. 0.5) • Repeat the steps with the remaining detections

Milestones of Object Detection • Before 2012: Handcrafted features •
After 2012: benefit from DCNNs https://arxiv.org/abs/1809.02165

Representative Object Detection Architectures • Two-Stage Detector • RCNN series
• R-FCN • One-Stage Detector • YOLO series • SSD

RCNN / Fast RCNN / Faster RCNN Highlights • Region
proposal (‘blob-like’) • CNN based classifier • SOTA of 2014 Problems • Multi-stage pipeline • Training is too heavy • Detection is slow (47s/image on GPU) https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1311.2524

RCNN / Fast RCNN / Faster RCNN Highlights • Feature
is calculated only once • Multi-task loss of classification and regression • Faster than RCNN Problems • Region proposal is still the bottleneck. https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1504.08083

RCNN / Fast RCNN / Faster RCNN Highlights • Use
CNN to do region proposal (RPN), other parts are just like Fast RCNN • Introduce Anchors • Joint training Problems: • Still slow https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1506.01497

R-FCN (Region-based Fully Convolutional Network) Highlights • Shared RoI subnet
• Position sensitive RoI pooling • Faster than Faster RCNN Problems: • More computational cost than single stage detector https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1605.06409

Can We Drop Region Proposal Step?

YOLO (You-Only-Look-Once) Highlights • Super fast • Use features from
entire image Problems: • Weak on small objects • A lot localization errors https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1506.02640

SSD (Single-Shot-Detector) Highlights • Use multiple CONV feature maps •
Competitive accuracy with Faster RCNN • Faster than YOLO-v1 Problems • Poor performance on small objects https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1512.02325

Which is the best? Given application&platform: Tradeoff of speed, memory
and accuracy Examples: • Mobile device: small memory footprint • Realtime applications: test-time inference speed • Server-side system: accuracy (subject to throughput constraint)

Configuration: Feature Extractor https://arxiv.org/pdf/1611.10012.pdf

Configuration: Input Image Size https://arxiv.org/pdf/1611.10012.pdf

mAP@Scales https://arxiv.org/pdf/1611.10012.pdf

Latest SOTA https://arxiv.org/abs/1811.04533

Before getting hands dirty… • Prepare a proper dataset •
Collect good quality images • Annotation work is necessary • Understand the data • Clarify the deployment environment • Edge device / Local machine / Cloud • Real-time? • Pick a model

• Data • Accumulation • Management • Annotation • ML/DL
Model • Training • Deployment • Serving and Inference • Version Management A Glimpse into ABEJA Platform

Technical Tutorials • Sample codes for classification, object detection, semantic
segmentation https://github.com/abeja-inc/abeja-platform-samples • Tech Blogs on ABEJA Platform https://qiita.com/advent-calendar/2018/abejaplatform • ABEJA’s General Tech Blog: https://tech-blog.abeja.asia/

After the lecture is over, we are waiting at the
Ask the Speaker section of the exhibition area. If you have any questions, please come to this corner after the session ends. See you Ask the Speaker ABEJA 17 6 5 4 3 1 2 9 10 11 12 7 8 16 15 ABEJA Ask the Speaker 14 3F Hall ABEJAծ ABEJA Deep Learning ABEJA

The contents introduced today and the products and services that
support the backside of these, We have prepared a booth at the 3F exhibition hall. Please drop by during the session. GO EXPO 2F 3F Room A Room B Room C Room D Hall ٖؒك٦ة٦ WC ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي Room E ٖؒك٦ة٦ WC ♧菙勻㜥罏「➰ ٝ؟٦ أؙ 闌怴罏「➰ 1F 2F 3F Floor Maps Room A Room B Room C Room D Hall ٖؒك٦ة٦ WC ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي Room E ٖؒك٦ة٦ WC Room W ♧菙勻㜥罏「➰ أهٝ؟٦ رأؙ 闌怴罏「➰ WC ٖؒك٦ة٦ Here

Tomorrow will be announced in many sessions how the technology
introduced today is actually used by clients. Please come tomorrow GO Day2 !! - for ABEJA Platform

Please give us feedback on this session if you like
ID of this session dev-e-2 Object Detector: WHY Do You Need and HOW Can You Own Feedback will be used to develop products and deliver more information https://goo.gl/forms/erEBAsrQK4XKEv352

Thank you.

Object Detector: WHY Do You Need and HOW Can Yo...

Object Detector: WHY Do You Need and HOW Can You Own

More Decks by ABEJA

Other Decks in Technology

Featured

Transcript