ICPRAI 2022 - STM

Space-Time Memory networks for multi-person skeleton body part detection Rémi
DUFOUR (PhD Student FCS Railenium), Cyril Meurie, Olivier Lézoray, Ankur Mahtani ICPRAI 2022

• Context • Related works • Experiments • Skeleton confidence
maps • Skeleton edges tracking • Conclusion 2 Plan

• Autonomous train prototype project directed by Railenium • Camera
surveillance is needed to enable services and provide security without onboard staff • Pose tracking would be useful as a base for action recognition Context 3

• Most existing pose trackers are two-stage • Pose estimation
in frames • Linking the poses over time • The first stage is usually performed using a top-down pose detector. • A pose detector first detects human bounding boxes • A pose estimation method is used inside these bounding boxes 4 Related works

• Existing top down pose estimation methods are not made
for video, they do not have a memory of past frames • Advances in the task of Video Object Segmentation (VOS) have led to a tracker that makes use of long-term memory, called STM (Space-Time Memory) 5 Related works

6 Related works • Our work seek to answer the
question "Can the STM architecture, conceived for Video Object Segmentation, be adapted for pose estimation in videos"

• First, we tried training STM to perform skeleton/background binary
segmentation • The original STM weights are finetuned on this new task • First on the MS-COCO[1] image dataset (64114 images annotated with keypoints) • We generate short videos (2 to 4 frames) by translating/rotating individual images 7 Experiments : skeleton confidence map

• Quantitative results (CC-loss is Pearson's Correlation Coefficient) • Results
obtained by initialising with ground truth for the first frame and then tracking for the rest of the video sequence 8 Experiments : skeleton confidence map

• Qualitative results on PoseTrack18 • Results are encouraging because
the model was not trained on PoseTrack18_Train 9 Experiments : skeleton confidence map

• The STM architecture was modified so that it has
multiple input and output channels, one for each skeleton edge type 10 Experiments : video skeleton edges tracking

11 Experiments : video skeleton edges tracking

12 Experiments : video skeleton edges tracking

13 Experiments : video skeleton edges tracking • Qualitative results
Prediction Ground truth

14 Experiments : Video pose estimation • Finally, we modify
the STM architecture s that it can that handle each keypoint and each edge in a particular confidence map.

15 Experiments : Video pose estimation

16 Experiments : Video pose estimation • Finally, we use
a STM architecture that handles each keypoint and each edge in a particular confidence map. • We try out multiple data augmentation method that aim at improving performance for long term tracking. without augmentations "rand" augmentation "baits" augmentation "jitter" augmentation "dull_clouds" augmentation

17 Experiments : Video pose estimation • Finally, we use
a STM architecture that handles each keypoint and each edge in a particular confidence map. • We try out multiple data augmentation method that aim at improving performance for long term tracking. without augmentations "rand" augmentation "baits" augmentation "jitter" augmentation

• Then, we finetune on the PoseTrack[1] dataset for 5
epochs 18 1. Andriluka. et al., PoseTrack: A Benchmark for Human Pose Estimation and Tracking, CVPR 2018 Experiments : skeleton confidence map

19 1Lin TY. et al., Microsoft COCO: Common Objects in
Context, ECCV 2014 Experiments : Video pose estimation • Finally, we use a STM architecture that handles each keypoint and each edge in a particular confidence map. • We try out multiple data augmentation method that aim at improving performance for long term tracking. • Cyclic training : During training, the model is fed back its own prediction for the previous frame, repeatedly

Context, ECCV 2014 Experiments : Video pose estimation

Context, ECCV 2014 Experiments : Video pose estimation • Verification that the gains from finetuning on PoseTrack18 and the data augmentations are on long term tracking performance

• We first experimentally proved the capacity of STM to
handle skeletons rather than segment the contour of the tracked object • We have then shown that the STM architecture can be modified to track each skeleton edge individually • Finally, we test a new architecture that can track skeleton keypoints and edges, and experiment with different training procedures and data augmentation methods. • Finetuning on the train set of PoseTrack18 • Cyclic training • Data augmentation "baits" and "jitter" shown to help • The next step should be to compare to other pose tracking methods Conclusion 22

Thank you for your attention 23

ICPRAI 2022 - STM

ICPRAI 2022 - STM

Olivier Lézoray

More Decks by Olivier Lézoray

Other Decks in Research

Featured

Transcript

Space-Time Memory networks for multi-person skeleton body part detection Rémi

• Context • Related works • Experiments • Skeleton confidence

• Autonomous train prototype project directed by Railenium • Camera

• Most existing pose trackers are two-stage • Pose estimation

• Existing top down pose estimation methods are not made

6 Related works • Our work seek to answer the

• First, we tried training STM to perform skeleton/background binary

• Quantitative results (CC-loss is Pearson's Correlation Coefficient) • Results

• Qualitative results on PoseTrack18 • Results are encouraging because

• The STM architecture was modified so that it has

11 Experiments : video skeleton edges tracking

12 Experiments : video skeleton edges tracking

13 Experiments : video skeleton edges tracking • Qualitative results

14 Experiments : Video pose estimation • Finally, we modify

15 Experiments : Video pose estimation

16 Experiments : Video pose estimation • Finally, we use

17 Experiments : Video pose estimation • Finally, we use

• Then, we finetune on the PoseTrack[1] dataset for 5

19 1Lin TY. et al., Microsoft COCO: Common Objects in

20 1Lin TY. et al., Microsoft COCO: Common Objects in

21 1Lin TY. et al., Microsoft COCO: Common Objects in

• We first experimentally proved the capacity of STM to

Thank you for your attention 23