Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TRANS-AM: Transfer Learning by Aggregating Dyna...

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri, “TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly”, ICRA 2021

blog:
https://medium.com/sinicx/trans-am-transfer-learning-by-aggregating-dynamics-models-for-soft-robotic-assembly-53ef3451d066

OMRON SINIC X

June 02, 2021
Tweet

More Decks by OMRON SINIC X

Other Decks in Research

Transcript

  1. © 2021 OMRON SINIC X Corporation. All Rights Reserved. TRANS-AM:

    Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA’21) Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri (OMRON SINIC X) International Conference on Robotics and Automation (ICRA 2021) June 1, 2021
  2. TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic

    Assembly Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri OMRON SINIC X Corporation
  3. Quick adaptation to a new setup (C) 2021 OMRON SINIC

    X Corporation 5 A new workpiece with different physical characteristics Adaptation 0. One peg-hole 1. Another hole 2. Small peg/hole
  4. Learning for quick adaptation (C) 2021 OMRON SINIC X Corporation

    6 A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning
  5. State transition dynamics (C) 2021 OMRON SINIC X Corporation 7

    A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning Action State Transition of the state of the robot
  6. Problem / Example / Related works 1. Different dynamics in

    source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the env. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 8 (C) 2021 OMRON SINIC X Corporation Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.
  7. Problem / Example / Related works 1. Different dynamics in

    source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 9 (C) 2021 OMRON SINIC X Corporation Peg-in-hole tasks with variable hole orientations Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.
  8. Problem / Example / Related works 1. Different dynamics in

    source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 10 (C) 2021 OMRON SINIC X Corporation Require frequent access to source environments • Inter-dynamics transfer [Chen+ NeurIPS2018] • Meta-learning [Vanschoren 2018] Policy Env 1 Env K ... Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.
  9. Using policies learned at source environments • No access with

    source envs. • Model-free RL • Many interactions at target envs. • Long delays until deployment • Damage of the robots and workpieces • Model-based RL (ours, TRANS-AM) 11 (C) 2021 OMRON SINIC X Corporation [Barekatain+ IJCAI2020] Policy Env 1 Env K ...
  10. Model based RL (MBRL) A model • 𝑔 ~ 𝑇

    • Learned using collected samples • Used to • predict the states from the actions • select the optimal actions (θ: parameter) 13 (C) 2021 OMRON SINIC X Corporation State-transition dynamics
  11. Transfer MBRL (C) 2021 OMRON SINIC X Corporation 14 Target

    dynamics model • Unknown • Quickly learning using source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators ... ?
  12. TRANS-AM learns a model of target dynamics (C) 2021 OMRON

    SINIC X Corporation 15 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models
  13. Peg-in-hole simulations with different hole angles • State: the pose

    of the wrist and the peg • Action: the velocity of the wrist • Return: the distance to goal position 17 (C) 2021 OMRON SINIC X Corporation Wrist Peg Spring Hole orientation Gripper
  14. Evaluation of learning 1. Returns in earlier episodes • Leaning

    curve 2. 1st success episode • average ± standard deviation 3. Success ratio • session with success before {5, 10, 15, 20 episode}/ all session • The number of source dynamics (K) is changed • Baseline: learning from scratch (C) 2021 OMRON SINIC X Corporation 18
  15. 1st success / success ratio (C) 2021 OMRON SINIC X

    Corporation 21 Average ± Standard deviation Faster High success rates e: episode K: source model number
  16. • State: pose, force • Action: gripper velocity • Reward:

    Setup with a robot arm (C) 2021 OMRON SINIC X Corporation 22 Motion capture cameras x 6 (FLEX13, OptiTrack) Force-torque sensor (FT300, ROBOTIQ) Gripper (2F-85, ROBOTIQ) Compliant wrist UR5 robot arm (Universal Robots)
  17. High success rates Success ratios (C) 2021 OMRON SINIC X

    Corporation 25 Average ± Standard deviation 30 % faster e: episode K: source model number
  18. Conclusion: TRANS-AM • Proposed new method to leverage known models

    when learning new tasks • Efficient model-based reinforcement learning confirmed in simulation and real-robot experiments • 30% faster learning than from scratch 26 (C) 2021 OMRON SINIC X Corporation
  19. Weight (C) 2021 OMRON SINIC X Corporation 29 Source 1

    Source 2 Hole orientation Hole orientation Hole orientation Source 1 Source 2 Source 1 Source 2 Target Target Target
  20. Success ratios (C) 2021 OMRON SINIC X Corporation 30 Average

    ± Standard deviation Faster e: episode K: source model number
  21. Model predictive control 1. Input action sequence 2. Predict state

    sequence 3. Calculate rewards in the sequence 4. Update action sequence and go back 2. (C) 2021 OMRON SINIC X Corporation 33
  22. Limitations of TRANS-AM • Not consider characteristics of environments •

    How many source envs? • How select envs? • Cost functions • Interactions with the env. (C) 2021 OMRON SINIC X Corporation 34
  23. TRANS-AM uses black-box source models (C) 2021 OMRON SINIC X

    Corporation 35 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators
  24. TRANS-AM learns characteristics of a new part Geometric • Size

    • Shape • Pose (Position, orientation) Dynamic • Mass • Inertia • Friction • Deformation • Elasticity (C) 2021 OMRON SINIC X Corporation 36