Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[Journal club] Flow as the Cross-Domain Manipul...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Semantic Machine Intelligence Lab., Keio Univ.
PRO
April 09, 2026
Technology
90
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[Journal club] Flow as the Cross-Domain Manipulation Interface
Semantic Machine Intelligence Lab., Keio Univ.
PRO
April 09, 2026
More Decks by Semantic Machine Intelligence Lab., Keio Univ.
See All by Semantic Machine Intelligence Lab., Keio Univ.
[Journal club ] PHyCLIP: ðð-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
keio_smilab
PRO
0
42
[Journal club] ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
keio_smilab
PRO
0
100
[Journal club] ReLaGS: Relational Language Gaussian Splatting
keio_smilab
PRO
0
100
Mobi-ð: Mobilizing Your Robot Learning Policy
keio_smilab
PRO
0
160
A Gentle Introduction to Transformers
keio_smilab
PRO
16
6.9k
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
keio_smilab
PRO
0
58
[Journal club] VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
keio_smilab
PRO
0
140
[Journal club] Improved Mean Flows: On the Challenges of Fastforward Generative Models
keio_smilab
PRO
0
200
[Journal club] MemER: Scaling Up Memory for Robot Control via Experience Retrieval
keio_smilab
PRO
0
140
Other Decks in Technology
See All in Technology
SONiC Scale-Up Working Group ããæ¢ã Scale-UpãUltraEthernetæ©èœã®å®è£ æ¹æ³
ebiken
PRO
2
480
Bucharest Tech Week 2026 - Guardians of the Cloud-Native Galaxy
edeandrea
PRO
0
140
èªåã詳ãããªãé åã§AIã䜿ã #ãããã¹2026
konifar
20
7.4k
10幎éã®ããã°çºä¿¡ãæ¯ãè¿ã£ãŠèŠããWebã¢ããªã±ãŒã·ã§ã³ãšã³ãžãã¢ãšããŠã®è»è·¡
stefafafan
0
180
ã¢ãžã£ã€ã«ãªçµçãš Claude Code ãšçµå¶ã®æªæ¥
kawaguti
PRO
3
190
Oracle Cloud InfrastructureïŒ2026幎6æåºŠãµãŒãã¹ã»ã¢ããããŒã
oracle4engineer
PRO
0
290
èµ·ç¹ã»æèã»åºåã§åè§£ãã ãPMæ¥åã®èªååèšèšã
kazu_kichi_67
1
1k
å ¥éïŒAWS Blocks
ysuzuki
1
190
âè©°ãâåã«ä»çµã¿ãäœã ãæè¡ã®æ³¢ã«æººããªãããã®ãã£ããã¢ããè¡ã
takasyou
7
3.7k
Comment regagner la souveraineté de vos données tout en étant payé grâce à Nostr !
rlifchitz
0
200
ã2026幎çã ãã¯ãã«æ€çŽ¢ãšEmbeddingæåç·
mocobeta
23
7.5k
5åã§ãããDuckDB Quack
chanyou0311
2
250
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
400
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.2k
For a Future-Friendly Web
brad_frost
183
10k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
310
äžçã®äººæ°ã¢ããª100åãåæããŠèŠãããã€ãŠã©ãŒã«èšèšã®å¿åŸ
akihiro_kokubo
PRO
72
40k
Leo the Paperboy
mayatellez
7
1.9k
AI: The stuff that nobody shows you
jnunemaker
PRO
8
730
The agentic SEO stack - context over prompts
schlessera
0
820
ãšã³ãžãã¢ã«èš±ãããç¹å¥ãªæéã®çµãã
watany
107
250k
Making Projects Easy
brettharned
120
6.7k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Transcript
Mengda Xu1,2,3 Zhenjia Xu1,2 Yinghao Xu1 Cheng Chi1,2 Gordon Wetzstein1
Manuela Veloso3,4 Shuran Song1,2 1Stanford University, 2Columbia University, 3J.P. Morgan AI Research, 4Carnegie Mellon University Flow as the Cross-Domain Manipulation Interface 2026 ææµŠåæç 究宀 å°æè倪 Xu, Mengda., Xu, Zhenjia., Xu, Yinghao., Chi, Cheng., Wetzstein, Gordon., Veloso, Manuela., Song, Shuran. âFlow as the Cross-domain Manipulation Interfaceâ. In 8th Conference of Robot Learning, 2024. CoRL24
æŠèŠ 2 âª èæ¯ âª å®æ©ã§ã®ããŒã¿åéã¯é«ã³ã¹ã 容æã«åéå¯èœãªããŒã¿ãããããåŠç¿ã«äœ¿ããã ⪠人éã®åç»ïŒã·ãã¥ã¬ãŒã·ã§ã³ããŒã¿ âª
ææ¡ææ³ïŒIm2Flow2Act ⪠object flow ãåªä»ã«ããè»éçæãã¬ãŒã ã¯ãŒã¯ ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸ âª çµæ ⪠ããããã®å®æ©ããŒã¿ã䜿çšããã«ç©äœæäœå¯èœ ⪠ã·ãã¥ã¬ãŒã·ã§ã³ã»å®æ©å®éšã«ãããŠããŒã¹ã©ã€ã³ãäžåã
èæ¯ïŒãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸã®å¿ èŠæ§ 3 ï ããããã®å®æ©ããŒã¿ã®åéã¯é«ã³ã¹ã ï 宿©ç°å¢ã«åãããã·ãã¥ã¬ãŒã¿ç°å¢ã®æ§ç¯ã¯é«ã³ã¹ã åéã³ã¹ãã®äœãããŒã¿ãçšããã ⌠人éåç» ï
human-robot ã®ãšã³ããã£ã¡ã³ãã®ã£ãã ⌠ã·ãã¥ã¬ãŒã¿ã®åäžç°å¢ã«ãããè»éããŒã¿ ï sim-real ã®ãã¡ã€ã³ã®ã£ãã (èæ¯, ç©äœãã¯ã¹ãã£, etc...) ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸã®å¿ èŠæ§
é¢é£ç ç©¶: cross-domain data ããã®ããããåŠç¿ 4 ææ³ ç¹åŸŽ VRB [Bahl+, CVPR23]
人éåç»ããç©äœã®ææç¹ãšè»éãåŠç¿ PEAC [Ying+, NeurIPS24] Cross-embodiment data ãã latent action ãåŠç¿ ATM [Wen+, RSS24] 人éåç»ãã hand-centric ãªãããŒãåŠç¿ ï 宿©é©çšæã« target embodiment ã§ã®ããŒã¿åéãå¿ èŠ VRB [Bahl+, CVPR23] ATM [Wen+, RSS24]
ææ¡ææ³ïŒIm2Flow2Act 5 â Im2FlowïŒåæç»å, èšèªæç€º object flow ã¿ã¹ã¯ããšã«åéãã人éåç»ãçšããŠèšç·Ž object flowã«ãã
cross-embodiment data ãçšããè»éçæãã¬ãŒã ã¯ãŒã¯ ⺠ç©äœã®å§¿å¢ãå€åœ¢ã衚çŸå¯èœ ⺠embodiment-agnostic âº èæ¯ããã¯ã¹ãã£ã«é å¥ â¡ Flow2ActïŒobject flow, çŸåšç»å è»é ã·ãã¥ã¬ãŒã¿ã§åéããè»éããŒã¿ãçšããŠèšç·Ž ã¿ã¹ã¯å šäœã«ãããç©äœè»é
åæïŒLDM (Latent Diffusion Model), AnimateDiff 6 äºååŠç¿æžã¿T2Iæ¡æ£ã¢ãã« (SD) ã« motion
module ãå°å ¥ããŠåç»ãçæ âŒ Temporal Transformer ⺠æéæ¹åã®äžè²«æ§ãåäž âŒ T2Iã¢ãã«ã freeze ããŠåŠç¿ãè¡ã ⺠äœã³ã¹ããªèšç·Ž AnimateDiff [Guo+, ICLR24] LDM [Rombach+, CVPR22] çæç©ºéãäœæ¬¡å ãªæœåšç©ºéã«ããã ãšã§é«å質ãªç»åãé«éã«çæå¯èœ ⌠cross-attention ⺠æè»ãªæ¡ä»¶å ¥å (text, bbox, etc...) ⌠èšç®éãåæž ⺠é«å¹çãªèšç·Ž ⺠é«éãªæšè« ⌠Stable Diffusion ãäœ¿çš LDM [Rombach+, CVPR22] AnimateDiff [Guo+, ICLR24]
(b) AnimateDiff (SDããŒã¹) ã§ãããŒãçæ â ãããŒãSDã®æœåšç©ºéã«ãšã³ã³ãŒã ð¥0:ð 0 = ðžð
â±ð |ð â [0, ð] â¡ motion module ãèšç·Ž ð¥1:ð ð¡ = àŽ€ ðŒð¡ ð¥1:ð 0 + 1 â àŽ€ ðŒð¡ ð1:ð (æ¡æ£éçš) â = ðŒ ðž â±1:ð ,ð¥0 0,ð,ðŠ,ð0:ð~ð© 0,ðŒ ,ð¡ ð â ðð ð¥1:ð ð¡ , ð¡, ð¥0 0, ð ð ððð(ð), ðð ð¡ð¥ð¡ ðŠ 2 2 ⢠ð·ð ãfinetuneããŠãããŒãåºå Im2FlowïŒFlow Generation Network 7 åæç»å + èšèªæç€º object flow â Grounding DINOã§bboxãååŸ â¡ bboxå ãåäžã«ãµã³ããªã³ã° â±0 â ð 3Ãð»Ãð ð¢, ð£, ð£ðð ðððððð¡ðŠ ð¢, ð£:ç»åå ã®åº§æš ð£ðð ðððððð¡ðŠ:ç©äœã®å¯èŠæ§ (a) ð» ð â±1 â ð 3ÃðÃð»Ãð â±0:ð ïŒæ£è§£ãã㌠ð ïŒåæç»å ðŠ ïŒèšèªæç€º ð¡ ïŒæå» ðžð ïŒSDãšã³ã³ãŒã ð·ð ïŒSDãã³ãŒã àŽ€ ðŒt :ãã€ãºã¹ã±ãžã¥ãŒã© (pre-defined) ð ð ððð/ðð ð¡ð¥ð¡ïŒCLIPãšã³ã³ãŒã (ç»å/èšèª)
(c) 1)State Encoder ð : ð ð¡ = ð(ðð¡ , ð¥0
) ã» (察象ã®äœçœ®ãå§¿å¢ã«é¢ãã) ç¶æ 衚çŸãçæ ã»åç¹ã®åº§æšããšã³ã³ãŒããCLSããŒã¯ã³ã§èŠçŽ 2)Temporal Alignment ð : ð§ð¡ = ð(â±0:ð , ð ð¡ , ðð¡ ) ã»æå»t 以éã®ãããŒã«ã€ããŠã®æœåšè¡šçŸãäºæž¬ ã»ð¿2 lossïŒ Æž ð§ð¡ â ð§ð¡ 2 Æž ð§ð¡ = ð ðð¡:ð 3)Diffusion Action Head : ð(ðð¡ |ð§ð¡ , ð ð¡ , ðð¡ ) æ¡æ£ã¢ãã«ãçšããŠè»éç³»åãçæ Flow2ActïŒ Flow-Conditioned Policy 8 ðð¡ : Nåã® key point ã®æå»tã«ããã ç»åå åº§æš ð¢ð¡ ð, ð£ð¡ ð ð=1 ð ð¥0 : Nåã® key point ã®åæãã¬ãŒã ã«ããã3次å åº§æš â±0:ð : ã¿ã¹ã¯å šäœã® object flow ðð¡ : ããããã® proprioception ð : ãããŒãæœåšè¡šçŸã«ãšã³ã³ãŒã ðð¡ : æå»tããã®è»éç³»å ðð¡ , ⊠, ðð¡+ð¿ (b) Online Point Tracking TAPIR ãçšã㊠key point ã远跡 TAPIR [Doersch+, ICCV23] object flow, çŸåšç»å è»é
å®éšèšå® 9 ⌠4ã€ã®ã¿ã¹ã¯ã§è©äŸ¡ ⌠Pick-and-place ⌠Pouring ⌠Open
drawer ⌠Folding cloth ⌠åŠç¿èšå® ⌠object flow: ⌠H=W=32 ⌠T=32 ⌠åŠç¿æéïŒèšèŒãªã ⌠ããããïŒUR5e â±1 â ð 3ÃðÃð»Ãð ⌠èšç·ŽããŒã¿ ⌠人éåç» âŒ äººéã«ããåã¿ã¹ã¯ã®ã㢠⌠ããŒã¿æ°: èšèŒãªã ⌠ã·ãã¥ã¬ãŒã¿ïŒMuJoCo ⌠ããããïŒUR5e ⌠ããŒã¿æ°: 4800
å®éççµæ@Simulation 10 âº å šãŠã®ã¿ã¹ã¯ã§ä»ææ³ãäžåã Demonstration-conditioned åäžã®äººéã®ãã¢ããobject flowãæœåº Language-conditioned åæãã¬ãŒã ãšèšèªæç€ºãå ¥åãšããã·ã¹ãã å šäœã®è©äŸ¡ Ablation
1)Heuristic å§¿å¢æšå®ã«ããè»éçæ 2)Grid Flow ç»åå šäœãäžæ§ã«ãµã³ããªã³ã° 3)No alignment Temporal Alignmentã䜿çšããªã
å®éççµæ@Real-World 11 âº å šãŠã®ã¿ã¹ã¯ã§ä»ææ³ãäžåã Demonstration-conditioned åäžã®äººéã®ãã¢ããobject flowãæœåº Language-conditioned åæãã¬ãŒã ãšèšèªæç€ºãå ¥åãšããã·ã¹ãã å šäœã®è©äŸ¡ Ablation
1)Heuristic å§¿å¢æšå®ã«ããè»éçæ 2)Grid Flow ç»åå šäœãäžæ§ã«ãµã³ããªã³ã° 3)No alignment Temporal Alignmentã䜿çšããªã
宿§ççµæ 12 ⺠object flow ãç©äœã®è»éãé©åã«è¡šçŸ
宿§ççµæ: Ablation Study ã«ããã倱æäŸ 13 ã¯ããã«åŒãåºããæŒãæ»ã è»éã¯æ£ãããïŒã³ãããå転ããŠãã ãããŒãšå®è»éã«ãããæå»ã®äžæŽå ï äžæ£ç¢ºãªåäœ
ï 誀ã£ãæ¹åãžã®åäœ ï ãžãã¿ãŒãçºç
ãŸãšã 14 âª èæ¯ âª å®æ©ã§ã®ããŒã¿åéã¯é«ã³ã¹ã 容æã«åéå¯èœãªããŒã¿ãããããåŠç¿ã«äœ¿ããã ⪠人éã®åç»ïŒã·ãã¥ã¬ãŒã·ã§ã³ããŒã¿ âª
ææ¡ææ³ïŒIm2Flow2Act ⪠object flow ãåªä»ã«ããè»éçæãã¬ãŒã ã¯ãŒã¯ ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸ âª çµæ ⪠ããããã®å®æ©ããŒã¿ã䜿çšããã«ç©äœæäœå¯èœ ⪠ã·ãã¥ã¬ãŒã·ã§ã³ã»å®æ©å®éšã«ãããŠããŒã¹ã©ã€ã³ãäžåã