31 “Shredded dried squid” “Kimchi” “tear by hand into small pieces.” “Cut kimchi with scissors,” “add to the squid, and” 02:31 02:44 05:14 07:02 08:09 08:10 09:44 “mix.” “white sesame” 10:00 10:16 10:23 10:29 10:41 10:45 11:42 12:03 “Add shredded dried squid to a bowl and” “top with white sesame.” Before After Dest. “Ingredient” Graph Structure “Action” “Brown sugar” 02:16 “Add brown sugar,” “mix, and” Ready to eat! Cooking starts. [ours] K. Maeda & T. Hirasawa, “COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark”, ECCV2024 VAGは動画をHOI単位のTask/Motionに分割→自動抽出によるデータ収集が鍵になる?