on the road. second car in line waiting at traffic light. corner of rounded building. Are there any buses on the street? (…) darkness on night sky.', box 0: {'area': 13125.0, 'iscrowd': 0, 'image_id': 486491, 'category_id': 1, 'id': 2497354, 'bbox': [182, 132, 175, 75], 'tokens_positive': [[0, 1], [2, 8], [9, 12]]}, box 1: {'area': 37975.0, 'iscrowd': 0, 'image_id': 486491, 'category_id': 1, 'id': 2497355, 'bbox': [344, 7, 155, 245], 'tokens_positive': [[26, 32], [33, 36]]}, … GLIPやGrounding DINO, YOLO-worldなどで使われている A-3: Visual Grounding Datasetの利用 参考:GoldG dataset preparation