Action100M → Benchmark Candidates

Domain All Kitchen Manipulation

Sample / 0

Jump

← → navigate · click time → seek video

Source Video

LLM Judgement

LLM Prompt (sent to GPT)

System prompt

User message (compact video summary)

Raw LLM output JSON

Metadata (from YouTube)

Tree-of-Captions (ground truth from Action100M)