Action100M → Benchmark Candidates

All Kitchen Manipulation Navigation
/ 0
navigate · click time → seek video

Source Video

LLM Judgement

LLM Prompt (sent to GPT)

System prompt
User message (compact video summary)
Raw LLM output JSON

Metadata (from YouTube)

Tree-of-Captions (ground truth from Action100M)