← Back to Dataset Visualizations

EgoSchema × Action100M Viewer

Hierarchical dense action annotation pipeline (Chen et al., 2026 — Action100M) ported to 50 EgoSchema 3-minute egocentric clips. Each clip is segmented by V-JEPA 2 + Ward agglomerative clustering, captioned by Llama-3.2-Vision + Perception-LM, and aggregated by GPT-4o with 3-round Self-Refine. Click any card to open an interactive viewer with the source video, hierarchical timeline, and per-node annotations.

50clips
8890stotal video
33,290tree nodes
3887GPT-4o annotated
$63.06API cost
Action100M-on-EgoSchema pipeline: V-JEPA 2 segmentation, Tree-of-Captions generation, GPT-4o aggregation with Self-Refine, and the resulting annotated event tree
End-to-end pipeline (V-JEPA 2 segmentation → Tree-of-Captions → GPT-4o aggregation with Self-Refine → annotated event tree).
→ Aggregate verb-composition sunburst
verb→object→modifier word-frequency hierarchy across all 50 clips' action.brief annotations
00FA

00faf954…

A person shapes mud into bricks using a wooden mold.

179s 671 nodes 85 annotated
action: Shape mud into bricks.
Open viewer
0D01

0d01c24b…

A person crafts a paper flower at a small table.

179s 671 nodes 80 annotated
action: Craft a paper flower.
Open viewer
0F0D

0f0d2135…

A woman packs groceries and interacts with a customer.

179s 671 nodes 75 annotated
action: Pack groceries into a bag.
Open viewer
1566

156683f3…

A person repairs a scooter in a workshop.

179s 671 nodes 78 annotated
action: Repair a scooter
Open viewer
1D37

1d37d8e5…

A person cleans the kitchen and explores the house.

179s 671 nodes 72 annotated
action: Clean the kitchen and explore the house.
Open viewer
22A0

22a04ca6…

A man prepares scrambled eggs in a messy kitchen.

179s 671 nodes 75 annotated
action: Cook scrambled eggs.
Open viewer
2B1A

2b1ad004…

A woman plays cards while feeding a lizard.

179s 671 nodes 74 annotated
action: Play cards with a partner.
Open viewer
2B96

2b960c7d…

A person exercises and makes coffee.

179s 671 nodes 76 annotated
action: Perform yoga and make coffee.
Open viewer
3581

3581bcf8…

A person is crafting bricks using clay and sand.

179s 671 nodes 84 annotated
action: Make bricks from clay and sand.
Open viewer
3B50

3b50beeb…

A person cuts and peels dried fruits at a table.

179s 671 nodes 79 annotated
action: Cut and peel dried fruits.
Open viewer
420F

420fa606…

Two men engage in card playing and note-taking at a table.

179s 671 nodes 75 annotated
action: Deal cards and write notes.
Open viewer
4785

4785bf2e…

A person organizes groceries and cleans the kitchen.

179s 671 nodes 77 annotated
action: Organize and clean the kitchen.
Open viewer
4AA1

4aa10456…

A person cleans and examines books on the floor.

179s 671 nodes 73 annotated
action: Clean and examine books
Open viewer
4C3B

4c3b9dd4…

A person prepares and cooks a dish.

179s 671 nodes 139 annotated
action: Prepare and cook a dish.
Open viewer
52E4

52e48527…

A woman cuts yellow fabric with scissors.

179s 671 nodes 86 annotated
action: Cut fabric with scissors.
Open viewer
5E43

5e43992d…

A person prepares and cooks a meal in a kitchen.

179s 671 nodes 71 annotated
action: Prepare and cook a meal.
Open viewer
604A

604acf21…

A man welds and smooths a metal pipe.

179s 671 nodes 92 annotated
action: Weld and smooth a metal pipe.
Open viewer
6709

670945d6…

A person navigates a house, interacting with items and observing surroundings.

110s 411 nodes 45 annotated
action: Explore and interact with household items.
Open viewer
6963

696392dd…

A person assembles a wooden project using glue and small blocks.

179s 671 nodes 82 annotated
action: Assemble wooden blocks.
Open viewer
7B90

7b904a75…

A person controls a robot vacuum while another cleans the kitchen and someone else uses a phone.

179s 671 nodes 74 annotated
action: Operate a robot vacuum.
Open viewer
7D5B

7d5b057b…

A woman irons various fabrics on an ironing board.

179s 671 nodes 74 annotated
action: Iron various fabrics.
Open viewer
7DF3

7df39cbd…

A person is knitting at a table with various items.

179s 671 nodes 82 annotated
action: Knit with yarn and crochet hook.
Open viewer
8350

8350d2b3…

A person organizes clothes by taking them out of a wardrobe and placing them on a bed.

179s 671 nodes 77 annotated
action: Organize clothes.
Open viewer
84BE

84be1093…

A man paints a wooden door and board yellow.

179s 671 nodes 79 annotated
action: Paint a wooden door and board.
Open viewer
8782

8782618b…

A person washes dishes in a kitchen sink.

179s 671 nodes 75 annotated
action: Wash dishes
Open viewer
920D

920d35a6…

A person cooks and prepares a meal in a cluttered kitchen.

179s 671 nodes 71 annotated
action: Cook and prepare a meal.
Open viewer
974A

974ac5d0…

A lab technician conducts experiments using pipettes and test tubes.

179s 671 nodes 80 annotated
action: Transfer liquids between containers.
Open viewer
9C95

9c956866…

A person sews a small pouch using a sewing machine.

179s 671 nodes 73 annotated
action: Sew a pouch.
Open viewer
A126

a1262146…

A person crafts a clay sculpture at a table.

179s 671 nodes 73 annotated
action: Shape and refine a clay sculpture.
Open viewer
A3A7

a3a71268…

Two people play a game of checkers on a wooden table.

179s 671 nodes 76 annotated
action: Play checkers
Open viewer
A88C

a88cabdc…

A person washes clothes in a bathtub while intermittently watching a video on their phone.

179s 671 nodes 75 annotated
action: Wash clothes in a bathtub.
Open viewer
AFA3

afa330df…

A person washes dishes at a kitchen sink.

179s 671 nodes 78 annotated
action: Wash dishes
Open viewer
B4C5

b4c5f426…

A woman crafts clay pots on the ground.

179s 671 nodes 76 annotated
action: Craft clay pots.
Open viewer
C94E

c94ea4e2…

A person organizes and cleans books on the floor.

179s 671 nodes 78 annotated
action: Organize and clean books
Open viewer
C9ED

c9ed0ee8…

A person cuts and prepares cardboard for a project.

179s 671 nodes 75 annotated
action: Cut cardboard.
Open viewer
CA96

ca9659f7…

A person prepares a meal by adding milk and water to a pot and organizing kitchen items.

179s 671 nodes 82 annotated
action: Prepare a meal and organize kitchen items.
Open viewer
CC4C

cc4ccc21…

A person cooks and cleans in the kitchen.

179s 671 nodes 78 annotated
action: Cook and clean in the kitchen.
Open viewer
D092

d092fae6…

A person photographs a field and interacts with a group.

179s 671 nodes 72 annotated
action: Photograph and interact with a group.
Open viewer
D5EA

d5ea4b32…

A person creates a craft project at a table.

179s 671 nodes 75 annotated
action: Create a craft project
Open viewer
DD3D

dd3d5867…

A person is gardening by weeding and planting in a raised bed and pot.

179s 671 nodes 74 annotated
action: Garden by weeding and planting.
Open viewer
E1E7

e1e76763…

A person prepares a meal in a modern kitchen.

179s 671 nodes 71 annotated
action: Prepare a meal.
Open viewer
E2C5

e2c54dce…

A man works on a woodworking project in a workshop.

179s 671 nodes 75 annotated
action: Cut and assemble wood.
Open viewer
E4F0

e4f08b6f…

A person walks through a house, brushes their teeth, and exits the bathroom.

179s 671 nodes 71 annotated
action: Walk and perform personal hygiene.
Open viewer
EA99

ea99b807…

A person repots a plant using a trowel and soil bag.

179s 671 nodes 78 annotated
action: Repot a plant.
Open viewer
EDB5

edb5c7f4…

A person folds a cloth and transfers items between bags.

179s 671 nodes 97 annotated
action: Fold a cloth and transfer items.
Open viewer
F506

f5066bbf…

A man sands and polishes a metal pipe using power tools.

179s 671 nodes 72 annotated
action: Sand and polish a metal pipe.
Open viewer
F75E

f75ea23d…

Two people work together to complete a 1000-piece emoji jigsaw puzzle.

179s 671 nodes 71 annotated
action: Complete a jigsaw puzzle.
Open viewer
FA23

fa2363f4…

A person knits a purple item in a living room.

179s 671 nodes 90 annotated
action: Knit a purple item.
Open viewer
FB82

fb82f807…

A person prepares potatoes in the kitchen.

179s 671 nodes 70 annotated
action: Prepare potatoes.
Open viewer
FEE0

fee05c7c…

A person cleans windows in a room.

179s 671 nodes 77 annotated
action: Clean windows
Open viewer

Pipeline overview

Stage 1. V-JEPA 2 ViT-g-384 frame embeddings (window=64, stride=8, res=384²) → temporal-contiguous Ward agglomerative clustering → ~600+ tree nodes per clip.

Stage 2. Leaf nodes captioned by Llama-3.2-11B-Vision on midpoint frame; internal nodes captioned by Perception-LM-3B on 32 evenly-spaced frames at 320×320.

Stage 3. Nodes ≥4s aggregated by gpt-4o-2024-08-06 using global-tree-context + current-subtree-markdown, with 3-round Self-Refine and JSON-Schema-strict structured outputs ({summary, action}).

Full method documentation: README · Source code (private): streaming_benchmark/src/{stage1,stage2,stage3}_*.py