← Back to home

Dataset Visualizations

Case-by-case browsers for multimodal / video-understanding benchmarks. Each viewer ships with a representative subset so it stays hostable on GitHub Pages; links to the official source are in the footer of every viewer.

StreamingBench

MLLM streaming video understanding benchmark. Four tasks: Real-Time Visual Understanding, Omni-Source Understanding, Sequential QA, and Proactive Output.

900 videos total 4,500 QA pairs 45 in viewer

video streaming MCQ + open

Open viewer

OmniMMI

Omni-modal interactive benchmark for streaming video dialog. Six tasks covering action prediction, state grounding, proactive alerting, and speaker identification.

1,400 samples 6 tasks 71 in viewer

video audio multi-turn

Open viewer

ProactiveVideoQA

Proactive video question-answering with time-aligned streaming replies. Four subsets: WEB (open-domain), EGO (first-person), TV (dialogue), and VAD (anomaly detection).

1,427 samples 4 subsets 8 in viewer

video proactive time-aligned

Open viewer

OVO-Bench

Online video understanding across three temporal modes: backward tracing, real-time perception, and forward active responding. 12 fine-grained tasks.

1,640 samples 12 tasks 24 in viewer

video streaming probe-based

Open viewer