← Back to home

Dataset Visualizations

Case-by-case browsers for multimodal / video-understanding benchmarks. Each viewer ships with a representative subset so it stays hostable on GitHub Pages; links to the official source are in the footer of every viewer.

SB

StreamingBench

MLLM streaming video understanding benchmark. Four tasks: Real-Time Visual Understanding, Omni-Source Understanding, Sequential QA, and Proactive Output.

900 videos total 4,500 QA pairs 45 in viewer
video streaming MCQ + open
Open viewer
OM

OmniMMI

Omni-modal interactive benchmark for streaming video dialog. Six tasks covering action prediction, state grounding, proactive alerting, and speaker identification.

1,400 samples 6 tasks 71 in viewer
video audio multi-turn
Open viewer
PQ

ProactiveVideoQA

Proactive video question-answering with time-aligned streaming replies. Four subsets: WEB (open-domain), EGO (first-person), TV (dialogue), and VAD (anomaly detection).

1,427 samples 4 subsets 8 in viewer
video proactive time-aligned
Open viewer
OV

OVO-Bench

Online video understanding across three temporal modes: backward tracing, real-time perception, and forward active responding. 12 fine-grained tasks.

1,640 samples 12 tasks 24 in viewer
video streaming probe-based
Open viewer