Case-by-case browsers for multimodal / video-understanding benchmarks. Each viewer ships with a representative subset so it stays hostable on GitHub Pages; links to the official source are in the footer of every viewer.
MLLM streaming video understanding benchmark. Four tasks: Real-Time Visual Understanding, Omni-Source Understanding, Sequential QA, and Proactive Output.
Omni-modal interactive benchmark for streaming video dialog. Six tasks covering action prediction, state grounding, proactive alerting, and speaker identification.
Proactive video question-answering with time-aligned streaming replies. Four subsets: WEB (open-domain), EGO (first-person), TV (dialogue), and VAD (anomaly detection).
Online video understanding across three temporal modes: backward tracing, real-time perception, and forward active responding. 12 fine-grained tasks.