OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
Can AI agents handle real professional work? OccuBench evaluates agents across 100 tasks in 65 specialized domains using language world models, revealing critical gaps in professional task performance.
Posted: 2026-04-17