The Data Agent Benchmark
A UC Berkeley + Hasura Research Study
The best AI model fails 62% of the time on real enterprise data questions.
We partnered with UC Berkeley's EPIC Labs to build the first benchmark that tests AI data agents on real enterprise patterns - messy data, multiple databases, and domain knowledge that lives in people's heads.
54 queries. 12 datasets. 13,500 trials. Five frontier models.
We analyzed over a thousand failed agent traces to understand where AI data agents actually break down. The answer isn't what you'd expect - and it changes how you should think about building (or buying) one.
What's Inside:
- Full research paper and methodology
- Failure mode analysis across 1,000+ agent traces
- Why shared context - not model capability - is the real bottleneck
- Open source benchmark code
Get your copy today