The Data Agent Benchmark

A UC Berkeley + Hasura Research Study

The best AI model fails 62% of the time on real enterprise data questions.

We partnered with UC Berkeley's EPIC Labs to build the first benchmark that tests AI data agents on real enterprise patterns - messy data, multiple databases, and domain knowledge that lives in people's heads.

54 queries. 12 datasets. 13,500 trials. Five frontier models.

We analyzed over a thousand failed agent traces to understand where AI data agents actually break down. The answer isn't what you'd expect - and it changes how you should think about building (or buying) one.

What's Inside:

Full research paper and methodology
Failure mode analysis across 1,000+ agent traces
Why shared context - not model capability - is the real bottleneck
Open source benchmark code

Get your copy today