The Data Agent Benchmark

A UC Berkeley + PromptQL Research Study


The best AI model fails 62% of the time on real enterprise data questions.

We partnered with UC Berkeley's EPIC Labs to build the first benchmark that tests AI data agents on real enterprise patterns - messy data, multiple databases, and domain knowledge that lives in people's heads.

54 queries. 12 datasets. 13,500 trials. Five frontier models.

We analyzed over a thousand failed agent traces to understand where AI data agents actually break down. The answer isn't what you'd expect - and it changes how you should think about building (or buying) one.

What's Inside:

  • Full research paper and methodology
  • Failure mode analysis across 1,000+ agent traces
  • The gap between finding data and knowing what to compute
  • Open source benchmark code

Get your copy today