The Data Agent Benchmark

A UC Berkeley + PromptQL Research Study

The best AI model fails 62% of the time on real enterprise data questions.

We partnered with UC Berkeley's EPIC Labs to build the first benchmark that tests AI data agents on real enterprise patterns - messy data, multiple databases, and domain knowledge that lives in people's heads.

54 queries. 12 datasets. 13,500 trials. Five frontier models.

We analyzed over a thousand failed agent traces to understand where AI data agents actually break down. The answer isn't what you'd expect - and it changes how you should think about building (or buying) one.

What's Inside:

Full research paper and methodology
Failure mode analysis across 1,000+ agent traces
The gap between finding data and knowing what to compute
Open source benchmark code

Get your copy today

First name

Last name

Work email

Company

Country

By submitting this form, I agree to the Privacy Policy, including how we may follow up about our products and services. You can unsubscribe at any time.