Register
Accurate AI intelligence without data prep — transform your enterprise’s AI story
Are you enjoying this session?
See exactly how PromptQL works for your business.
Book demo
What's discussed in the video
Hello, I'm your host Katie Bavoso, and welcome to today's presentation of Do You Trust Your AI? Achieving High Reliability on Enterprise Data, brought to you by PromptQL and Technology Advice. Today, we're diving into one of the most pressing challenges in enterprise AI adoption, reliability and trust. As organizations increasingly deploy AI to work with critical business data, we're discovering a fundamental problem. Most current AI solutions fail to deliver consistent, accurate results when faced with real-world enterprise scenarios. Our speaker today, Anashrut Gupta, Product Lead at PromptQL, will take us through Hasura Labs ground breaking work with PromptQL. A data access agent designed to achieve what many consider impossible. That's one hundred percent reliability when working with enterprise data. You'll learn why traditional approaches like RAG, function calling, and multi-agent systems often fall short with complex business queries and discover a fundamentally different architecture that separates query planning from execution to deliver repeatable, accurate results at scale. As AI moves from experimental to mission critical in organizations, understanding these reliability patterns becomes essential for any technology leader. Whether you're evaluating AI solutions or building them, today's session will provide actionable insights on achieving trustworthy AI systems that can be confidently deployed in production environments. And now let me introduce you to our speaker who will guide us in achieving one hundred percent reliability with AI on enterprise data. Welcome, Anna Schrute. Take it away. Hey, Katie. Thanks for having me. Hey, folks. I'm Anushrut. I'm the applied AI lead at PromptQL. I lead the applied research that we do here at PromptQL and take it out to the industry and see how we can have real-world applicability of a very reliable AI. So let's dive right into As we get into this discussion of reliable AI, my question to you is, are you dreaming big enough with AI? Are you using AI for like these business critical tasks and business critical decision making, which will have very high impact on your enterprise. Are you using AI for more than like summarizing an email or like giving you some insights from a PDF or some documents? But are you using AI for like actually creating your enterprise scale of data, different types of data and making decisions, informed decisions based on the trusted results of your AI? I'm assuming most of you will say no, right? And so what's holding you back? We have been talking about this with a bunch of our enterprise customers and they say that the biggest issue for us to deploy enterprise AI at scale is trust. I don't trust my AI. And that's because my AI is not reliable. And that's what's holding you back, which is the reliability of your AI. That is basically the discussion that's happening today in the enterprise AI space, where like a big fast food company, they initially deployed their drive-through ordering AI, but then eventually had to take it down because it was unreliable. This is also becoming the latest discussion among engineers and technical leaders. Even the AI engineer world summit, the conference in San Francisco has an entire agent reliability track. So reliability has become this critical talking point for us to be able to actually increase AI adoption in the enterprises and move beyond these cute demos that we see on Twitter or Akathons. And the way to make AI reliable on your enterprise data systems is for us to be able to have answers to questions like these. Like how do we make AI reliable with data that it has not already been trained on? How do we deal with its inherent non-determinism, right? AI at the end of the day uses large language models and large language models are these probabilistic non-deterministic text models which are the next token predictors, right? So how do you solve for these non-deterministic models? How do we deal with the real-life messiness of our enterprise data and systems, right? Will our data ever be AI ready, right? You have data in your structured databases, your unstructured databases, your own APIs, microservices. You are also using a bunch of different SaaS applications internally. All of the data is incoherent. How do you bring all of that data and have an AI system which works reliably on top of that. And then finally, how do we nudge the user behaviour to feel this AI well? Let's say I give you the most powerful AI, but I still need to build trust in the user and help the user understand how do I operate with AI. If there's one thing I want you to take away from all of this is think of AI as you think of a human. Think about how you have built trust in your colleague. Think about how you work with your colleague. How do you work with a data analyst? How do you work with an intern? That's the same analogy you should be thinking about when you're working with AI. So let's look at the current state of reliability. So my question is, what are they doing to make AI more reliable? And when I say they, I'm talking about the biggest players out there, right? How are they building more powerful AI, more reliable AI, right? They're doing a lot of research, rolling out new models. At the end of the day, the answer is just throw more GPUs at it, right? Just the more compute power you throw at your AI system, the more reliable it might get. But recently, Anthropic released this paper called Tracing the Thoughts of a Large Language Model. And I thought it's a very interesting read that no matter how fancy a model gets, this chain of thought process, this reasoning process, always adds more and more non-determinism. They also talked about that AI cannot really handle data very well. Try this experiment. Give any large language model that you like, give it a list of ten thousand numbers and ask it to do the sum of those ten thousand numbers and an average of those ten thousand numbers. I guarantee you it will fit because AI cannot handle structured data very well. It can't handle data which it hasn't seen before very well. So they also mentioned in their paper that they asked their AI to compute the cosine of a large number. It can't easily calculate that. But it still gives you an answer, right? And this famous philosopher, Harry Frankfurt, said that it's called bullshitting in the industry. So they are also on top of this thinking about how to tell apart like faithful reasoning from unfaithful reasoning, faithful results from unfaithful results. So we realize that there are 2 fundamental things that AI needs to get right to make it reliable on your data, right? Number one is AI needs to be predictable, right? It needs to be consistent and I should be able to control it. Only a predictable thing can be controlled, right? Again, the same analogy as a human, right? You trust a colleague who is consistent in their behaviour and someone you can work with. You can guide towards the right direction, even if they don't get the right answer. So they should do the same thing every time and they should not feel in unexpected ways. And second thing is complete transparency and explainability with AI. I should exactly understand what, how, why my AI is doing something that it's doing. And then once I understand this, then I have the ability to exert control and the AI should be able to comply with my control. Right. Let's review the state of the existing techniques out there today, right? Most popular technique, which I'm sure all of you have heard, is called RAG, Retrieval Augmented Generation, and all of the different flavours of RAG, like agentic RAG, graph RAG, this RAG, that RAG, but RAG as a family. second is tool composition function calling which is recently been popularized by entropic with their more context protocol as well but was launched by opening I many many months ago, and this has become the de facto way of connecting your AI to different kinds of systems out there. And then the third is to connect structured databases to your AI is this method of generating query language using natural language to a query language generation. It can be a structured query language SQL or it could be a NoSQL or any other query language, but query language generation. right? So let's look a little deeper and be technical about RAG, right? So what is RAG? In a nutshell, basically, you create a store of searchable data, for example, like a vector database with a bunch of vector embeddings for your, let's say, PDF documents, right? Then you search for the relevant data, you create some semantic search functions, you create some lexical search functions, or like a hybrid search functions, and then you allow your AI to send a query to that function, that function finds the right piece of data from your store and then gives that back to the LLM and add it to the context of the LLM and ask it to generate the answer. So that's a retrieval augmented generation. Let's look at one of these rag-based AI assistants out there, right? It's one of the biggest email providers out there, and they have a very powerful AI system that they have deployed. But I asked a very simple question, right? Like, when was my last Uber trip and how much did I spend? It says that your last Uber trip was Tuesday, April fifteenth, and you spent 14 dollars. Makes sense, which is correct. And that's also the reality, right? That I have a bunch of Uber receipts and I asked this question and this makes sense. Okay. So now I ask a follow up question. Like what time was this trip? And now it says the trip was on Saturday, June, 29, 20, 24 departing at four ten p.m. from San Francisco. What just happened? I asked you this question. It's a follow up question on your on the previous answer, right? But you were giving me a completely different answer. So I asked it, how did you get this answer? You just said my last trip was April, and it responds with each product has its own strengths, deciding which one is best for you depends on what you're trying to do and what style you prefer. Now it's completely gone off the rails, right? And I'm never going to come back to this product and use it again. This fundamentally lost my trust because it is not reliable in answering my question. And that's the problem with the rack-based approaches, right? They are not predictable. They are not explainable. Even though it technically does the same thing every time, like it calls the same semantic search function under the hood every single time, but for the user, the accuracy is completely different, right? And the second thing is, there's no way I can tell what exactly is happening under the hood. How did it calculate a response? How did it come up with an answer? I have no insights into that. So the number one pain point with RAG that we have heard based on all of our enterprise conversations is that it's hard to guarantee the breadth and depth of tasks. It just seems impossible to do that. So let's look at another technique, tool composition, tool composition. Basically you call a tool and get its output and tool can be like an API under the hood or any other system under the hood. You call that tool with a certain parameters and you get its output based on the output. You might want to make another tool call or you might just want to get that tool, the output back to the LM and the LM generates a result based on everything that's happened till now. That's in a nutshell what happened. So I tried one of these again, one of these popular systems. This is an enterprise AI on an enterprise SaaS product. It's a CRM. And I ask it a question like this. Can you calculate our average sales cycle length? It's an enterprise AI system. It should be able to answer that. It says that to calculate the average sales cycle length, we need to create or use an existing report and X, Y, Okay, fair enough. But can you do it for me? And it gave me a recommendation saying find all closed opportunities and their duration from creation to closure. So I clicked on that. And this gave me an even more detailed way of creating a report. I want you to answer it. And you in your marketing material, you said that you can answer it. So what happened? So I refresh the page again and try to try it again. What's the average length of our sales cycle? This time it comes back with an answer saying 71.6 days. And I'm like, OK, that's good. But how do you calculate this? I'm not sure if that number is correct. It says that I took the stage one to stage 4 age length field and calculate the sales cycle. But we have 7 stages. Why did you arbitrarily assume there are 4 stages? So can you look at all the 7 stages and says, OK, I'll go look at all the 7 stages. Please hold on a moment. And that moment lasted forever. It never came back with an answer. Again, refresh the page, try it again. What's the average length of our sales cycle this quarter? This time it says, again, I can't do it. Just nudge it to try again. This time it says 2.2 1 days. You said 71.6 days last time. Like again, because I don't know what exactly you're doing, I can't even nudge you and tell you where to go, which direction to go in, where you might be going. It's OK to go wrong. Even humans go wrong. But how do I nudge you to get to the right answer? Right. So the predictability is extremely low. The perceived accuracy is high for like simple scenarios. Right. Like if I just asked for a simple number, it might have found the number given me the answer. That's fine. But does it do the same thing every single time? Is it repeatable? No, it's not. Is it explainable? I mean, It kinda is because I know the exact tool architecture under the hood. If I am technically savvy and if I ask it, it can kinda tell me what it's doing, but still like why it made certain decisions. What exactly is the way it's orchestrating between multiple tools, right? All of these kinds of things I can't tell. So the number one complaint with tool composition that realizes it's not predictable for complex tools or large number of tools. Right. If it's a simple tool which takes the name of a city gives me the weather in that city. That's the example we keep saying. Right. That's fine. It will work perfectly fine. But when we're talking about an enterprise scenario, they might have dozens of tools, right? Those tools can be pretty complicated. They can take multiple different types of parameters. They can do different types of operations. How do you chain all of those tools reliably? How do you manage all of the tool inputs and outputs in the context of the LLM? How do you not overload the LLM with a bunch of data and context, right? So let's look at the third kind of system, which is on structured databases specifically. Right. So basically you take the user's query in natural language and you translate that into whatever database you're using into that specific query, run that query into on the database, pull out the data and give it back to the user. And we did that. So this is a very popular co-pilot for a SQL-based database. And I asked it a question like this, right? It's a very simple data set that I've connected to it. And I asked the question, how many albums from the metal genre have a positive and happy sounding title? That's a simple question. It says that, OK, I don't know what positive and happy will be in my SQL query, so I can't do that. But I can help you find some albums associated with the metal genre. Okay. It gave me this, this SQL query, which has this name is like metal name contains metal and gave me this list, which is fine, but it does not help me answer my question. Right. If you nudge it a little more, it'll say, okay, I will do some kind of substring search for different permutation combinations of positive, happy, rainbow, sparkles, stuff like that, can add that as substring search in your SQL query, but that's the best it can do, right? It does not have that emotional, that semantic intelligence that you needed to have. So the predictability is pretty high, right? Because I know for my question, what kind of SQL you will generate. If I have a little variation in my question, the task is the same, you'll still generate the same SQL. And I can understand, if I understand SQL, I can understand exactly what you're doing. And I can tell you, hey, don't use this specific operator. Can you add this extra condition here or there? Right? I can control it. So it's pretty explainable, but I can't do arbitrary GNI kind of things. I can't do semantic search. I can't integrate different types of data sources. I'm only limited to that specific SQL database. What if I have MongoDB, which does not take SQL? What if I have different microservices under the hood? If I have my own custom APIs I want to connect to? What if I was connecting to my ticketing system like Zendesk? How do I do all You can't. It can only be understood by the domain expert. If I don't understand SQL, I can't tell you, I can't tell what's wrong with the SQL that was generated. So the biggest complaint we are hearing from our enterprise leaders is that only our analysts can use it. And it's limited to what the database itself. I can't build cross-domain products on top So mix and match, have all these different types of approaches in the mix and match and you can increase the quality of the output by mixing and matching, but it will become more and more complicated. The more kinds of approaches you want to use the more different types of tools you start connecting to it right tool calling that integrate search base retrieval and text to SQL. So you can have like multiple tools like architecture can be similar but one tool is a rag based tool one tool is a SQL based tool, but Still, you need to do all of these tool composition inside the LLMs context, right? All the issues with tool calling still exist. So multi-agents is the answer, right? Like you can have very task-specific agents which can do their own thing really well and then all of these agents talk to each other and then there's an orchestrator agent which basically manages all of these agents. which is basically tool composition on steroids. But again, with agents, they're all talking natural language. It's like me saying, talking to my data analyst, hey, can you give me the report on my ten thousand most active customers? And then the analyst comes back with an answer and starts saying every single row of my data one at a time and just conveys all of that in natural language to me. That's not correct, right? Give me a file, give me a CSV file and I'll use that. So, but agents are LLMs at the end of the day, right? They will talk in natural language. They get caught in these collaboration loops and then forget to stick to a specific plan because there's just so much context being overloaded, especially to the orchestrator agent that if I give you a complicated task to solve, you will go into these complicated loops and orchestration loops and then you lose the track of the original goal. So what can we do to make AI work reliably on our enterprise data? Let's see. So this is the current architecture, right? Very simple, very basic. You have different data systems, like you have actual databases, or you have like different kinds of services that you want to connect to, and you have different kinds of AI applications, right? You have some customer-facing AI, you have some co-pilots, you have some agents, and they're all talking to these different data systems. But if something fails, who do I blame it on? the LLM? Do I blame it on the AI? Do I blame it on the data system under the herd? Who do I blame So why don't we introduce a data agent, right? Someone to take responsibility for the reliability. And now I know I need to fix this data agent for anything that goes wrong, right? All of these different tools talk to this one data agent, which can orchestrate all of the data really reliably and really well. So, and there's a very famous sense saying which says the only thing you we can control the general models is what it generates. You ask it to generate an image, it will generate an image. If you ask it to generate a table, it will generate a table. So we can control what it can generate. How it generates, where it generates, difficult, but what it generates, we can control. So instead of asking an AI system to take your user's input and then generating a result, right? What if we don't do that? Because that's where everything goes wrong, right? Results have hallucinations. Results have inconsistencies. Results can be incorrect, right? Results will be non-repeatable. So instead of asking an AI to generate a result and generation implies that it's not accurate, right? It's like hypothesized. What if we decouple planning and execution? We do not let the LLM generate the result. We only let whatever state of the art LLM you have generate a plan, right? A plan of how to approach a problem. Like if I ask you a question, you don't just give me the answer, right? You first think about it. Okay, so I need to do these 7 steps. Step one, step 2, step 3, step 4. And all of these steps I will now execute in a very deterministic environment as a data analyst, right? I will write some Python code, I will write some SQL code, and I will do some kind of analysis, keep saving intermediate results in like CSV files or data frames, Python data frames. And then finally, once I have all of my analysis done, that's when I directly share that output whoever asked me this question, right? I don't try to remember everything and then replay it. So me as the human, I am just coming up with this plan and creating that plan. The actual execution is happening on my computer, in my Python environment, in my SQL environment. That's what is actually handling the data. I am not handling the data in my own head. So let the LLM do the same thing. Let the LLM come up with that plan. and then execute that plan in a programmatic runtime. Do not let the LLM do anything. Just take that plan, execute it in a programmatic runtime. Now that runtime can have 3 fundamental things. In that runtime, you can either access data from any kind of data source. You can perform any kind of data compute, composition, things like those, like actual analysis work. But if there is a semantic task, which metal album sounds happy, you might need to take these album names and give it to the LLM and ask it, hey, can you tell me if this sounds happy to you or not? So you should be able to call another LLM inside this programmatic runtime if required, again, in a structured way. If there are there are ten thousand albums, if you give that entire list to another LLM, it might not be it might not remember all of those ten thousand So maybe I should call an LM one album at a time or like batch it like 5 albums at a time. This kind of optimization and LM calls, all of that should happen programmatically. You want all those control statements there, right? If conditions for condition. And that generates the result. LLM does not generate the result. There is no AI generated result. It's all deterministic programmatic result generation. And that's what you give back to the user. And that's how you get that reliability that you're seeking. And now you have complete control over how this programmatic runtime is set up. You can nudge the LLM to generate the code better. Right. And that's what allows you to increase the predictability and explainability of an AI system. So let me show you PromptQL in action and let me show you how it works and then we can dive a little deeper into what we have built and how we have built it. This is an assistant created using PromptQL for like a typical enterprise SaaS company's internal use. So it is connected to a bunch of different data that an enterprise SaaS company might have What are the different users they have? What are the different organizations that we serve? Each organization can have multiple users of our product, right? What are the different projects they have running with us? What is our billing that we have generated? Have they raised any support tickets? It's also connected to their ticketing system, which is Zendesk, right? And now I want to ask a question like this. Give me a detailed report on the user's projects, billing, usage, and support history of our 5 largest customers by total revenue. To find this unique customers, don't use the organization ID data since it's messed up. Let's say my data is not clean. Can you find unique or customers based on the email domains of individual users? If I ask a question like this, for a typical AI system, I need to have these specific tools under the hood which can answer this kind of question. But with Pro PromptQL, what happens is it comes up with a query plan of how to approach your problem and looks at the underlying data landscape and says, OK, first I need to find unique customers by grouping users by their email domains. Then for each domain, I'll calculate the total revenue from all the invoice items. OK, makes sense. I never told it that the revenue is coming from invoice items, but smart. Makes sense. Get the top 5 domain by revenue and then for the top 5, gather all of these details, store them in artifacts, share it with Now it implements that plan. in that DSL that we spoke about. And now it's executing that DSL right now and creating these bunch of artifacts which are coming directly from the data source. There is no LLM being involved here, right? So it says, okay, here is the comprehensive analysis. These are your top 5 domains by revenue. Williams.com, PerrisBrown.com, total revenue, and the number of users each of these organizations have. Now it's giving me detailed information about each of these organizations. That's great. And now I have a complete detailed report and analysis of every single customer that I wanted on the fly. None of this data that you see on the right has been AI generated. LLM has not even touched it. LLM only came up with this query plan of how to approach the problem, implemented that query plan, executed that query plan. And all of that execution happened deterministically, all of these artifacts the LM has never even seen that's why I can't even answer very detailed question on every single I haven't given it this eight hundred and twenty four rows of billing history from Williams.com. But this also allows me to nudge my AI into a direction I want. Okay, this is great. But if I wanted to edit step three and say, don't do step 3, instead of that, can you do this? All I have to do is edit this query plan and be like, don't do this. find top 5 based on something else. Let me show you a different example of editing. But now it's also saying AI is also self aware here. It's flagging something here. It's saying that here I'm my evaluation system is reporting fair reliability for the response. The reason is Because the revenue calculations include all positive invoice amounts, including pending invoices, this may not perfectly reflect the actual realized revenue if some invoices aren't paid or if there are credits or refunds that should be considered. See, smart to figure out. Okay. I don't think my answer is perfect, because maybe the question that you asked was vague, or that the data under the hood is vague. So I'm going to be completely transparent and tell you, these are the assumptions that I made, and hence I'm evaluating my reliability as fair. If you clarify those assumptions, then I will be much more confident in my own reporting. And the project counts only include current active projects. Historic projects that were deleted are not included in these numbers. So that's why I'm saying it's fair. This is what you want with an AI, right? Complete transparency, complete control, complete determinism. Okay, let's do some follow-up questions. Let's ask, give me the billing projects and, sorry, billing number of projects and number of support tickets for each of the users of Williams.com. So it said that Williams.com has 17 users, so I want the details of those 17 users. says, okay, get all the projects and their owners from Williams.com domain, get all the support tickets and the requesters, get all the billing information, aggregate the information, create this user level summary artifact. Makes sense. Now that it's done, I can see the user level details of Williams.com. We have Mercer Schneider as 6 projects, 5 support tickets, two hundred and twenty thousand in billing and three hundred and ninety one active models. That's some internal terminology. OK, so this is good. I want to know what is happening with Mercer support tickets, right? I want to see how they feel about our product. So I can be like for the first user, can you fetch the details of their tickets including the comments and then summarize each ticket then use those summaries to extract this user's sentiment towards our product what is going well what is not going well Now, this is that kind of structured plus unstructured kind of question, right? I need you to, in a structured way, fetch all of the ticket details. Look at the entire thread and then use your AI, like use your semantic abilities to tell me the sentiment of this user towards a product. Right. So what PromptQL is doing is it says, okay, I'll get all of these details. And then I'm going to delegate this summarization task to another AI. And then I'm going to analyze these summaries by again calling another AI to understand the sentiment so that I do not overload my own context with all of these tickets and the comments on these tickets. Right. I'm going to delegate this task. And that's exactly what it's doing under the hood. So it's basically pulling all of that data, creating these ticket threads. All of that is happening programmatically. And then it's calling other LLMs under the hood to do these kinds of semantic tasks. Okay. So here I have it, the summaries of each of the tickets by Marissa. And then I also have like a sentiment analysis, positive aspects, pain points, overall sentiment is neutral towards the product. Okay. Makes sense. One final thing, that you can also use AI to take action, right? You can do stuff with your AI. You can be like, okay, for this user's highest build project, issue, 2 percent of that billing amount as refund. So let's say you have connected some API, let's say like Stripe API under the hood, and you want to call that API. Do it, trigger any kind of workflow, take action, right? Right from your AI. You can do that. Say for this user size build project, issue 2 percent of the billing amount. Say, okay, I'll find that project. I'll call this function that you've connected under the hood. All of that is once again happening programmatically. because all of these parameters that I have to pass, LLM should not be thinking of these parameters. They should directly be pulled from the data source and passed to the API. And it says the data access layer, the data agent under the hood, it deterministically says that, hey, the AI is trying to call some function. These are the parameters that's passing. Are you okay with that? Right. I do not This is where you need human in the loop, and you don't want to let your AI just completely run on the fly. You can if you want to, but ideally you So I laid a proof, it calls the Stripe API into the herd, issues the refund, comes back with a response saying, okay, the refund has been issued by 2 percent of their highest bid project, and that's the invoice ID. So that is what a reliable AI looks like. This is what a predictable, completely explainable AI looks like. And now imagine the kind of possibilities that this will unlock for your business. So that was prompt you, right? The DSL to decouple planning and execution. And that finally allows us to build high accuracy systems on any kind of task. Right. This is like a general purpose AI system with a deterministic execution. And it's completely transparent, completely explainable. You understand what exactly it did. You can control it. You can change what it's doing. You can guide it to the right direction. It also tells you when it thinks it's making assumptions. Right. So basically PromptQL can capture any kind of data task data retrieval. be it from a SQL database, NoSQL database, APIs as application, vector database, any data source you can think of. It can do any scale of computational tasks. It just because it has like a bionic runtime under the hood. So it can do any kind of data composition, data analysis. You can also connect it to much more complicated systems. You have your traditional machine learning algorithms running under the hood. Connect PromptQL to it. You have word from alpha for complicated regression. Connect word from alpha to PromptQL. You can do any kind of tasks that you want. The generative tasks have a completely decoupled context, right? Generate a summary of the support tickets that we saw. Find out the sentiment of the user towards our product. That has completely decoupled from the actual flow of the conversation. That specific task is delegated to a completely separate LLM so that you don't overload your own head with thousands of support tickets, let's And then of course, PromptQL can invoke PromptQL under the hood. So it can create these subtasks under the hood. And then that allows us to do much, much deeper analysis and workflows, which is a new capability that we are going to launch, which I'll talk about at the end of the webinar. So what does that enable? first, tangibly. Tangibly, PromptQL easily achieves state-of-the-art on any benchmark out there, out of the box. With no fine-tuning, no specific data source, data sources, data cleanup, X, Y, Z, no. Just PromptQL connect to the data systems, click on run the benchmark. PromptQL beats every single state-of-the-art benchmark out there. But you can still exercise precise control over PromptQL. So now if I allow you to steer PromptQL in the right direction, you can not just beat these benchmarks, but you can smash these state of the art benchmarks and not just for these synthetic benchmarks for any kind of evaluation set that you might have for your enterprise data and systems so that your stakeholders trust your AI. And this is what allows us to also deliver an AI product which has accuracy SLA. Have you ever seen an AI product with an accuracy SLA? So we say you give us your eval set and we will promise that we will have at minimum 95 percent of accuracy on your eval set which you are free to change whenever you want. We usually deliver a hundred percent but for legal reasons 95 percent. So now all of this knowledge that you have of how to nudge AI into the right direction, your tribal knowledge that you have domain specific terminology, all of that is in your heads. So will you keep steering your AI into that right direction? Will you keep telling it, hey, you shouldn't have done that? You should do that. No matter how predictable the AI is under the hood, right? How do you improve that? Sometimes we don't even have the knowledge today when we're building the AI system. As you use your AI system more and more you unlock new learnings, right? And now you want to teach your AI about those things. How do you enable that? You enable that by an agentic semantic layer. The semantic layer that sits between your AI and your data. This semantic layer basically captures the entire domain specific information about your data systems under the hood. The kind of data they have, the kind of relationships it has, also any other kind of tribal knowledge that you might have, it doesn't directly relate to a specific data source. But in general, like for example, let's say your quarter starts in February instead of January, right? How does my AI know that? Like I've had to tell it many times when I was using it that, hey, do your analysis starting February, not January. So an AI should be learning that. Where should I add those learnings? In the semantic layer. So a semantic layer that continuously captures your business context to improve this planning accuracy. The plan that we created keeps getting better and better. So you require less and less nudging to your AI. So let me show you what this looks like, right? Like I've asked this question, which employees are working in departments with more than a thousand dollars. I purposefully made the data very dirty. Like I have tables like more plugs or brain which have no meaning to the AI. So it's asking me, Hey, I don't see this data. Now I'm like, can you sample a few rows from each table and figure out the, which table contains what? And then, then answer my question. It's like, okay, now that I've sampled the data, I realize Zorp contains employee information. Plug contains, um, plug contains information about my departments. Now, based on that understanding, I can start doing the analysis that you were asking me to do. And then run the analysis. And then I give it one more, not saying that the budget data is in cents, not in dollars. So can you answer my question based on that? So let's say all of this information, you had to keep giving the, I had to nudge it in the right direction, right? And now once it's done with its analysis, now I can go to this autograph thing. This is manually triggered. You can also completely automate this. But this is for demo purposes, manual. So I just say, suggest metadata improvements based on the recent threats. Now what happens here is the AI will be like, OK, whatever conversations we had previously, let me look at the last one hour of conversations and understand what was happening. OK, I see that there are 3 models with different types of data. So based on these interactions, now I can generate some meaningful descriptions for these models and then I can store that in my semantic layer. And that's what it does. And all I do is click on apply and it will implement all of those changes into my semantic layer and create a new build of the semantic layer. And all of these are immutable builds, right? So just like how you version your code, you version your semantic layer. See, generated by autograph, 0 seconds ago. Now, if I ask the same question, right, which employees are working in departments with more than ten thousand dollars. This time it does not have to ask you that, hey, the data is not there. No, I understand the data now because I autograph updated my semantic layer. So this is what you need to build these kinds of reliable AI systems under the hood, which is a decoupled planning and execution and a genetic semantic layer. And all of this allows us to build something extremely powerful that we'll launch very, very soon, which is deep research on enterprise data. You give a very vague question. You gave a question like, how do I increase my sales in North America by 5 percent by the end of the quarter? Very vague question. It's a tough question to answer based on the data that we are seeing right now. You need your AI to come up with like ten different hypotheses, go down into your data, in your dirty data, do a bunch of analysis, validate some hypotheses, invalidate some hypotheses, come back with a detailed strategy and report back to you. Right. Imagine if you could do that with your own enterprise data. That's what we're building next. So that was PromptQL. Any questions? Happy to take them here. Anushrut, thank you so much for that presentation. Now, I'm sure you receive a lot of great questions, so let's dive into a few of the most common ones that you tend to get. So how do you guarantee one hundred percent accuracy? That's a perfectly valid question, and I love the skepticism with this question. Yeah, but the way to think about accuracy with AI is also the way you think about accuracy with your colleagues and employees, right? How do you guarantee accuracy with humans? You can't, right? Because at the end of the day, everything that we are dealing with is non-deterministic. We think that is free will. But so same with AI, right? That when we say a hundred percent accuracy, what we mean by that is The fact that you can work with your AI to reach the right answer. The AI is performing very predictably. It's not failing in unpredictable ways. It's giving you its clear insights on how it's taking its approach, how it's achieving that answer that it has reached. of what are the steps it took, and it explains it very, very clearly. And then you as the user and complete control of how to guide your AI into the right direction, if it's just exactly how you do with an employer or colleague. Very well put. Let's move on to another one. What data sources can PromptQL connect to? So PromptQL can connect to any kind of data source that you might have, right? So enterprises have structured data, which will be in SQL or NoSQL kind of databases. They'll have unstructured data or documents, vectorized data. So it can connect to unstructured data sources as well. Then enterprises also use a bunch of different SaaS applications, like Salesforce and Zendesk, so it can connect to practically any SaaS application. Or your own APIs and microservices that you might have under the hood. These can be REST APIs, JRPG, GraphQL, whatever you have So PromptQL is a data source agnostic design. It's supposed to be the single layer across all of your data sources. What are the deployment options, Anashrut And enterprise can have their own preferences, right? So starting from the LLM that we use. So PromptQL is an LLM agnostic design. So you bring your own LLMs, and this can be either a direct API of one of the LLM providers like Anthropic or OpenAI, or you can have your own custom deployments either on Bedrock or Azure. or GCP, or you can also just bring your own in-house LLMs which you have self-deployed on your own hardware. So that's one part of it. The other part is where your data sits. So since PromptQL is a federated design, your data sits in the sources where you have it. And The way we deploy the data plane and the control plane is in multiple ways. 1, you can use the Hasura Cloud. The other way to do that is we can have a dedicated VPC for you, and then we can peer your data sources through VPC peering, or you can host the data plane completely on your own infrastructure. So PromptQL gives you complete flexibility of deployments. How is PromptQL different from other approaches? So all the other approaches that we saw in this webinar were what we call in-context approaches. So what in-context means is in this approach, you are letting the LLM handle all of this data in context of the LLM. And that adds all of the non-determinism to the final output that was generated. In any of these approaches that you look at, the LLM is generating a response and that's where the problem lies, right? A generated response is by definition a hallucinated response and you're just hoping that hallucination is correct. Whereas with PromptQL, we separate this data handling from the context of the LLM. So the LLM is only responsible for creating the plan of how to approach the problem. And that plan is executed deterministically, completely outside of the elements context. And that's what allows us to achieve such high reliability and repeatability across any scale of data. Anushrut, something I'm curious about, where are you seeing this technology cause real world impact? I'd love to hear about some use cases. That's a great question. So with PromptQL, we have seen use cases in deployments across industries and across verticals. So from one of the Fortune Ten companies, having acquired a bunch of different companies over the last many years, they have a fragmented sales department. Now they want to consolidate all of their sales department into one single layer so that they have complete insights into all the different deal flows that are happening and they can do their sales forecasting much more accurately. Now this, even a five percent better sales forecast is multiple hundred millions of impact. Similarly, one of our healthcare companies, they are deploying PromptQL at the customer onboarding stage. So they're generating these dynamic apps and UIs so that they can onboard the patients much faster. while still being compliant with all of the different regulations in that domain. Another one of these use cases we have seen is with one of the biggest e-commerce companies. They have their massive inventory and supply chain that they have to manage and all of these supply chain managers and The business operations people want very deep insights on all of the data that they have and waiting on the data analysts takes a lot of time and there's a lot of work that gets stuck. So how do we democratize that data access to anyone who needs access to instant data? So asking free-form national language questions very deep across your multiple enterprise data sources, that's helping speed up these decision-making processes which have massive impact on the business, on supply chain, on operations and stuff like that. So we are seeing Prompto's applications in many places. And this ranges from one is insights and business intelligence. The other is generating apps, workflows, dashboards, data pipelines, stuff like that. Any task that you have to do with data, doing all of that with natural language very reliably is what Prompt is going to be used for. Anashut, as we know with AI, it often does get better over time as we continue to use it. Can we apply that same principle to prompt QL? Does it get better with us and help us get better by default over time? Yeah, you're spot on with that observation. So since we don't have our own large language model, so we are not training any models per se as the interactions keep happening. But what we are doing is improving the agentic semantic layer. The way I should say it, it's improving the semantic layer agnatically. So what that means is the more you use PromptQL, the more back and forth you do with PromptQL, the more things you share with PromptQL. These are the KPIs that matter to me. These are my business goals. This is how you should be thinking about a problem. You should do step one, step 2, step 3. All of this tribal knowledge that you as the user or an expert has, a business expert has, that needs to be captured somewhere by the AI so that the next time you try to achieve the same goal, the AI is very instantly able to understand the context and operate exactly as you desire. That's how a good employee also trains themselves. They are very quick learners. So that's what the semantic layer helps us with. So as you use PromptQL more and more, semantic layer keeps getting enriched with all of this tribal knowledge that you have. And that allows us to power your future PromptQL experiences much, much faster and better as you keep using PromptQL. One more question today. What kinds of models does PromptQL use? As I mentioned before, PromptQL is a little agnostic, so you can bring your OpenAI model, GPT models, you can bring your Anthropic Cloud models, Google Gemini models, or Lama, Mistral, DeepSea, whatever you have it. We recommend a certain set of models which are really good at the planning that PromptQL requires. Of course, the latest and greatest from each of these LLM providers works the best, but as we are LLM agnostic, you bring your own LLM, you bring your own LLM deployment, be it on a cloud provider, be it on your own hardware, and PromptQL can work with anything. Thank you so much for your time today, Anna Schrute. Would you like to tell us one more time where we can learn more about PromptQL? Yeah, you can read everything about PromptQL at promptql.io. There's the design specifications of PromptQL, all the documentation of PromptQL, all our benchmarks and studies that we have published, comparing PromptQL to different approaches. So just visit promptql.io. And for any specific product details and technical details, you can go to promptql.io or you can scan the QR code on the screen. Anashrut, thank you so much for your time today and thank you at home for joining us. If you have any questions lingering after this session, please don't hesitate to reach out. Once again, thank you to Anashrut and thank you from Technology Advice. We hope you found today's session informative and valuable. I'm Katie Bavoso. Have a great day.



