How to Get Consistent SQL Queries from GPT-40 for the Same Input?

I’m currently working on a Text2SQL project where I utilize OpenAI GPT-40 to generate SQL queries based on user input. In the prompt I send to OpenAI, I provide a detailed explanation of the database schema, including tables, columns, and relationships. I also include a set of specific instructions for generating the SQL queries. Overall, I’m quite satisfied with the results I’m getting.

However, one issue I’ve noticed is that sometimes the LLM generates different SQL statements for the same query, despite there being no changes to the prompt and the temperature being set to 0.

Do you have any insights or suggestions on how I can ensure that the LLM consistently returns the same SQL query for the same input?

Thanks in advance!

Output of Open AI models is not deterministic no matter what the temperature.

It would be great if we could be offered that property in a future model.

An alternative is to fix the structure of your SQL in a function and allow the LLM only the flexibility of changing some of the parameters.

1 Like

Hi @ramanan.iyer !

Exactly as @merefield suggested - the best approach is to pre-define a bunch of functions, i.e.

get_invoice_ids(...)
get_revenue_aggregate(...)
...

And then LLM simply translates the query to the right function. As Robert suggested, the flexibility comes from generating the right “arguments” in the function. For example, for revenue aggregates, it should be able to choose if it’s quarterly or monthly or yearly.

1 Like

@ramanan.iyer

I am not an expert at this , but I guess there should be no need to use the OPENAI API if the PROMPT is exactly the same. You can cache the prompt and the response and save some money and also get consistent response.

I usually do this with Langchain : :slight_smile:

    from langchain.globals import set_llm_cache
    from langchain_community.cache import SQLiteCache
    set_llm_cache(SQLiteCache(database_path="llm_cache.db"))

Hope it helps :slight_smile:

Hi @merefield
Thanks for your feedback.
I have an SQLite Database with key information about Customer Contracts. My understanding is that I write functions which contain the base SQL statements and the LLM would only suggest which function/parameter values to use based on the User query.
In this case would I not be required to write a lot of functions to cover all the key information that a User may query for? Considering this I decided not to use functions at the start but ask the LLM to generate the SQL automatically.

Hi @platypus
Thanks for your feedback. I have posted my doubts in reply to the answer from @merefield

1 Like

Hi @hk1255
Thanks for your suggestion, will definitely check this out. Have used Langchain and Langgraph for other Projects, for this Project just decided to use the OpenAI APIs directly.
Your suggestion could be a valid reason to use Langchain here.

Imho that is questionable practice from an engineering perspective. You don’t want arbitrary queries running on your database unless you are trying to implement a highly flexible BI reporting system (are you?)

What if your LLM decides it needs to return all rows, or create many inefficient joins that causes your database to slow to a crawl?

You also have to work within constraints of what is possible with an LLM.

Both of these aspects lead me to suggest the function based approach.

Work out what your customer actually needs and design to those constraints rather than offering the entire kitchen sink?

If the client requires complete flexibility I would manage client expectations on system performance and reliability.

Cool. Though @merefield is on point there. Having a fully dynamic text-to-SQL is a minefield - speaking from personal experience here! On top of not having control over what/how the result set is returned and contextualized, which is going to impact how your end user views and interprets the results, you are opening yourself up across a very large security vector (injections, exposing data that shouldn’t be exposed, etc). Potentially.

Regarding caching - yes, you can use LangChain prompt/response cache. Although when you peel off LangChain code, it’s really just a dict of the form {"prompt": "response"} so you can implement that yourself as well. Though I am not sure that solves anything here. For example - what if your tables are constantly being updated - a response cache will not stay “fresh”, and LC doesn’t handle that, so you have that problem. Also, if you have multiple users that are asking the same thing but in different ways, response cache won’t help you either.

1 Like

Thank you very much for your feedback @merefield and @platypus
Currently, this project is being developed solely for internal use and will be accessed by a select few. I completely agree that incorrect SQL queries could negatively impact performance. For now, from a security perspective, I’ve ensured that only SELECT statements are generated.

As for caching, yes, we can implement our own caching mechanism, focusing on storing the user queries and the corresponding SQL statements, not the results. However, this is something we plan to explore later, as right now, we are primarily testing the concept’s viability.

I also think we’ll need to move towards function calling, but we’ll do so after analyzing the types of questions our users are asking. This will guide us in deciding which functions to implement.

Thank you again for all your guidance.

2 Likes