Need pointers from expert for building a robust text to sql conversion on 20 datasets with 2000 plus attributes

rolandgarros160 · March 14, 2024, 1:41am

Hi Experts
There has been lots of discussion around this text to sql but none of then have concluded on any robust ,reliable,svalable solution,can you guys please help out in defining such design,this is definitely going to help out lot of folks out here

Diet · March 14, 2024, 1:46am

gpt 4 0125 or 1106 should be up to the task if you do it in small chunks

what have you tried so far?

rolandgarros160 · March 14, 2024, 10:23pm

I have tried supplying a huge prompt with table ,column description ,column distinct values
Also many few shot examples
So much text that it feels like we are trying to write all possible user query
This is resulting in huge token cost
Next i am not sure what to do
Planning to create views on top of all dataset with meaningful view names and column names
Intent detection and entity relation before feeding the table definition column name etc plus the few shot to rag
Will that help?

Diet · March 14, 2024, 11:08pm

Yeah, identifying tables by intent would be a good start, you can use vectors or keywords for that. I imagine you don’t need to have all tables loaded in context to answer a query.

And views? Depends. You can have views be an optional feature; if you have a high query match on a view you could return it, but I’m wondering if it’s not better to just scrap views altogether, because they’re just redundant information. Unless you use them for access control, but then you don’t need the fundamental tables.

The trick is to include as little information as possible, while still being sufficient and obvious to answer the query.

rolandgarros160 · March 15, 2024, 12:35am

My base tables have garbage column names so was thinking of creating view with more meanigful name,was thinking that will give me a better control later if i want to change any name
So instead of supplying underlying table name instead control the column names with view

How would you vision this kind of ask end to end,considering we need some kind of query checker as well before running it against db
Please suggest

Diet · March 15, 2024, 7:18am

That’s not a bad idea. I assumed you wanted to throw all your stuff at the AI and hope for the best. Cleaning up your stuff does help.

It really depends on your data. Without looking at it I can’t say anything concrete. I’d just generally go with the standard

embed(table context) → search/retrieval + instruction → sql generation

rolandgarros160 · March 15, 2024, 6:57pm

Any particular embedding model you will suggest?
Also any vector db suggestion which can go well here.
Will opensearch,elastic be a good choice?
With many columns etc how would you suggest we do
Is this good flow or even possible
1.View creation with clean names
2.intent detection based on user query-how can we do this efficiently
3. RAG based on table definitions and many 1 shot examples or 1 shot example queries should be separate?
4. Similarity search on user question+intent+sys prompt to get to correct entities which are table and columns
5. Sql generation

Can you please let me knowif this flow seems ok?

rolandgarros160 · March 17, 2024, 6:50pm

Any pointers here will be highly appreciated ,many people are trying to do the same,will be great if everyone know the options

Diet · March 18, 2024, 3:45am

Hi, sorry.

I wasn’t sure how to answer your question - particularly because it seems like you’re asking us to do your engineering work for you.

In general, sure, sorta, what you’re proposing can work. You’ll need to evaluate that stuff on your actual data and use-cases to see how well it works.

If it’s too much work, consider hiring a consultant?

But I’d suggest just getting started and trying to see what works and familiarizing yourself with the ecosystem, I’m sure you can do it! There’s nothing wrong with getting it slightly wrong the first time, it’s just part of the work!

rolandgarros160 · March 19, 2024, 2:30am

I am sorry if it sounded like this,we have a strategy,just wanted to validate it…or have ideas from someone who has done similar use case in the past

Topic		Replies	Views
Text2sql architecture pattern -dynamically create/update good sql database Community chatgpt	7	546	November 6, 2024
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	4480	September 23, 2024
Converting natural language to SQL query API api , open-llm	9	18221	December 18, 2023
Text to SQL generation API	14	12222	December 24, 2023
How do i create a custom model for text to sql queries? API	13	11893	December 18, 2023

Need pointers from expert for building a robust text to sql conversion on 20 datasets with 2000 plus attributes

Related topics