Best approach to Q&A a SQL table (post dev conf)?

ctrlbrk · November 10, 2023, 9:03pm

Have a forum with >1M posts. I want a q&a chatbot that can answer questions based on an understanding of what’s in the SQL table ( ‘posts’), which can be organized by thread and grouped by forum category.

My thinking prior to this week dev changes was to make multiple passes to categorize then summarize each thread, then add API functions to do keyword searches generated by OpenAI, hoping it would craft really great function calls.

But now I’m not sure if I should transition to using Assistants instead, and upload the data another way?

In my mind, I need a middle interpreter to infer relationships between a keyword match rule in my DB and a vector db created by OpenAI, allowing for natural language search instead of keywords.

This is a big project so I’m really hopeful I’ll earn a few minutes of your time, so I can build this to help my community.

SomebodySysop · November 11, 2023, 5:06am

I am assuming that the SQL table “posts” consists of text (like user posts), as opposed to items (like title, description, cost, location, etc…). And, you want to be able to semantically search the text in your SQL table.

A technique I used in a similar case (my “posts” are in an Apache Solr index) was to “synchronize” the Solr index in a vector db. That is, I basically embedded the entire Solr index in a vector store, and created code to keep that vector store updated with any changes to in the local index. Essentially, I maintain an embedded copy of my Solr database. I then run my prompts as cosine similarity searches against the vector store and use the LLM to not only determine the best answer, but to also return the Solr index ID’s so that I can pull up the original objects if I want to. This would include the metadata that I have defined during the embeddings.

How you do that using the new Assistants API or even GPTs is still a mystery to me since, in your case, you would need to upload your “posts” table as a series of no more than 20 files, each with a maximum limit of 512 megabytes (and 2 million tokens). It is also unclear to me how, in this scenario, you would be able to have either return meta data such as post author, title, date posted, etc…

This strategy would work if you were trying to pull up items, like in a store, records with single item fields. I don’t think it works will trying to do keyword searches against text fields. And, it’s the very problem vector stores were created to solve in the first place.

I’d too would be really interested to hear any thoughts on this. How you create a “keyword match rule” to use against hundreds if not thousands of records of free-flowing text would be of great interest to me.

Topic		Replies	Views
Matching on a database of records API	1	87	July 26, 2024
Connecting an assistant to a database for retrieval API	13	8722	April 29, 2024
Assistant API and using fine-tuning to improve function calls API	3	827	November 25, 2023
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	3362	September 23, 2024
Creating chat assistant for client, not sure what way is the best way to go about it API	1	483	March 14, 2024

Best approach to Q&A a SQL table (post dev conf)?

Related topics