Connecting an assistant to a database for retrieval

With the new release of assistants, I am excited to connect it to a database and have it answer questions. It the docs i see it can be done with files but how can an assistant be connected to a database?

3 Likes

I think its going to be easiest with Actions, but we need to wait for the GPT Editor in ChatGPT

Will Actions allow GPT to answer nuanced questions about the data in the database? From what I can tell it doesnā€™t transform the data into vectors so it wouldnā€™t be that good? Iā€™m imagining something that vectorizes the data and embeds in a custom GPTā€¦ if that makes sense. Basically the functionality thats described here for files in context of the Retrieval tool

1 Like

Also worth noting- not sure if anyones tried to use assistants with files but I get this uploading the file BadRequestError: 400 Failed to index file: Unsupported file file-Jf8okhY5J2Te2SuiBIosmozU type: application/csv

if you have a small DB and doesnt update very frecuently, yes you can add the the db into the assistants files, in a readable format.

The other option, as i do, i give the AI my full DB structure, and in the functions i ask the AI to create the DB Query to run in my system, and return the result to the Assistants.

This is better to have updated results.

Just make sure to add some security to check the query before run it in your site (for example make sure it only has SELECT, has propper filters, etc)

3 Likes

I mean, it will be able to write SQL queries and then use the SQL response to answer the question, which is much more accurate and efficient than a vector search if your data is tabular and number/label-oriented.

Vector similarity search works best for text and natural language, so tasks like finding relevant snippets in a long document where it can match the ā€˜vibeā€™ of a query to a chunk of text that may or may not share the exact wording. However, for tabular data (e.g the name, population for every city in the U.S.), we probably donā€™t care about which city names match the general ā€˜vibeā€™ of ā€œPhiladelphiaā€, we just want to go to that row and see what the number is.

1 Like

Exactly my question for same reasons. Subscribing to thread.

I had the same plan yesterday. Tried to do it via actions and use a https_request json call. However, https_requests are not supported (yet?). What would be your best guess to get around this limitation?

Is there a documentation regarding https_requests? I planned on using axios in my function to retrieve data from my database using rest api.

Got access to actions today and https seems to be working for me, whatā€™s your api schema like?

@bleugreen How are you providing the database structure to assistant?
I have the same usecase and what I am doing is creating a file where I am storing the db schema with business context for each table and feeding to assistant. Is it the same ?
I have around 30 tables so may not fit in context limit

Hmmm ā€¦ not sure Iā€™d let the AI do that, you canā€™t predict what horrible queries might run which would at the very least have the potential for bringing your system to its knees, let alone get hold of sensitive data like User account information that it shouldnā€™t have access to! (unless you want to write a set of specific views which are permissioned separately?!)

Alternatively create a specific local function on your app server that takes specific arguments described in your function definition you share with the LLM and and tailor the queries for these specific use cases. You can then target indexes etc. This kind of wrapping is more work, but the system will be faster, more robust and more secure. You also then have a layer you can refactor if your schema changes without affecting the LLM code.

@clubmaple, what is a readable format for the api? I uploaded a text file with a script I generated using sql server that has the dml statements to create my tables. It keeps saying it canā€™t read my file.

I will recommend plain text, docs, pdf, or csv

i also explained it here: