Best file format for Assistants on table data

brandondooley · December 13, 2023, 1:21pm

Hi there,

We are trying to use the Assistants API to query data which is in a table form. We have tried multiple options such as .txt files containing comma separated results, .md files containing Markdown tables. Some questions asked of the assistant come back fine but a decent portion are completely incorrect and are pulling data from the wrong places.

Has anyone had any success using Assistants to query table data?

Thanks.

rheinze08 · December 13, 2023, 2:28pm

Yes, I am using it through the API. Are you using it through UX? Have you tried 3.5 vs 4 as the engine for the Assistant? I am using CSV tabular data, still experimenting with it, but I agree so far it has been hit or miss whether it actually grabs the file or not.

brandondooley · December 13, 2023, 3:48pm

We’re using it both via the UI and also the API (depending on test cases). We’re using 4 for the engine of the Assistant, have not tried 3.5 but presumed it’d perform worse.

rheinze08 · December 13, 2023, 5:17pm

Can you confirm that your assistant (when retrieved through python or CLI) shows the file_id attached? something similar to this image. It should be the case if you have already added it in the UI.

brandondooley · December 13, 2023, 10:43pm

Yeah I can confirm, our assistant does show the file ID when fetched via the CLI

rheinze08 · December 14, 2023, 2:11am

Yeah I am not sure, I also have issues with the assistant recognizing the file. I know there’s the option to add the file to the message itself, but I don’t think we should need to do that. I have tried to be explicit with the file name, but that shouldn’t need to be the case as the front-end user can’t be expected to do that. I’m sure this will be improved soon.

longprep · December 14, 2023, 2:44pm

Hi Brandon,

I went through the same issues a few weeks ago while feeding a json file containing a list of customers. I made a number of tests and came to the conclusion that the model can’t just be used by feeding it a mass of data in the hope I’ll be able to query it in natural language. It feels like the model has human like limitations (lazyness and untrustable memory ). However, the model is great at programming so what I ended up doing is asking it to create a SQL query I can then pass to my DB to retreive the exact information I’m looking for.
So instead of loading my full customer list to the model and query it for the number of records I have (this generates random results), I asked the model to create a SQL query to retrieve that info after giving it the column and table names I’m using. After asking the number of clients, I now get something like [RequiredActionFunctionToolCall(id=‘call_rUqU2RzOrC’, function=Function(arguments=‘{“query_string”:“SELECT COUNT(*) FROM Clients”}’, name=‘executeQuery’), type=‘function’)]. That’s just an example since I can now query basically any available information on specific records or the whole table.

Hope this helps

Pascal

rheinze08 · December 17, 2023, 12:19am

Clever approach but I do not think this will work “in-memory” use cases?

Topic		Replies	Views
Questions about File Search on assistants API assistants , gpt-4o	3	343	July 19, 2024
What's the best file format for recommendation by using assistant API? API assistants-api	8	4282	March 19, 2024
Connecting an assistant to a database for retrieval API	13	9943	April 29, 2024
Best file format for assistant's retrieval mode API api , assistants-api	8	4172	January 12, 2024
How to teach a model relational data? API assistants	10	971	July 17, 2024

Best file format for Assistants on table data

Related topics