Best file format for Assistants on table data

Hi there,

We are trying to use the Assistants API to query data which is in a table form. We have tried multiple options such as .txt files containing comma separated results, .md files containing Markdown tables. Some questions asked of the assistant come back fine but a decent portion are completely incorrect and are pulling data from the wrong places.

Has anyone had any success using Assistants to query table data?

Thanks.

1 Like

Yes, I am using it through the API. Are you using it through UX? Have you tried 3.5 vs 4 as the engine for the Assistant? I am using CSV tabular data, still experimenting with it, but I agree so far it has been hit or miss whether it actually grabs the file or not.

We’re using it both via the UI and also the API (depending on test cases). We’re using 4 for the engine of the Assistant, have not tried 3.5 but presumed it’d perform worse.

1 Like

Can you confirm that your assistant (when retrieved through python or CLI) shows the file_id attached? something similar to this image. It should be the case if you have already added it in the UI.
image

Yeah I can confirm, our assistant does show the file ID when fetched via the CLI

1 Like

Yeah I am not sure, I also have issues with the assistant recognizing the file. I know there’s the option to add the file to the message itself, but I don’t think we should need to do that. I have tried to be explicit with the file name, but that shouldn’t need to be the case as the front-end user can’t be expected to do that. I’m sure this will be improved soon.

1 Like

Hi Brandon,

I went through the same issues a few weeks ago while feeding a json file containing a list of customers. I made a number of tests and came to the conclusion that the model can’t just be used by feeding it a mass of data in the hope I’ll be able to query it in natural language. It feels like the model has human like limitations (lazyness and untrustable memory :slight_smile: ). However, the model is great at programming so what I ended up doing is asking it to create a SQL query I can then pass to my DB to retreive the exact information I’m looking for.
So instead of loading my full customer list to the model and query it for the number of records I have (this generates random results), I asked the model to create a SQL query to retrieve that info after giving it the column and table names I’m using. After asking the number of clients, I now get something like [RequiredActionFunctionToolCall(id=‘call_rUqU2RzOrC’, function=Function(arguments=‘{“query_string”:“SELECT COUNT(*) FROM Clients”}’, name=‘executeQuery’), type=‘function’)]. That’s just an example since I can now query basically any available information on specific records or the whole table.

Hope this helps

Pascal

2 Likes

Clever approach but I do not think this will work “in-memory” use cases?