Assistant function response retrive error

Hello, I am using assistant to analise objects from my dataset. I am using function to get dataset from my DB. But there is a problem with retrieving this data . For example function returns dataset of 1000 objects and even for simple question like “how many objects are in dataset?” assistant returns wrong answers. Count of retrieved object depends from model. But the most close result was answer “300” using model gpt-4-turbo-preview. Are there some limitations on retrieving? Code interpreter and retrieving params are both turn on.

1 Like

I have literally just come across this same issue - if I limit the size of the response (e.g. return only the ID of each object) then I get more accurate results. So I am thinking that there is a maximum size of response I can send a response to a tool call - but can’t find what this might be.

Anyone got any insight?

your results back are limited by model token size which is your message, data, instructions and response per thread.

Having better assistant instructions on how to handle the data is key to achieving better results. as you mentioned id of each object is more specific thing for the ai to track for example. I do think there is a max number of things it can track at once I swear I hit a cap once. I wanting to say 10k with the preview model. GPT4 will always understand things better, so when dealing with one model over another with the same instructions it wont always result the same. with gpt3 when crafting your prompts you have to spend more time thinking on how to make what you are doing more clear and even than because of how ai works there are still chances you will get patterns back that will not fit your logic.

gpt4 is pricey for running depending on what you are trying to do. where gpt3 is amazing costs but requires a lot more work to make it work the way you want.

I am still looking where I swear i saw that cap for tracking. that or maybe it was an article I read lol.

I’m not sure how to improve my instructions for dealing with larger amounts of data - albeit not that large!

I have a function called getOrders() which goes to my database and simply returns all orders. The dataset I’m testing with has about 70 orders, and each record is around 2k in size when in JSON format.

When I pass this back to the assistant as a “Tool Output” (so, an array of ~70 order records) and ask the assistant to count the number of orders, it will say there are 22 orders.

If I limit the amount of data returned per order record to just the order ID and status, then it gives me the correct number.

I have modified my function to have different parameters to optionally filter the data, or filter the returned data, which does bring some success - but it feels like a limit of around 20 records, that are quite small in size, is quite restrictive.

Any insight would be greatly received! I’m using gpt4-turbo-preview as my model.

if you are using a getOrders() are you not able to count the number in realtime on the pull? example when I pull data from sql or neo4j I get a count back from the DB on how many records and I would just use that number if that is something you want the ai to know faster and would save you asking and spending money on a question you can already know with out ai. without knowing specifics on your whole process to understand what the goal of the complete data is there maybe more logical ways to refine the data before it gets to the ai.

I guess part of the issue is that I don’t have a specific use case in mind. I have an application (lets call it an order management app) that has data on customers, orders, suppliers, etc…, etc…

The app obviously has different functionality built in - e.g. certain reports etc… that are commonly used. This functionality addresses very specific use cases, just like in any app.

What I am trying to do, is to build an assistant that can “fill in the gaps” between this functionality - e.g. I want to be able to ask a question that doesn’t necessarily warrant a specific report or spending any time on development for, but the assistant can get the data and drop it into a file for me.

This would be useful to address some of the ad-hoc requests I get that require me to query the database directly (or use our API) to get the data and create this one-off report. So perhaps doing a count of all orders is not a good example of what I am trying to do, because that is available within the app, but I am just using it as a way of highlighting the issue I came across. And there will be some queries/reports I would want the assistant to work on that may require pulling 100 records or so - and at the moment that doesn’t seem like it would work.

I think I need to get a bit more creative with filtering etc

Just add some params for aggregation to your function. “timegrouping”: one out of “month,day,year” and groupings by category. Param “aggregation” : List of values like sum,count,avg
If you ask for the number of orders the function will get param aggregation=count, timegrouping = “” and catgrouping=“”. If you say: Get the number of orders grouped by year and month of the last 12 months for material group yyz your function will get timegrouping=“month,year”, aggregation=“count” and catgrouping=“materialgroup”