We are trying to use file_search to search over a catalog of products. There is a vector store with 3 markdown files including our products. The format of our files is like this:
## Product Name
**Types:** A, B, C
.... (more props)
### Subsection of product
... (more subsections)
## Another Product
...(and so on)
We are running into a problem we have found no way to overcome.
We are not able to get a list of all the products in our catalog of a certain Type. For instance, we have a type that occurs 48 times in our catalog but the assistant returns between 11 and 15 when asked for all of the products of that type.
We have tried tweaking the instruction, increasing max_num_results to 50 (the max) but had no luck. The only way we have found to get more is to tell the assistant how many items there should be but that’s not practical for our use case.
Do you have any recommendations of things to try to maximize the results we get?
This seems like a better task using Function Calling.
Have you thought about harmonizing the two together? You can use embeddings as an index to find the relevant keys and then perform a search in the database you would probably have.
Then, you don’t need to completely update your embeddings each time a product is updated, and can deal with constantly adjusting values like stock.
This way you are back into the world of deterministic results that can be calculated on. Like performing aggregation tasks.
Thanks! We are doing something like what you mention, which is to add database IDs in our catalog and have the assistant give them to us so that we can show the products in our app. However, since we are using file_search for filtering, we would still want more IDs being passed to us in the function call.
Previously, we had an alternative assistant that filtered based on function calling. That one didn’t have this problem but would sometimes be too “narrow” when searching, that’s why I was tasked on using file_search instead. I guess I’m trying to get the best of both worlds, flexibility and accuracy, but haven’t quite nailed how to do it