How to build for queries that require semantic understanding and data analysis?

The documentation indicates that the Assistant API can call multiple tools as part of a Run cycle in response to a user query, but I don’t see any examples of this. I’m trying to understand the intended architecture for a multi-stage query.

As an example, let’s say I am working on a foods-as-medicine disease prevention app. There are three main sources of knowledge. Assume that the following JSON files all have classes that serialize the contents into objects that can be used for encoding or preprocessing in any way we desire.

  1. diseases.json - an array of Disease objects. Each object has a name, array of alt_names, and an array of nutrients that fight the disease. Each nutrient has an efficacy score of 0-10, and a short description of why it’s effective.

  2. nutrients.json - a truncated list of Food objects, containing the name and their nutrient contents (fat, calories, protein, vitamin a, magnesium, etc)

  3. recipes.json - a list of Recipe objects, containing a name, category, ingredients, and yield.

I would like a user to make a complex query to the Assistant like: "I have the flu. Give me some soup recipes."

This query would involve multiple stages:

  1. Identify the most important nutrients for the disease.

  2. Identify the soup recipes with the highest concentration of these nutrients.

  3. Process the result and format it in natural language for the user.

What I’m confused about is how to combine the semantic capabilities of the assistant with data retrieval and analysis. Vector databases don’t allow for sorting,

Solution A:
Multiple Tool Functions
Idea: implement functions to analyze the data sets: get_nutrients_for_disease(disease_name), get_top_recipes(type, nutrient_arr). The Assistant should call get_nutrients_for_disease, then that output should be used to call get_top_recipes, with that result being used to construct the response to the user.
Problem: How can I ensure that the functions are called in the correct order? This involves a lot of work on the backend, and doesn’t seem to really be leveraging the advantages of GPT. It’s nothing more than a natural language interface to a traditional service.

Solution B:
Preprocessed Data
Idea: Using Chroma or Pinecone, create a vector database for RAG. The VDB would have the following categories: nutrients, recipes, diseases. The recipes collection would contain an enhanced version of the Recipe class that contains pre-calculated scores for each disease, along with the nutrient amounts derived from the ingredient list.
Problem: How could I handle sorting queries, like: What are the five foods with the most vitamin a? You can’t sort by metadata fields. Is there some way to have a kind of associative database or pre filtering method? Users may also form a query that a preprocessed field cannot satisfy, and the Assistant will likely hallucinate as a result. It also requires updating the entire collection if I add more diseases or change any scores.

It feels like I’m misunderstanding something or missing a crucial part of the toolchain. Maybe the solution is neither of these? How are we meant to build relative complex applications that involve multiple steps?

Ah, such a cool project.

From my experience, this is where chain of thought prompting comes into play.

In your assistant, you would include the following:

Do the following steps:

  1. Use the disease name provided and search in the disease.json file which has been loaded into your directory and extract the nutrients and its efficacy score
  2. sort the nutrients with the highest efficacy score and then take the top 10…

You can even ask your assistant to give you how its thinking line by line.

Hope this helps. At the end of the day, think of it as writing code, instead, you are writing an instruction and you will have to play around with your language etc.