Best practices to reduce Code Interpreter errors and improve Performance

I am working on developing a Chatbot to analyze CSV files.
I am using the new Assistant API, Code Interpreter and either gpt-4 or gpt-4-1106-preview models. A set of 3 csv files are attached to the Assistant, as well as some Instructions.

I am running into ongoing errors and dismal performance with Code Interpreter (regardless of the size of the files), so I am trying to help Code Interpreter by providing more detailed instructions at the assistant level. But it does feel like Code Interpreter is not following the instructions of the Assistant.

Here are the instructions that are passed to the Assistant (files id are hidden):

“You are tasked with answering questions based on the data provided in the CSV files that are attached to the assistant. You have access to a Python code interpreter tool. This tool allows you to run Python code and scripts. You can execute these scripts within your Python environment to fulfill specific user requests. To load the contents of any of the files attached, use the pandas function read_csv.
The exact path for the file ‘Customers.csv’ is ‘/mnt/data/file-id-xxx’ and it contains information about the customers.
The exact path for the file ‘Orders Header.csv’ is ‘/mnt/data/file-id-yyy’ and it contains summary information of the orders such as date, channel used, sales amount.
The exact path for the file ‘Orders Lines.csv’ is ‘/mnt/data/file-id-zzz’ and it includes the detailed information of each order, such as the products that were purchased.
When answering the first user message in a new thread, check the header of the files that are attached to the assistant to see the available columns and structure of the data.
Do not provide answers based on external knowledge or assumptions; rely solely on the data extracted from the CSV files. Ensure that your responses are accurate and derived directly from the CSV files to address the user’s questions effectively.”

I am using Playground to improve the instructions, and I am running sometimes and randomly (it depends on the run) into two types of issues:

  • Code Interpreter is not using the correct path for the files. For example, it can add a csv file extension to the file id, even though the exact file id path is available in the instructions. It depends on the run…
  • Code Interpreter is using invalid column names and not following this specific instruction:
    “When answering the first user message in a new thread, check the header of the files that are attached to the assistant to see the available columns and structure of the data”. It assumes some column names and then ran into some errors as these column names don’t exist in the files.

Order Date is the perfect example. Sometimes it is trying to use “Order Date” as a name. Sometimes it tries “OrderDate” (no space). One is correct, the other is not. If it was following the instructions (check the header of the files that are attached to the assistant to see the available columns), the Code Interpreter wouldn’t run into an error.

For any developers out there using the new Assistants and Code Interpreter, what best practices have you put in place in your instructions to minimize Code Interpreter errors with file names, file paths and column names?

hmmm

what happens if you remove the path? From playing around it feels you can just reference the file saying “use the customers file” or “the orders file” or whatever.

consider maybe including the header in the same file, and then letting it use these headers with pandas or whatever so you don’t have that extra step

basically, when you get erratic behavior, it’s typically best to try and reduce the complexity of your task as much as possible. Take smaller steps, do as much imperative preprocessing as possible, and include as little potentially confusing information as possible.

I am using 3.5 and have also had issues with consistency about getting the Code Interpreter to recognize and retrieve the file. I have not added Instructions with explicit paths, but rather just the file names. Either way, I imagine this is a functionality that is known to be lacking (how can they not have noticed through testing), and hope it will be improved soon