Anyone successfully using Code Interpreter via API for real analysis?

Hello everyone, I’ve been experimenting with OpenAI’s Assistant API using GPT-4o and the Code Interpreter tool for some simple HR data analysis (e.g., value counts for departments, min/max salaries, etc.). My data is stored in a vector database, and I pass it as a JSON file. But I keep running into issues:

  • Sometimes the model says it can’t read the data from the vector store.
  • Other times, it successfully loads the file (pd.read_json(file_path)) and starts running Pandas functions… but then suddenly resets, replaces my data with a hardcoded example list, and gives me a completely irrelevant answer.
  • In some cases , it runs the right code for the specific instruction but then outputs a different result from the expected one

Has anyone gotten this to work consistently? How do you:

  • Ensure it actually reads the right file and doesn’t hallucinate a dataset?
    • Get it to exactly return the results of its executed code ?

Would love to hear your experiences, workarounds, or best practices!

1 Like

You’ll need to do the “making it work consistently” with your system prompting and quality of application development. First, it sounds like you are conflating the features available.

Code interpreter and file_search’s vector store are two different tool products.

You must decide in your API “tools” specification whether you will enable one or the other.

For code interpreter, you must attach files specifically with tool_resources, the code interpreter type, and the file IDs that you wish to place in the mount point.

Vector stores are separate. File search is simply a knowledge search feature that returns chunked documents powered by the vector store. It is not even called “vector store” in the file search tool description, nor is there a listing of what type of knowledge there is to be found (where on models older than gpt-4o-2024-08-06, it is internally called myfiles_browser in the instructions placed for AI consumption.)

Thus, none of the work of developing an API application is done for you, relating to files. This includes there being no mechanism beforehand of telling the AI what files it will find in the mount point, or their purpose. It would have to write scripts to even list the contents of the directories. That is a feature for you to place and develop, which you can do with additional_instructions if the Python files are specific to the user or the run.

2 Likes

Thank you for the detailed clarification! I don’t think I was mixing up File Search and Code Interpreter, but I was relying on Code Interpreter to use files uploaded via File Search. I didn’t realize the difference between tool_resources and File Search for file access—this explains the inconsistency. I’ll make sure to attach files properly with tool_resources . Appreciate the insight!