When I create a custom gpt and upload a file, let’s say “mycode.py” for the code interpreter, the code interpreter can access it at:
/mnt/data/mycode.py
When I want to do the same with an assistant: I upload the same “mycode.py” for usage by the code interpreter, the file in /mnt/data is named:
file-p4zUlf6AkqMJBDVm8kA8xxZZF ( or another random name, matching the fileId of the upload ).
How can I have correct filenames in the code interpret in the assistant api? Am I missing something ?
I have tried multiple API methods, and it seems there is no provided facility in Assistants to associate mount point uploaded files with an original file name.
That is despite the file being uploaded correctly and the metadata file name being returned:
The assistant cannot find any method to retrieve the original files.
Here is the list of files you have uploaded:\n\n- ID: file-Uv9vHHjWAszEFOo9D3qcJwub, Original Name: file-Uv9vHHjWAszEFOo9D3qcJwub\n\nIt seems that the original name of the file is not user-friendly and is the same as the system-assigned ID. If you have any specific operations you would like to perform on this file, please let me know!
The augmentation you would have to perform would be to provide your own mapping of uploaded file names to mount point file IDs, perhaps by updating the assistant instruction with an additional section, or using additional_instructions if they are user files.
The ultimate augmentation would be to provide your own Jupyter notebook execution environment that has the statefulness and file versatility you desire, is deployed within the scope of a chat session, user, or group as may be desired, is free – and is offered as a function for Chat Completions.
I came to the same conclusions, but I am wondering how they make it work with the custom gpts? Do you think it’s 100% prompt-engineering, with additional_instructions appended every time a new file is attached to the code execution environment ?
We can see that it works logically for OpenAI’s product but not for you the Assistants API developer, by the additional information that is placed in context in a GPT, noting the mount point name:
Gizmo uploaded file with ID ‘file-ImnU16MgHOCaqBN5aG57izvZ’ to: /mnt/data/get-thread-messages-and-save.py.
Gizmo uploaded file with ID ‘file-UZNu3dnYX949baiuLitZl1D9’ to: /mnt/data/streaming_helper.py.
Gizmo uploaded file with ID ‘file-yIdBZ2n7nZ1hpVikjbNvfMiU’ to: /mnt/data/list-vector-stores.py.
All the files uploaded by the user have been fully loaded. Searching won’t provide additional information.
Code interpreter will return the original file names when doing a ls of the mount point.
The text about “fully loaded” is that the full text of these small files is placed into the context window, which the AI can reproduce with no tool call – files also going to file search if supported.
Besides a map of original name to mount point file ID placed into context just for reference, the creative person could tell the Assistants AI that it must rename files as the first thing sent to the Python notebook, before continuing with the user task. Send the actual script with the actual map.
# Define the mapping of blob file name to original file names
file_name_map = {
'file-xxx': 'original.1',
'file-yyy': 'original.2',
'file-zzz': 'original.3'
}
# Create symlinks for each file with the new names
for original_file, new_name in file_name_map.items():
original_path = os.path.join('/mnt/data', original_file)
symlink_path = os.path.join('/mnt/data', new_name)
os.symlink(original_path, symlink_path)
# List the files again to confirm symlinks creation
os.listdir('/mnt/data')
Ok I see the idea, thanks.
Actually we probably would have to maintain a file with the mappings in each thread context. Thinking of it this would explain why we can attach less files to a running code interpreter in a custom gpt than in an assistant, they probably have some slots reserved for the gory “mappings”.