I created an assistant using the web UI, and uploaded two files:
foo.csv
bar.csv
once uploaded, they are stored under different file names, usually with this format file-<random_str>.csv. This causes the assistant to mix them up when running the code interpreter. For example it would look for columns in bar.csv that are only present in foo.csv and vice versa. Is there a solution for this? I noticed that it doesn’t always read the files in the same order, so adding in the instructions that the first file is foo.csv does not help. Thanks!
If you are using code, you could track the columns in the original file names as say (file_name, [list of columns]) and when the anon occurs, track it as (anon_file_name, file_name) and then switch the two.
This tuple you could then append to the prompt and pass to the assistant. It should get the information this way.
Another small step to improve performance might be to append a small description of each column in the csv if you can, it would greatly boost the ability to select the correct column
You’re looking at file_id param the file name is not altered. However it’s important to note that the API only uses file_id to uniquely identify each file for all purposes whether fine-tuning or assistants.
Yes I’m referring to the file id, I don’t randomize the names myself but when the assistant runs the code interpreter it uses the file id instead of the file name, so it would run ‘load_csv(“file-abc.csv”) instead of ‘load_csv(“foo.csv”)’