It would be nice if there was the ability to add metadata to the file object like you can with assistants, threads and runs.
The biggest need for this is really to be able to reliable “External Id” to use to when trying to keep files in sync.
It would be nice if there was the ability to add metadata to the file object like you can with assistants, threads and runs.
The biggest need for this is really to be able to reliable “External Id” to use to when trying to keep files in sync.
They way I do it is to use the thread meta data. When adding a file, I add ‘file_id’ : ‘my id’ pairs to the THREAD metadata.
But I agree with your point completely. Hell, it this point I would love to be able to simply set the FILENAME!
Bumping this, as I’m really feeling the pain of not having metadata for files. We literally have metadata for every object in the Assistants API now, assistants, vectorstores, threads, runs, messages, but not files.
I’m going to see about incorporating the file path and some sort of versioning in the name itself, like finance/doc1_v001.pdf
or something that will allow me to map directories to vector stores/ assistants.
Additionally, there’s content_type and headers available in the FileTypes object that files.create uses, I’m not sure how it can be used…
FileTypes = Union[
# file (or bytes)
FileContent,
# (filename, file (or bytes))
Tuple[Optional[str], FileContent],
# (filename, file (or bytes), content_type)
Tuple[Optional[str], FileContent, Optional[str]],
# (filename, file (or bytes), content_type, headers)
Tuple[Optional[str], FileContent, Optional[str], Mapping[str, str]],
This is not ideal, but the API accepts filenames that are very large (over 10,000 characters), which means that you could easily store the metadata in there. I’m not sure this will be future proof - I’d imagine this is something OpenAI will eventually want to cap - especially since the API to retrieve the file list is current not paged and that request can already get pretty huge with normal filenames…