This is not a guess. This is a solution.
The behavior and performance of the vector store endpoint has changed, and the URL containing a file path is not immediately ready. Thus, The API SDK immediately polls and fails.
Mine won’t:
'''error-tolerant file uploader and vector store attachment'''
from openai import OpenAI; cl = OpenAI()
from time import sleep
from openai import NotFoundError
def upload_attach_poll(vs: str, file: str, suffix: str = "") -> str:
# step 1: upload the file to OpenAI storage
file_id = cl.files.create(
file=open(file, "rb"),
purpose="user_data",
#expires_after={"anchor": "created_at", "seconds": 3600},
).id
# step 2: attach the uploaded file to the vector store with attributes
vs_file_id = cl.vector_stores.files.create(
vector_store_id=vs,
file_id=file_id,
attributes={"name": file + suffix},
).id
# step 3: poll the attachment status until it leaves 'in_progress'
# - tolerate initial 404s before first observed status
# - give it a minute - or be like openai and loop forever?
max_retries = 30
seen_status = False
sleep(1) # initial delay before the first retrieve
for attempt in range(1, max_retries + 1):
if attempt == max_retries:
# On the final attempt, do not catch any error so the SDK error surfaces
status = cl.vector_stores.files.retrieve(
vector_store_id=vs,
file_id=file_id,
).status
print(f"Polling gave status {status}...")
if status != "in_progress":
return file_id
break # fall through to failure after max retries
try:
status = cl.vector_stores.files.retrieve(
vector_store_id=vs,
file_id=file_id,
).status
seen_status = True
if status != "in_progress":
return file_id
except NotFoundError:
print(f"Polling gave NotFoundError - Retrying...")
# Tolerate early 404s only until the first successful status retrieval
if seen_status:
# Do not tolerate a regression back to 404 after seeing a status
raise
sleep(2) # fixed delay between retries
# If we reach here, status remained 'in_progress' after max retries
raise TimeoutError(f"file {file_id} remained 'in_progress' after {max_retries} retries")
# demo: make a vector store, use your local file, demo metadata, success uploading
filename = "vs_metadata_demo.py"
vs_id = cl.vector_stores.create(name="vstest").id
file1_id = upload_attach_poll(vs_id, file=filename, suffix="")
file2_id = upload_attach_poll(vs_id, file=filename, suffix="--second")
print(f"-- retrieving vector store file listing, looking for metadata")
sleep(2) # this also is acting slow to create a complete listing
vs_files = cl.vector_stores.files.list(vector_store_id=vs_id)
for item in vs_files.data:
print(item.model_dump())
cl.files.delete(item.id)
cl.vector_stores.delete(vector_store_id=vs_id)
Function that takes your vector store ID, a local file, an optional suffix to add onto the end of the filename used as metadata – and then still gets it uploaded and attached.
Results in this version that prints:
Polling gave NotFoundError - Retrying...
Polling gave NotFoundError - Retrying...
-- retrieving vector store file listing, looking for metadata
{'id': 'file-QSfDDBSARoreLFEiiS1ZZP', 'created_at': 1761590207, 'last_error': None, 'object': 'vector_store.file', 'status': 'completed', 'usage_bytes': 1146, 'vector_store_id': 'vs_68ffbbb190fc8191a6f83381afa81f38', 'attributes': {'name': 'vs_metadata_demo.py--second'}, 'chunking_strategy': {'static': {'chunk_overlap_tokens': 400, 'max_chunk_size_tokens': 800}, 'type': 'static'}}
{'id': 'file-1P3Sg5pEzWYGXgiq2ZzqgG', 'created_at': 1761590196, 'last_error': None, 'object': 'vector_store.file', 'status': 'completed', 'usage_bytes': 1146, 'vector_store_id': 'vs_68ffbbb190fc8191a6f83381afa81f38', 'attributes': {'name': 'vs_metadata_demo.py'}, 'chunking_strategy': {'static': {'chunk_overlap_tokens': 400, 'max_chunk_size_tokens': 800}, 'type': 'static'}}
You can adapt if you just want to pass a file_id to such a function and not a file to upload then attach, or don’t want the usefulness of it making metadata with a file name. Crank up the poll time.
The vector store listing is also slow to update after a success status, so the demo of this function that uploads and shows different filenames as search query “attributes” also gets a sleep().