Hi, first timer here. I looked at some similar questions, but the only solution I found was to pay $50 and I don’t think that will help me given the file sizes and number of them I am working with. I am looking for any other solutions.
I created a file search Assistant, and uploaded some files to a vector store. Then I created a Thread and tried to create a run.
The run always fails and I always get this error:
LastError( 32521. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.')
truncation_strategy=TruncationStrategy(type=‘auto’, last_messages=None), usage=Usage(completion_tokens=24, prompt_tokens=845, total_tokens=869), temperature=1.0, top_p=1.0, tool_resources={})
I don’t get this error when the file I upload for file search is tiny.
I have paid $5 already.
I thought OpenAI docs said it would auto chunk the files I upload and do a keyword and semantic search over them to retrieve relevant data?
Does OpenAI send the entire doc in the prompt?
Can I modify the chunking strategy to fit my quota limits?
Details:
from openai import OpenAI
client = OpenAI(api_key = config["openai_api_key"])
assistant = client.beta.assistants.create(
name="Assistant",
instructions="You are an expert on a person. Use you knowledge base to answer questions about his works.",
model="gpt-4o",
tools=[{"type": "file_search"}],
)
# Create a vector store called "Financial Statements"
vector_store = client.beta.vector_stores.create(name="TheData")
# Ready the files for upload to OpenAI
file_paths = [<some large files>]
file_streams = [open(path, "rb") for path in file_paths]
# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id, files=file_streams
)
# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)
assistant = client.beta.assistants.update(
assistant_id=assistant.id,
tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)
# Save the assistant's ID for future use
with open("assistant_id.txt", "w") as f:
f.write(assistant.id)
print("Assistant setup complete. ID saved to assistant_id.txt")
Now I try to send a message to the Assistant:
thread = client.beta.threads.create(
messages =[
{
"role":"user",
"content": "tell me about the main themes in doc 1"
}
]
)
assistant_id_got = load_assistant_id()
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant_id_got,
instructions="Please address the user as Jane Doe. The user has a premium account."
)
if run.status == 'failed':
print(run)
if run.status == 'completed':
messages = client.beta.threads.messages.list(
thread_id=thread.id
)
print(messages)
else:
print(run.status)
The run always fails and I always get this error:
LastError( 32521. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.')
truncation_strategy=TruncationStrategy(type=‘auto’, last_messages=None), usage=Usage(completion_tokens=24, prompt_tokens=845, total_tokens=869), temperature=1.0, top_p=1.0, tool_resources={})
I don’t get this error when the file I upload for file search is tiny.
I have paid $5 already.
I thought OpenAI docs said it would auto chunk the files I upload and do a keyword and semantic search over them to retrieve relevant data?
Does OpenAI send the entire doc in the prompt?
Can I modify the chunking strategy to fit my quota limits?