I’m experiencing unexpectedly high token usage when using OpenAI’s Assistant API with file search. My input message (user+system) is relatively short (estimated ~200 tokens), but every request consumes around 21,000 tokens, which is completely disproportionate.
Use Case:
I am building an image analysis script where:
- The system submits an image (refered as image_url).
- I provide a short text description of the image (small string, probably ~200 characters long).
- OpenAI’s Assistant API with file search should return relevant tags (industries) from a vector store that contains a single text file with ~1,000 keywords.
- The API should only return industry names that match the provided vector data.
- The API should search the vector store, extract relevant industries, and return a small JSON response (this is working fine btw).
More Context:
- Model: gpt-4o
- Using OpenAI Assistants API
- File Storage: A single text file (~1,000 words, the tokenizer calculates about ~4.300 tokens) stored in a vector store
Debugging Steps Taken
- Average token usage for output: ~50 tokens
- Tried setting truncate_context: true
- Limited file search size (max_chunks is not supported in API)
- Confirmed the vector store only has a single text file (~1,000 words/~4.300 tokens)
- Checked the assistant response (very short, ~50 tokens only)
- Made sure no persistent threads or excessive history is being loaded
Why is the Assistant API consuming so many tokens (~21,000) when my prompts are probably max. 100 tokens long and the text file (vector store) should only eat about 4,300 tokens? Does the image analyzis really cost more than 15,000 token for a 600x400px image?
How can I reduce token consumption while still leveraging the vector store efficiently?
Any insights or suggestions would be highly appreciated.
public async setupAssistant(): Promise<string> {
const assistant = await this.openai.beta.assistants.create({
name: "Industry Tagging Assistant",
instructions: "Analyze images and return relevant industry tags based on the vector store.",
model: "gpt-4o",
temperature: 0.2,
tools: [{ type: "file_search" }],
tool_resources: {
file_search: { vector_store_ids: [env.OPENAI_VECTOR_STORE_ID] },
},
response_format: {
type: "json_schema",
json_schema: {
type: "object",
properties: {
data: { type: "array", items: { type: "string" } }
},
required: ["data"],
additionalProperties: false,
},
},
});
return assistant.id;
}
public async analyzeMedia(mediaId: string): Promise<void> {
const thread = await this.openai.beta.threads.create();
const description = `Analyze this image. Description: "${media.descriptions?.en.text ?? 'No description available'}"`;
const userMessage = [
{ type: 'text', text: description },
{ type: 'image_url', image_url: { url: publicURL } },
];
await this.openai.beta.threads.messages.create(thread.id, {
role: 'user',
content: userMessage,
});
const run = await this.openai.beta.threads.runs.create(thread.id, {
assistant_id: env.OPENAI_ASSISTANT_ID,
tool_choice: "file_search",
});
const result = await this.openai.beta.threads.runs.retrieve(thread.id, run.id);
const messages = await this.openai.beta.threads.messages.list(thread.id);
const assistantMessage = messages.data.find(msg => msg.role === 'assistant');
if (!assistantMessage) throw new Error('No response from assistant.');
...
}