Unexpectedly High Token Consumption in OpenAI Assistant

k.agar · March 10, 2025, 7:31pm

I’m experiencing unexpectedly high token usage when using OpenAI’s Assistant API with file search. My input message (user+system) is relatively short (estimated ~200 tokens), but every request consumes around 21,000 tokens, which is completely disproportionate.

Use Case:

I am building an image analysis script where:

The system submits an image (refered as image_url).
I provide a short text description of the image (small string, probably ~200 characters long).
OpenAI’s Assistant API with file search should return relevant tags (industries) from a vector store that contains a single text file with ~1,000 keywords.
The API should only return industry names that match the provided vector data.
The API should search the vector store, extract relevant industries, and return a small JSON response (this is working fine btw).

More Context:

Model: gpt-4o
Using OpenAI Assistants API
File Storage: A single text file (~1,000 words, the tokenizer calculates about ~4.300 tokens) stored in a vector store

Debugging Steps Taken

Average token usage for output: ~50 tokens
Tried setting truncate_context: true
Limited file search size (max_chunks is not supported in API)
Confirmed the vector store only has a single text file (~1,000 words/~4.300 tokens)
Checked the assistant response (very short, ~50 tokens only)
Made sure no persistent threads or excessive history is being loaded

Why is the Assistant API consuming so many tokens (~21,000) when my prompts are probably max. 100 tokens long and the text file (vector store) should only eat about 4,300 tokens? Does the image analyzis really cost more than 15,000 token for a 600x400px image?

How can I reduce token consumption while still leveraging the vector store efficiently?

Any insights or suggestions would be highly appreciated.

public async setupAssistant(): Promise<string> {
    const assistant = await this.openai.beta.assistants.create({
        name: "Industry Tagging Assistant",
        instructions: "Analyze images and return relevant industry tags based on the vector store.",
        model: "gpt-4o",
        temperature: 0.2,
        tools: [{ type: "file_search" }],
        tool_resources: {
            file_search: { vector_store_ids: [env.OPENAI_VECTOR_STORE_ID] },
        },
        response_format: {
            type: "json_schema",
            json_schema: {
                type: "object",
                properties: {
                    data: { type: "array", items: { type: "string" } }
                },
                required: ["data"],
                additionalProperties: false,
            },
        },
    });
    return assistant.id;
}

public async analyzeMedia(mediaId: string): Promise<void> {
    const thread = await this.openai.beta.threads.create();

    const description = `Analyze this image. Description: "${media.descriptions?.en.text ?? 'No description available'}"`;
    const userMessage = [
        { type: 'text', text: description },
        { type: 'image_url', image_url: { url: publicURL } },
    ];

    await this.openai.beta.threads.messages.create(thread.id, {
        role: 'user',
        content: userMessage,
    });

    const run = await this.openai.beta.threads.runs.create(thread.id, {
        assistant_id: env.OPENAI_ASSISTANT_ID,
        tool_choice: "file_search",
    });

    const result = await this.openai.beta.threads.runs.retrieve(thread.id, run.id);
    const messages = await this.openai.beta.threads.messages.list(thread.id);
    const assistantMessage = messages.data.find(msg => msg.role === 'assistant');

    if (!assistantMessage) throw new Error('No response from assistant.');

...
}

_j · March 11, 2025, 2:25am

File search is a tool that the AI can call, multiple times if it desires, especially if not getting what it expects. Every output to tools instead of to a response is another API call based on a growing chat.

You can examine the run steps list to see the tool invocations, the iterations, and the token consumption that are delegated.

Or you can realize that nothing here you describe needs an AI calling tools. On chat completions, you can provide your system message, knowledge file to answer from, and image prompt, and get your response in a single API call of input and output tokens actually sent and generated.

Topic		Replies	Views
Using Assistant API GPT-4o with File Search enabled automatically ups the tokens used by 3.5k Bugs api , assistants-api , gpt-4o	2	968	June 27, 2024
High Costs and Input Tokens with Assistants API File Search API pricing , assistants-api , assistants-pricing , assistants-files	4	1539	October 31, 2024
Assistant API - way too much "input" tokens used API assistants-api , assistants-pricing	7	5011	September 6, 2024
Strange Assistants Pricing API assistants-api , assistants-pricing	1	596	May 7, 2024
Seeking Advice on Reducing Costs for RAG Chatbot Using File Search Assistant API api	4	1098	July 6, 2024

Unexpectedly High Token Consumption in OpenAI Assistant

Related topics