Customizing chunk size for file_search tool

macok · March 21, 2025, 12:10pm

From the docs: https://platform.openai.com/docs/assistants/tools/file-search

By default, the file_search tool uses the following settings but these can be configured to suit your needs:

Chunk size: 800 tokens

Chunk overlap: 400 tokens

Embedding model: text-embedding-3-large at 256 dimensions

Maximum number of chunks added to context: 20 (could be fewer)

Ranker: auto (OpenAI will choose which ranker to use)

Score threshold: 0 minimum ranking score

I’m currently creating thread like below:

thread = openai.beta.threads.create(
        messages=[
            {
                "role": "user",
                "content": final_query,
                "attachments": [
                    {"file_id": file_id, "tools": [{"type": "file_search"}]}
                    for file_id in file_ids
                ],
            }
        ]
    )

Can you help me change parameters like chunk size, embedding model etc.?
I can’t find any docs on that.

_j · March 21, 2025, 1:15pm

You can do it when you create a new vector store:

vector_store = client.vector_stores.create(
    name="crazybase",
    file_ids=file_id_list,
    expires_after={
      "anchor": "last_active_at",
      "days": 60,
    },
    chunking_strategy={  # optional, default is 800, 400
        "type": "static",
        "static": {
            "max_chunk_size_tokens": 600,
            "chunk_overlap_tokens": 200,
        }
    }
)

You’d need to use that as a thread vector store

client.beta.threads.update(
    "thread_abc123",
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

Any new files should follow the chunking property of the vector store.

Topic		Replies	Views
File retrieval in assistant uses huge amount of input tokens API assistants-api	11	2803	June 12, 2024
What model does the Vector store functionality use? API vector-store	5	437	August 7, 2024
Control chunk size when adding files to a Vectorstore for the new Assistant? API	5	2119	September 19, 2024
Assistants API v2. Maximum number of chunks limit API	8	1737	October 31, 2024
Get retrieved text chunks from file_search tool? API assistants-files	2	971	June 7, 2024

Customizing chunk size for file_search tool

Related topics