Customizing chunk size for file_search tool

From the docs: https://platform.openai.com/docs/assistants/tools/file-search

By default, the file_search tool uses the following settings but these can be configured to suit your needs:

  • Chunk size: 800 tokens
  • Chunk overlap: 400 tokens
  • Embedding model: text-embedding-3-large at 256 dimensions
  • Maximum number of chunks added to context: 20 (could be fewer)
  • Ranker: auto (OpenAI will choose which ranker to use)
  • Score threshold: 0 minimum ranking score

I’m currently creating thread like below:

thread = openai.beta.threads.create(
        messages=[
            {
                "role": "user",
                "content": final_query,
                "attachments": [
                    {"file_id": file_id, "tools": [{"type": "file_search"}]}
                    for file_id in file_ids
                ],
            }
        ]
    )

Can you help me change parameters like chunk size, embedding model etc.?
I can’t find any docs on that.

You can do it when you create a new vector store:

vector_store = client.vector_stores.create(
    name="crazybase",
    file_ids=file_id_list,
    expires_after={
      "anchor": "last_active_at",
      "days": 60,
    },
    chunking_strategy={  # optional, default is 800, 400
        "type": "static",
        "static": {
            "max_chunk_size_tokens": 600,
            "chunk_overlap_tokens": 200,
        }
    }
)

You’d need to use that as a thread vector store

client.beta.threads.update(
    "thread_abc123",
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

Any new files should follow the chunking property of the vector store.