Looking for Tips to Improve Document Search and Thread Management in OpenAI Assistant API

anotheruser · August 22, 2024, 4:42pm

I’m working on implementing a feature in my real-world application where I need to extract answers from 2-3 uploaded documents using around 20-30 predefined questions. I’ve been using the file_search tool in the OpenAI Assistant API, and so far, it’s been working well—I’m able to retrieve relevant answers.

Here’s what I’ve done so far:

Created an assistant.
Looped through the list of questions (not using async).
Created a separate thread for each question using createAndRun, attaching the relevant fileId and vector storeId.
When message.event === 'thread.run.completed', I save the answer.
When message.event !== 'thread.run.completed', I handle it as “unable to fetch at this moment.”

My questions are:

Can I use a single thread for all the questions, considering that OpenAI will automatically truncate if the context window is full? Would using a single thread be beneficial in this case?
If a vector store is assigned to a thread, it will search across all the files within that store to find relevant answers. Is it better to maintain different vector stores for different sets of files, or should I explicitly attach specific fileIds to the threads?
Are there any other improvements or best practices I could apply to my current approach?

I’m new to this—does my approach sound reasonable, or is there anything I should consider refining?

    for (let i = 0; i < questions.length; i++) {
      await this.fetchAnswer(
        fileId,
        vectorStoreId,
        questions[i].question,
        assistant,
        results
      );
    }

  private async processQuestion(...){
 return this.openai.beta.threads
        .createAndRun({
          assistant_id: assistant.id,
          thread: {
            messages: [
              {
                role: 'user',
                content: question,
                attachments: [
                  { file_id: fileId, tools: [{ type: 'file_search' }] },
                ],
              },
            ],
            tool_resources: {
              file_search: {
                vector_store_ids: [vectorStoreId],
              },
            },
          stream: true,
          },
        }) .then(async run => {
          for await (const message of run) {
            if (message.event === 'thread.run.failed') {
             // save unable to retieve at this moment
              } as FileMessageEvent);
              return resolve(true);
            }
  
            if (message.event === 'thread.run.completed') {
              const messages = await this.openai.beta.threads.messages.list(
                message.data.thread_id,
                {
                  run_id: message.data.id,
                },
              );
             //save it
         }
}

thinktank · August 22, 2024, 4:52pm

Hi! Welcome.

With regard to whether you use one thread, or several additional threads: What is the nature of the questions you are asking?

Do they need to be comprehended individually or taken as a whole?

anotheruser · August 22, 2024, 5:11pm

They can be comprehended individually but again it comes under one topic but definitely can get the good answer without prev context

sergeliatko · August 22, 2024, 5:19pm

Hi @anotheruser ,

I have a full framework (almost public), that does exactly this.

Approach we have:

Prepare the files
Convert to JSON objects with hierarchical structure
Import into rag the chunks and section outlines for better search
Run search query
Select the items containing the answer
Pass selected items to the answering model
Collect the final response

Steps 4-7 are run in parallel to save time.

Would you like to give it a free spin to see if that fits your needs?

anotheruser · August 22, 2024, 5:21pm

That sounds like an impressive and efficient framework! I’m definitely interested in giving it a try to see if it aligns with what I’m looking for!!

sergeliatko · August 22, 2024, 7:48pm

Here is another post of mine with details (approximately) of what I’ll need to run a test: Fine-tuning for better extraction - #2 by sergeliatko

You may send me one or two files of data and a list of questions in the format I specified in the other thread, PM or here, as you decide.

ALBERT_MARTIN · May 7, 2025, 2:15pm

I used the files endpoint to upload the files with purpose “assistants” and store the fileids .
then create an assistant give all the fileids ,the instructions ( prompt should be clear )and temprature set to .3 .
use the assistantid to create and run with threads/runs enpoint in response we get the threadid use this to continue the chat with threads/{thread_id}/runs endpoint.

5lmnts · May 7, 2025, 3:52pm

Just BTW, and unfortunately, the Assistants API will be deprecated mid next year, so keep it in mind, before you invest too much time in the Assistants API. I invested all the last year and most of this for an assistants’ manager, that will be useless next year…

Based on your feedback from the Assistants API beta, we’ve incorporated key improvements into the Responses API. After we achieve full feature parity, we will announce a deprecation plan later this year, with a target sunset date in the first half of 2026. Learn more.

https://platform.openai.com/docs/assistants/overview

Topic		Replies	Views
Is there a future for the Assistants API? API assistants-api	13	3161	June 1, 2025
The OpenAI console Assistant does not use or find some of the files uploaded in its file search zone API	5	428	October 10, 2024
New "Assistants" API a potential replacement for low level "RAG" style content generation? API	9	8668	March 4, 2024
Using threads vs chat completions API	4	2943	May 15, 2024
Assistants API is too slow! API assistants-api	26	5034	March 16, 2025

Looking for Tips to Improve Document Search and Thread Management in OpenAI Assistant API

Related topics