Looking for Tips to Improve Document Search and Thread Management in OpenAI Assistant API

I’m working on implementing a feature in my real-world application where I need to extract answers from 2-3 uploaded documents using around 20-30 predefined questions. I’ve been using the file_search tool in the OpenAI Assistant API, and so far, it’s been working well—I’m able to retrieve relevant answers.

Here’s what I’ve done so far:

  1. Created an assistant.
  2. Looped through the list of questions (not using async).
  3. Created a separate thread for each question using createAndRun, attaching the relevant fileId and vector storeId.
  4. When message.event === 'thread.run.completed', I save the answer.
  5. When message.event !== 'thread.run.completed', I handle it as “unable to fetch at this moment.”

My questions are:

  1. Can I use a single thread for all the questions, considering that OpenAI will automatically truncate if the context window is full? Would using a single thread be beneficial in this case?
  2. If a vector store is assigned to a thread, it will search across all the files within that store to find relevant answers. Is it better to maintain different vector stores for different sets of files, or should I explicitly attach specific fileIds to the threads?
  3. Are there any other improvements or best practices I could apply to my current approach?

I’m new to this—does my approach sound reasonable, or is there anything I should consider refining?

    for (let i = 0; i < questions.length; i++) {
      await this.fetchAnswer(
        fileId,
        vectorStoreId,
        questions[i].question,
        assistant,
        results
      );
    }

  private async processQuestion(...){
 return this.openai.beta.threads
        .createAndRun({
          assistant_id: assistant.id,
          thread: {
            messages: [
              {
                role: 'user',
                content: question,
                attachments: [
                  { file_id: fileId, tools: [{ type: 'file_search' }] },
                ],
              },
            ],
            tool_resources: {
              file_search: {
                vector_store_ids: [vectorStoreId],
              },
            },
          stream: true,
          },
        }) .then(async run => {
          for await (const message of run) {
            if (message.event === 'thread.run.failed') {
             // save unable to retieve at this moment
              } as FileMessageEvent);
              return resolve(true);
            }
  
            if (message.event === 'thread.run.completed') {
              const messages = await this.openai.beta.threads.messages.list(
                message.data.thread_id,
                {
                  run_id: message.data.id,
                },
              );
             //save it
         }
}
2 Likes

Hi! Welcome.

With regard to whether you use one thread, or several additional threads: What is the nature of the questions you are asking?

Do they need to be comprehended individually or taken as a whole?

1 Like

They can be comprehended individually but again it comes under one topic but definitely can get the good answer without prev context

2 Likes

Hi @anotheruser ,

I have a full framework (almost public), that does exactly this.

Approach we have:

  1. Prepare the files
  2. Convert to JSON objects with hierarchical structure
  3. Import into rag the chunks and section outlines for better search
  4. Run search query
  5. Select the items containing the answer
  6. Pass selected items to the answering model
  7. Collect the final response

Steps 4-7 are run in parallel to save time.

Would you like to give it a free spin to see if that fits your needs?

1 Like

That sounds like an impressive and efficient framework! I’m definitely interested in giving it a try to see if it aligns with what I’m looking for!!

1 Like

Here is another post of mine with details (approximately) of what I’ll need to run a test: Fine-tuning for better extraction - #2 by sergeliatko

You may send me one or two files of data and a list of questions in the format I specified in the other thread, PM or here, as you decide.