Getting Attachments to work

I am using the code below to build a simple Assistant that is capable of reading a pdf file attached as a part of a message thread. But this does not seem to work as even though the message_files object is being created (checked via print statements) it does not seem to get uploaded and I am unsure as for the cause of this since this is the code from the api documentation (https://platform.openai.com/docs/assistants/tools/file-search).

import openai
from openai import OpenAI
from config import *
import time
import sys

client = OpenAI(api_key=OPENAI_API_KEY)
assistant_id = "someid"

def create_thread_with_file(file_path):
    # Open the file and send it to OpenAI
    with open(file_path, 'rb') as file:
        message_file = client.files.create(
            file=file, purpose='assistants'
        )
    print(client.files.content(message_file.id))
    print("message_file",message_file)
    # Create a thread and attach the file to the message
    thread = client.beta.threads.create(
        messages=[
            {
                "role": "user",
                "content": "Please analyze the attached document.",
                "attachments": [
                    {"file_id": message_file.id, "tools": [{"type": "file_search"}]}
                ],
            }
        ]
    )
    print("thread",thread)
    return thread

def stream_generator(prompt, thread_id):
    message = client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=prompt
    )

    print("Wait... Generating response...")
    stream = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id,
        stream=True
    )
    
    full_response = ""

    for event in stream:
        if event.data.object == "thread.message.delta":
            for content in event.data.delta.content:
                if content.type == 'text':
                    full_response+=content.text.value
    print(full_response)


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python script.py <file_path>")
        sys.exit(1)
    
    file_path = sys.argv[1]
    thread = create_thread_with_file(file_path)
    if thread:
        print("File uploaded and new thread created!")

    while True:
        prompt = input("Enter your message (or type 'exit' to quit): ")
        if prompt.lower() == 'exit':
            break
        stream_generator(prompt, thread.id)

For more context I tried other methods like building vector stores but I only found examples where it was being uploaded to the assistant as a knowledge base I do not want that as the information being sent in a thread must be available only there and outside of the same thread should not be available.
Also, I am able to see each thread and the data associated is being passed as a file attachment but still not available.
I understand that there is some type of bug going around relating to this but I wish to get this operational so any and all help regarding this issue is much appreciated.

2 Likes

This may be an outdated pots but just in case someone is wondering how this works at the moment.

What OpenAI does when you send the file as attachment is create a temporary Vector Store, that is equal to the ones you’d create with the API, with an expiration of 7 days.

So, under the hood, both accomplish the same end. Sending attachments is the only way though to get two separate vector stores working within one thread.

It also means that if you were expecting to see the file show up in the main Vector Store linked to the assistant, it won’t be there, it will be in a new “Untitled” Vector Store created just for the thread.

Are you saying this works compared to the create vector store, send file to vectorstore, attach store to thread, query thread model? (which i have set up but has yet to work)

Yes!

Right now, if you upload a file to a thread a a message with attachments, that creates a temporary (7days) VS already linked to the thread.

And separately, the linking of VS to assistant should work, I am rn working on a project with it working correctly.

The only limit is that atm (other of what’s already mentioned above) there’s a limit of one VS per assistant or thread. So if you need to use multiple knowledge indexes you need to switch them, use an assistant per knowledge source and switch those, or use a 3rd party for retrieval and give it as a tool to the assistant.

I’m sorry if this is a dumb question but what is the best way to grab the file_search_stores for a given thread? Assuming this is the temporary 7 day vector store?

You can call retrieve passing in a thread_id to get a thread which will surface the thread-level vector store.