New developer here trying out the Assistant API and looking to implement it in a project of mine.
I have some questions about Assistants and threads.
Reading about Threads i understand it that “context” in this case is kept within the tread. Does this also mean that each following call to the API consumes more and more tokens up to a certain limit where the context is truncated? OR does it save the context in some way that doesn’t include the previous prompts?
If i were to implement Assistant in my project, i’d need it to be able to scale horizontally to multiple users. My thinking would be to create one assistant with one thread to each user. Is this possible and what are the limitations of this? Couldn’t find any answers on google.
Is it possible to “clean” or “empty” threads in code such as in the playground?
Been working on this same issue all day. Discovered you need 3 different API calls to get the response. 1. Create a Thread: A thread represents a conversation session and is created with an initial set of messages. This is what your createThread function is doing.
2. Run the Thread: Once the thread is created, you perform a ‘run’ on this thread. The ‘run’ involves invoking the Assistant on the Thread, where it processes the messages in the thread and may append new messages (responses from the Assistant). This is handled by your runThread function.
3. Retrieve Messages from the Thread: After the run is completed and the Assistant has potentially added its responses to the Thread, you then fetch the updated list of messages from the Thread. This will include both the original messages and any new messages appended by the Assistant during the run.
Welcome to our community! It’s great to see new developers diving into the Assistant API. I’ll address your questions in a more informal and clear manner:
Context Maintenance in Threads:
You’re spot on about the context being within the thread. Each API call consumes tokens ( Pricing (openai.com) | Assistants API pricing details per message - API - OpenAI Developer Forum ), and there’s a limit to the context size. When it exceeds, only the most recent context that fits within the token budget is retained. This means that older interactions disappear, similar to a sliding window over the conversation history.
Horizontal Scalability with Assistants and Threads:
You’re on the right track by creating a unique assistant and thread for each user (OpenAI Platform) . Each thread acts as an independent conversation and can handle a unique user ( API Reference - OpenAI API) . There are rate limits and potential pricing considerations based on usage volume, but planning ahead can help manage them effectively.
Cleaning or Emptying Threads:
It seems, based on the official documentation and API references for OpenAI Assistants, that you can edit or modify threads, but currently, the only available parameter for modification is the metadata. API Reference - OpenAI API - Modify Thread
Instead, you can effectively manage the continuity of the conversation by creating new threads when you need a fresh start or ensuring relevant content continues in future interactions.
sequenceDiagram
autonumber
participant User as User
participant API as Assistants API
participant Thread as Thread (Conversation)
participant Message as Message
User->>API: Create Thread (POST /v1/threads)
API->>Thread: Thread Initialized
Thread->>User: Respond with Thread ID
User->>API: Send Message (POST /v1/threads/{thread_id}/messages)
API->>Message: Process Message
Message->>Thread: Add Message to Thread
Thread->>User: Confirm Message Receipt
User->>API: Request Response (GET /v1/threads/{thread_id}/messages)
API->>Thread: Retrieve Last Message
Thread->>API: Provide Last Message
API->>User: Display Assistant's Response
I hope this helps shed some light on your questions!
You can’t create a thread with a specific assistant, but you can create a thread run with a specific assistant. Basically, you can tell assistant to answer based on provided thread, something like this:
I created a open source repo where I was able to get threads to work. It’s using the Chatbot-UI and doesn’t have the vision or file upload features but the threads works, all you need to do is put your Assistant_ID in the env file with your API key use it. The three API calls are in the server side index.js file if you want to go look.
This is totally unusable without having the ability to remove messages from threads. By the time you guys feel that the thread has reached the context size limit (128k ), we will be broke
The uploaded file is saved to the assistant, not the thread. I am not sure if you can check the assistant to see if the file is attached there. You can always update the assistant
2 questions: Is it possible to include image files in assistant creation, I know you can’t through messages yet, and what filetype would it be. And how do you set a max token limit for responses by assistants.
The whole assistants thing has the smell of being outsourced. The multi-faceted obliviousness in creating the thing and nothing being addressed about the major issues in weeks.
Did I miss something or there is no API to get Threads list?
How do I know assistant usage?
I wonder whats happens if somebody steal API key for assistant and will use tokens from account?
There is no way to control expences in that case.
Or in corporate environment: there is no control how employee use access to assistants.
They won’t incur cost unless you are running them. You need the thread id to delete them using the delete function. Note that clearing the Run in Playground does not delete them. However, deleting them using delete function will not list them up in the thread page.
By the way, they will be retained up to 60 days then after which will be automatically deleted. Although they noted that they are still evaluating about it.
It is a pity that openai does not have a programmatic way of listing threads. I had to devise a whole new way of using assistants to list threads of specific types. The demo is from within the betaassi framework; as I need it to do streaming with the Assistant API.
SPOILER ALERT: This way of enablement is fairly limiting. YMMV. But someone might find it helpful. Also the streaming is with Assistant API; but under the hoofd it makes use of chat completion. This streaming is NOT part of the framework. Just core components are.