Can someone explain: A key change introduced by this API is persistent and infinitely long threads, which allow developers to hand off thread state management to OpenAI and work around context window constraints

What does this mean? Do threads count towards the context? What it this grows large, are they basically using RAG against older messages? Summaries maybe?


I may be incorrect here (still learning).

A thread represents a single conversation.

The threads contain the messages which contain the context (and count as tokens if that’s what I think you mean)

They are “smartly” truncated.

You can access your threads, and the messages within anytime. You can also modify the messages if you want to perform your own work.

Lastly, the thread also contains information regarding the current status. So we can continuously poll the thread until the work is completed, or additional actions are required.

Does that mean they discard the older messages, in which case we need to store the messages on the client side, as the user might scroll up to see a very old message

Good question. I don’t know. The documentation doesn’t go into any details. Truncating typically means cutting/discarding the older messages. The “smartly” is a curveball though.

Trying it out now and hopefully will have a better answer

Has anyone gotten a sense of what threads are doing? I am trying to work with them today and my Thread convos seem to be performing much worse in terms of instruction following than my Chat convos.

1 Like

how did you create a thread convo? i was able to create the thread object but creating a message returns the same message as object. how do you start a convo within thread?

Once you add a message you need to then need to create a “run” object by attaching an assistant to the thread

async function main() {
  const run = await openai.beta.threads.runs.create(
    { assistant_id: "asst_nGl00s4xa9zmVY6Fvuvz9wwQ" }