How to Reuse Existing Assistant Context with OpenAI Realtime API?

We have developed an assistant using GPT-4 (GPT4o-mini) that utilizes documents like PDFs, Docs files, and specific instructions to provide tailored responses. The assistant relies on attaching documents and instructions via an assistantId to maintain its context.

Now, we are building a new functionality where we want to reuse this existing assistant’s capabilities with OpenAI’s realtime API. However, we are unable to find a way to:

  1. Attach documents or contextual data during a realtime API session.
  2. Reuse the assistant’s existing context or settings dynamically.

We’ve gone through OpenAI’s official documentation but couldn’t identify a clear way to achieve this.

Looking for community guidance on:

  • How to attach documents or context dynamically in the realtime API?
  • Ways to load and reuse assistant configurations (like assistantId) in the realtime API.
  • Alternative approaches to replicate this behavior using OpenAI’s API.

If anyone has tackled a similar challenge or has ideas on how to achieve this, we’d greatly appreciate your insights. Thank you in advance!

Hey there and welcome to the community!

After taking a quick peek at the docs, it looks like you can add specific text input and output here:

https://platform.openai.com/docs/guides/realtime-model-capabilities#text-inputs-and-outputs

There also seems to be a method for adding supplemental or alternative context here:

https://platform.openai.com/docs/guides/realtime-model-capabilities#create-a-custom-context-for-responses

@Macha

Thanks for the response and for sharing the links! I checked the documentation, but I’m still trying to figure out how we can attach multiple documents effectively.

We have 7-8 PDFs and Docs files that we want to reuse as context in our assistant. In our existing setup with assistantId, these documents are attached, and the assistant references them for generating responses. However, with the realtime API, I don’t see a direct way to attach these documents.

From what I understand:

  1. The realtime API allows adding custom context dynamically.
  2. It doesn’t seem to support direct document uploads like the assistant-based approach.

Would you suggest extracting text from these documents and dynamically inserting relevant excerpts into the API requests? If so, what’s the best way to structure this so the model can refer to the right sections without hitting token limits?

Any insights or alternative approaches would be really helpful! Thanks again.

1 Like

Yep yep!

So if it were me, I’d be storing these PDFs documents (or the texts within them) in some kind of vector data base or some kind of RAG system. Then, depending on the user request, the language model could pull what information is relevant based on the vector of the text data. The language model should be able to do the heavy lifting for you, although I’ll admit I haven’t tried this specifically with the realtime API, but the principles should be the same.