Hello, we’re currently looking into migrating from the depricated Assistants API to the new Responses API.
Old implementation
For our use case we made use of the creation of an Assistant via the API to make a new Assistant for each of our clients. This also made it possible to keep instructions given to an Assistant on our platform for each client.
The problem we’re facing
With the new Responses API, Prompts are the new Assistants. The problem with Prompts is that its not possible to CRUD them via an API.
Our solution
What we came up with after reading the guide, is using a single Prompt that will have variables used by each client (user data vars only), and to pass the instructions via a Response Object.
Questions:
Is there a better way to solving the problem we are facing? (Not being able to create Prompts dynamically) And is our solution a good idea?
Is there a set limit to the Response Object instructions property?
using “prompts”: completely optional (with issues because you cannot retrieve the contents to run “includes” parameters on tools correctly, or alter via API);
using “variables”: quite optional, a cache-breaking workaround to those unalterable prompts by API;
using “conversations”: completely optional (with issues because of near-unlimited input costs, you cannot clean up an expired code interpreter container and issues with tools vs reasoning, and storing unwanted response state); not meant for an initial instruction.
using “previous_response_id”: completely optional, an unmanageable state you cannot migrate out of, but from which you can branch.
even using “instructions” as an API parameter: completely optional.
You can set “store”: “false” and construct every turn with initial system/developer role messages as “instructions”, previous chat history messages you’ve retained yourself and manage in length, and then newest input, capable of being shaped by near-placement of further temporary messages, and then even every sequence of function call yourself, per API call.
“instructions” is a per-turn every-turn field inserted before any “input” messages you pass. It comes before previous input by “conversation” or “previous response ID” (where when employing the latter two, instructions are near-mandatory if you allow conversations to grow longer than model input and first messages are discarded). It is most like the instructions field of an “Assistant”, except you furnish the text each time instead of the assistant ID each time.
So solving the problems with the offered solutions is not using these turnkey proprietary generics.
Thanks for the detailed breakdown.
We’re mostly looking for a practical pattern to pass per-client instructions each turn. From your experience, is there any hidden limit or best practice for how long the instructions field can be?
I’ve got 20MB of dereferenced OpenAPI specification over here - lets see how much of it can be ingested. Encode half-a-million tokens for a million-token model perhaps?
import json, httpx, tiktoken; from pathlib import Path
INPUT_FILENAME: str = "openai.documented.yml"
MAX_TOKENS: int = 500000
def get_api_key_headers():
import os
return {"Authorization": f"Bearer {os.environ.get("OPENAI_API_KEY")}"}
try:
base_dir = Path(__file__).parent
except NameError:
base_dir = Path.cwd()
text_path = base_dir / INPUT_FILENAME
text: str = text_path.read_text(encoding="utf-8")
enc = tiktoken.get_encoding("o200k_base")
tokens: list[int] = enc.encode(text, disallowed_special=())
truncated_tokens: list[int] = tokens[:MAX_TOKENS]
doc: str = enc.decode(truncated_tokens)
payload = {
"model": "gpt-4.1-nano",
"instructions": f"Reference documentation retrieval: {doc}",
"input": "What is the topic of the documentation?",
"store": False,
"max_output_tokens": 200,
}
try:
response = httpx.post(
"https://api.openai.com/v1/responses",
json=payload,
headers={**get_api_key_headers()},
timeout=300,
)
response.raise_for_status()
assistant_texts = [
content["text"]
for output in response.json().get("output", [])
for content in output.get("content", [])
if content.get("type") == "output_text" and "text" in content
]
print("\n---\n\nCollected response text:\n" + str(assistant_texts))
print(response.json().get("usage", {}))
except:
print(
response.status_code,
json.loads(response.content.decode())["error"]["message"]
)
Nope, OpenAI is counting characters.
400 Invalid 'instructions': string too long. Expected a string with maximum length 1048576, but got a string with length 3896434 instead.
So if you’ve got under a megabyte of “instructions” for the AI to follow, <1MB of network traffic for each turn, you should be okay. The rest will have to be in more messages.