Parallel Response calls to the same Conversation

Hi, correct me if i am wrong, but my understanding of the Responses API and Conversations API is this: The Conversations API is used to store Responses, and the Responses API is used to create Responses when given said Conversation IDs.

My use case is that, i want to generate two different responses (two parallel API calls) everytime a user message is appended to a particular conversation. But I get “Another process is currently operating on this conversation. Please retry in a few seconds.”

Is there anyway I can make this happen? Thanks

1 Like

Hi, welcome to the community!

Your understanding is correct. Conversations hold the history, and the Responses API generates responses within those conversations. You’re seeing the lock message because each conversation object can only handle one active request at a time.

A good workaround is to retrieve your existing conversation, create a copy (a new conversation object) with the same items, and then run your two API calls simultaneously—each using its own separate conversation.

I’ve put together a small Python script as an example how this can be done.
Hope this helps!

Sample Scipt to branch a conversation
import os
from getpass import getpass
from openai import OpenAI


def get_client() -> OpenAI:
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        api_key = getpass("OPENAI_API_KEY: ").strip()
    return OpenAI(api_key=api_key)


def get_all_conversation_items(client: OpenAI, conversation_id: str):
    all_items = []
    after = None

    while True:
        page = client.conversations.items.list(
            conversation_id,
            limit=100,
            order="asc",
            after=after,
        )
        all_items.extend(page.data)
        if not page.has_more:
            break
        after = page.last_id

    return all_items


def fork_conversation(client: OpenAI, conversation_id: str, metadata=None):
    items = get_all_conversation_items(client, conversation_id)
    cloned_items = [
        {
            "type": "message",
            "role": item.role,
            "content": item.content,
        }
        for item in items
        if item.type == "message"
    ]

    return client.conversations.create(
        metadata=metadata or {},
        items=cloned_items,
    )


def ask_followup(client: OpenAI, conv_id: str, user_message: str, model: str) -> str:
    response = client.responses.create(
        model=model,
        conversation=conv_id,
        input=[{"role": "user", "content": user_message}],
        max_output_tokens=256,
    )
    return response.output_text


def main():
    MODEL = "gpt-4.1-mini"
    client = get_client()

    # Base conversation
    base_conv = client.conversations.create(
        metadata={"topic": "branching-demo"},
        items=[
            {
                "type": "message",
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": (
                            "We are planning a weekend trip to Paris. "
                            "We like museums, good food, and walking around the city."
                        ),
                    }
                ],
            }
        ],
    )

    print(f"Base conversation id: {base_conv.id}")

    first_resp = client.responses.create(
        model=MODEL,
        conversation=base_conv.id,
        input=[
            {
                "role": "user",
                "content": "Give me a quick 3-bullet itinerary.",
            }
        ],
        max_output_tokens=256,
    )

    print("\nFirst answer:")
    print(first_resp.output_text)

    # Fork into two branches
    conv_a = fork_conversation(
        client,
        base_conv.id,
        metadata={"branch": "A", "source_conversation": base_conv.id},
    )
    conv_b = fork_conversation(
        client,
        base_conv.id,
        metadata={"branch": "B", "source_conversation": base_conv.id},
    )

    # Two different follow-ups
    answer_a = ask_followup(
        client,
        conv_a.id,
        "Continue this conversation, but respond like a very formal travel agent.",
        MODEL,
    )

    answer_b = ask_followup(
        client,
        conv_b.id,
        "Continue this conversation, but respond like a funny travel buddy.",
        MODEL,
    )

    print("\nBranch A (formal agent):")
    print(answer_a)

    print("\nBranch B (funny buddy):")
    print(answer_b)


if __name__ == "__main__":
    main()

1 Like

Conversations additionally should be understood as: automatically updated with the addition of the newest user input and the AI output when they are run. That is why this idea cannot work in conjunction with the conversations’ method for chat history.

Better would be to use the previous_response_id mechanism for chaining prior responses with an intrinsic chat history. You then can reuse the same ID multiple times, each creating a seperate output with its own response ID to branch from.

The Chat Completions API naturally supports an n= parameter, where, for the cost of one input, will deliver the specified count of outputs with different sampling variations in the generation. You provide your own retention of any input messages that make up a prior chat there. This is truly ‘parallel responses’ as they come as response.choices[N] for you to see in the API response array. The Responses has not implemented this, among other missing parameters and features.

1 Like