[Realtime] Server Error When Forcing Specific Tool

I’m getting this server error without any details when I send an out-of-band response.create event using gpt-realtime model (GA).

{
  "event_id": "event_CR4yVT961RcMPnsGP4Wgf",
  "type": "response.done",
  "response": {
    "id": "resp_CR4yVgPDDZHhw2vvs366E",
    "object": "realtime.response",
    "status": "failed",
    "status_details": {
      "error": {
        "type": "server_error"
      },
      "type": "failed"
    },
    "output": [],
    "usage": {
      "total_tokens": 0,
      "input_tokens": 0,
      "output_tokens": 0,
      "input_token_details": {
        "cached_tokens": 0,
        "text_tokens": 0,
        "audio_tokens": 0,
        "cached_tokens_details": {
          "text_tokens": 0,
          "audio_tokens": 0
        }
      },
      "output_token_details": {
        "text_tokens": 0,
        "audio_tokens": 0
      }
    },
    "max_output_tokens": "inf",
    "metadata": {
      "response_type": "out-of-band"
    }
  }
}

This is the event I’m sending.

{
  "type": "response.create",
  "response": {
    "instructions": "Summarize the conversation so far",
    "tools": [
      {
        "type": "function",
        "name": "return_response",
        "description": "Return the response to the agent",
        "parameters": {
          "type": "object",
          "properties": {
            "response": {
              "type": "string"
            }
          },
          "required": [
            "response"
          ],
          "additionalProperties": false
        }
      }
    ],
    "tool_choice": {
      "type": "function",
      "name": "return_response"
    },
    "conversation": "none",
    "output_modalities": [
      "text"
    ],
    "metadata": {
      "response_type": "out-of-band"
    }
  }
}

I don’t get an error if I update tool_choice to a string, but according to the API reference, I should be able to specify an object, so is there a bug?

I think your intention is to force it to call your tool with a summary? There’s a great openai example for best practice: Context Summarization with Realtime API

apologies if that’s not what you’re trying to do and the goal is just to understand in principle why your call is failing. If that’s the case, I’ve never done it that way (force a tool call that returns the content you want).