Actions that return images for inline viewing

I am trying to put together a Custom GPT with several custom actions. For one of those I would basically like to setup Kroki to enable a wide selection of diagram rendering (I realize ChatGPT has some support build in, but I will have same problem for some other tools).

I have generated an OpenAPI schema for kroki(.io) that seems to work fine through Postman etc, and set it up as an action in my custom GPT, but it fails to work.

I am using this schema:

openapi: 3.1.0
info:
  title: Kroki API
  description: API for generating diagrams from text.
  version: 1.0.0
servers:
  - url: https://kroki.io
paths:
  /:
    post:
      operationId: generateDiagram
      summary: Generate a diagram from plain text
      description: Converts a plain text diagram definition into an image.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                diagram_source:
                  type: string
                  description: The diagram definition in plain text.
                diagram_type:
                  type: string
                  description: The type of diagram (e.g., graphviz, mermaid).
                output_format:
                  type: string
                  description: The desired output format.
                  enum:
                    - png
                    - svg
                    - jpeg
              required:
                - diagram_source
                - diagram_type
                - output_format
      responses:
        "200":
          description: Successful response containing the generated diagram.
          content:
            application/octet-stream:
              schema:
                type: string
                format: binary
        "400":
          description: Invalid request body.
        "500":
          description: Internal server error.

What does the ChatGPT UI expect from an action that returns an image?