ApiGateway - Stream Buffering Token Response Header

We are using openai behind our enterprise API-gateway. Some of our consumers are interested in using SSE (streams). However, we face a challenge when trying to capture token information from the request because our gateway buffers the response when we introspect the response body, which disables the stream to the consumer.

To address this issue, I suggest a solution to the OpenAI product team.
We propose adding the

  • model
  • prompt_tokens
  • completion_tokens
  • total_tokens

By adding these headers on the response would provide the necessary data without requiring us to introspect the body. Furthermore, this approach is simple and non-intrusive since it can be received in the initial response without interruption allowing proxies and gateways to read the data without service interruption to our consumers.

Please could someone review the proposal as we have a pressing need within our organisation