Allow generating restricted client api tokens! GPT-4o low latency is wasted when forwarding client requests

Please allow generation of restricted client api tokens!

Routing client requests through backend servers defeats the purpose of all the amazing work done on improving GPT-4o latency. It can double the latency in some cases (320ms is advertised for GPT-4o and a backend route can easily add another 300ms).

Minimum features requested:

  • API call to generate a client token.
  • Set expiry.
  • Set rate limit.
  • Set allowed endpoints.

If you agree, please add a like!

1 Like