Fantastic GPT-40 but...where are the docs? has a link for docs on GPT-4o and it points to the link which has no mention of GPT-4o

Is this already available on API? I guess so because the playground shows it when I am logged in

Anyone tried it yet?




Example script to use it:

from openai import OpenAI

client = OpenAI()

response =
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
          "type": "image_url",
          "image_url": {
            "url": "",


The first thing is that this model uses a different token encoder. If you were counting, you’ll have to grab tiktoken released 15 minutes ago.

MODEL_PREFIX_TO_ENCODING: dict[str, str] = {
    # chat
    "gpt-4o-": "o200k_base",  # e.g., gpt-4o-2024-05-13
    "gpt-4-": "cl100k_base",  # e.g., gpt-4-0314, etc., plus gpt-4-32k

Found that max_tokens and usage of 256 is counted at 258-268 by cl100k-base of my script, a hair more efficient on English:

However 512 gpt-4o tokens of Japanese is 702 tokens of cl100k (GPT-4) output.

Day 0 speed (110tps is about where gpt-3.5-turbo-instruct maxed out hours after release)

For 3 trials of gpt-4-0125-preview @ 2024-05-13 11:03AM:

Stat Minimum Maximum Average
stream rate Min: 24.4 Max: 31.8 Avg: 27.267
latency (s) Min: 0.78 Max: 1.103 Avg: 0.900
total response (s) Min: 8.848 Max: 11.2203 Avg: 10.373
total rate Min: 22.816 Max: 28.933 Avg: 24.972
response tokens Min: 256 Max: 256 Avg: 256.000

For 3 trials of gpt-4o @ 2024-05-13 11:03AM:

Stat Minimum Maximum Average
stream rate Min: 107.6 Max: 112.9 Avg: 110.200
latency (s) Min: 0.3701 Max: 0.525 Avg: 0.434
total response (s) Min: 2.6818 Max: 2.8598 Avg: 2.773
total rate Min: 90.216 Max: 96.204 Avg: 93.342
response tokens Min: 258 Max: 260 Avg: 258.667

The response is in about 5% less streaming chunks.


Nice chart _j thanks!

I just spent the last few weeks fine-tuning GPT 35 and got some awesome speeds but comparing to gpt-4o now makes me feel like I wasted my time! I hope we can fine tune gpt-4o ASAP. very nice though.

1 Like

Do you know how to use mp3 or mp4 files with this model?

As per the docs, it doesn’t seem to have audio input at this time


Oh, so this is in fact the better, fastest and cheaper gpt-4-vision? There is no voice as input or output.

So, in addition to the image URl or other input given in a reply, the CURL request is the same?

I await the docs for using audio in the API: that will be a killer (not so keen on video yet, for my use cases)

Anyone has any idea when this will be supported?

there’s an official staff post here:

curl looks the same. As usual, you can go on the playground and click on show code button top right (</>)

curl \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
  "model": "gpt-4o",
  "messages": [
      "role": "system",
      "content": [
          "type": "text",
          "text": "hello hello"
      "role": "user",
      "content": [
          "type": "text",
          "text": "what do you see?"
          "type": "image_url",
          "image_url": {
            "url": "..."
      "role": "assistant",
      "content": [
          "type": "text",
          "text": "This image appears to be an illustration containing several eyes grouped together in a circular, cross-like pattern. Each eye has a black pupil with a white background. \n\nAt the top-left corner, a speech bubble contains the following dialogue:\n\"It looks like you are afraid\"\n\"BE NOT AFRAID\" (in larger, less neat text)\n\nBelow this dialogue, there are two additional speech bubbles:\n\"I AM AFRAID\"\n\"I'\''M NOT AFRAID\"\n\nThe entire image is set against a light background, and there are several black splatters around the edges of the illustration. The eyes are connected, giving the impression of an abstract creature or object."
  "temperature": 1,
  "max_tokens": 256,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0