What is this new Streaming parameter?

What is this new Streaming parameter? I’m just seeing this new parameter and have no idea what it is

It basically allows you to receive tokens back in batches so that you can give the appearance of generation like with chatgpt

2 Likes

what does that mean appearance of generation?

Streaming is the sending of words as they are created by the AI language model one at a time, so you can show them as they are being generated.


(technical: Subscription to a server-sent push event)

1 Like

oh no shit huh? Does that cost more or its just a fun thing you can add?

It is the same cost per token.

Instead of waiting 30 seconds for the complete answer, you start receiving almost immediately.

It is not “just fun”, imagine if you had to stare at ChatGPT for half a minute wondering what it was going to say.

4 Likes

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: message,
      },
    ],
    temperature: 1.1,
    max_tokens: 600,
    top_p: 1,
    frequency_penalty: 0.3,
    presence_penalty: 0.5,
  });

So this works fine, but when I add stream: true…

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: message,
      },
    ],
    stream: true,
    temperature: 1.1,
    max_tokens: 600,
    top_p: 1,
    frequency_penalty: 0.3,
    presence_penalty: 0.5,
  });

  res.json({
    message: response.choices[0].message.content.trim(),
    usage: response.usage,
  });
});

All of a sudden i get this error

    message: response.choices[0].message.content.trim(),
                             ^

TypeError: Cannot read properties of undefined (reading '0')
    at C:\Users\tvent\OneDrive\Desktop\gpt\server\index.js:204:30  
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)


Yes, you’re going to get chunk objects, a format documented in the API reference, or dumped out for me just now.


{
  "id": "chatcmpl-82dsfjjaofldfa3mOIDJFOIJ",
  "object": "chat.completion.chunk",
  "created": 1695524999,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": " for"
      },
      "finish_reason": null
    }
  ]
}

You get them until finish_reason: stop or length

right im saying when i adjust my code though I don’t know how to get it to work.

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: message,
      },
    ],
    stream: true,
    temperature: 1.1,
    max_tokens: 600,
    top_p: 1,
    frequency_penalty: 0.3,
    presence_penalty: 0.5,
  });

  for await (const chunk of response) {
    console.log(chunk.choices[0].delta.content); // This correctly streams it in the terminal 
  }

  res.json({
    message: response.choices[0].delta.content.trim(), // does not stream to the front end
    usage: response.usage,
  });
});

Basically how do i get the res.json to stream? Im sending the message variable to the front end, that works and is all set up. So how do I make it work within instead of that for await loop in my res.json?

const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: message,
      },
    ],
    temperature: 1.1,
    max_tokens: 600,
    top_p: 1,
    frequency_penalty: 0.3,
    presence_penalty: 0.5,
  });

So this works fine, but when I add stream: true…

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: message,
      },
    ],
    stream: true, // adding stream
    temperature: 1.1,
    max_tokens: 600,
    top_p: 1,
    frequency_penalty: 0.3,
    presence_penalty: 0.5,
  });

  for await (const chunk of response) {
    console.log(chunk.choices[0].delta.content); // this code from the doc runs
  }

  res.json({
    message: response.choices[0].delta.content.trim(), // this is causing the script to fail
    usage: response.usage, // What happened to the usage object?
  });
});

The console.log will run but It doesnt go to the res.json
All of a sudden i get this error

    message: response.choices[0].delta.content.trim(),
                             ^

TypeError: Cannot read properties of undefined (reading '0')
    at C:\Users\tvent\OneDrive\Desktop\gpt\server\index.js:208:30  
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Im trying to make sure that the streaming object gets attached to the message object in my res.json

any help is appreciated

really bad backend programmer here :slight_smile:

Also, what happened to the usage object if you choose to do streaming?

1 Like

The “all of a sudden” is likely that you are not handling the finish_reason case.
Or that you get a function_call, which just appears at the object root.

End of stream:


{
  "id": "chatcmpl-jdfj933jaf03kar03raf",
  "object": "chat.completion.chunk",
  "created": 1111111111,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "?"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "chatcmpl-82ojfoaFsD7",
  "object": "chat.completion.chunk",
  "created": 112222222,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}
1 Like

so i should add stop to the end of the function?

It works on the console.log it is streaming and logging it

  for await (const chunk of response) {
    console.log(chunk.choices[0].delta.content); // this code from the doc runs
  }

its not working in the res.json

  res.json({
    message: response.choices[0].delta.content.trim(), // this is causing the script to fail
    usage: response.usage, // What happened to the usage object?
  });
1 Like

I provided the second to the last and the last object received in a stream.

See that part in your code where it says .delta.content?

See any “content” in the last chunk?

You need to short-circuit out by detecting the finish_reason first.

You’re truly just not being clear.

It IS WORKING in the log. in my terminal.

its not passing to the front end, its the same chunk.


app.post("/j", async (req, res) => { ... ....
...

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: message,
      },
    ],
    stream: true, // adding stream
    temperature: 1.1,
    max_tokens: 600,
    top_p: 1,
    frequency_penalty: 0.3,
    presence_penalty: 0.5,
  });

  for await (const chunk of response) {
    console.log(chunk.choices[0].delta.content); // this code from the doc runs
  }

  res.json({
    message: response.choices[0].delta.content.trim(), // this is causing the script to fail
    usage: response.usage, // What happened to the usage object?
  });
});

the difference is this for await
but how do i use that within my res.json?

I don’t know the purpose of res.json in this context. The tools/environment you’re using are not my forte.

Typically what you’ll need to do is have two different mechanisms going

if not a finish_reason or massive_error:

  • display the content of the chunk
  • append the content of the chunk to your variable for capturing the message

then do message stuff like chat history with the completed message.

Here’s a little python chatbot at least.

what happens to the usage object if you convert to streaming?

You have to count tokens on your own with streaming which is the trade-off…

How do I count my own tokens without being told what my tokenage is? -_-

If you search the forum, you can find out more on tiktokken

Also, if you are embedding the input and outputs, which you should do anyway to maintain good context within the bot, a nice side effect of Ada-002 is that is reports out the token counts.

1 Like