What is this new Streaming parameter? I’m just seeing this new parameter and have no idea what it is
It basically allows you to receive tokens back in batches so that you can give the appearance of generation like with chatgpt
what does that mean appearance of generation?
Streaming is the sending of words as they are created by the AI language model one at a time, so you can show them as they are being generated.
(technical: Subscription to a server-sent push event)
oh no shit huh? Does that cost more or its just a fun thing you can add?
It is the same cost per token.
Instead of waiting 30 seconds for the complete answer, you start receiving almost immediately.
It is not “just fun”, imagine if you had to stare at ChatGPT for half a minute wondering what it was going to say.
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
So this works fine, but when I add stream: true…
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
stream: true,
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
res.json({
message: response.choices[0].message.content.trim(),
usage: response.usage,
});
});
All of a sudden i get this error
message: response.choices[0].message.content.trim(),
^
TypeError: Cannot read properties of undefined (reading '0')
at C:\Users\tvent\OneDrive\Desktop\gpt\server\index.js:204:30
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Yes, you’re going to get chunk objects, a format documented in the API reference, or dumped out for me just now.
{
"id": "chatcmpl-82dsfjjaofldfa3mOIDJFOIJ",
"object": "chat.completion.chunk",
"created": 1695524999,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"delta": {
"content": " for"
},
"finish_reason": null
}
]
}
You get them until finish_reason: stop or length
right im saying when i adjust my code though I don’t know how to get it to work.
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
stream: true,
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
for await (const chunk of response) {
console.log(chunk.choices[0].delta.content); // This correctly streams it in the terminal
}
res.json({
message: response.choices[0].delta.content.trim(), // does not stream to the front end
usage: response.usage,
});
});
Basically how do i get the res.json to stream? Im sending the message variable to the front end, that works and is all set up. So how do I make it work within instead of that for await loop in my res.json?
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
So this works fine, but when I add stream: true…
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
stream: true, // adding stream
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
for await (const chunk of response) {
console.log(chunk.choices[0].delta.content); // this code from the doc runs
}
res.json({
message: response.choices[0].delta.content.trim(), // this is causing the script to fail
usage: response.usage, // What happened to the usage object?
});
});
The console.log will run but It doesnt go to the res.json
All of a sudden i get this error
message: response.choices[0].delta.content.trim(),
^
TypeError: Cannot read properties of undefined (reading '0')
at C:\Users\tvent\OneDrive\Desktop\gpt\server\index.js:208:30
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Im trying to make sure that the streaming object gets attached to the message object in my res.json
any help is appreciated
really bad backend programmer here
Also, what happened to the usage object if you choose to do streaming?
The “all of a sudden” is likely that you are not handling the finish_reason case.
Or that you get a function_call, which just appears at the object root.
End of stream:
{
"id": "chatcmpl-jdfj933jaf03kar03raf",
"object": "chat.completion.chunk",
"created": 1111111111,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"delta": {
"content": "?"
},
"finish_reason": null
}
]
}
{
"id": "chatcmpl-82ojfoaFsD7",
"object": "chat.completion.chunk",
"created": 112222222,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}
so i should add stop to the end of the function?
It works on the console.log it is streaming and logging it
for await (const chunk of response) {
console.log(chunk.choices[0].delta.content); // this code from the doc runs
}
its not working in the res.json
res.json({
message: response.choices[0].delta.content.trim(), // this is causing the script to fail
usage: response.usage, // What happened to the usage object?
});
I provided the second to the last and the last object received in a stream.
See that part in your code where it says .delta.content?
See any “content” in the last chunk?
You need to short-circuit out by detecting the finish_reason first.
You’re truly just not being clear.
It IS WORKING in the log. in my terminal.
its not passing to the front end, its the same chunk.
app.post("/j", async (req, res) => { ... ....
...
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
stream: true, // adding stream
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
for await (const chunk of response) {
console.log(chunk.choices[0].delta.content); // this code from the doc runs
}
res.json({
message: response.choices[0].delta.content.trim(), // this is causing the script to fail
usage: response.usage, // What happened to the usage object?
});
});
the difference is this for await
but how do i use that within my res.json?
I don’t know the purpose of res.json in this context. The tools/environment you’re using are not my forte.
Typically what you’ll need to do is have two different mechanisms going
if not a finish_reason or massive_error:
- display the content of the chunk
- append the content of the chunk to your variable for capturing the message
then do message stuff like chat history with the completed message.
Here’s a little python chatbot at least.
what happens to the usage object if you convert to streaming?
You have to count tokens on your own with streaming which is the trade-off…
How do I count my own tokens without being told what my tokenage is? -_-
Also, if you are embedding the input and outputs, which you should do anyway to maintain good context within the bot, a nice side effect of Ada-002 is that is reports out the token counts.