Heres me making a blank post because OpenAI mods REFUSE to delete them themselves. Delted post. deleted post.
It basically allows you to receive tokens back in batches so that you can give the appearance of generation like with chatgpt
what does that mean appearance of generation?
Streaming is the sending of words as they are created by the AI language model one at a time, so you can show them as they are being generated.
(technical: Subscription to a server-sent push event)
oh no shit huh? Does that cost more or its just a fun thing you can add?
It is the same cost per token.
Instead of waiting 30 seconds for the complete answer, you start receiving almost immediately.
It is not “just fun”, imagine if you had to stare at ChatGPT for half a minute wondering what it was going to say.
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
So this works fine, but when I add stream: true…
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
stream: true,
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
res.json({
message: response.choices[0].message.content.trim(),
usage: response.usage,
});
});
All of a sudden i get this error
message: response.choices[0].message.content.trim(),
^
TypeError: Cannot read properties of undefined (reading '0')
at C:\Users\tvent\OneDrive\Desktop\gpt\server\index.js:204:30
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Yes, you’re going to get chunk objects, a format documented in the API reference, or dumped out for me just now.
{
"id": "chatcmpl-82dsfjjaofldfa3mOIDJFOIJ",
"object": "chat.completion.chunk",
"created": 1695524999,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"delta": {
"content": " for"
},
"finish_reason": null
}
]
}
You get them until finish_reason: stop or length
right im saying when i adjust my code though I don’t know how to get it to work.
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: message,
},
],
stream: true,
temperature: 1.1,
max_tokens: 600,
top_p: 1,
frequency_penalty: 0.3,
presence_penalty: 0.5,
});
for await (const chunk of response) {
console.log(chunk.choices[0].delta.content); // This correctly streams it in the terminal
}
res.json({
message: response.choices[0].delta.content.trim(), // does not stream to the front end
usage: response.usage,
});
});
Basically how do i get the res.json to stream? Im sending the message variable to the front end, that works and is all set up. So how do I make it work within instead of that for await loop in my res.json?
The “all of a sudden” is likely that you are not handling the finish_reason case.
Or that you get a function_call, which just appears at the object root.
End of stream:
{
"id": "chatcmpl-jdfj933jaf03kar03raf",
"object": "chat.completion.chunk",
"created": 1111111111,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"delta": {
"content": "?"
},
"finish_reason": null
}
]
}
{
"id": "chatcmpl-82ojfoaFsD7",
"object": "chat.completion.chunk",
"created": 112222222,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}
so i should add stop to the end of the function?
It works on the console.log it is streaming and logging it
for await (const chunk of response) {
console.log(chunk.choices[0].delta.content); // this code from the doc runs
}
its not working in the res.json
res.json({
message: response.choices[0].delta.content.trim(), // this is causing the script to fail
usage: response.usage, // What happened to the usage object?
});
I provided the second to the last and the last object received in a stream.
See that part in your code where it says .delta.content?
See any “content” in the last chunk?
You need to short-circuit out by detecting the finish_reason first.
I don’t know the purpose of res.json in this context. The tools/environment you’re using are not my forte.
Typically what you’ll need to do is have two different mechanisms going
if not a finish_reason or massive_error:
- display the content of the chunk
- append the content of the chunk to your variable for capturing the message
then do message stuff like chat history with the completed message.
Here’s a little python chatbot at least.
what happens to the usage object if you convert to streaming?
You have to count tokens on your own with streaming which is the trade-off…
How do I count my own tokens without being told what my tokenage is? -_-
If you search the forum, you can find out more on tiktokken
Also, if you are embedding the input and outputs, which you should do anyway to maintain good context within the bot, a nice side effect of Ada-002 is that is reports out the token counts.
Here is new docs providing chunks of streaming. https://platform.openai.com/docs/api-reference/streaming