Print message token by token

Hello,

at the moment my tool sends a AJAX request using a PHP file that sends it to https://api.openai.com/v1/chat/completions and generates a JSON message that is rendered via JavaScript.

The problem is that the output is rendered at a strech and not token-via-token.

You could i implement that?

Best regards
Andreas

Meanwhile i got it to work. Here’s a small working example:

index.html:

<html>

    <head>
        <title>Stream Demo</title>
    </head>

    <body>

        <div id="content"/>
        <script>
           var eventSource = new EventSource("api.php");

           eventSource.onmessage = function (e) {
              if(e.data == "[DONE]")
              {
                  document.getElementById('content').innerHTML += "<br><br>Finished.";
                  eventSource.close();
              } else {
                  if (JSON.parse(e.data).choices[0]['delta']['content']) {
                    document.getElementById('content').innerHTML += JSON.parse(e.data).choices[0]['delta']['content'];
                  }
              }
           };
           eventSource.onerror = function (e) {
               console.log(e);
           };
        </script>

    </body>
</html>

api.php

<?php

    /* Andreas Koch, abkoch (at) posteo.de */

    $api_key = '<put-your-key-here>';
    $api_url = 'https://api.openai.com/v1/chat/completions';

    $ch = curl_init();

    $messages[] = array("role" => "user", "content" => "Tell me a story about you.");

    $post_fields = array(
        "model" => "gpt-3.5-turbo",
        "stream" => true,
        "temperature" => 0.7,
        "max_tokens" => 512,
        "top_p" => 1,
        "frequency_penalty" => 0,
        "presence_penalty" => 0,
        "stop" => ["\\n"],
        "messages" => $messages
    );

    $header  = [
        'Content-Type: application/json',
        'Authorization: Bearer ' . $api_key
    ];

    header('Content-Type: text/event-stream');
    header('Cache-Control: no-cache');
    header('Connection: keep-alive');
    header('X-Accel-Buffering: no'); 

    curl_setopt($ch, CURLOPT_URL, $api_url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($post_fields));
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    curl_setopt($ch, CURLOPT_WRITEFUNCTION, function ($ch, $data) {
       echo $data;
       echo PHP_EOL;
       ob_flush();
       flush();
       return strlen($data);
    });

    curl_exec($ch);

?>
1 Like

stream (defaults to false…)

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI Cookbook for example code.

https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream

Here you go. Hope your weekend is going well…

1 Like

Thanks. I already found a solution and posted it yesterday. But Askimet blocked it.

Did you paste in the entire response and then send it?

As a Discourse admin on another site Askimet will flag post by users with low Discourse trust levels that were not entered by hand. The reason for flagging such post is that this is a key indicator of spam bots as bots are much faster than humans at creating post. If it has been more than 24hrs since posting and if the admins and moderators here were doing what they need to then your post should have been reviewed and a response sent back already.

I was once on another well known site where the all the admins and moderators were not active for months, so I turned the use of the site into a social experiment, I hope the same does not happen here.

Yes, i copy&pasted my code out of two files i’ve created.

My trust level was increased some minutes ago. So i give it another try:

api.php:

<?php

    $api_key = '<put-your-key-here>';
    $api_url = 'https://api.openai.com/v1/chat/completions';

    $ch = curl_init();

    $messages[] = array("role" => "user", "content" => "Tell me a story about you.");

    $post_fields = array(
        "model" => "gpt-3.5-turbo",
        "stream" => true,
        "temperature" => 0.7,
        "max_tokens" => 512,
        "top_p" => 1,
        "frequency_penalty" => 0,
        "presence_penalty" => 0,
        "stop" => ["\\n"],
        "messages" => $messages
    );

    $header  = [
        'Content-Type: application/json',
        'Authorization: Bearer ' . $api_key
    ];

    header('Content-Type: text/event-stream');
    header('Cache-Control: no-cache');
    header('Connection: keep-alive');
    header('X-Accel-Buffering: no'); 

    curl_setopt($ch, CURLOPT_URL, $api_url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($post_fields));
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    curl_setopt($ch, CURLOPT_WRITEFUNCTION, function ($ch, $data) {
       echo $data;
       echo PHP_EOL;
       ob_flush();
       flush();
       return strlen($data);
    });

    curl_exec($ch);

?>

index.html:

<html
    <head>
        <title>Stream Demo</title>
    </head>

    <body>

        <div id="content"/>
        <script>
           var eventSource = new EventSource("api.php");

           eventSource.onmessage = function (e) {
              if(e.data == "[DONE]")
              {
                  document.getElementById('content').innerHTML += "<br><br>Finished.";
                  eventSource.close();
              } else {
                  if (JSON.parse(e.data).choices[0]['delta']['content']) {
                    document.getElementById('content').innerHTML += JSON.parse(e.data).choices[0]['delta']['content'];
                  }
              }
           };
           eventSource.onerror = function (e) {
               console.log(e);
           };
        </script>

    </body>
</html>
2 Likes

How do we get the tokens count in this example?

Any ideas please?

Thanks

If you want an estimation, you can count the number of network SSE delta chunks you receive. That only works generally when language is only ASCII English; anything Unicode or emoji may have multiple tokens per chunk to send you a complete character, so it would be an underestimation of costs.

The correct way is to reassemble the AI language into a single string, and use the correct token encoder, tiktoken, to count tokens in the response.

The better correct way: If OpenAI can send you a finish reason chunk, they can send you usage - but just don’t.

1 Like

Ah ok

So it is by design :man_facepalming:

Because, when I don’t stream it, i get the tokens usage