Tokenizer and logit_bias with gpt4o and streaming api ver 1.47.1

MaxStreamZ · October 10, 2024, 3:09am

A developer I’m working with has tokenizer and logit_bias working for api ver 1.47.1 with gpt-4o in non-streaming mode.

It is not working in streaming mode. The developer is looking for resources to understand this issue.

Is there a path to streaming implementation of tokenizer and logit_bias? or
Is this known to be an openai bug or openai future work item?

_j · October 10, 2024, 3:54am

Logprobs are returned in a stream response also, unless they have been disabled by OpenAI obscuring the production of functions or structured responses. They also have to be enabled by chat completions parameter.

Logprobs contain the bytes returned in a chunk, which can extend across multiple tokens per chunk, for uncompressed Unicode beyond 7fh, for example. Token numbers are not reported.

The AI produces tokens, so we know token boundaries are coming out. However, you cannot encode large strings to tokens accurately without having the full text that BPE operates on, at least up to a non-joinable token like a number, or the unreceived tokens that enclose chat content messages.

You don’t discuss what the idea is behind this “tokenizer”, or how it would be “broken”, as API for a developer is basically language strings in and out. You can first turn on logprobs, receive as deltas, and reassemble them into a final response to see if that suits your need.

Here is a snippet to get and display the top-logprobs section of a response.

from openai import OpenAI
import numpy as np
client = OpenAI(timeout=30)

params = {
  "max_tokens": 4, "top_p":0.01, "stream": True,
  "max_tokens": 4,"logprobs": True, "top_logprobs": 3, "logit_bias": {},
  "messages": [
      {"role":"system","content":"""
You are a backend AI classifier. Response is for API: no markdown.
""".strip()},
      {"role":"user","content":"Produce flower color list, no chat."},
  ]
}

for model in ["gpt-4o", "gpt-4o-mini"]:
    params['model'] = model; print(f" -- for {model}")
    response = client.chat.completions.with_raw_response.create(**params)
    reply=""
    for chunk_no, chunk in enumerate(response.parse()):    # with_raw_response.create parsing
        print(f"\nchunk_no: {chunk_no}")
        if chunk.choices[0].delta.content:                 # if chunks with assistant
            reply += chunk.choices[0].delta.content        # gather for chat history
            for index, prob in enumerate(chunk.choices[0].logprobs.content):
                #print(index, end=': ')
                for top in prob.top_logprobs:
                    print(f"{repr(top.token)},  bytes:{top.bytes}, prob: {np.exp(top.logprob):05f}")
    print("\nresponse content:\n" + reply)

Producing output like

– for gpt-4o-mini

chunk_no: 0

chunk_no: 1
‘Red’, bytes:[82, 101, 100], prob: 0.696413
‘-’, bytes:[45], prob: 0.256196
‘1’, bytes:[49], prob: 0.030598

chunk_no: 2
‘,’, bytes:[44], prob: 0.988889
’ \n’, bytes:[32, 32, 10], prob: 0.010986
‘\n’, bytes:[10], prob: 0.000122

chunk_no: 3
’ Blue’, bytes:[32, 66, 108, 117, 101], prob: 0.632229
’ Pink’, bytes:[32, 80, 105, 110, 107], prob: 0.181137
’ Yellow’, bytes:[32, 89, 101, 108, 108, 111, 119], prob: 0.181137

chunk_no: 4
‘,’, bytes:[44], prob: 1.000000
’ ,', bytes:[32, 44], prob: 0.000000
‘،’, bytes:[216, 140], prob: 0.000000

chunk_no: 5

response content:
Red, Blue,

There’s a list of models in there it iterates over.

No, there is no “bug”.

Topic		Replies	Views
`logit_bias` not working as it did before Bugs api	7	544	June 12, 2024
JSON Response + logit_bias API	4	906	March 15, 2024
How to get token usage for each API call in streaming model? API	9	8479	December 14, 2023
Gpt-3.5 and gpt-4 endoftext token suppression / logit bias API gpt-4 , gpt-35-turbo	8	3593	July 20, 2023
API parameter logit_bias is non-functional, not affecting output at all Bugs bug , api , logit-bias	2	232	January 25, 2025

Tokenizer and logit_bias with gpt4o and streaming api ver 1.47.1

Related topics