There is mostly one element per chunk. You cannot measure tokens by it, except to estimate a minimum.
However, it does seem to be a feasible concept to look at logprobs for some utility, because we get a list of logprobs per chunk.
Logprobs are blocked, however, when tool_call is emitted, so it is not a complete solution:
tool chunk_no: 0
Traceback (most recent call last):
for index, prob in enumerate(chunk.choices[0].logprobs.content):
TypeError: 'NoneType' object is not iterable
gpt-4-vision = no logprobs either
Specifying tools to gpt-4-turbos steals extra tokens from content output because of shady tool tricks to prevent control.
For an example where āchunksā does not equal tokens, letās just get emojis, and make a clear presentation from logprobs within:
chunk_no: 0
chunk_no: 1
0: [240, 159, 152]
1: [128]
chunk_no: 2
0: [240, 159, 152]
1: [131]
chunk_no: 3
0: [240, 159, 152]
1: [132]
chunk_no: 4
0: [240, 159, 152]
1: [129]
chunk_no: 5
0: [240, 159, 152]
1: [134]
chunk_no: 6
response content:
ššššš
{'tool_calls': []}
The AI writes a tool call, though? No token count for you! Showing here:
chunk_no: 0
0: ('content', None)
chunk_no: 1
tools content:
{ātool_callsā: [{āidā: ācall_idnumberā, āfunctionā: {āargumentsā: āā, ānameā: āget_random_intā}, ātypeā: āfunctionā}]}
(unseen overhead of tool_calls and max_tokens cuts off arguments)
Parsing code snippet, Python
response = your streaming API call, .chat.completions.with_raw_response.create()
reply=""
tools=[]
for chunk_no, chunk in enumerate(response.parse()): # with_raw_response.create
print(f"\nchunk_no: {chunk_no}")
if chunk.choices[0].delta.content: # if chunks with assistant
reply += chunk.choices[0].delta.content # gather for chat history
for index, prob in enumerate(chunk.choices[0].logprobs.content):
print(index, end=': '); print(prob.bytes, end='\n')
if chunk.choices[0].delta.tool_calls: # if chunks with tool call
for index, prob in enumerate(chunk.choices[0].logprobs): # None
print(index, end=': '); print(prob, end='\n')
tools += chunk.choices[0].delta.tool_calls # gather ChoiceDeltaToolCall list
tools_obj = tool_list_to_tool_obj(tools) # forum search: "messy tool deltas"
print("\nresponse content:\n" + reply)
print(tools_obj)