Feature request: get generated tokens back with request

belenos46 · July 28, 2023, 11:37pm

Team: I’m using the openAI API to generate responses to an arbitrary set of user inputs, then creating a relation graph to illustrate similarities between the questions. The end result will be a graph that displays n user inputs and their calculated relation to each other.

Since I anticipate this data set eventually being quite large, indexing and search is going to be an issue. I’d like to use the tokenized input text as part of my addressing scheme, as I think that would provide options for a useful search parameter. Is there a way to get those already?

If not, would it be possible to add an optional field “returnTokens” to openai.Completion.create, such that when returnTokens = true, the response object includes the tokens generated from the input text?

I realize I could use another library to tokenize the input and index that way, but if the project continues to use openai, having direct access to the generated tokens could prove useful for later development.

supershaneski · July 29, 2023, 12:58am

Do you mean like this (see prompt_tokens)?

{
  "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
  "object": "text_completion",
  "created": 1589478378,
  "model": "text-davinci-003",
  ...
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}

_j · July 29, 2023, 1:35am

That information provided with a standard API .create(), while probably the answer here, is not provided when the answer is streamed.

tokens = response['usage']['prompt_tokens']

The feature request needed is another API inquiry endpoint, that can return the message again for the id, or at least its final metadata.

A non-breaking api parameter, that would need new libraries, would be a “stream_usage=True”, sending a packet of usage after the finish reason in the subscription.

Foxalabs · July 29, 2023, 6:59am

This sounds like you need the OpenAI tiktoken library, correctly configured it can locally tokenise your text and give you a list of the values.

cap · July 29, 2023, 9:37am

Do you mean the EMBEDDING of the reply?

Topic		Replies	Views
Token count for completion call? API	6	2183	December 19, 2023
Feature request: token injection during streaming for structured output generation API	4	1316	May 17, 2023
Feature request: Query token counts via API Prompting	3	1629	May 24, 2022
Get tokens_used by request_id? Feedback	0	219	June 9, 2024
Request: Query for a models max tokens API	13	5732	June 12, 2024

Feature request: get generated tokens back with request

Related topics