Feature request: get generated tokens back with request

Team: I’m using the openAI API to generate responses to an arbitrary set of user inputs, then creating a relation graph to illustrate similarities between the questions. The end result will be a graph that displays n user inputs and their calculated relation to each other.

Since I anticipate this data set eventually being quite large, indexing and search is going to be an issue. I’d like to use the tokenized input text as part of my addressing scheme, as I think that would provide options for a useful search parameter. Is there a way to get those already?

If not, would it be possible to add an optional field “returnTokens” to openai.Completion.create, such that when returnTokens = true, the response object includes the tokens generated from the input text?

I realize I could use another library to tokenize the input and index that way, but if the project continues to use openai, having direct access to the generated tokens could prove useful for later development.

Do you mean like this (see prompt_tokens)?

{
  "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
  "object": "text_completion",
  "created": 1589478378,
  "model": "text-davinci-003",
  ...
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}
1 Like

That information provided with a standard API .create(), while probably the answer here, is not provided when the answer is streamed.

tokens = response['usage']['prompt_tokens']

The feature request needed is another API inquiry endpoint, that can return the message again for the id, or at least its final metadata.

A non-breaking api parameter, that would need new libraries, would be a “stream_usage=True”, sending a packet of usage after the finish reason in the subscription.

1 Like

This sounds like you need the OpenAI tiktoken library, correctly configured it can locally tokenise your text and give you a list of the values.

Do you mean the EMBEDDING of the reply?