Batching with ChatCompletion Endpoint

Explanation

  • The promptsArray contains all the prompts that will be processed in a batch, as individual elements of a string array.

  • The promptsArray is then converted to a string using JSON.stringify() and stored in stringifiedPromptsArray, which will be used as the content of the user’s message.

  • batchInstruction is a system message that directs the chat completion model to complete every prompt in stringifiedPromptsArray and return an array of completions.

  • The chat completion is obtained from the response and converted back into an array of strings using json.loads(). The individual completions can then be easily accessed from batchCompletion

Output:

['Hello world, from', 'How are you B', 'I am fine. W', 'The  fifth planet from the Sun is ']
ChatGPT: 
['Hello world, from Earth', 'How are you Bob', 'I am fine. What about you?', 'The fifth planet from the Sun is Jupiter']

Limitations

  • The max_tokens doesn’t control the max_tokens for individual prompts; instead max_tokens limits the total amount of tokens per request.
  • Length of one completion can influence other completions in the batch. If one completion is longer than expected, other completions may get truncated, even the array might not turn out to be valid.
6 Likes