I fixed 1st error by reducing from max_tokens=100K to 86K, now it says 4096 ???
I’m confused. this is 128K model gpt-4-turbo-2024-04-09
I’m doing 7 papers each from 20K to 70K
I have system prompt 1 user prompt and 2nd used prompt is specific article text
Looking for working example with papers Thanks
{“error”: {“message”: “This model’s maximum context length is 128000 tokens. However, you requested 141226 tokens (41226 in the messages, 100000 in the completion). Please reduce the length of the messages or completion.”, “type”: “invalid_request_error”, “param”: “messages”, “code”:
max_tokens is too large: 86000. This model supports at most 4096 completion tokens, whereas you provided 86000
Max_tokens refers to the number of output tokens, which is 4096 for all of the recent models. So the value you set for max_tokens must be set to 4096 or lower.
128,000, on the other hand, refers to the total context window, which is the sum of input and output tokens.
In order to avoid the error you are experiencing you have to ensure that your input tokens do not exceed 124k (or slightly higher depending on the number of output tokens you are looking to produce).
you technically don‘t have to specify max_tokens - it‘s only needed if you want to set a hard limit for the output tokens (this may come at the risk of output being cut off randomly).
Check the OpenAI cookbook for the python script for counting tokens and then just add that step to your script:
Error processing 2404.18616v1.txt: Unknown encoding cl100k_base. Plugins found:
Could not find encoding for model gpt-4. Using ‘cl100k_base’ as a fallback. Error: Unknown encoding cl100k_base. Plugins found:
def num_tokens_from_string(string, model=“gpt-4”):
“”“Returns the number of tokens in a text string according to the model encoding.”“”
try:
encoding = tiktoken.encoding_for_model(model)
except Exception as e: # Catch all exceptions as “encoding_for_model” may raise various types
print(f"Could not find encoding for model {model}. Using ‘cl100k_base’ as a fallback. Error: {e}")
encoding = tiktoken.get_encoding(“cl100k_base”)
num_tokens = len(encoding.encode(string))
return num_tokens
This post has a quick script I wrote to count per-example tokens on a JSONL file, in the process validating the file format. It does not deal with functions.
You can omit the max_tokens parameter from API calls, leaving the AI to potentially produce the maximum, or accept an input so large there is little context length remaining to form an answer.
A typical max_tokens setting for a response is 2000, returning an error if input + output reservation is over model capabilities, ensuring enough space is available to form a complete response.