openai.error.InvalidRequestError: Token limit exceeded HOWEVER the input, prompt, and output are far below the token limit

I am giving a task to GPT 3.5 turbo to take a bunch of chapters from books I’ve written and break them up into smaller pieces.

The maximum length of a chapter, with the prompt, and the output will not surpass 4,000 tokens, with most being in the 2-3k range combined.

However, when I run the code, I run into a max token limit seemingly regardless of what model I use.

This is the error message from the EXACT same input and prompt from the two respective models.


OpenAI API Error: This model’s maximum context length is 4097 tokens. However, you requested 5136 tokens (1136 in the messages, 4000 in the completion). Please reduce the length of the messages or completion.


OpenAI API Error: This model’s maximum context length is 16385 tokens. However, you requested 17521 tokens (1136 in the messages, 16385 in the completion). Please reduce the length of the messages or completion.

I’ve spent hours on this trying to solve the issue to no avail. I am indeed a Python noob, there may be something I’m missing.

Basically, I am pulling the text from a CSV file, feeding that to GPT with a prompt, receiving the output, then using Python to parse that data and write it to a new CSV file.

One relevant detail: I created an HTML display of the results to show how many rows of data were processed from the source file and how many were written to the new file. I’ve run some breaks in the code to try and debugged (removed below) and when I did, it returned this:
Rows processed: 95 (the number of rows of data in the source sheet)
Rows written: 0

I’ve cleaned out and truncated the data, but here is the core code. Might anyone have insights into this?

import os
import pandas as pd
import openai
from flask import Flask, render_template, redirect, url_for
from dotenv import load_dotenv
from itertools import cycle

app = Flask(__name__)

# Load API key
openai.api_key = os.getenv("GPT_API")


def process_text(text):
    engine = "gpt-3.5-turbo-16k"
    message = [
        {"role": "user", "content": "PROMPT OF APPROX 750 TOKENS"},
        {"role": "user", "content": text}
    response = openai.ChatCompletion.create(model=engine, messages=message, temperature=0.5, max_tokens=16385)
    return response['choices'][0]['message']['content']
def process_csv():
    # Open or create the output CSV file
    output_file_path = 'path_to_output_file.csv'

    # Read source CSV
    for chunk in pd.read_csv('path_to_input.csv', chunksize=1):
        # Process each row
        for idx, row in chunk.iterrows():
            text = row['Text']
            processed_text = process_text(text)
            parts = processed_text.split('(SEGMENT_END)') # Splits ouput text at a marker in the text.
            parts = [part.strip() for part in parts if part.strip()]  # Remove extra line breaks and white spaces
            letters = cycle('abcdefghijklmnopqrstuvwxyz')
            for i, part in enumerate(parts):
                # Write data to output
                data = pd.DataFrame({
                    'Level': [row['Level']],
                    'Book': [row['Book']],
                    'Chapter': [f"{row['Chapter']}{next(letters)}"],
                    'Len': [len(part)],
                    'Text': [part]
                # Append this row's processed data to the output CSV
                data.to_csv(output_file_path, mode='a', index=False, header=False, encoding='utf_8_sig')

if __name__ == '__main__':

It seems you don’t know how to use max_tokens. Best for you if you remove it entirely from the API call to chat models.

Please educate me!

I must admit that I did remove it in a previous iteration and it ran. I also once set it to 10000 and it also ran.

Currently it’s set to the 3.5turbo-16k model with max_tokens=16385.

What am I missing or not understand here?

The max_token value you set refers only to the response you get back from the AI. From the fixed common context length, that is for both the input to the model and formation of the response you get back, max_tokens sets a reservation that the API will designate is for only responses, and keep input tokens from encroaching on that token space, instead returning an error to you.

1 Like

You sir are indeed correct. I fundamentally misunderstood max_tokens ! That makes perfect sense now. Thank you for explaining it to me!

Also, you can find how much token an input may consume using this

import tiktoken

    enc = tiktoken.encoding_for_model("text-embedding-ada-002")
    token_len = len(enc.encode(str("INPUT DATA")))
    print("Total token's used: ", token_len)

Note this is only for input data not including response