openai.error.InvalidRequestError: Token limit exceeded HOWEVER the input, prompt, and output are far below the token limit

RufusTank · November 2, 2023, 1:08am

I am giving a task to GPT 3.5 turbo to take a bunch of chapters from books I’ve written and break them up into smaller pieces.

The maximum length of a chapter, with the prompt, and the output will not surpass 4,000 tokens, with most being in the 2-3k range combined.

However, when I run the code, I run into a max token limit seemingly regardless of what model I use.

This is the error message from the EXACT same input and prompt from the two respective models.

gpt-3.5-turbo

OpenAI API Error: This model’s maximum context length is 4097 tokens. However, you requested 5136 tokens (1136 in the messages, 4000 in the completion). Please reduce the length of the messages or completion.

gpt-3.5-turbo-16k

OpenAI API Error: This model’s maximum context length is 16385 tokens. However, you requested 17521 tokens (1136 in the messages, 16385 in the completion). Please reduce the length of the messages or completion.

I’ve spent hours on this trying to solve the issue to no avail. I am indeed a Python noob, there may be something I’m missing.

Basically, I am pulling the text from a CSV file, feeding that to GPT with a prompt, receiving the output, then using Python to parse that data and write it to a new CSV file.

One relevant detail: I created an HTML display of the results to show how many rows of data were processed from the source file and how many were written to the new file. I’ve run some breaks in the code to try and debugged (removed below) and when I did, it returned this:
Rows processed: 95 (the number of rows of data in the source sheet)
Rows written: 0

I’ve cleaned out and truncated the data, but here is the core code. Might anyone have insights into this?

import os
import pandas as pd
import openai
from flask import Flask, render_template, redirect, url_for
from dotenv import load_dotenv
from itertools import cycle

app = Flask(__name__)

# Load API key
load_dotenv('path_to_setup.env')
openai.api_key = os.getenv("GPT_API")

@app.route('/')

def process_text(text):
    engine = "gpt-3.5-turbo-16k"
    message = [
        {"role": "user", "content": "PROMPT OF APPROX 750 TOKENS"},
        {"role": "user", "content": text}
    ]
    response = openai.ChatCompletion.create(model=engine, messages=message, temperature=0.5, max_tokens=16385)
    return response['choices'][0]['message']['content']
    
@app.route('/process')
def process_csv():
    # Open or create the output CSV file
    output_file_path = 'path_to_output_file.csv'

    # Read source CSV
    for chunk in pd.read_csv('path_to_input.csv', chunksize=1):
        # Process each row
        for idx, row in chunk.iterrows():
            text = row['Text']
            processed_text = process_text(text)
            parts = processed_text.split('(SEGMENT_END)') # Splits ouput text at a marker in the text.
            parts = [part.strip() for part in parts if part.strip()]  # Remove extra line breaks and white spaces
            letters = cycle('abcdefghijklmnopqrstuvwxyz')
            for i, part in enumerate(parts):
                # Write data to output
                data = pd.DataFrame({
                    'Level': [row['Level']],
                    'Book': [row['Book']],
                    'Chapter': [f"{row['Chapter']}{next(letters)}"],
                    'Len': [len(part)],
                    'Text': [part]
                })
                # Append this row's processed data to the output CSV
                data.to_csv(output_file_path, mode='a', index=False, header=False, encoding='utf_8_sig')

if __name__ == '__main__':
    app.run(debug=True)

_j · November 2, 2023, 1:14am

It seems you don’t know how to use max_tokens. Best for you if you remove it entirely from the API call to chat models.

RufusTank · November 2, 2023, 1:35am

Please educate me!

I must admit that I did remove it in a previous iteration and it ran. I also once set it to 10000 and it also ran.

Currently it’s set to the 3.5turbo-16k model with max_tokens=16385.

What am I missing or not understand here?

_j · November 2, 2023, 1:38am

The max_token value you set refers only to the response you get back from the AI. From the fixed common context length, that is for both the input to the model and formation of the response you get back, max_tokens sets a reservation that the API will designate is for only responses, and keep input tokens from encroaching on that token space, instead returning an error to you.

RufusTank · November 2, 2023, 2:50am

You sir are indeed correct. I fundamentally misunderstood max_tokens ! That makes perfect sense now. Thank you for explaining it to me!

dineshbabu2977 · February 9, 2024, 9:32am

Also, you can find how much token an input may consume using this

import tiktoken

    enc = tiktoken.encoding_for_model("text-embedding-ada-002")
    token_len = len(enc.encode(str("INPUT DATA")))
    print("Total token's used: ", token_len)

Note this is only for input data not including response

Topic		Replies	Views
Not allowed to have all 8192 tokens API gpt-4	16	11670	December 18, 2023
Gpt4 token usage not using more than 3000 tokens even though it’s listed at much higher availability API	12	1956	December 17, 2023
Not enough tokens error, even though I've paid A LOT (maximum context length error) API api	5	5993	September 9, 2023
Maximum Context Length Error with gpt-3.5-turbo-16k Models API gpt-35-turbo , api	5	6711	December 15, 2023
Struggling with max_tokens and getting responses within a given limit, please help! API chatgpt	5	20116	October 28, 2023

openai.error.InvalidRequestError: Token limit exceeded HOWEVER the input, prompt, and output are far below the token limit

Related topics