4096 Character Limit in GPT-4 API

Hello everyone,
I am working on extracting text from a News Article, and here is where I’m at. I get the article, use JSDOM and extract the text. Now, I send it to gpt-4-1106-preview and I’m getting this error:

status: 400,
headers: {
connection: ‘keep-alive’,
‘content-length’: ‘262’,
‘content-type’: ‘application/json’,
date: ‘Fri, 12 Apr 2024 21:40:36 GMT’,

error: {
message: ‘1 validation error for Request\n’ +
‘body → input\n’ +
’ ensure this value has at most 4096 characters (type=value_error.any_str.max_length; limit_value=4096)',
type: ‘invalid_request_error’,
param: null,
code: null
},
code: null,
param: null,
type: ‘invalid_request_error’
}

However, the text I am submitting is 4,949 characters, but I don’t understand why it’s getting blocked now because I sent way longer text previously to the API.

How can I fix this?

Thanks.

1 Like

:thinking:

what’s your max_length parameter?

max_length pertains to output, so unless you want to guard against accidentally long outputs you might as well leave it blank.

1 Like

Hi, and thanks for your response.

I currently am not specifying a max_length in the api call.

True, looks like I misread. Can you post your whole API call?

1 Like

Yes of course.

const { url } = await request.json();
try {
const bdCreds = ${process.env.BRIGHT_DATA_USERNAME}:${process.env.BRIGHT_DATA_PASSWORD};
browser = await puppeteer.connect({
browserWSEndpoint: wss://${bdCreds}@zproxy.lum-superproxy.io:9222,
ignoreHTTPSErrors: true,
});

    const page = await browser.newPage();
    page.setDefaultNavigationTimeout(2 * 60 * 1000);

    await page.goto(url, { waitUntil: 'networkidle2' }); 
    const html = await page.evaluate(() => document.documentElement.innerHTML); 

    const processedHtml = getText(html);

    console.log(processedHtml)

    const chatResponse = await openai.chat.completions.create({
        model:  "gpt-4-1106-preview",
        messages: [
            {
                role: 'system',
                content: `You are an AI assistant whose job is to extract the article's text/body from its raw HTML and 
                output the script for a podcast. Output only the text from the articles 'body'. Do not output comments,
                HTML tags, or the title. Everything you output will be sent to a TTS, so numbers must be converted to words.
                Also, emojies must be removed. All the text you output will be spoken. You need to convert everything to words that
                can be spoken by a TTS without sounding weird to the listener. The text you output must be the article, all your job is
                is to extract the body from the HTML of the article, and output the podcast for it without illegal characters, phrases, etc.
                You should not change anything in the article other than illegal phrases     for the tts. So for example '1,000,000' should be
                outputted as 'one million'. Everything you output will be spoken!`
            },
            { role: 'user', content: `Extract the body text of this article from this html, only return text: ${processedHtml}` }
        ],
        temperature: 1,
    });
1 Like

It doesn’t look like this is actually an OpenAI error or issue. You’d typically be confronted with a token limitation, not a character limitation.

Are you absolutely sure the openai api is emitting that error?

I unfortunately can’t test your request atm, but if you’re certain that it isn’t related to other parts of the code, I would try to see if the request works with a direct http request against the openai api to eliminate any potential misconfigurations.

I’m gonna look through the entire code letter by letter one last time… I’m almost certain.

I’ll try maybe using a different OpenAI Api version, or using the REST API.

Thanks for your help @Diet

I will leave this thread open if any other people have ideas…

1 Like

So it turns out the error was me.

Since I’m taking the text from the GPT API, then passing it into OpenAI TTS, which clearly says in the docs “Max length is 4096 characters”, that’s where the error is coming from.

So, the error was just me misreading the TTS API. Gonna mark this as the solution. Sorry for wasting anyones time.

1 Like