Hello everyone,
I am working on extracting text from a News Article, and here is where I’m at. I get the article, use JSDOM and extract the text. Now, I send it to gpt-4-1106-preview and I’m getting this error:
status: 400,
headers: {
connection: ‘keep-alive’,
‘content-length’: ‘262’,
‘content-type’: ‘application/json’,
date: ‘Fri, 12 Apr 2024 21:40:36 GMT’,
…
error: {
message: ‘1 validation error for Request\n’ +
‘body → input\n’ +
’ ensure this value has at most 4096 characters (type=value_error.any_str.max_length; limit_value=4096)',
type: ‘invalid_request_error’,
param: null,
code: null
},
code: null,
param: null,
type: ‘invalid_request_error’
}
However, the text I am submitting is 4,949 characters, but I don’t understand why it’s getting blocked now because I sent way longer text previously to the API.
const page = await browser.newPage();
page.setDefaultNavigationTimeout(2 * 60 * 1000);
await page.goto(url, { waitUntil: 'networkidle2' });
const html = await page.evaluate(() => document.documentElement.innerHTML);
const processedHtml = getText(html);
console.log(processedHtml)
const chatResponse = await openai.chat.completions.create({
model: "gpt-4-1106-preview",
messages: [
{
role: 'system',
content: `You are an AI assistant whose job is to extract the article's text/body from its raw HTML and
output the script for a podcast. Output only the text from the articles 'body'. Do not output comments,
HTML tags, or the title. Everything you output will be sent to a TTS, so numbers must be converted to words.
Also, emojies must be removed. All the text you output will be spoken. You need to convert everything to words that
can be spoken by a TTS without sounding weird to the listener. The text you output must be the article, all your job is
is to extract the body from the HTML of the article, and output the podcast for it without illegal characters, phrases, etc.
You should not change anything in the article other than illegal phrases for the tts. So for example '1,000,000' should be
outputted as 'one million'. Everything you output will be spoken!`
},
{ role: 'user', content: `Extract the body text of this article from this html, only return text: ${processedHtml}` }
],
temperature: 1,
});
It doesn’t look like this is actually an OpenAI error or issue. You’d typically be confronted with a token limitation, not a character limitation.
Are you absolutely sure the openai api is emitting that error?
I unfortunately can’t test your request atm, but if you’re certain that it isn’t related to other parts of the code, I would try to see if the request works with a direct http request against the openai api to eliminate any potential misconfigurations.
Since I’m taking the text from the GPT API, then passing it into OpenAI TTS, which clearly says in the docs “Max length is 4096 characters”, that’s where the error is coming from.
So, the error was just me misreading the TTS API. Gonna mark this as the solution. Sorry for wasting anyones time.