So I am using the chat completions API with gpt-4o. My code, in abbreviated form, looks something like this:
SUMMARY_LENGTH = 1000
class Interval(BaseModel):
index: int
start: str
end: str
summaries: str
class Intervals (BaseModel):
intervals: list[Interval]
generate_summary_message = [
{
"content": f"The user will send you a list of time intervals, where each interval consist of an index number, a start field, an end field, and a list of associated news articles. For each interval, create a summary of the news articles with at least {SUMMARY_LENGTH - 100} and at most {SUMMARY_LENGTH + 100} characters. Format your response as json as per the specified response format.",
"role": "system"
},
{
"content": json.dumps(news_infos, ensure_ascii=False).replace("\\n", "\n").replace('\\"', '"'),
"role": "user"
}
]
response = client.beta.chat.completions.parse(model="gpt-4o", messages=generate_summary_message, response_format=Intervals, max_tokens=16000)
ai_response = Intervals.model_validate_json(response.choices[0].message.content).intervals
This sort of works, BUT the API absolutely will not adhere to the character limit I specify in the system prompt. Likewise, if I make further stipulations in the system prompt like âonly generate a summary if the news article for the given interval are meaningfully different from the news articles for previous intervalsâ, that, too, will be ignored. I guess the second requirement might be a bit too complicated for a non-cot-model, but surely adhering to a specified length for the summaries shouldnât be a problem?
Iâm afraid that this might be more complicated than you think. The models are absolutely terrible at counting, and donât really have a conceivable way of counting characters as they generate text.
One analogy would be asking if you could tell me how many characters are in your post. Could you answer that question at a glance, without resorting to an iterative approach? This also doesnât really work exactly with word count either, but it might get you in the general ballpark.
I would suggest a tool assisted, iterative approach. Tell it go generate a summary with a qualitative target with a simple number that can be encoded as a single token (50 words, 200 words, 1 paragraph, 2 paragraphs) (https://platform.openai.com/tokenizer), then count the characters, and give it to the model again telling it to trim by x% or some proportion or something.
This oneâs actually more achievable, I think. Split your summaries str into a
Summary : {
cot: str
summary: str | null
} []
array, and tell the model to reflect into the cot about what article itâs about to process, and whether a similar article has been processed. If a similar article has been processed already, summary should be set to null.
hope this gives you some ideas you can get started with!
Thatâs fascinating. I had assumed I had to be doing something wrong, because adhering to some specified length sounds trivially easy compared to, say, producing a good summary, or any of the endless other things the api can do pretty well.
I havenât tried your proposed solution yet, but like you said, the word count might be off in much the same way as the character count, and if it is, I would have to resubmit both the articles which were the basis for the summary, and the summary itself, causing additional time and token usage for what is already a rather long pipeline. Maybe Iâll just settle for the current summaries, which are a bit unpredictable in length, though generally err on the side of being too short.
Iâd say you donât necessarily have to re-send the whole prompt, asking the model to re-write the existing summary (assuming it overshot) might work decently well.
Yeah, I guess there are a lot of misconceptions around this. Iâd say a good litmus test of whether a model could do it, is if you can do it at a single glance, or without stepping things through in your head, keeping a count, etc. CoT models (R1, o1, o3) are more capable here right out of the box, because they have been trained to iteratively aggregate things so theyâre more âglanceableâ.
The one superpower LLMs have is that they have more âeyesâ than you, so they can look at multiple things at the same time; but they canât look at 100 things, and they often donât look at what you think they could be looking at.