Restarting partially completed chat completion API calls

brianinoc · February 22, 2024, 9:10pm

If I use the streaming API and a given query stops either because I run out of tokens or a timeout (or I stop it because is running slowly), is it possible to restart the query using the partial response I already have? In the case of running out of tokens in a 8K GPT4 model, I could potentially just resume on the 32K GPT4 model. This seems like it should be conceptually possible, but I can’t see an obvious way to do this with the existing API.

_j · February 22, 2024, 9:19pm

Yes, if you place the assistant response as the last message with no user input, the AI will usually continue upon that if it looks unfinished in the context of the question posed.

Diet · February 22, 2024, 9:21pm

sometimes, but sometimes it will just ignore it and start from scratch, depending on the model.

sometimes you need to add an extra instruction to tell the model to continue where it left off.

_j · February 22, 2024, 9:31pm

Here for example, away it goes again, even though I truncated a ChatGPT response at the clean end of a section 2/3 of the way in:

Or force the AI to keep writing until it has no more text to predict.

The only side-effect is the continuation is trained on briefness by the previous outputs.

The side effect of prompting is the possibility of “Sure, let me continue where we left off before the interruption you report.”

(btw, this is having gpt-3.5-turbo (latest) complete on GPT-4 writing. Being informed by previous verbosity. Prompt gpt-3.5-turbo and you have vastly reduced quality.

gpt-3.5-turbo complete

Feeding:

Provide a diet of whole fish, such as herring or capelin, as well as squid and krill.
Feed them twice a day to ensure they get enough nutrients.

Habitat:

Maintain a cold environment with a temperature between 32-45°F (0-7°C).
Keep their habitat clean by regularly changing the water and providing a clean area for them to rest.

Enrichment:

Provide opportunities for swimming and diving to mimic their natural behaviors.
Offer toys and objects to play with to keep them mentally stimulated.

Healthcare:

Regularly check for signs of illness such as changes in behavior or appetite.
Consult with a veterinarian who specializes in exotic animals for routine check-ups and medical care.

Socialization:

Consider getting a companion for your penguin to prevent loneliness.
Interact with your penguin regularly to build trust and bond with them.

Legal Considerations:

Check local regulations to ensure you are allowed to keep a penguin as a pet.
Obtain any necessary permits for owning an exotic animal.

Overall, caring for a pet penguin requires dedication, proper nutrition, a suitable habitat, enrichment, healthcare, socialization, and adherence to legal guidelines.

Diet · February 22, 2024, 9:36pm

that’s cheating*

if it’s cut off mid-sentence it sometimes tends to start over

*but it could be an interesting strategy, deleting everything until the last period or something

brianinoc · February 22, 2024, 9:38pm

Conceptually it should be possible to stop the generation of the AI response at any point and restart such that the final results is no different than having generated the entire response in one go (modulo differences with random number generators). It appears that issuing a query that ends with an assistant response isn’t really the same thing though.

Diet · February 22, 2024, 9:39pm

it should be mentioned (but you probably know this) that it IS possible (and trivial - the default behavior) with the legacy completion endpoints

brianinoc · February 22, 2024, 9:41pm

I assume legacy completions will eventually go away though?

_j · February 22, 2024, 9:47pm

Interesting perspective. I found a point where gpt-3.5-turbo-0125 or gpt-4-turbo (0125) would start over in the penguin prose.

GPT-4-0613, no hiccup. gpt-3.5-turbo-0613, no problem.

Latest models broke completion (among other deoptimizations) where the deprecations guide specifically recommends chat as an edit replacement, and previously had gpt-4 pointed at also to replace completions.

I found the most performative against the new behavior is a user message “[continue AI completion]”

However, the messages being wrapped in a container for “ChatML”, and an unseen “assistant” prompt, means the flow is broken up and the AI is re-prompted.

brianinoc · February 22, 2024, 10:07pm

Yeah. But what I mean is that under the hood the ChatML query gets translated into a sequence of tokens with special tokens at the beginning/end of each section. If you had the model running locally, you could have design the infrastructure to not have the end token for the last assistant and thus restart with identical behavior.

Diet · February 22, 2024, 10:07pm

I hope they make a come back!

I want gpt-4-instruct!

~~all~~ most of the open source models are instruct based, and do support that behavior.

brianinoc · February 22, 2024, 10:10pm

My main motivation for this is I sometimes see queries that are outliers and take much longer than they should to complete. Ideally I would have the responses streaming and stop the slow queries and restart it from where it left off. I’d like to do this programmatically in a library. Right now, the best I can do is restart the entire query, but that takes longer and cost more.

_j · February 23, 2024, 1:04am

Why stop the slow query? The new one adds latency before you get its next token; you can send it in the background and see if it is on a generation path to catch up and surpass. Then replace output when its going to win.

Just need a good determiner for the edge case of “too slow, faster will be obtained”, so you aren’t always paying for two inputs.

Topic		Replies	Views
How to complete Long API responses? API gpt-35-turbo , chatgpt	6	4661	December 19, 2023
How to force to continue a truncated completion? API	2	5226	December 24, 2023
ChatGPT Stops Mid-Sentence: Work Around? API	6	3222	November 17, 2023
ChatGPT's "Stop Generating" function - how to implement? API	12	12983	December 14, 2023
Continuing content after output token limit? API	3	1603	May 23, 2024

Restarting partially completed chat completion API calls

Related topics