Interrupting completion stream in Python

Is it possible to interrupt completion stream and not waste tokens? E.g. when I see that it’s looping or going in the wrong direction.

I know I can use stream option and then use the response object like a generator.

response = openai.Completion.create(
for line in response:

But is it enough to just jump out of the loop when I decide enough is enough? Will the server then stop generating the rest of the tokens?

1 Like

I think what you are looking for is Stop Sequences here is a short guide on how to use them How do I use Stop Sequences? | OpenAI Help Center

1 Like

I don’t think there’s a way to stop it once the stream starts, but I may be wrong.

I’m looking for something like the option to interrupt in the playground. You can just cancel the stream if you see that it’s going in the wrong direction, though I’m not sure if this is really telling the server to stop computing.

I’m not looking for Stop Sequences. I just want the user of my app to have the ability to quickly stop and try something else (and possibly not waste all tokens).

1 Like

@lemi did you figure this out? I have the same question: ability to stop steaming, not via stop sequences, to save token costs when it is clearly going in an unproductive direction (e.g. repetitions).


Hey guys,
did you figure out this issue? or any alternative solutions?

According to my research, this should do the trick, since openai.Completion.create uses requests under the hood:

response = openai.Completion.create(
    # Other stuff...
    for stream_resp in response:
        # Do stuff...
        if thing_happens:
except Exception as e:

I came up with the same solution, which also works on my end. Though, while I’m sure the server-side of this equation will necessarily generate at least a couple more tokens than is received by the client, what I was hoping for was some assertion from OpenAI (or, from someone who’s done some meticulous testing with this method to determine whether they are charged for the total sum of tokens that WOULD have been received) that once (more or less) the connection is no longer open from the client side, that the server necessarily stops generating tokens.

1 Like

did anyone figure this out? (if it actually stops generating tokens in the backend)

1 Like

I make a simple test for @thehunmonkgroup 's solution.

I make a call to gpt-3.5-turbo model with input:

Please introduce GPT model structure as detail as possible

And let the api print all the token’s. The statistic result from OpenAI usage page is (I am a new user and is not allowed to post with media, so I only copy the result):
17 prompt + 441 completion = 568 tokens

After that, I stop the generation when the number of token received is 9, the result is:
17 prompt + 27 completion = 44 tokens

It seems there are roughly extra 10 tokens generated after I stop the generation.

Then I stop the generation when the number is 100, the result is:
17 prompt + 111 completion = 128 tokens

So I think the solution work well but with extra 10~20 tokens every time.


Excellent deductive and data driven results, thank you for posting them :smiley: