I’m looking for something like the option to interrupt in the playground. You can just cancel the stream if you see that it’s going in the wrong direction, though I’m not sure if this is really telling the server to stop computing.
I’m not looking for Stop Sequences. I just want the user of my app to have the ability to quickly stop and try something else (and possibly not waste all tokens).
@lemi did you figure this out? I have the same question: ability to stop steaming, not via stop sequences, to save token costs when it is clearly going in an unproductive direction (e.g. repetitions).
According to my research, this should do the trick, since openai.Completion.create uses requests under the hood:
response = openai.Completion.create(
# Other stuff...
stream=True,
)
try:
for stream_resp in response:
# Do stuff...
if thing_happens:
break
except Exception as e:
print(e)
finally:
response.close()
I came up with the same solution, which also works on my end. Though, while I’m sure the server-side of this equation will necessarily generate at least a couple more tokens than is received by the client, what I was hoping for was some assertion from OpenAI (or, from someone who’s done some meticulous testing with this method to determine whether they are charged for the total sum of tokens that WOULD have been received) that once (more or less) the connection is no longer open from the client side, that the server necessarily stops generating tokens.
Please introduce GPT model structure as detail as possible
And let the api print all the token’s. The statistic result from OpenAI usage page is (I am a new user and is not allowed to post with media, so I only copy the result):
17 prompt + 441 completion = 568 tokens
After that, I stop the generation when the number of token received is 9, the result is:
17 prompt + 27 completion = 44 tokens
It seems there are roughly extra 10 tokens generated after I stop the generation.
Then I stop the generation when the number is 100, the result is:
17 prompt + 111 completion = 128 tokens
So I think the solution work well but with extra 10~20 tokens every time.