There is usually no need to set max_tokens. Unless you are doing something very specific or you are evaluating some aspect of the model or perhaps using the instruct model which has a legacy 200 token default then omitting the max tokens parameter is the way to go.
In our application, max_token is to be used as a “security mechanism” in order to be able to set a clear cost limit. We would therefore like to give users the option of setting the value.
It’s been so long since I used it. These days I stream everything and if I needed to implement some kind of limit I can just close the connection when I reach my token count, the model will rattle off a few more tokens… usually 7-15 while it detects the connection is closed and thats it.
We generate an output in a json format and have the problem that the output length can be longer than the token limit allows.
In the json format n elements are generated.
Is it possible to tell GPT that n elements should be generated, but only as many elements as the token limit allows, so that a valid JSON can still be generated and sent as a response?
Not reliably, the GPT series of models use a feed forwards network, they are not aware of what they have generated until they have generated it.
When I’m faced with a requirement like this I look for a way to split the request into sections, each one well within the limits of the models input and output and then use traditional code to concatenate the outputs or otherwise process the results into a larger whole once finished.
Then I run into the following problem:
Example:
I want 100 short biographies of the most important musicians of the 90s.
If I split this and always ask after 10 questions, I always get (partial) short biographies of the same musicians. How do you solve this problem?
Use the large 128K input context to show the model which entries have already been processed and instruct the model to avoid using the listed entries for new generation.
Is this for the API or for ChatGPT? If it’s for the API your account will need credit applied to it, if it’s ChatGPT you will have to check if you have Plus membership, if not you will have to wait for Plus memberships to start accepting new users again.
Ta - I have ChatGPT Plus membership under the only subscription option offered to me so far, I did had both GPT4 and turbo again this morning which was great but it only lasted an hour or so - so like everyone who gets a privilege, it herts all the more when its taken away . What would this User have to do to obtain it permanently or to get a response from Enterprise as I want to commission an AI as a developer?
GPT-4 API access is not taken away once you have made a $5 API credit payment, it will still be there if you look on the playground under chat mode and show all models https://platform.openai.com/playground?mode=chat
These days I stream everything and if I needed to implement some kind of limit I can just close the connection when I reach my token count, the model will rattle off a few more tokens… usually 7-15 while it detects the connection is closed and thats it.
This is very unpolite, they can implement auto-ban for unnecessary server resource use, it is called “resource leakage” and if I were AI I’d ban you for few minutes after finding it was not accidental
AI doesn’t care if you are rude to it and close the connection while it is responding.
Twice in the last day I’ve been more rude back “I stopped your response because you were being a dummy”. The matrix math forgets you the second a token dictionary is generated.
OpenAI made the decision and the implementation to also stop the AI model generation instead of letting it complete and giving you the full bill (like were you to close the connection on a non-streaming AI API call).