Parameters sometimes don't work, user fault?

Hi, I’m relatively new to GPT4. Just found out about setting parameters.
It seems sometimes it works and sometimes it doesn’t.
I test around and want to know about the german fairytale “Der Froschkönig”.
Example: “(max_tokens=6) Was ist ‘der Froschkönig’ ?”

The answer:

I assumed the answer should be shorter.
is there anything wrong with the parameter?

Parameters can only be used as specifications when making programming calls to the API. Not within ChatGPT.

If you want to use language, use normal language, such as “maximum five words output response”. Which the AI might not obey.

Why do I get this answer then?

I actually entered GPTs suggestion.

I get this answer from ChatGPT:

Me:

In using OpenAI AI models, where can I use the max_tokens parameter? I want to truncate the output of the AI response to just a few tokens of length.

ChatGPT:

You can use the max_tokens parameter in both the Completions API and the Chat Models API to control the length of the output generated by the AI models. Here’s how you can use it in each context:

  1. Completions API (gpt-3.5-turbo-instruct):

When using the Completions API, you can specify the max_tokens parameter to limit the length of the generated response. This parameter allows you to set an arbitrary maximum number of tokens in the response. Here’s an example:

   from openai import OpenAI
   client = OpenAI()

   for chunk in client.completions.create(
     model="gpt-3.5-turbo-instruct",
     prompt="Say this is a test",
     max_tokens=50,  # Set the maximum number of tokens for the response
     temperature=0.5,
     stream=True
   ):
     print(chunk.choices[0].text, end="")

In this example, the max_tokens parameter is set to 50, which means the response will be truncated to 50 tokens or less.

  1. Chat Models API (gpt-4):

In the Chat Models API, you can also use the max_tokens parameter to limit the length of the generated response. Here’s an example using the Chat Models API:

   from openai import OpenAI
   client = OpenAI()

   completion = client.chat.completions.create(
     model="gpt-4",
     max_tokens=50,  # Set the maximum number of tokens for the response
     stream=True,
     messages=[
       {"role": "system", "content": "You are a helpful assistant."},
       {"role": "user", "content": "Hello!"}
     ]
   )

   for chunk in completion:
     print(chunk.choices[0].delta.content, end="")

Similar to the Completions API, you can set the max_tokens parameter to limit the response’s length to a specified number of tokens.

In both cases, adjusting the max_tokens parameter allows you to control the length of the AI-generated response to your desired token count.

The AI benefits from me pasting API reference documentation as custom instructions, though…