Struggling with max_tokens and getting responses within a given limit, please help!

Hey people!

I’m trying to use the gpt API to receive responses using a prompt + custom data in the prompt. The problem here is that the prompt will vary in length and token count, and I’m not sure how I can make the input part of the prompt infinite, or at least much larger than the response I want to get, of at most 200 words.

I tried using:

const parsedUserInformation= await request.json();

  const inputPrompt = `Return a CV given the following data blah blah: '${parsedUserInformation.data}'.`;

  const response = await openAIInstance.chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [
      {
        role: "user",
        content: inputPrompt,
      },
    ],
    temperature: 1,
    max_tokens: 200,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,
  });

But the reponse I get is abruptly cut before the sentance is even finished, signaling that something’s wrong, and I suspect it being the “parsedUserInformation” taking up space from the token count.

Hope all this made sense to you!

Many thanks for reading and hopefully giving me a helping hand, cheers!

1 Like

The setting max_tokens is only dictating to the API the point where you want your answers cut off and for generation of a response to stop. That is the maximum response you will get, and does not set a limitation on the amount of input you can provide (except that the whole amount is reserved for a response in the model’s context length).

You will find that if you take the truncated response and paste it into a token encoder, that indeed it is your response that is the 200 tokens you set.

If you were to send too much input, which for gpt-3.5-turbo is 4191 tokens minus this output specification, you would get an API error. That’s a lot of text and conversation that you can send.

If you don’t want to worry about a response ever getting limited except when you absolutely run out of AI model context length, you can omit the max_token specification. Then there is no reservation, and the AI can also produce the maximum response it is capable of.

The AI model will write at the length you instruct it, so you need to use prompting language if you want to alter the type of response you receive. This kind of instruction for the AI telling it how to operate in general, is best in an initial system role message that is sent every time.

1 Like

Interesting, didn’t know about the token encoder, thanks.

The problem I’ve been experiencing is that the answers I receive seem to cut sentances short at the end, gpt doesn’t listen to my “… generate within 200 words” which is crucial in my case. To be more precise, this is what my code looks like:

const inputPrompt = Generate a CV (200 words max) based on the information mentioned in the CV document. Make it compelling, selling and reader-friendly to generate more leads. Use emojis for sections. Don't use headings / titles on sections. Avoid "fluff", mentioning pricing, duration or people's names.: '${parsedUserInformation.data}'.;

Note that I’ve altered the prompt a bit to make it generic.

Would love any help here, how I can ensure that my responses are within 200 words. I’ve tried everything, even asking gpt multiple times to give me a prompt that works, but without success :frowning_face:

Cheers, have a great day.

Your answer will be cut off if you don’t give the AI enough tokens to answer. Tokens are not words, and the AI doesn’t know what max_tokens value you set. Set it to 1500.

1 Like

I used to have that problem,too. Write instruction in tokens, like ‘use up to 200 tokens’.

Don’t say 200 words because GPT will never understand ‘words’

And I omit maxtokens parameter.

1 Like

you have to limit the input.

gpt-3.5 is 4096,
so - max in is [4096 - “whatever you set as max_tokens”]

Id use gpt-3.5-turbo-16k tho.
IF youre not feeding it chat history - its… really cheap to use.

1 Like