How "force" the API to really follow all the instructions?

I don’t think “give the results” is how it would be expressed in idiomatic Enligsh. Specific words and phrases matter.
Try something like:
“You must output the result in Portuguese” or perhaps “The answer, translated to Portuguese, is:”

I’ve found that Markdown is the least flaky output format for structured output. You may get better results if you ask for Markdown, and then run the output through some Markdown → HTML renderer library separately.

These models don’t even know what “characters” are. They deal in tokens, which may represent some number of glyphs. For example, the model might think the country name Kenya doesn’t start with “K” because it starts with the token “Ken.”

2 Likes

This is good advice. Markdown is the model’s “native” tongue.

1 Like

@jwatte Thanks!
Is there a specific equivalence between tokens and English characters?
Or is it basically “random”? (1 token could be 1, 2, 3 or even more English characters)
I’m thinking maybe ask it to limit by tokens instead of characters (even if it’s not superprecise, I guess it could be better than asking for characters if it doesn’t know what those are).

In chat models (GPT3.5 - 4) does it make any difference if I use the system message or the user message? (I’m only sending 1 of each, not an entire conversation).
For example, is the model designed to follow the system instructions more closely than the user instructions or something like that?

You can use this site,

https://tiktokenizer.vercel.app/

to see how the messages are tokenized.

Choose cl100k_base or gpt-4.

1 Like

I think if you really want portuguese results you should prompt in portuguese.
Made that experience with german language.

For uppercase you should use a programming language after it - many languages have a toupper function.

Yes they are now trained to follow system. Use it for instructions.

3.5 before the 0613 update was not trained to follow system (it would work better putting instructions into user back then), now it is. But its still much worse than using GPT4
GPT4 just solves all problems.

1 Like

@elmstedt Thanks, but I’ve tried using tokens instead of characters and the results are much worse. It doesn’t seem to know what tokens are at all. For example, with a 5 token limit, it gives me long phrases (even more than 20 tokens).

At least with characters I think it understands the concept (it makes shorter phrases and complies with the limit approx. 50% of the time), even if it doesn’t really count them or don’t “obbey” for some other reason.

That is correct, it also doesn’t know what tokens are.

You’re asking the model to do something it cannot, you might as well ask a fish to climb a tree.

One thing that somewhat helped me with length (but mostly style)
Was to pre-craft assistant role message.
You can make it fixed with every call and hidden from end user. When GPT has some assistant role before, it tries to somewhat follow the same style. Can be used to direct, just in a different way than with system.

Give it some assistant message, which is short, same length and style as you want the GPT to respond.

Can write it manually, or let GPT4 generate it until you like it. (and use it in 3.5 if you want)

System
Assistant
User (the first actual prompt)

It can be some welcome message, or some acknowledgement of the instructions, or pretty much anything.
Just if you make it Welcome message, it might no longer welcome the user, as it has seen it has already done that.

But of course also try to mention in instructions that it should only respond with short messages, or x sentences, or similar…but that will only work sometimes as you have found out.

I mainly used it for when GPT would always respond in different ways/styles at the start, and then it mostly continued that style…so I thought “I can use that”

Its just generally hard to make GPT respond with specific length. (because it does not know how or when will sentence end?)

Again GPT4 will be better at this

Each LLM has its own set of tokens, and its own number of tokens (size of vocabulary), too. OpenAI lets you try the tokenizer they use on their playground:

image

Also note: casing matters!

image

As does spacing:

image

image

1 Like

It works with tokens but it’s as unlikely to “understand” what a token is as a random person on the internet is. My (untested) intuition would be to try and specify a word count or give an example or two.

I agree with Jochen that you are more likely to get a Portugese response if you write the prompt in Portugese. As for upper case, that is trivial to do consistently afterwards so I see no reason whatsoever to try and get the model to do that for you.

I struggled with this both with davinci trying to use the training models and chat completions end point

The way to get real results and it is the only way I found that really worked was to chunk a file with input that changes with the user.

In other words there must be enough keywords existing in the system input to match the user input and you can’t use a document.

It has to be short chunks of text that have corresponding key words with the user input

GPT 4 is better but still requires the associated chunk data to stay on task

3.5 works with embeddings they call
This method but small chunks not large ones

It has also been observed and I validated

that the position in the input array matters

How you concatenate to the user input makes a big difference as well for example in my double trouble City concatenated the bots Call sign before the user input made a huge difference remembering his call sign and use it

Just my report

From my explorative interactions with GPT-4(n) I’ve detected this issues is a bounce-back thing pertaining to fuzzy-logic in query formulation. Same thing with Human’s I guess.

I’m having more success with instructions now that I interact using the new functions. So in your case, create a fake function that takes Portuguese as an input. You don’t ever actually call an external function with that input, but you’ve tricked chatGPT into sending you the text in the format you want (uppercase, etc)

GPT-4 is a chat model. The gpt-4 model uses the chat/completions endpoint.

That said, good system and user messages can usually get pretty good results.

By January 4, 2024 there should be a release of gpt-3.5-turbo-instruct, which will replace text-davinci-003.

1 Like

You cannot “FORCE” GPT to follow instructions.
Not because its prevented, but because the entire nature of GPT hinges on understanding.

IF you can properly explain a process to GPT, then it will follow throutgh.
However… if it does not understand* (understand* is a word in this post i dont mean literally)

Then it will try to guess what you mean, sorta like nearest neighbor but with comprehension. (bad analogy?)

So if you dont fully comprehend an instruction, the AI will not comprehend it either (is a good measure, surely there exceptions like pasting prompts from the internet)
Then again… if that prompt doesnt make sense either… then… well there might be something do that.

I know you mentioned it, but it bears repeating: GPT “understands” nothing.
It “recognizes” and “predicts.”

And it only has so many places in a prompt it can “recognize” things at the same time – if you load it up on rules, it will almost guaranteed not “recognize” all the rules at the same time.
And, because it predicts one token at a time, with no ability to loop or go back and re-do, it can’t “check itself” once it’s gotten further into the output. Each output token is written in stone.

Even if you are doing a single shot from your app you can include the prior parts of the conversation in there. (both user and system responses)

Make life easier on yourself. Start in ChatGPT and ask a few questions to prime the results before you ask your main question. Experiment until you find a sequence that works to get you good results most of the time. Once you feel you are there you can then do your one-shot prompt for your application where you send the prior elements of the conversation (every single time) that the user will never see. That should get you more consistent results when your usage is an edge case (based on this models training)

In the API chat is just a series of one shot prompt completions with the prior conversation loaded so that it has context.

1 Like

Hey, I had similar problem, the workaround (or maybe it’s the way to do it) I found is rather simple, pass on examples as previous interactions. That is add a dictionary of ‘user’ and ‘assistant’ chats where you ask similar question, and manually write the answer you expect. This works surprisingly well, especially for larger tasks.