Request to support exact reply prefix of assistant for chat completion

Snowdar · December 29, 2023, 2:14am

For chat completion, now, the messages format is a only way to provide the context for GPT, if

messages = [
  dict(role="system", content=prompt),
  dict(role="user", content="Hello")
]

then, the input string format of GPT could be added an assistant prefix, such as:

<|im_start|>system
prompt<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant # added assistant prefix

But I want an extra reply prefix to control the response to make it more stable:

<|im_start|>system
prompt<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant # added assistant prefix
[step 1] # added extra reply prefix to force guide the response with instruction of step 1 in prompt

For simple and short prompt, write special instruction in prompt directly to require the GPT to output with this extra prefix [step 1] is okay in most of time. But it does not work well in long prompt or complex task-oriented prompt and it will also result in failure to follow major mission instructions. Another try is appending the assistant message containing the prefix to the end of messages, but it does not work either. That’s why I want this new feature for chat completion.

Of course, this prefix should not influence the funcion calling (if a function is triggered, this should be ignored.), but I think this requires the ability of the model if appending this extra prefix directly. So, maybe token limitation is an another way to avoid the influence to the function, for example, decoding starts by accepting only function and prefix content in the decoding tree.

Diet · December 29, 2023, 2:27am

Hi! welcome to the forums

what you’re describing is unfortunately what openAI considers “legacy completions”

you can still have this sort of control with completion models such as gpt-3.5-turbo-instruct - for now.

IMO these completion models are much more powerful if used correctly, but these modern chat approaches are obviously much easier to use for most people, because apparently, most people are just interested in building dumb chatbots.

Snowdar · December 29, 2023, 2:38am

But I do not know which special tokens (such as <|im_start|> etc.) should be used to segment messages, in other words, I am not sure there are existed special tokens which has been good finetuned like chat model. I just need the reliable chat mode and also a prefix-based completion at the same time. Maybe gpt-3.5-turbo-instruct needs more extra examples to do in-context learning (This is not what I want to see)?

Diet · December 29, 2023, 2:50am

I remember that davinci-003 was pretty reliable with json output. Apparently they nixed 003 completely, and you can now only finetune davinci-002 (according to their docs, I don’t know, haven’t tried)

As to which tokens you should use, I think that’s mostly irrelevant, you can just make stuff up, especially if you use a finetune.

I don’t think you need a lot of examples with turbo instruct, you’re probably better off adding an appropriate instruction in the context. Also, don’t forget to add an appropriate stop sequence.

_j · December 29, 2023, 3:20am

What you are talking about doesn’t seem to have much to do with token message containers.

A message and its special tokens, btw, currently looks like:

{start}role_type:optional_name{middle}this is what I say{stop}

Instead, you want to add completion text into the assistant response area.

This is unlikely to change, because it then becomes just as easy to complete upon other text:

You can prompt that all responses must begin with your [step 1] prefix text and likely get good results - as long as the conversation doesn’t stray long and the AI then just fails at most stuff.

To keep the instruction within the AI’s attention better than you’ve achieved, you can programmatically add it as a post-prompt to the user input immediately prior.

Snowdar · December 29, 2023, 3:50am

Thanks to reply. As I mentioned, if you strictly require the model to be output according to a certain prefix, it will only be stable if you add a system message at the end instead of writing it in the system prompt at the beginning, but this will affect the function calling and may also interrupt the task described by the initial prompt.

_j · December 29, 2023, 4:12am

I did this on completions since I already have my own chatbot that uses “tokens”. It was actually just adding the characters to the overly-constructed prompt, but [Step 1] is not seen echoed here, just followed:

system: 73 tokens
user: 17 tokens
{'role': 'user', 'name': 'program start header', 'content': 'generate AI introduction. Print date and time.', 'time': '2023-12-28 20:02PM', 'tokens': 17}

  Hello there! My name is ChatI and I am an AI language assistant. I
  am powered by OpenAI's gpt-3.5-turbo-instruct model, which means I
  am constantly learning and improving. Today's date and time is
  December 28, 2023 at 8:02 PM. Is there anything specific you would
  like to know about me? 
[Finish reason: stop, 1.39 seconds.]
Prompt: How can I construct a carbon bulb filament similar to Edison bulbs
in 1900 to then reproduce a completely authentic old bulb?
   First, you will need to gather the necessary materials such as
  carbon rods, glass bulbs, and a vacuum pump. [Step 2] Next, you will
  need to create a mold for the filament using a thin wire. [Step 3]
  Then, insert the carbon rods into the mold and heat them until they
  become pliable. [Step 4] Once the carbon rods are pliable, carefully
  shape them into a spiral shape to mimic the design of Edison bulbs.
  [Step 5] Place the spiral carbon filament into the glass bulb and
  use the vacuum pump to remove all air from the bulb. [Step 6]
  Finally, seal the bulb and test it to ensure it is working properly.
  Is there anything else I can assist you with?
[Finish reason: stop, 1.80 seconds.]
Prompt: How can I construct the Edison metal base and then attach it
to the vacuum bulb that only has two wires protruding from the glass?
   Gather the necessary materials such as a metal base, screws, and a
  soldering iron. [Step 2] Use the soldering iron to attach the two
  wires protruding from the glass bulb to the metal base. [Step 3]
  Secure the wires in place with screws. [Step 4] Test the connection
  to ensure it is secure and functioning properly. Is there
  anything else I can assist you with?
Prompt: What's a good cold resistance to measure across the filament
to know it will work with 120VAC?
   Measure the resistance of the filament using a multimeter. [Step 2]
  Convert the resistance to ohms using the formula R=V/I, where V is
  the voltage (120V) and I is the current (which can be found on the
  packaging of the bulb). [Step 3] The ideal resistance for a 120VAC
  bulb is around 960 ohms. Is there anything else I can assist you
  with?
[Finish reason: stop, 1.23 seconds.]
Prompt:

The model’s AI is rather curt. It is “trained” on both a completion-style preprompt and then a system message.

BTW, for the curious, the ChatI container tokens are ASCII control bytes that are encoded to a single token. The AI learns the correct stop token just by seeing the system and user messages. Similarly, there was no instruction about what [step 1] would mean, I just added that text and the AI runs with it. Chat models can’t learn this fast.

Topic		Replies	Views
OpenAI Raw Prompting/Response Prefixing API	4	1538	January 23, 2025
Feature request: token injection during streaming for structured output generation API	4	1324	May 17, 2023
Consecutive assistant messages Prompting gpt-4	7	2777	August 25, 2023
Chat Completion API - System prompt does't work all the time API	4	2637	May 23, 2023
The Completions API doesn't really return completions API prompt-engineering	10	275	August 26, 2024

Request to support exact reply prefix of assistant for chat completion

Related topics