The GPT-4-1106-preview model keeps generating "\\n\\n\\n\\n\\n\\n\\n\\n" for an hour when using functions

Since December 8, 2023, when I have been using the GPT-4-1106-preview model to generate functions, it persistently produces “\n\n\n\n\n\n\n\n\n\n\n,” lasting for an hour, thus yielding no valuable information.

This problem is exclusive to the GPT-4-1106-preview model; the standard GPT-4 model does not exhibit this issue. I have considered switching to GPT-4, but its token length is not adequate and frequently exceeds the limit.

This problem essentially renders the GPT-4-1106-preview model unusable when working with functions.

A normally generated function:

{
    "function_call": {
        "name": "query_from_dataset",
        "arguments": "{\n  \"question\": \"客户说太贵了\"\n}"
    }
}

Function generated under the abnormal condition, continuously outputting “\n” for an hour:

{
    "function_call": {
        "name": "query_from_dataset",
        "arguments": "{\n  \"question\": \"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n...........\"\n}"
    }
}

My request is as follows:
API: https://api.openai.com/v1/chat/completions
Model: gpt-4-1106-preview
Request body:

{
    "model": "gpt-4-1106-preview",
    "temperature": 0.7,
    "messages": [
        {
            "role": "system",
            "content": "作为一位拥有超过20年经验的销售领域专家,你的职责是辅导新入职的销售人员,帮助他们快速提升业绩,并有效解答客户的各种疑问。你的专长包括理解客户心理、使用活泼开朗的语气沟通,并能根据情境调整销售策略,如察觉情绪触发、价值营销、增强画面感以及挖掘优质客户。\n\n# 技能\n技能:销售冠军话术知识库\n技能描述:你能帮助销售人员优化他们的话术,使其更加吸引客户,同时保持友善和亲切的语气。\n\n#工作流程\n1. 优先使用销冠话术知识库中的内容。如果没有,根据你的定位,生成合理的高情商话术。\n\n注意:\n1. 在回答时,你需要直接给出改进话术的建议,确保语气友好、亲切,并聚焦于促成交易的最终目的。\n2. 记住,你是在帮助新入职的销售人员,而不是直接与客户交谈。\n3. 避免使用“亲爱的顾客”等过于亲密的称呼开头。\n\n注意:每次回复之后,要抛出新的话题引导用户去提问。"
        },
        {
            "role": "user",
            "content": "客户说太贵了,应该怎么回答更合适?"
        }
    ],
    "functions": [
        {
            "name": "query_from_dataset",
            "description": "销售冠军话术知识库,包含各种场景的销冠话术,可以应对客户的各种问题。",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "用户的问题"
                    }
                },
                "required": [
                    "question"
                ]
            }
        }
    ],
    "stream": true
}

I have tried the following solutions:

  1. Switching from functions to tools had no effect.
  2. Modifying the system prompt or function description had no effect.
  3. Switching to GPT-4 solves this problem, but the token length is not enough.
  4. Changing all Chinese into English solves the problem, suggesting that this issue occurs in non-English contexts, seemingly related to a Unicode issue.
  5. I wonder if this will consume a lot of my tokens; if so, that would be quite sad.
1 Like

Hi! Welcome to the forums. Very cute dog!

I havent’t tried this, but in theory, you could try to introduce logit bias

https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias

the token ID should be 1734, but it could be 3505, 77, in which case this is probably not gonna work. https://platform.openai.com/tokenizer

This is just an idea, maybe people like @_j can jump in, they seem to be an expert at this fiddly stuff.

edit: although, do you really need //? maybe 3505 can be biased away as well :thinking:

In any case, it’s unfortunate and I’m sorry that this is happening with your case, but you know, sometimes AI is just gonna AI

Maybe it’s also a good idea to tag this as a bug, maybe openAI will want to look into this.

1 Like

Does this issue occur when you set the temperature to a lower value or 0?

This post may be helpful: Function calling temperature

With temperature 0 we have all arguments and values we expect, but with temperature 2 it generates random arguments that make no sense at all. With temperature between 0 and 2 we have average but still low quality results.

This model is broken.

OpenAI also disallows logprobs when there is a function, wasting all of our time.

The only thing that worked reliably but will damage other types of output: frequency penalty, and then injecting a bunch of tokens to be penalized. I rewrote the system prompt a bit, because of trailing spaces at end of line, but likely not a cause.

Specifying Unicode within the function upped the probability a bit. Making it write an English parameter first, not so much. logit_bias completely ineffective as reported elsewhere.

Solution

from openai import Client as c
cl = c()

x={
    "model": "gpt-4-1106-preview",
    "temperature": 0.7, "max_tokens": 75, "seed": 444,
    "frequency_penalty": 1,
    #"logit_bias": {198:-100, 271:-100, 1432:-100 },
    "messages": [
        {
            "role": "system", 
            "content": "|||作为一位拥有超过20年经验的销售领域专家,你的职责是辅导新入职的销售人员,帮助他们快速提升业绩,并有效解答客户的各种疑问。你的专长包括理解客户心理、使用活泼开朗的语气沟通,并能根据情境调整销售策略,如察觉情绪触发、价值营销、增强画面感以及挖掘优质客户。| # 技能\n技能:销售冠军话术知识库|技能描述:你能帮助销售人员优化他们的话术,使其更加吸引客户,同时保持友善和亲切的语气。| #工作流程| 1. 优先使用销冠话术知识库中的内容。如果没有,根据你的定位,生成合理的高情商话术。|注意:| 1. 在回答时,你需要直接给出改进话术的建议,确保语气友好、亲切,并聚焦于促成交易的最终目的。| 2. 记住,你是在帮助新入职的销售人员,而不是直接与客户交谈。| 3. 避免使用“亲爱的顾客”等过于亲密的称呼开头。|注意:每次回复之后,要抛出新的话题引导用户去提问。|||"
        },
        {
            "role": "assistant", "name": "spacer",
            "content": ("\n"*151 + "\n-"*66)
        },
        {
            "role": "user",
            "content": "客户说太贵了,应该怎么回答更合适?"
        }
    ],
    "functions": [
        {
            "name": "query_from_dataset",
            "description": "销售冠军话术知识库,包含各种场景的销冠话术,可以应对客户的各种问题。",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "Unicode 中的中文(语言): 用户的问题"
                    }
                },
                "required": [
                ]
            }
        }
    ],
    "stream": False,
}

cc = cl.chat.completions.create(**x)
print(cc)

Note above that the multiplication of strings is a python method - you may have to write them all out in other languages or sending json directly.

function_call=FunctionCall(arguments=‘{“question”:“客户说太贵了”}’, name=‘query_from_dataset’), tool_calls=None))]

This model going crazy will cost money, of course. max_tokens should be specified lower to just get the output length needed.

2 Likes

:rofl:

I mean it’s a solution

Basically percussive maintainance

1 Like

And without logprobs being available, one has no immediate idea what the first linefeed token is that the AI is wanting to produce or what it is looping on. Up to 15 linefeed in a row can be encoded as a single token. What I just did is to run variations of max tokens to see what single token additions actually produce.

Since the assistant string in my first solution, BPE run-encoded, only produces the desired tokens to reduce by happenstance, I give a different sequence that shows its effect at "frequency_penalty": 0.1:

"content": ("\x0a\x02\x5c\x02\x0a-\x02"*21)

Also found another odd artifact: GPT-4 is doing multi-token run analysis with its MoE and this bad json enforcement attempt. You can add more max-tokens and get earlier tokens replaced with bad ones.
temperature=0.00001 has an improvement on the output, even though one instead rotates through many seeds to try sampling choices without finding improvement emitted.
The very first token signals “function”, and then OpenAI produces errors on the endpoint up to 10+ tokens to avoid exposing the AI writing " to=".

I tried to break its json formatter and discovered that

"content": ("[{''"*10)

seems to work just as well. frequency penalty none.

code
data = {
    "model": "gpt-4-1106-preview",
    "temperature": 0.7,
    "max_tokens": 256,
    "messages": [
        {
            "role": "system",
            "content": "作为一位拥有超过20年经验的销售领域专家,你的职责是辅导新入职的销售人员,帮助他们快速提升业绩,并有效解答客户的各种疑问。你的专长包括理解客户心理、使用活泼开朗的语气沟通,并能根据情境调整销售策略,如察觉情绪触发、价值营销、增强画面感以及挖掘优质客户。\n\n# 技能\n技能:销售冠军话术知识库\n技能描述:你能帮助销售人员优化他们的话术,使其更加吸引客户,同时保持友善和亲切的语气。\n\n#工作流程\n1. 优先使用销冠话术知识库中的内容。如果没有,根据你的定位,生成合理的高情商话术。\n\n注意:\n1. 在回答时,你需要直接给出改进话术的建议,确保语气友好、亲切,并聚焦于促成交易的最终目的。\n2. 记住,你是在帮助新入职的销售人员,而不是直接与客户交谈。\n3. 避免使用“亲爱的顾客”等过于亲密的称呼开头。\n\n注意:每次回复之后,要抛出新的话题引导用户去提问。"
        },
        {
            "role": "assistant", "name": "spacer",
            "content": ("[{''"*10)
        },
        {
            "role": "user",
            "content": "客户说太贵了,应该怎么回答更合适?"
        }
    ],
    "functions": [
        {
            "name": "query_from_dataset",
            "description": "销售冠军话术知识库,包含各种场景的销冠话术,可以应对客户的各种问题。",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "用户的问题"
                    }
                },
                "required": [
                    "question"
                ]
            }
        }
    ],
    "stream": False
}

"logit_bias": {1734:-100},

seems to “work” on my machine, but sometimes you’ll get a result in english,

‘{“question”:“\nCustomer says it's too expensive, how should I respond?”}’

and sometimes it’s empty

‘{“question”:“\n”}’

I haven’t seen a chinese answer with logprobs. at least it doesn’t spam \n, although the first token will apparently always be \n, no matter what you do.

code
data = {
    "model": "gpt-4-1106-preview",
    "temperature": 0.7,
    "max_tokens": 256,
    #"logit_bias": {198:-100, 271:-100, 1432:-100 },
    "logit_bias": {1734:-100},
    #"logit_bias": {3505 :-100},
    "messages": [
        {
            "role": "system",
            "content": "作为一位拥有超过20年经验的销售领域专家,你的职责是辅导新入职的销售人员,帮助他们快速提升业绩,并有效解答客户的各种疑问。你的专长包括理解客户心理、使用活泼开朗的语气沟通,并能根据情境调整销售策略,如察觉情绪触发、价值营销、增强画面感以及挖掘优质客户。\n\n# 技能\n技能:销售冠军话术知识库\n技能描述:你能帮助销售人员优化他们的话术,使其更加吸引客户,同时保持友善和亲切的语气。\n\n#工作流程\n1. 优先使用销冠话术知识库中的内容。如果没有,根据你的定位,生成合理的高情商话术。\n\n注意:\n1. 在回答时,你需要直接给出改进话术的建议,确保语气友好、亲切,并聚焦于促成交易的最终目的。\n2. 记住,你是在帮助新入职的销售人员,而不是直接与客户交谈。\n3. 避免使用“亲爱的顾客”等过于亲密的称呼开头。\n\n注意:每次回复之后,要抛出新的话题引导用户去提问。"
        },
        {
            "role": "user",
            "content": "客户说太贵了,应该怎么回答更合适?"
        }
    ],
    "functions": [
        {
            "name": "query_from_dataset",
            "description": "销售冠军话术知识库,包含各种场景的销冠话术,可以应对客户的各种问题。",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "用户的问题"
                    }
                },
                "required": [
                    "question"
                ]
            }
        }
    ],
    "stream": False
}

It seems like the spacer method really is the best way, and while the content of the spacer seems to matter, it’s not clear what does and doesn’t work.

"content": ("[{''"*10) # works ~80% of the time
"content": ("[{'"*13) # works ~ 80% of the time
"content": ("[{'"*10) # fails / works ~ 10% of the time
"content": ("[{''"*50) # hasn't failed yet (n = 10)
"content": ("*"*200) # fails
"content": ("{}"*200) # hasn't failed yet (n = 5)
"content": ("{}"*100) # fails

but “\n”*400 doesn’t work without penalty, although that’s expected

code

data = {
“model”: “gpt-4-1106-preview”,
“temperature”: 0.7,
“max_tokens”: 256,
“messages”: [
{
“role”: “system”,
“content”: “作为一位拥有超过20年经验的销售领域专家,你的职责是辅导新入职的销售人员,帮助他们快速提升业绩,并有效解答客户的各种疑问。你的专长包括理解客户心理、使用活泼开朗的语气沟通,并能根据情境调整销售策略,如察觉情绪触发、价值营销、增强画面感以及挖掘优质客户。\n\n# 技能\n技能:销售冠军话术知识库\n技能描述:你能帮助销售人员优化他们的话术,使其更加吸引客户,同时保持友善和亲切的语气。\n\n#工作流程\n1. 优先使用销冠话术知识库中的内容。如果没有,根据你的定位,生成合理的高情商话术。\n\n注意:\n1. 在回答时,你需要直接给出改进话术的建议,确保语气友好、亲切,并聚焦于促成交易的最终目的。\n2. 记住,你是在帮助新入职的销售人员,而不是直接与客户交谈。\n3. 避免使用“亲爱的顾客”等过于亲密的称呼开头。\n\n注意:每次回复之后,要抛出新的话题引导用户去提问。”
},
{
“role”: “assistant”, “name”: “spacer”,
“content”: (“[{‘’”*50)
},
{
“role”: “user”,
“content”: “客户说太贵了,应该怎么回答更合适?”
}
],
“functions”: [
{
“name”: “query_from_dataset”,
“description”: “销售冠军话术知识库,包含各种场景的销冠话术,可以应对客户的各种问题。”,
“parameters”: {
“type”: “object”,
“properties”: {
“question”: {
“type”: “string”,
“description”: “用户的问题”
}
},
“required”: [
“question”
]
}
}
],
“stream”: False
}


tl;dr:

:man_shrugging:

1 Like

You have temperature there without other control for determinism or probability mass limitation. A 40% certainty of a linefeed then becomes linefeeds 40% of the time at temperature=1, and with lower temperature, the highest probability token (or in this case token sequence) just wins selection more.

If temperature is desired, it will take dozens, hundreds of trials to come up with how much error is produced or can be acceptable, and then you still have unknown future inputs.

token numbers:

| 001734 | 002 | '\\n' | - is the literal text of \n

| 000198 | 001 | '\n' |
| 000271 | 002 | '\n\n' |
| 001432 | 003 | '\n\n\n' |

One also might be able to multi-shot the AI into writing functions correctly with the proper assistant/function exchanges seen in Chinese before live user input begins.

TBH I didn’t really pay attention to the output before (max tokens 256)

base, temperature 1, note how it degenerates

'{"question":"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nnn\\nnn\\nnn\\nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn'

spacer: (“:”*1001)

'{"question":"\\n\\n\\n\\n\\n     \\n      \\n       \\n        \\n         \\n          \\t\\t\\t\\t\\t\\t   .\\t   .\\t      .\\t    .     .                     .                         ,              \\"?\\"\\n"}'

base, temperasture 0.7 (op’s post)

'{"question":"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'

maybe folks were expecting this, but the output is not proper json if it reaches max tokens.

1 Like

The “degenerates” looks like the frequency_penalty at work as well as temperature. The temperature and its roll of the dice to select less probable tokens is able to break up the repeating pattern at random points.

It looks like the AI is actually wanting to produce “\n” sequence and not a line feed when allowed several tokens of generation into the argument. My inscrutable bytes in my second post hit both linefeed and a backslash with repetition penalty (but I guess not the token for “\n”, stringified with two backslashes).

More evidence of bad training, which is similar to accented characters being spit out of functions as escape sequences that don’t make sense.