New gpt-4-turbo-preview saying it can't help on complex prompt

Hey there, first post so please bear with me.

I have a very long & complex prompt that involves aggregating multiple text sources and generating new writing based on the aggregation. Loosely, the prompt specifies a format, tone instructions, source content, and an outline. It is one step in a multi-step process. For this step, the system and user instructions comprise around 5,000 tokens.

gpt-4-1106-preview handles it just fine, but gpt-4-0125-preview responds with “I’m sorry, I can’t assist with that request”

It’s a bit ironic since I think 0125 is supposed to combat laziness. 1106 usually gives me just what I need, at around 1500 tokens in response. Has anyone been seeing similar issues?

NOTE: I can’t post the exact prompt here due to privacy issues with my company, but I’m trying to reproduce through similar but have not yet been able to.

temperature doesn’t seem to help, but my usual tempt for this prompt is 0.1 if that info is relevant.

2 Likes

It seems to me what you are doing is spinning articles, are they news?

It could be that that the newer model has been trained to not accept whatever type of article content that you’re attempting to spin.

3 Likes

I agree, that is ironic re: 0125’s objectives!

I have a pretty strong suspicion of how I could offer a tweak or two that might fix this.

I just recently posted the following as a tip for another user. Would it be possible for you to do this, so that you can share a rough example:

If you ever need to provide examples, the best route is taking the most conventional/standard examples you have, feeding them to an LLM and requesting it give you an analog. In your prompt, specify the topic and meta details so that the compute time / prediction focuses solely on the most analogous, direct parallel.

Remove all of the long content so that you can fit it in 1500 tokens or so. Ideally preserving the meta/instruction side and trimming the variable content between them.

Again, I’m not requesting anything private or anything directly from your prompt, but the methodology/formatting/similar example that I can test as I have a few suspicions.

In addition, you mentioned temperature – for purposes of testing/reproduction, you’re reducing top_p as well, correct? Temp at 0.1 and top_p at 0, in this instance?

Also, I’m sure you’ve already done this; however it can’t hurt to mention – have you attempted to merely trim the variable content and retry, adding iteratively to identify the point where it begins to fail?

Similarly, you’ve also provided GPT a highly simplified straightforward overview in the first sentence w/markup to assure that all context that follows is more accurately transformed?

I know these are rudimentary; however, it never hurts to be certain :slight_smile:

2 Likes

Hey. This is a good theory. The thing that is strange is that it’s happening regardless of the source content, and the task is purely neutral aggregation of text (some news, some other content like scientific documents etc).

Also I can send “can you explain why not?” or “why?” and it usually does it (albeit a worse job) after apologizing. I would guess that it would say something about TOS, etc. The task is generally “Summarize in this format from these sources as neutrally as possible” (and as far as I know was also cleared back when use cases had to go through approval)

1 Like

Never too rudimentary! I feel like a lot of things that are assumed in some people’s workflows are different in others. I’ve tried most of these but haven’t dug too deep yet since I just noticed this today. I will try to get an analog as well.

Thanks for the thoughts!

1 Like

I don’t have much experience with 125 yet; but typically if it apologizes, then you need to get rid of the apology and reframe the prompt (or just retry).

Sometimes a justification of why it’s OK or necessary at the end of the prompt might help the model get started.

My prompts are pretty complex as you can see here: API Prompt for gpt-3.5-turbo-16k - #12 by SomebodySysop

I haven’t tried the new gpt-4-turbo-preview but I’ve not had problems with these prompts with gpt-4-1106-preview. Although with gpt-3.5-turbo-16k I had your exact problem. I didn’t totally fix it, but improved the responses by sending the prompts in XML format. Maybe try that with your prompts to the new turbo preview to see if it makes a difference?

1 Like

next step prompt inject malicious prompts into websites
:thinking: