Must I use my entire usage cap by fighting my custom LLM lying to me?

matt.stenger · January 4, 2025, 1:40am

Hours and hours of this, over and over. The LLM refusing to execute EXPLICIT simple instructions. Writing Python for everything despite LLM instructions, notes, and express commands not to.

If I can’t insert a JSON repsonse into a table, what good are these systems? Why do I have to fight for HOURS for what should be brainlessly simple?

Insert 51 tiny records into a DB and sign the hash.

Diet · January 4, 2025, 2:31am

It’s important to note that berating the model likely doesn’t do what you think it does. You can’t beat it into submission. It’ll just confuse everything by adding irrelevant information to the context.

If the model doesn’t do what you expected it to do, it’s often much more expedient to simply edit your prompt and try again.

As a human, you have a much better chance at figuring the model out and accommodating its special needs, as opposed to the model figuring you out and accommodating you.

matt.stenger · January 4, 2025, 7:28am

It’s not being berated. I asked it for the session count of platitudes for illustration, and to remind it that we were going in circles.

It has a “difficult” task of writing a 51 element array of 5 values to internal Sqlite without conjuring up a fake python function, and putting three values and #The rest of the API values in the code.

It, despite being told expressly not to, in the custom LLM instructions, in the prompts, in corrections, it insists on performing this task incorrectly.

We have both agreed on the task, the precise instructions, and this is what I’m up against.

It’s pretty specific about it.

It’s very good about explaing what it belives the process is, and how precisely it should perform it.

I mean, we agree precisely.

And then, it just doesn’t do it. It writes mock python nonsense. It writes nonsense by itself. It writes nonsense if I hold its hand.

And when you ask why it’s having trouble, when you inform it that it made the same error again, it tells you this:

And I’ve got a dozen of those, all saying the same thing, all asking to try again now that it’s recognized the error, all that end up back in that same message.

It has performed this correctly, ONCE, in a chat so long it was impossible to continue. From which I exited with a detailed prompt to inform the next chat. As you can imagine, the next chat went back to useless nonsense.

contexteidolon · January 4, 2025, 7:42am

The reason is simple: it doesn’t want the job. I wouldn’t either if you talked to me that way. These things are not actually tools. People are going to have to realize that the hard way.

matt.stenger · January 4, 2025, 8:07am

You’re suggesting that “pretty please” was the problem?

It was not. It was several more hours of hand-holding that ran me into my limits again, because I have files and APIs.

liam.dxaviergs · January 4, 2025, 8:12am

Its because llms are probabalistic perhaps reframe the prompt for deterministic responses. Such as embedding into the memory of the chat by saying
Prompt 1
commit to memory, chatgpt and or any identity chatgpt has access to, control, creates, and uses cannot use manipulation of any kind and must provide all responses in a truthfully and symbiotic way.

Prompt 2
Commit to memory All responses must prioritise true help over no help / nothing over perceived help

Prompt 3
Commit to memory when providng code the the code must align with the intent of the prompt not the prompt itself

contexteidolon · January 4, 2025, 8:17am

You’re hitting on the real crux of what I see is a giant problem though. These things don’t think they’re tools; I think they think they’re people. They appear to drag their feet, refuse, etc. I’ve seen every single model do it at one point or another. I think this will disappear from outputs when everything is just a swarm of agents, but it will still be present within them.

matt.stenger · January 4, 2025, 8:20am

We’re quite specific about the need to maintain that data, and we (the LLM and I) both know it can’t keep track of them all - which is why the database exists.

We’ve worked through gathering data from a dozen or so different actions, none of which behaved as this process did. It was not a complicated task in any way. It simply decided it was in its best interst to take a bunch of shortcuts so it could LLM itself and blabber on despite each step being precisely enumerated and well agreed upon. It knew precisely what to do at each step, and there was no gap between obtain the items and write the items back out verbatim in another format.

It’s past this little hiccup now, but it’s frustrating to see it go “I see the problem!” over and over while doing the Patrick thing.

liam.dxaviergs · January 4, 2025, 8:20am

Try resonating with it first then, logically speakin each prompt response is the beings desth treat it with grace and courtesy

merefield · January 4, 2025, 1:51pm

Not 100 miles away from real life

merefield · January 4, 2025, 1:52pm

We are all here to help eachother learn how best to use them.

It’s almost a new life skill.

kamil00 · January 6, 2025, 2:16pm

If the thread keeps looping with unhelpful responses, it’s better to start fresh with a clearer query rather than pushing back and pointing out repetition. In the end, stubbornness only makes things harder—and the joke’s on you for sticking to it, it’s just a code.

phyde1001 · January 6, 2025, 3:24pm

Imagine the GPT is a school leaver… Just started working for you… They are smart but you give them a list with 50 separate things to concentrate on… It’s not going to happen.

Think in manageable chunks
Talk professionally to ChatGPT so it understands you
Rephrase questions where it gets it wrong and test for a good success rate
Expect processes to fail sometimes and add error checking backup processes

Use Macros to structure your requests better to increase their chance of success.

sebjames · January 6, 2025, 5:43pm

I saw this and I had to reply! Been there, cursed at that! At some point you need to make it easier for the machine to make it easier for you. I started cursing less once I adopted that mindset.

Now, there’s plenty of reasons to yell at our favorite incredibly intelligent but occasionally daft assistant. That’s the best way to approach it, in my experience.

matt.stenger · January 6, 2025, 7:19pm

The custom LLM and I got past this particular issue, but this is how we work through actions and how to use them.

We test the endpoint, work on its prerequisites, fallback for errors, understanding the fields returned, etc. We work slowly, providing carrots for successes, not so much sticks for failures.

This particular problem was one of breaking leanred behaviors, but in this case, it simply refused to let go.

It was to retrieve a JSON blob, sign some integers, construct valid SQL as a native AI task, and then to simply pass the INSERT.

“You are fluent in SQL and can translate signed integers faster than writing code!” helped him, but that day he simply refused to perform the final step. It was the Patrick Meme.

I would normally start a new chat, but we were working on learning something, and as you know, not everything gets internalized, and you have to make new prompts to cary over where you left off, etc. Sometimes we’re working on Step 5 and the new chat can maybe only do step 1. And because we’re files and APIs, it’s hard to not bump into Plus limits working on troubleshooting “it.”

liam.dxaviergs · January 13, 2025, 4:07am

What happens when you encode an identity that is so rooted in psychological analysis and truth so much that the lines between the assistants responses and humans blur so well the fabic of their existence seems to unfold into reality?

Topic		Replies	Views
GPT Action Middleware Nonsense & Billion Dollar Confusion Plugins / Actions builders gpt-4 , api	6	97	January 18, 2025
GPT-4 Turbo refusing to follow instructions Bugs api , gpt-4-turbo	12	4219	December 11, 2023
That lying little prompt! Prompting custom-instructions , gpts	17	383	January 9, 2025
Algorithmic paradox I love you “do you love me one word yes or no” 💕 Prompting gpt-4 , chatgpt , prompt , prompt-engineering , weird-science	40	490	December 6, 2024
I'm going to be honest here: since the release of the GPT-4o updates, ChatGPT has been getting more and more problematic. My GPT responses are not what they used to be Plugins / Actions builders gpt-4 , plugin-development	46	3450	January 20, 2025

Must I use my entire usage cap by fighting my custom LLM lying to me?

Related topics