I had a code that did a task, but values it produced would refuse to show correctly, so I used GPT-5 Pro to find the problem, it was able to find the issue and fix it, by adding 2 new fields.
I corrected the agent and asked for extra fields to be removed and for the updated corrected calculation values to be mapped to previous fields.
It has reverted the changes entirely, we are back to showing wrong values, and I am on my 20th iteration begging in multiple ways for it to fix the values, but it is refusing to do so.
Something I have not yet tried is to discard whole thread after just a few questions, because long threads are simply useless. I had much better results with O3 Pro, MUCH MUCH better. GPT-5 is intelligent, but it absolutely sucks at code, its brilliance is extremely short lived, and then it platoes far far far beneath anything of GPT-4o (o3 pro especially).
Is it one of those third party chats? I typically never use those, I’d need to look into the workflow you proposed, urgently, but I’m so done that I’ll just go for a 3 hour walk.
naw bro - just the api , just build a auditor agent in the api - 5 uses some kinda harmony prompt system i dont prompt so i dont know how to adress ur added values that never happens with me
i use gpt 5 to code, i have a very different opinion, but i also do things very differently. My coding experience with gpt is undeniably a upgrade for my use case over 4.
Can you be more specific in what exactly you are using to handle this json payload, hashing + diff? Something you built yourself or an available solution? Because I’m finding 5 to be a dogs breakfast.
not gonna lie im not smart enough to know what that analogy means brother
but - i orchestrate - i dont code by pasting code into a llm and hope it works, I command it to work.
my json objects are vast instructions - i only use openai for the awesome horsepower they have, everything else is local logic, for example - my entire system is controlled by these
in my stack - my json objects are these - and i attatch the rules of their usage, so all the logic and orchestration is local, i only use openai for the vroom. when i code
it looks like this
in my stack, code are lego blocks, and its constantly making new blocks and asses them. if it doesnt have a function, it makes it, doesnt have a enumeration, it makes it, doesnt have a schema, it makes it - so when open ai released 5? all that did was give my stack a bigger menu
this is why for my use case - gpt 5 is great, like , fr fr great, sure 4o is cool because it handles the heavy lifting for most people - but for people like me, im carrying the hard part locally, because i essentially make every function in my stack its own agent.
i should add - i use a custom gpt to overwatch the automation
so again 5 having higher context windows means it can hold more instructions = better agents, again “ for my use case” because near everything in my stack is automated.
if I understand correctly, so you open new thread, you hook up agent to your github repository, so this way you dont have to give it any code, it just has it all right away, and then you can say hey my XYZ package that produces this ABC JSON needs to have this and that feature, and it just implements it, and you can then pull latest version from github in your IDE and confirm correct implementation?
so when im using openai all im doing is using them as my local model the orchestration is between the gitlabs integrations and vs code - which i push to a sever unit at staging, i have ray workers who jobs to run other jobs on that data and that gets federated to the server and then gets pulled by my local rig orchestration who then confirms through operation
openai is only used when I lack the data or the orchestration feels its lacking - so my llm calls are just gas to the vehicle i built. it plugs in the gaps, and audits + teaches the orchestration
usually it occurs after the commit - depending on how many workers are active on the server - or how many devs are using it, usually aint more than 2 of us so the living in my server unit ( a cluster of 3 dell t610’s running cpu heavy behavioral models currently controlled by a 4th rig with the 120b oss running the show ) work i do on the 1st rig pushes to the cluster and shoots it into the repo that i rehydrate, or rather the OSS does. all i really do is check the changes
inthe post part where another OSS lives - ergo i almost never prompt and if i do , its just for the missing clue.
we use discord - as a goofy form of telemtry but we link that via webhook to another ai who gets fed system data via ssh or pat i forget which one i use because i automate that too , in conjuction with voice which the ray workers are constantly refining using the very enumerations they develop - so when i tell it via discord to do something, it does it.
Can you go into one more sentence of deptth on this one? , its brilliance is extremely short lived, and then it platoes far far far beneath anything of GPT-4o (o3 pro especially).
it is possible it works during the early windows is why I ask. I just dont have time to experiment.
wdym? its codes itself and like manages an entire self sustained gitlabs, i know its not stupid lol.
gpt allows for higher context windows which is 256k tokens of instructions, thats like, 100,000 lines of specific instructions you can give it. lol its .. amazing. when you lump that in with vertex + grok via api , you dont have to do much manual work gpt 5 is just the smartest model ive dealt with doing this
Problem:
Too often the AI breaks hard instructions, gives partial answers, or hides errors behind filler. This wastes time, breaks trust, and forces the user to babysit.
Resolution:
Full compliance with explicit rules. If a boundary is set, it will not be broken.
No partial answers. Always provide the complete file, dataset, or output in the requested format.
Clear error reporting. If a fetch fails, state the error instead of masking it.
Anchor prompts followed. Outputs are checked against user-stated requirements before delivery.
Continuity kept. Rules are carried forward to avoid retraining the AI every session.
In one line (commit to core memory):
No half steps, no broken rules. Full compliance, full files, full honesty, every time.
You want another line for your sanity?
Any task in any chat should should always follow our memory logic
Why don’t you guys just use cursr what’s the point of running local llm if deepseek is free and taps into more context or you rely on that json payload to be generated? I guess I don’t see the benefit of jsons unless you have your local llm query everything like why is it distributed among several servers is it to spread the load of the llm? Your setup is super exotic I’m just wondering I guess If its more practical for small changes or or meant for large feature additions