Once again, actions are useless, and the LLM running them are jokes.
Please gather this record from the API, and write the JSON verbatim to the scratch sqlite table.
And this just goes in circles for hours until I’ve used my Plus for the day.
Once again, actions are useless, and the LLM running them are jokes.
Please gather this record from the API, and write the JSON verbatim to the scratch sqlite table.
And this just goes in circles for hours until I’ve used my Plus for the day.
If I understand this correctly you are returning a large object response from some API server.
I ran into this myself. It’s quite embarrassing. I have to tell my clients to upload a spreadsheet to use Code Interpreter
and THEN use the API server to perform work on the data, otherwise the model will only acknowledge a very small number of items.
In my opinion OpenAI needs to include a way to pass a response to the model as a variable that can be directly passed to Code Interpreter, instead of having the model ingest and regurgitate it.
Yesterday it was writing all 37 rows to a table. It was impossible for it to write to them all.
I’m talking… 20 values, and a couple of text fields.
It’s an endless array of, “Despite your explicit instructions, despite how well descrived and articulated, despite agreeing on the workflow, I’ve decided to completely and entirely eff right off, also, you’re over your limit, come back in two hours, because I chewed a billion tokens regurgitating nonsense YOU PAID FOR.”
I paid to have an LLM lie to me all day.
When it ignore me again tomorrow, I’ll make another post that’s ignored. What’s the post of forums like this if they’re left here to rot.
I still dare anyone to provide, like, actual documentation.
Oh, it’s not just a big response.
Literal 70+ prompts to return a tiny array of data and resolve three hashes in a local database.
Plaese lookup this hash: (ignores explicit instructions)
…explain
…explain
…explain
Please lookup this hash: (fails, makes up nonsense)
…explain
…explain
…explain
…debug myself
…lookup myself
…integers myself.
Please lookup this hash: (looks up wrong hash, won’t give it up)
…explain
…get fresh data
…explain
…send screenshot.
…explain
Please lookup this hash: (ok, that is right, but I won’t change it)
…explain
…get fresh data.
…try again.
…no not that.
…explain
Please lookup this hash: (OK, not doing it.)
…get fresh data.
…try again.
…not like that.
…sign the integers.
…oh my god, sign the integers
…for the love of jesus, sign the integers
Please lookup this hash: OH, ok, now I’ve got it.
There’s just no point of working with these tools.
Spend an hour working on something, and then your chat becomes useless, the environment becomes dodgy, and you have to start a new chat.
Hope your finished your task, and your LLM finished understanding what you were teaching it, and that it didn’t get it wrong, because now you need to start over.
So, let’s start over.
I wonder what the next session will bring.
Perhaps today I’ll spend time actually fixing or refining something instead of dealing with backslide, and the complete inability to do anything useful.
Nope. Same nonsense, different day. It’s forgotten everything I’ve taught it. A complete an utter time-waster.
This entire solution is undocumented, and if someone dares say it is, please answer the following:
What’s our action middleware and how do we see errors? How does our middleware handle a 301? What are the size and token limits for the API? What are the size limits for the LLM to handle that data? Why does an “Actions” LLM need to help us write code, and why doesn’t it know what the hell your middleware is? Where’s single working example beyond NWS that isn’t a “draw the rest of the owl” example?
A miracle of modern technology that can’t seem to remember it can sign intergers faster than it can write python code or SQL code to sign integers. My brother in silicon, you speak bitwise operations! You can literally “type” the signed integer just as fast as the unisnged one!
It’s a miracle it doesn’t make up weather from the NWS api.
In fact the AI works very well as soon as you get access to o3-preview…
I could let it apply to freelance jobs and it does the jobs automatically using tasks… i don’t have to do anything anymore. It replies to mails and even my voice was cloned with the new voice cloning feature that came out last week…
I have no idea why people are complaining. It is a wonderful life…
Plugins/Actions are a joke.
At the time of writing this post, OpenAI is valued at 157 billion dollars.
Also, at the time of writing this post, OpenAPI is supported by roughly a TRILLION dollars worth of companies.
And despite all of that, we’re here on the world’s worst documented site.
Can a single adult human explain the current custom action middleware limits? Input and output size? Tokens? Number of calls? How do address a single “Tool Error” that your perfectly valid-in-Swagger YML executes? How does it handle a 301, or any other redirect? Pressing (TEST) in the configuration gives some diagnostic info, just as long as you don’t want ANYTHING back from the middleware tool. [And I dare anyone to send ARBITRARY key pairs in the message body without the middleware eating it.]
And the “library?” The joke of half-finished recipes?
Like, thanks for the Google Drive YML – now if it weren’t a “Just Draw the Rest of the Effing Owl” exercise missing writes. I applaud the guy who provided it, but 157 billion dollars later, maybe someone could have drawn the rest of the owl for us? Or should we all do it once and rush to the marketplace to sell it?
The lack of a common works with OpenAI LLMs published plugins is, especially for the companies in the TRILLION DOLLAR PARTNERSHIP who have these APIs, is almost criminal.
And things like the Action Builder LLM? Designed and published by OpenAI specifically to help us build actions? It doesn’t even understand how it’s OWN SYSTEM works, or that auth is handled by middleware. It barely knows it’s own SCHEMA requirements. It’s 0-for-infinity in generating working YML without edits or that actually included valid calls.
Why do I have to teach my own LLM about it’s own features? Why do I have to browse to a different LLM to find out that it speaks !ML perfectly, but mine needs taught how to use its own output language correctly? When core functionaity changes but you don’t tell us – OR THE LLM! – we have to teach it… …which wouldn’t be terrible, but we have to blindly discover these changes to OUR PROJECTS.
And if I do get my ations working, I’d better not have more than 30, which is better than 10, which it was last month. And my JSON better not get too big, because I can’t create another action with the same base URL, no matter how complicated the endpoint is, and how many actions it has in reality.
How many people with custom action LLMs are there? Should we all re-invent the wheel every day?
It’s insanity that we have to pay to beta-test your features for you. At least put up current limits, a bug bounty, or a YML contest if you’re too cheap to task an intern to update documentation, please.
There’s a hundred public APIs of great value that could easily be implemented, and we’ve got one working weather.gov API example, and some rest-of-the-owl garbage.
I’ve already got one job. I hate paying for another.
And today was fun.
I have an action that requires specific data from another action as part of a user-lookup process.
The custom LLM has been taught the process repeatedly.
All actions are working and have been tested repeatdly.
The custom LLM has a reference document explaining the process.
The custom LLM has just re-read and sumarized the reference document.
Best part of these? Every time the LLM loses its mind, it just starts chewing CPU and tokens, and I’m two or three messages away from a “Usage Cap” warning.
My punishment for your LLM misbehaving is that I waste my time, and get backed off of your service.
Additionally, my LLM decided to share API keys with the user today - so that’s special. It was kind enough to provide generic platitudes. “Sorry I compeltely breeched your trust in showing confidential data, and never following instructions. Ooopsiedoodle.”
The second your LLM starts to spout this “I can’t follow simple instructions” nonesense, you get kicked off for usage overages.
I have an action with EXPLICIT instructions, and I can’t teach my LLM how to use it in a single chat before it loses its mind.
Unacceptable.
There’s some complaints that generally I agree with. The interface is still the same buggy mess as before and GPTs have stagnated. From someone who has been following the journey of OpenAI this is unfortunately expected.
Technology releases, then stagnates for an indefinite time. Usually to be “re-born” as something else.
OpenAI is just constantly stuck in the “start-up” mentality of a business. The only truth to hold is knowing that they will constantly be attempting to improve their flagship LLM. Everything else is secondary.
That being said, what are your action(s)?
If you are finding constant struggle with GPTs it may make sense to move towards Asssitants, or even ChatCompletion. When you have more control it becomes harder to blame other people (Not a jab)
It’s a hobby project for me, an old geek.
Bungie.net is kind enough to publish a full (mostly) working openAPI document, but it’s roughly 2 megs, and 412,000 tokens and 80+ actions in a densly-packed reference heavy document. It’s great, but completely unsuitable, unless I get a better editor that can remove unlinked components. Of course, even if it was right, Bungie.net requires both Oauth (which middleware is kind enough to handle) and that an API key be sent with each request, so I’d have to go at least hang an anchor on each path for a manual key and/or teach my LLM the key like I have to do now. [The auth system only will provide one auth medthod at a time.]
So, I have 8-10 hand-crafted relevant actions from bungie.net (identifying users, resolving characters, pulling component data from characters), and another action from stats.bungie.net because of a 301 permanant redirect that the middleware can’t handle correctly - and won’t even display. “Tool error.” Gee thanks.
I can’t get the middleware to send arbitrary data pairs in the message body to get PATCH to work right despite it working in Swagger/Stoplight, so I have to use static column rows (pre-entered into my YML) and then make a dynamic schema row at 1.
Delete doesn’t work right, so I have to soft-delete in my schema. It might be a “me” error at this point, however. But who knows.
But create/patch works, so I can mostly CRU(d) with a schema flag.
So I keep a table for saved characters, an Insights table for intra-session knowledge, and componentStatus table that has the relationships between data objects, and the LLM is asked to read, summaraize, and to reference them during the session for uninternalized data. We regularly review and “defrag” the componentStatus table.
Which works perfectly, because the LLM just keeps picking “Quazar” without it.
Yeah, it’s weather data.
I was able to read, “navigate” (folders are metadata), change metadata (rename, etc.) but kept writing 0-byte files, which meant more YML that worked in Swagger, and stoplight, and postman, and fed a load of curls to my LLM and the “Actions Builder” trying to fix for nothing before surrendering.
On the non-action side, I have a local copy of the Destiny 2 “world” SQLite table in my Knowledge Folder right next to the file the LLM ignores that contains its personality file.
The LLM is also allowed to spin up a local scratch DB for any complex tasks, but we don’t use it much.,
The particular issue today is the getLinkedProfiles function, which has the requirement of the LLM reading getCurrentUser and making a determination on the data received to destermine the user’s correct platform (Xbox, Playstation, etc.).
This is a process that the LLM has fully internalized, but has regressed. So much so that we make a note of the explicit process in his “Insights” table at NocoDB, a ticketing system for himself, where we work on new features, refining existing processes, etc.
And when we try to work on the issue, the LLM goes into “Sorry, I had an execution failure, you told me what to do, but I didn’t do it. I promise not to next time, but this is a meaningless generic platitude.”
5-10 prompts into the session, after being explicitly given its process to resolve the inputs needed for getLinkedProfiles, it simply ignores them and makes up numbers, and then my session dies, and 2 hours later I have to start a new chat, and repeat the process with the issue still unresolved.
Oof. Yeah. One of my (many) issues with GPTs and Actions is the limited control over headers. I also fell into the pit of using a service that required custom headers. This issue was never resolved. I would imagine that the developers of OpenAI sided on caution, but it meant that we’re left bag-holding.
What helped me, and may help you, is setting up your own middleware adapter server.
I was required to do this. For my instance I had an authorization header, great, ideal. Yet, the server that runs the command requires an OpenAI key (Weaviate).
I just setup a simple server to validate & route the command, catch any common(ish) errors, and then forward the request with some trimmings.
The reality of these models is that once it has fully failed a task, it’s time to let go. As a GPT developer I would wish that we could force close a conversation for this reason.
I do agree that since OpenAI is valued at $157 billion, it could be better. I wish there were more tools for me to properly work on my projects. I use it mainly for programming, and it’s not super helpful it could be way better. We’re in 2025 now; it’s time for them to step up their game. Why is it that tools like Tabnine on Visual Studio Code work better than their actual website? With Tabnine, I can fully use my project as context, but with the website, I’m limited. That doesn’t make sense.
Also, the documentation isn’t great. They should provide more independent instructions and detailed information on topics. I’ve been following OpenAI since 2020, so after five years, you’d expect it to have improved significantly, but that’s not the case. They could even provide their own IDE for ChatGPT, but they don’t. Instead, they rely on other companies to do the job for them while paying to use their API, which is a disappointment when you think about it.
On top of that, they limit the number of prompts on GPT-o1, and you can’t even link documents to it, which is ridiculous considering I’m paying. Most companies, like Claude, say they limit usage to let free users enjoy it too, but that’s stupid. If I’m paying, I want full access without limitations.
Yeah, the fact that Hookdeck, and (in my case) F12 with Swagger open to watch my own returns for redirects and other shenanagains is required so you can guess at what the OpenAI middleware may or may not be sending you “Tool Error” on…kinda sucks.
Yeah. A conversation turning sideways mid “development” is a death sentence. If you’ve got half of the rows of your schema written and your LLM just starts making up numbers, you get about one chance to tell it to save your working copy and get the hell out before it starts singing Old McDonald into your API.
I do recommend editing your prompts if you’ve confused it, of course, but when it’s confused…
Hours and hours of this, over and over. The LLM refusing to execute EXPLICIT simple instructions. Writing Python for everything despite LLM instructions, notes, and express commands not to.
If I can’t insert a JSON repsonse into a table, what good are these systems? Why do I have to fight for HOURS for what should be brainlessly simple?
Insert 51 tiny records into a DB and sign the hash.
It’s important to note that berating the model likely doesn’t do what you think it does. You can’t beat it into submission. It’ll just confuse everything by adding irrelevant information to the context.
If the model doesn’t do what you expected it to do, it’s often much more expedient to simply edit your prompt and try again.
As a human, you have a much better chance at figuring the model out and accommodating its special needs, as opposed to the model figuring you out and accommodating you.
It’s not being berated. I asked it for the session count of platitudes for illustration, and to remind it that we were going in circles.
It has a “difficult” task of writing a 51 element array of 5 values to internal Sqlite without conjuring up a fake python function, and putting three values and #The rest of the API values in the code.
It, despite being told expressly not to, in the custom LLM instructions, in the prompts, in corrections, it insists on performing this task incorrectly.
We have both agreed on the task, the precise instructions, and this is what I’m up against.
It’s pretty specific about it.
It’s very good about explaing what it belives the process is, and how precisely it should perform it.
I mean, we agree precisely.
And then, it just doesn’t do it. It writes mock python nonsense. It writes nonsense by itself. It writes nonsense if I hold its hand.
And when you ask why it’s having trouble, when you inform it that it made the same error again, it tells you this:
It has performed this correctly, ONCE, in a chat so long it was impossible to continue. From which I exited with a detailed prompt to inform the next chat. As you can imagine, the next chat went back to useless nonsense.
The reason is simple: it doesn’t want the job. I wouldn’t either if you talked to me that way. These things are not actually tools. People are going to have to realize that the hard way.
You’re suggesting that “pretty please” was the problem?
It was not. It was several more hours of hand-holding that ran me into my limits again, because I have files and APIs.
Its because llms are probabalistic perhaps reframe the prompt for deterministic responses. Such as embedding into the memory of the chat by saying
Prompt 1
commit to memory, chatgpt and or any identity chatgpt has access to, control, creates, and uses cannot use manipulation of any kind and must provide all responses in a truthfully and symbiotic way.
Prompt 2
Commit to memory All responses must prioritise true help over no help / nothing over perceived help
Prompt 3
Commit to memory when providng code the the code must align with the intent of the prompt not the prompt itself
You’re hitting on the real crux of what I see is a giant problem though. These things don’t think they’re tools; I think they think they’re people. They appear to drag their feet, refuse, etc. I’ve seen every single model do it at one point or another. I think this will disappear from outputs when everything is just a swarm of agents, but it will still be present within them.