Custom GPTs - We NEED Verbatim Outputs OR Longer Instruction Capacity

With the most recent updates, we’ve all seen how ChatGPT no longer wants to output verbatim or even simply longer summaries from the files you upload.

I understand that’s done to protect the original source.

But here lies the problem. What if YOU are the original source?

╌╌╌

In the case of Custom GPTs, we have one in the GPT store which is gaining more and more users.

And we have uploaded a list of helpful scripts that the users can run to do specific calculations.

However, when a new user launches our GPT, they don’t know about these scripts.

For this reason, we’ve created a conversation starter which shows a list of the scripts, for example:

Script 1 - Helps you with such and such.
Script 2 - Helps with this and that.
Script 3 - Enables this
Script 4 - Etc.
All the way to Script 10

And we’ve instructed our GPT to output verbatim the list of Scripts and their respective functions, so the user can explore them, and if they choose to run one, then to proceed.

╌╌╌

However, instead of generating the list of scripts verbatim, our GPT generates a short summary such as:

“I have various scripts in my uploaded knowledge that the users can run, some help with a bit of this, some help with a bit of that.
For more information, refer to the original document.”

And as you can see, this is very problematic.

WE DON’T WANT A SUMMARY. And the users obviously cannot refer to the original document, since that’s a part of our uploaded knowledge.

We want our GPT to output the content verbatim.

But this upstream system prompt prevents ChatGPT and Custom GPTs from doing so.

╌╌╌

Another example, in our uploaded knowledge, we have specific paragraphs of important information which showcase limitations, precautions and list the URLs of pubmed sources.

And these need to be generated verbatim, so that the user can see the sources and be aware of the limitations.

However, our custom GPT, yet again, summarizes the content and often skips the sources, or skips the limitations, or skips both entirely.

╌╌╌

We need to figure out a solution. Custom GPTs should be able to retrieve verbatim content from their knowledge source.

Otherwise, they are severely handicapped.

8 Likes

I had a similar problem. I made a custom GPT that would access my own website (blog) and provide me with quotes from previous posts, which worked for a few weeks. But, presumably after a model update, it started refusing to give me anything verbatim, even though it was my writing. I understand that OAI doesn’t want to encourage copyright infringement, but it seems like a fairly sizeable limitation to have it refuse to copy anything verbatim for any reason.

5 Likes

Create an API and use Actions

3 Likes

Fair, that makes sense. I guess the frustration is that making custom GPTs is so nice and simple, and my use case seemed extremely straightforward (and had been working really well!), that this limitation is just annoying.

3 Likes

Agreed. It does suck. Two giants fight (OpenAI & NYT (w/ more adding on)) and the people below take all the damage.

3 Likes

We have thought of building an API, but that comes with a whole set of new problems.

There are plenty of complaints about APIs misfiring, misbehaving, or not functioning correctly.

And also, why would we use an API simply to output a few paragraphs verbatim or to showcase a list of the available scripts?

That’s like digging an underground tunnel from your house’s living room to your backyard, because your front door is locked, and you don’t want to fix it.

2 Likes

I’m sadly getting the impression that the answer to that question is “Because currently OAI won’t let you do it the easier way.” Which is a bummer.

1 Like

Yeah, exactly.

And if the custom actions were to function as intended 100% or at least 98-99% of the times, it might be worth doing.

But, that’s not the case.

A viable solution would be for OpenAI to increase the instructions’ character cap from 8,000 to something like 1,000,000.

This way, GPT builders can put all of the content that must be generated verbatim inside the instructions. And leave the remaining content in the uploaded knowledge source.

Since data from within the instructions can be generated verbatim without issues.

APIs are not motors. They do not misfire. They are also much more reliable than embedding and prompting, as you have now discovered.

Regardless, my comment was directed to @jck

If your calculations require an API then I would still use Actions to return the list of available commands. This is not an ideal solution for embeddings. Keep in mind that (For Assistants anyways) a small knowledge file is completely injected into the context instead of running semantic similarity on it.

Wat

No

Of course, they do…

On top of that, even with a custom API, GPT4 can still decline to output content verbatim.

Hence, my analogy.

Meaning, why complicate such an obvious and simple solution.

Most definitely yes.

2 Likes

Maybe you’re intending to say something different because misfire literally is for combustion.

If APIs “misfired” then the world wide web would be a complete mess. I can understand that they can go down, and incompetency can lead to strange results. This is inherent in all internet-related features and not exclusive for APIs so I don’t understand what you’re talking about.

Have you tried this to confirm? I can personally confirm that Assistants will repeat verbatim the results of API requests. I can also confirm that people can repeat verbatim Action results of URLs.

It’s clearly not an obvious and simple solution if you’re having difficulties. Everything is working as intended.

1 million characters at an average of 5 characters per word is 200k words. That’s about 250k tokens (3/4 of a word is a token on average).

The average word count for adult fiction is between 70,000 to 120,000 words. So your solution is for OpenAI to permit >2 books of content into instructions. Let’s just say OpenAI pays $0.005/1k tokens you are asking for each message to cost a potential $1.25 ONLY for the instructions IF it was even possible. The highest token limit so far is only 128k

1 Like

It was an obvious simile, meaning not working as intended.

Mr. Literal over here.

Almost every GPT that has a custom API from the store has failed multiple times to do its job.

The whole point of custom GPTs is to leave the simple tasks for your prompts and if needed to use custom APIs for additional functionality.

Since content and data can be retrieved verbatim from the instructions 100% of the time, the obvious solution would be to increase the instructions cap.

Sure, 1M can be an overkill. And if you re-read my comment I didn’t say 1M but “something like 1M”, meaning a large enough amount to where GPT builders can actually create a functioning end product.

2 Likes

APIs are used everywhere, such as the current forum you have been consistent successful in sending & reading messages on. Network errors may occur, sure, but that’s with everything internet-related.

This is just simply not true. GPTs with actions do work.

You have a problem that cannot be solved this way. The solution can be to use function-calling. I don’t see what you’re trying to accomplish here. You have hit a brick wall and think the solution is for OpenAI to change instead of using the tools available to you.

I can understand that it’s frustrating that something seemingly so simple is difficult, but that’s the way it is, and going to be. Work with it.

Your instructions are sent and used each time GPT responds. If your issue is that you cannot fit it in the instructions and the knowledge file isn’t working you can use function calling to append to your instructions. This is not “digging around”, this is “efficiently addressing the issue”.

In my case I have an Assistant that is “aware” of my web app. It knows what the controls do, it can make actions on behalf of the user. I do not fill all of this information in the instructions. Instead I use function-calling to essentially “append” the necessary information. So if a user says “What do the header buttons do?”, it returns specifically the header button information. If the user doesn’t ask about anything regarding the UI the instructions are kept small, and precise.

If you are not using Actions I can see why this can feel unnecessary but it seems like you have exhausted your other options. From personal experience I know that function-calling in this manner works great at reducing token size, and delivering precise, verbatim information. My API does not “misfire”.

Yes, and displaying verbatim content is used everywhere, such as the current forum on which you’re reading this message.

The problem isn’t API themselves, but OpenAI’s crippling upstream system prompt.

If you have GPT 4 without any such upstream prompts, you can output verbatim content, you can scan your uploaded knowledge, etc.

Therefore, yes it is indeed digging around. Because you are quite literally digging around the system prompt.

You are using APIs to circumvent upstream instructions.

Don’t you understand how nonsensical it is for your custom GPT to not show your content?

That’s what I’m addressing here. And that’s what needs to be fixed in some capacity in order to allow GPT builders to create fully functioning end products without having to use API workarounds for literally the most basic of basic prompts.

1 Like

It’s been known lately that OpenAI is trying to “break” the ability for verbatim responses as a result of the recent NYT lawsuit. So yeah, it sucks. It’s all pretty ridiculous. They’re now claiming that NYT “hacked” them

This is really a stretch of the metaphor.

I get it, and I agree. It’s all stupid. This lawsuit is screwing people over.

All I’m trying to say to you is if:

  1. You cannot place this information in your instructions
  2. Your knowledge file is not being returned verbatim

You can try function-calling/Actions. I can personally vouch that it still returns content verbatim in Assistants, and have seen other people use it and get their content returned verbatim using Actions.

That’s it. Again, I get that you’re frustrated in something that seems straightforward and simple but that’s just how it is, and going to be. Roll with the punches and try it out. Actions is a very powerful tool and I’m sure there are more purposes you can find with it to augment your GPT.

1 Like

Yeah, I saw that OpenAI are claiming NYT “hacked” ChatGPT to produce these responses.

And I agree with OpenAI, if you dig into it, NYT definitely did some shady things to get these outputs.

I believe the solution should include increasing the instructions’ character cap.

And I’m certain with the next iteration of GPT 4.5, 5. etc., that will happen.

But also lessening the system prompt to where your uploaded knowledge can be generated verbatim.

Not just for GPTs, this system prompt restriction is fully breaking employing the assistant sandbox for data/text manipulation which was one of my favorite uses of it. Annoying to add back the tedium of copying scripts to and from local environment.

1 Like

Yup, I kind of get the argument for scanning and retrieving content from websites.

But it’s still somewhat nonsensical.

If you don’t want your website to be scanned by OpenAI’s scrappers, simply modify your robots.txt. That’s it.

We don’t need a system prompt to do the job of site owners.

As for your uploaded knowledge, that should be scannable and retrievable in whatever form you’d like.

My belief since the beginning is that they added custom instructions telling it specifically to act the way it does, and then they can use screenshots of their chat history showing that they “didn’t prompt it” in their court case.

1 Like

Well hopefully that happens soon so you can make some progress

The issue isn’t that it’s reading websites. The issue is that it has been trained on copyright material and sometimes can output it verbatim. This is the argument by NYT. This is why OpenAI is doing everything they can to prevent verbatim responses.

100% agree. I think it’s a bit of a stretch to say “hacked”. I do recall a couple(?) months ago when verbatim training data could be emitted by sending a bunch of nonsense to the model.

Not necessarily shady. Another company recently did a plagiarism test using GPT-3.5 and found similar results. They don’t show any of their work though… so :person_shrugging:

1 Like