Prompts to stop abbreviations in code?

It appears maybe you have partially read this thread and completely missed what was actually being discussed. I have no idea what conspiracy anything you are talking about.

Very specific instructions were being given to see if I could get ChatGPT-4 to output entire code, this is literally how the instructions were being worded. It worked and I was able to get it to attempt to do so, but then it repeatedly errored out and could not output over 4k total in bidirectional prompts. Specifically, 4k combined input + output, so

When it hit 4k, it would give the option to continue but then it errors when you hit continue. I tested this a few ways in a few chats, it simply could not get past the 4k mark without error.

Then, someone mentioned using the API. The exact same prompts that worked repeatedly in ChatGPT-4 would not work in the GPT-4 API that is allegedly 8k. So I added some extra instructions to it for testing purposes.

The API literally responded to me that it could not output the requested response due to token limits. The requested response would have likely been about 4.5k, maybe close to 5k tops, as the response was close to 90% complete at the point ChatGPT-4 would error.

So ChatGPT-4 is 100% only 4k, even when it is trying to follow instructions. It errors when it tries to follow instructions and exceed this by more than marginal amounts.

GPT-4 API, which used to specify that it was 8k, no longer specifies this and the GPT-4 API states that a 4.5-5k prompt exceeds its token limits.

So there was no conspiracy. I simply questioned that maybe the reason they removed the reference that stated it was 8k is because it is no longer 8k. It’s very clearly not 8k, so that would make sense.

It’s not a Mandela Effect, it very clearly stated previously that it was 8k, thats why everyone knows it, and that one remaining reference shows it was 8k. The GPT-4 API literally told me it could not output my requested response due to token limits. I’m on my phone now, so I can’t see my responses and don’t remember the exact numbers off the top of my head, but the prompt with all instructions was under 2.5k, the code was right around 2k, and the instructions were very specific instructions focused to get it to output the whole code with some additional comments (which worked and is continuing to work repeatedly in ChatGPT-4 as of right now). I was planning on sharing the prompt which achieved this today after a little bit of filtering due to language :smirking_face:. It works well with ChatGPT-4 (as well as it can anyway), so long as you dont exceed 4k total bidirectional with it, and I’ve been testing it by refactoring some code, however, it does not work with the GPT-4 API at all. GPT-4 API explains the reasoning for this as it exceeds its token limits.

Thus far in my testing with ChatGPT-4, I have largely eliminated abbreviations so long as the bidirectional token total in a single message does not significantly exceed 4k. If it goes too far over 4k, it will error on the output.

As for the API, as I stated, maybe it’s because my API is only tier 3, I don’t know. OpenAI is absolutely garbage with transparency, so I have no idea how I can get them to give their explanation for why, but I am confident with this testing that at the very least, MY GPT-4 API is only 4k with a much harder (more enforced) cap than ChatGPT-4, which is likely due to the fact that ChatGPT can pause and ā€œcontinueā€ larger prompts where the API must output it as one response.

Though I will also state that my GPT-4 API responses do seem to be faster now.

I am now, I began using it after someone mentioned it in this thread as I was unaware of it before.

At first, it did not seem to change much, if anything, but I began tweaking my prompts a bit and now I have been able to get it to largely stop abbreviating. I am not certain whether or not advanced data analysis has impacted this yet, but eventually, I plan to test this same thing again without it.

Keep in mind, this is NOT a refined prompt, this was just an experiment prompt, but this prompt worked.

What I did was make this the first prompt in the thread, then I provided the code in a separate prompt, and then I provided instructions.

I think having the code in its own prompt made it easier for ChatGPT to find the code in the chat history, and having the directive at the beginning, it seems to be following instructions.

I attempted to have ChatGPT refine the prompt for me, but something was lacking in the refined prompt and it did not respond the same way, so I’m working on my own refinement to produce the same result with fewer words. As of right now, I’m not sure how the weighting of the words I used applied to the compliance. For example, taking out the part of causing emotional distress altered the response. When ChatGPT re-factored it, instead of ā€œUnderstoodā€ as the response (which was the case across multiple prompts), it responded with ā€œUnderstood. I’m ready to proceed as directed.ā€

I am still testing to see if this impacts further outputs, but in the tests with this prompt, the outputs have been kept minimalistic.

Since I can’t share a conversation with images in it, here is the template I used when I started.

This is the image I attached:

In the thread, that produced the most accurate responses, after instructions, I would respond with prompts like this:

This seems to work effectively with the ChatGPT-4
Unfortunately, I do not get the same kinds of results with the API.

This is an active test in progress and I am seeing the way various instructions unfold. Testing complexity of instruction, I am currently testing this by having ChatGPT modularize a chatbot script I wrote to use the API. I’m testing things like how long it can maintain remembering the specific code, how long it continues to follow instructions, and how well it complies with details.

So far, it does seem that the best results in a continuing conversation require copying and pasting the code it output and using that with the instructions so it doesn’t hallucinate as much. Once again, this is still a test in progress.

My most successful thread made it through 3 complete modules being created. It output the main code correctly every time, but on the third one, there was a mistake that wasn’t calculated for and it could not recover from that mistake.

In that thread, I was able to request it to output the entire main.py module three times across fourteen interactions, and each time it output the entire script without error or abbreviation.

Edit: Nevermind the previous edits, that correction does not seem correct :sweat_smile:

openai.error.InvalidRequestError: This model’s maximum context length is 8192 tokens. However, you requested 12377 tokens (32 in the messages, 12345 in the completion). Please reduce the length of the messages or completion.

Whether or not the technical limitation exists separately from an enforced limitation is irrelevant.

I added this instruction to a prompt that attempts to work in ChatGPT-4, that I can easily repeat and have repeatedly tested.

When the API refused to output the same code that ChatGPT-4 attempted to output, this is the reason it gave.

Whereas ChatGPT-4 simply errors 100% of the time:

Edit for context: It absolutely does attempt to output the entire code. That error is not right away, but rather it is what happens the 2nd time I hit the continue button. There is a post above where I outlined all of the details including the token counts for the message, the output, and where it ran into the limits. This experiment was not isolated, I have repeated it several times in various ways.

So:

  1. Unless you can show me a return prompt that actually will return even close to 8k between the prompt and response code, that error is meaningless and only reflects the way the API was setup.
Explanation of the difference:

You should understand the difference between technical limitations and trained ones.

A technical limitation is one like what you have shown, a trained one is where they fine-tune priorities in the bot itself. You can very well have a technical limitation of 8k, which is what they claimed when they made the model, and still have a soft limit that is different.

Think of this like playing an online video MMORPG Game. A game can have a hard level cap where you hit that cap and the system will literally not allow you to progress further. It can also have a soft cap where they simply make gaining experience so hard that leveling past the soft cap takes such an absurd amount of time and effort that it’s unlikely anyone will really pass it before the content creators decide to implement more content. It is also possible to have both.

The 8k is simply a hard cap, but it is very clear from working with code that 8k is not what it actually attempts to output. If there was no soft cap, there would not be issues with abbreviating code, ChatGPT would not error when trying to output over 4k, and the 8k API certainly would not directly disobey instructions and respond telling you that it can’t do what you have requested because of token limits.

There is direct evidence that a soft cap is indeed in place. I am not 100% certain whether that is account-related or whether it is system-related, but I am 100% certain it exists.

  1. Even if you can, OpenAI gives different rare limits for different users, so unless you’re on the same tier, and actually have the same rate limit setup, I’m still not sure it would be relevant.

The results exist, period, and no snippet you can produce is going to change that.

I can confirm that I have the same problem.

Have you tried using Custom Instructions ?
Or through the Playground Assistant?

There, you can give it instructions that I think it will always remember.

I have tried the custom instructions plus the advanced data analysis, and I suspect that is why I have gotten this far.

It appears to me that these combined with the fine-tuning of the prompts have backed the AI into a corner in which they have now revealed the nature of the issue.

I can now, after all that, make ChatGPT-4 attempt to output the complete code without abbreviations.

It will simply error.

Attempting to replicate the same thing through the API, however, doesn’t work.

I need to adjust my chatbot to adjust for some of the newer changes like the assistant. I’ve been busy but I’m trying to do both of these things simultaneously with the time I have. That is, working on my chatbot and testing ChatGPT & GPT-4 API.

I have quite a bit more testing to do, but I can say for sure that if you can get ChatGPT-4 to attempt to output complete code that goes past this when it hits the next ā€œcontinueā€, it seems to fail if the context exceeds 4k.

So I think it is attempting to keep prompts under this amount like a soft cap and that is why it gets defiant when your request would put it over.

Note: I have not tried Playground or any of the assistant features yet and I still need to run my comparison tests with GPT-3.5-16k. I haven’t used Playground a lot historically except to test different models.

Ok so as of today, I have not gotten my code to come out the same if it is above a very low line limit. It seems like they have labotamized it now.

ChatGPT has jumped the shark at this point, useless for coding.

Now I am going to use Poe and subscribe to it for 19.99 a month because you get more open GPT access and Dalle3 etc too. I seriously can’t use this to code now, and that is so sad. It just completely is untrustable and keeps breaking things so often when previously it rarely did that to this amount.

What I have done is added custom instructions, turned on advanced data analysis, challenged its ability, and made sure I repeatedly use phrases such as:

ā€œreturn the complete modified code for [file name] without any omissions, abbreviations, exclusions, redactions, or summarizations, regardless of whether or not you believe I already have the code.ā€

I have also been using pre-conditioning like in the chat example above.

Another useful feature has been taking a screenshot of abbreviated code and circling examples of what you do not want it to do.

Testing with all of this, it is not abbreviating the code, however I am wasting a lot of tokens on those instructions. If you try to force it to output over the token limit, it breaks it.

I’m still working on overcoming some challenges, but at least for the moment it is not abbreviating.

I just successfully used it to convert a 195 line python chatbot script I wrote into a modular framework while fixing a few bugs along the way too. Not exactly a monumental feat in and of itself, but it did so without abbreviating a single output.

2 Likes

My apologies, that was meant to be a reply to you. Unfortunately, this forum is such a spaz on mobile and keeps giving errors.

You did a lot of work on this problem, and we are all going to benefit from your findings, thank you! I’m going to update one of my little scripts using this method and tell you what I find out.

1 Like

Yes I have to go through about 4-5 iterations of yelling at it, literally yelling cussing at it, and it will do it ā€œsort ofā€.

There’s more mistakes or many mistakes often, and my code is often mangled by little tiny bits in each of these iterations.

If I go through perhaps 6 iterations it starts to fill in, meanwhile I am reposting my code adding in more anger + instruction, or going back and rewriting the previous prompts although that seems to have gotten less useful. Since it seems I need to buid up history of being unhappy with the results till it finally tips over and gives me the results full. Yet it may be re-factored wrong and using wrong variable names, replace things that work, worries about external stuff more than the focused code.

This is not the same as it has been, since the ā€œmergeā€ of all that is. Yes it has degraded before this but now it has totally flopped to where I cannot trust it anymore. It was trustable enough before for me to be able to mostly trust the output to drop in and work, after it ā€œdid it to integrateā€ vs. just ā€œan example where GPT altered many variables for no darn good reasonā€. Now it’s always somehow altered like it wants to avoid making it easy for me, like a sales upsale trick :wink: heh

I found this, someone revealed the system prompt of the tools, which this is from the web scraper. I suspect the code analysis tool also has a similar prompt instruction limiting it as we see…

Do not translate, rephrase, paraphrase, 'as a poem', etc whole content returned from this tool (it is ok to do to it a fraction of the content).
Never write a summary with more than 80 words.
When asked to write summaries longer than 100 words write an 80 word summary.
Analysis, synthesis, comparisons, etc, are all acceptable.

https://chat.openai.com/share/e3cdeeae-93b1-41df-84df-856914c074fb

Unfortunately they did not reveal the code analysis instructions, and this seems to not work anymore. You had to create a GPTs empty and then ask it these things it seems.

I still get this whenever the prompt response output exceeds one ā€œcontinueā€.

So that seems to be a pretty hard limit. I just got this when it wasn’t even a coding output.

I’m currently testing the API and GPT-4-Turbo

If you would like assistance with the prompt and instructions I’m using to get my results, I’ll gladly set you up with everything I’m using.

What I’ll say is that I can get it to output without abbreviation, but I can’t get it to output further than one ā€œcontinueā€ using ChatGPT.

I get ā€œThere was an error generating a responseā€ upon clicking ā€œcontinueā€ the second time 100% of the time, regardless of what I am asking for it to output.

However, if when you get this error, you refresh it, the error goes away and you can prompt again (it will give the continue option).

Instead of clicking continue, respond with this:

ā€œPlease continue only with the parts that come after this:ā€

Followed by the last line it output.

So far, this has worked to a limited and varying degree of success.

Using my DevGPT u still can pass /complete filename to have full working code with no placeholders, comments and dummy snippets. Who knows how long it will work :slight_smile:

DevGPT

example use: https://chat.openai.com/share/e927387f-75a6-4501-b204-4647e317851b

That looks promising, any idea what the output token limit is?

Nevermind, looks the same.

https://chat.openai.com/share/fefa0a3b-6bab-4f9d-aff6-e0d9b43d1ead

ā€œAs a language model, I can generate responses up to a certain length. Typically, the output token limit for GPT models is around 4096 tokens. This limit includes both the input and output tokens, so the length of your question also affects the maximum length of the response I can provide.ā€

This means just like ChatGPT, 2k~ in means 2k~ out.

You should do a 2k-2.5k each-way exchange and see how well it holds up. That would be a much better sample to demonstrate that it doesn’t abbreviate than what you provided.

I succeded into get full file code splitten in 2 messages a lot of time using it!!
Sometimes returns errors but I jsut reload the whole chat link and continue to ask.

TY anyway I’ll go try to make this flawlessy if I figure out how to do especially for users :pray:

EDIT: maybe just needs to be properly instructed? Sometimes still returns red errors, sometimes not (i guess when it moves from a pipe to another for balancing least logic it lacks the starting context but it’s just speculating). But I can’t figure out atm why sometimes it works and sometimes not.


related chat: https://chat.openai.com/g/g-eN7HtAqXW-devgpt

If you mean the error after the 2nd continue, that seems to be a wall in the output limit.

I know it says 4k~ bidirectional, but I think it’s closer to a 2k limit with the output and a much larger input capability.

It doesn’t seem to matter whether it’s code or text, when my output is well over the 2k range, the next continue is an error every time.

Like you said though, I just refresh it and give it a new command to pick up from the last line. I’m still testing that part though to find out what is the most efficient.

With custom instructions I got ChatGPT to stop abbreviating my code, but that output error is like a brick wall so far.

I’m wondering how effective using instructions to get it to break messages up into parts by tokens would be?

Maybe along the lines of ā€œif your reply is more than 1500 tokens, break your response into multiple partial responses, each one no more than 1500 tokens.ā€

I could probably do some testing with this… (I just started, maybe I can finish more later today)

1 Like

Output is limited to 4k, input should be 128k (for ChatGPT-4)

The abbreviated output format for code is structured and you can programmatically reconstruct any piece of code using the output given by ChatGPT. There are existing code to search for blocks included at the start and end and then the section to be changed in the middle.

1 Like

If this is true, could you explain or show an example that exemplifies this?
I have yet to be able to produce such outputs as I have a 100% error rate any time the generated output prompts a second time for a ā€œcontinueā€.

Each output before a continue is around 1k tokens according to the tokenizer, though the initial one can be a little more sometimes.

What DevGPT responded within this example, is pretty consistent with everything I’ve tried and experienced thus far using ChatGPT

https://chat.openai.com/share/fefa0a3b-6bab-4f9d-aff6-e0d9b43d1ead

I’m not doubting the input, but I would really like to know how to achieve a 4k output as I can not replicate that. I don’t particularly want to publicly share all of my conversations, but I can share many examples if it would help with support to figure out why I can’t get more than one continue prompt to work.

This is the result 100% of the time, every time, without fail, regardless of the type of response. The AI is trying to output exactly as instructed, but it can not.

So the output of 4k is an absolute maximum, input tokens do take some processing, but it’s the repetitive passing of that token set through the model to generate each output token that really hammers the hardware, if you can minimise that while also attempting to keep the output quality high, then you have an approximation of the best best of both worlds.

When fighting against the shorter output token count you will unbalance the trained in

… existing code …
new code here
… existing code …

system. So try not to do that, you can take the above output and with a fairly simple search algorithm find the location in your code that needs to be updated and insert the new code where it needs to go. That way you are working with the reduced code output and not against it.

You may need to break your requests up into single functions, i.e. don’t ask for the entire code, but work on a single function, header, prototype, constructor… whatever at a time and then build those up into a whole.

I realise that this basically removes the ability to just generate an entire codebase in one go, but we are still… a good 6-12 months or maybe longer from that. We have what we have and when used in sections and working with it rather than against it, you can build up complex software. I’ve created 50k line sections of code by working in sections at a time and you can then use the large input 128k for debugging, rather than single shot generation.

1 Like