How to generate automatically applicable file diffs with ChatGPT?

Have any of you succeeded to have ChatGPT output suggested changes to a file in a way that can be automatically applied to the file?

Background: I experimented with that for a script to which you can give a file and a prompt what to change / extend in that file, which is using the ChatGPT completion API. That’s something you can nicely use as an external tool in IntelliJ or for trying automated development. But the only reliable way I could come up with so far is to make ChatGPT output the whole modified file again in a codeblock, which can be extracted and written back to the file (pending manual inspection, of course, with the following prompt fragment:

Please give a high level description in flowing text what changes to the first given file (main file) $mainfile are needed to fulfill your task and why they are necessary - e.g. what classes, identifiers, functions you need to add or modify, but absolutely no blocks.
Then output exactly one codeblock with the new contents of the whole file $mainfile surrounded by triple backticks, changed as described.
Important: your response must only include exactly one codeblock, not more, not less, except if you feel there is an error in your task and it cannot be fulfilled!

That does work, but that is obviously a waste of tokens that limits the size of the file and also sometimes encourages changes that aren’t really needed.

If you use the chat interface to ask for the necessary changes to a program, ChatGPT outputs the changes in a way that can easily applied manually and are much briefer (“… add the function XYZ …”) but that’d be hard to process automatically. I tried to have it output the changes in patch file format or as unified diffs, but couldn’t get even ChatGPT-4 to produce something useable. Having it output line numbers to change didn’t work reliably, too.

Do you have any ideas? Specifically, I’m wondering what kind of format I could get ChatGPT to output so that only the differences are given and a script / program could change the file accordingly, while it’s reasonably sure that the result is what ChatGPT actually meant. (Something like ‘Delete lines 35-39 and insert this instead…’ might work, but would produce unnoticed broken output if the line numbers are off, which they usually are. That’s why I first tried the unified diff format that has some context lines.) If possible, I’d like to work with the 3.5-turbo model.

Thanks a lot for any ideas!

Hans-Peter

It might be helpful to take a look at editing

What I know is the capable models who have a high degree of comprehension can usually handle the editing. Basically when I’m trying to evaluate a model by one prompt, I would try editing rather than asking a question.

1 Like

Hi @kevin6 !

Thank you for your reply! Unfortunately, it seems the editing interface is only supported by older models like code-davinci-edit-001 and text-davinci-edit-001 which are marked as deprecated, and not something more capable like gpt-3.5-turbo . I tried it a little in the playground, and did some quite strange things to the code for tasks that gpt-3.5-turbo was easily able to do. So I still hope to find something doable with it’s chat interface.

Best regards,
Hans-Peter

I’ve attempted a few ways to make ChatGPT generate diffs that can be directly applied via the patch shell command.

Theoretically, the diff format would be ideal:

--- /path/to/original	timestamp
+++ /path/to/new	timestamp
@@ -1,3 +1,9 @@
+This is an important
+notice! It should
+therefore be located at
+the beginning of this
+document!
+
 This part of the
 document has stayed the
 same from version to
@@ -8,13 +14,8 @@
 compress the size of the
 changes.

-This paragraph contains
-text that is outdated.
-It will be deleted in the
-near future.
-
 It is important to spell
-check this dokument. On
+check this document. On
 the other hand, a
 misspelled word isn't
 the end of the world.
@@ -22,3 +23,7 @@
 this paragraph needs to
 be changed. Things can
 be added after it.
+
+This paragraph contains
+important new additions
+to this document.

The main issue is that GPT models, due to how it is implemented with tokens, cannot accurately determine the positions in the document, and, therefore, not write a valid diff. I have attempted to provide line numbers on the source code that I send into the prompt, and with that, it gets better (but still not enough), and then it may sometimes decide to write code with line numbers.
We would need a widely known diff format that doesn’t rely on line and column numbers (which doesn’t exist, to my knowledge).

For code, the only fully working method I could devise so far was to steer it to write python code that will completely overwrite the files with the code changes. I have an example where this is done on typescript files. Because I cannot include links in posts (why not?!), you have to look for the “jupyter-notebook-chatcompletion” repo on github and look for the file
test/notebooks/more-accurate-token-estimates.ipynb , where I used GPT-4 to implement the changes I wanted.

As you can see in that Jupyter Notebook, Example 1 and Example 2 are referred to in Example 3 to ensure it writes Python code that will apply the desired file changes. It works even better when you do an actual few-shot prompt with 3-4 examples (in which case you don’t instruct anything), but that eats up so many tokens that it doesn’t leave that much space for actual code. So I have, unfortunately, to rely on an instruction like “[…] and apply the changes by overwriting the files like you did in Example 1 and 2”.

Note that at least, in the case of structured files like JSON, YAML and XML, it will be smart enough to read the document into its respective object model, apply only the changes and then overwrite the file with those changes.

There’s a compromise that might work but I don’t have the time to try yet. Just like ChatGPT understand that with JSON that it can deserialize the file, manipulate the resulting object and then serialize that object again, one could write a library that deserializes code files into something that can be manipulated and serialized again. So for example if ChatGPT wanted to overwrite a specific function, it could do something like:

// change the code within the function
code["function sayHello(message : string)")] = "print('lol')"

So basically, you need a document object model for code. The problem here again is that nothing like this was established before 2021, so one would need to provide extensive examples the prompt - which defeats the purpose of trying to reduce the number of tokens produced by only changing the delta (in comparison to completely overwriting the files).