GPT-4 produces bad/invalid patch file for `git apply` command

Hi there,

Backgound
I have a tool that applies changes in text files using git apply command. The inputs of the tool are the next:

  • path to the file
  • patch to be applied on the file

The content of the file to be modified is provided earlier as well as the prompt how to change the content.

Bug
Although the model (gpt-4-0125-preview, gpt-4-1106-preview) produces the patch file and calls the tool on the proper file, in most of the time I face the error that the patch cannot be applied. Manually checking the patch I can see that the patch is logically mostly okay but there are various syntax and content errors. The followings are the most frequent problems in the patch:

  • missing header elements - I could mostly eliminate it by few-shot prompting
  • bad line numbers in the header of a hunk - I could handle it by setting switches (e.g.: git apply --recount)
  • missing lines in the preceding context of the changes withing a hunk
  • the preceding lines of the changes are copied not in an exact way from the origial file
  • additional empty lines inserted in the preceding context of the changes
  • totally missing following context

I’m not sure if I need to provide some sample files or prompts because this bug appear in almost all cases. I face the issue in many file formats like: XML, CSV, python source code, react source code etc.

Update: use-case
I’m working on an agent that picks up Jira issues on various configuration changes of different thinks (software, infrastructure etc.). The Jira issue contains requests in natural language. There are a set of tools that help the model to get real time information from git repos (search, read file, patch file etc.). I want the model to set up a plan to get the desired state described in the Jira issue, get information from the environment (git), refine the plan according to the info and execute the plan. The changes in the configs should be written in config files, committed, pushed by the agent. In the end of the workflow a merge request to be opened to pass the change to code review.

So it is the model who decides where and what to change in which file.

A bug or just the expected variety of outputs from a non-deterministic machine?

Out of interest why are you using an LLM in this way ? What problem are you trying to solve which cannot be done with normal deterministic code?

Is there a way you could reduce the risk of badly formed git commands by using the LLM just for parts of the problem and not the creation of the entire command? Or eliminate the LLM completely?

Hi merefield,

valid point, but let me explain how the model is used in a workflow. I’m working on an agent that picks up Jira issues on various configuration changes of different thinks (software, infrastructure etc.). The Jira issue contains natural language requests. There are a set of tools that help the model to get real time information from git repos (search, read file, patch file etc.). I want the model to set up a plan to get the desired state described in the Jira issue, get information from the environment (git), refine the plan according to the info and execute the plan. The changes in the configs should be written in the config files, committed, pushed by the agent. In the end of the workflow a merge request to be opened to pass the flow to code review.

So it is the model who decides where and what to change. I also tried to make the model to change the full content of the config file but it didn’t worked because the unchanged parts were replaced with some placeholders like /* unchanged part of the file */.

In this context can you suggest any other way how to apply changes in a file in a different way?

Thanks,
Adam

1 Like

Whilst function calling isn’t perfect (and may suffer from some variability too), you could try to put the git command within a local function that is called by the LLM with relevant attributes.

This might cut down on the command being just downright wrong syntactically.

But it also won’t guarantee perfection (just might reduce errors and increase success rate).

Before going that far, have you tried this with lowered temperature?

I hope the reliability of the models improves for you.

The git apply is in the tool that the LLM calls and executed on-the-fly. The stdout and errout of the git apply are fed back to the model directly.

To get as deterministic as possible, initially I set the temperature to 0. When I solved to feed the error message back to the model I could see that the model didn’t really refine the patch, so I increased the temperature to 0.1. So yes, the temperature is lowered.

1 Like