Merge Request Reviewer bot

While playing with the text-gpt-003 model, I figured I’d throw some code at it, and I noticed that I could ask it lots of questions about code and it is even better at understanding, changing and conversing about code than the codex models.

So I thought, what if it can lend its code wisdom to the QA of a merge/pull request, saving developers time when reviewing changes of their peers, or getting advice on their own submitted code etc.

Created a quick POC within a couple of hours (written in a single file that was 90% generated by chatGPT, btw), just as a little demo to my colleagues, nothing more, but I thought I’d share the use case:


  1. Whenever a new MR gets created in Gitlab, a webhook kicks off my bot.
  2. The bot fetches the code changes of the MR and extracts the git diff strings for all the changes.
  3. Prepared a pre-prompt paragraph that asks it to act as a senior developer who reviews the changes and answers 10 code review questions about those changes.
  4. Add the questions and the code diffs to the prompt and throw them at the GPT-3 model via OpenAI’s REST API, as-is (and I mean really as-is, without any parsing or additional explanation of what a git diff looks like! GPT fully understood the format out of the box)
  5. Also made part of the prompt some instructions about the formatting of the response, so it renders nicely in Gitlab.
  6. Post the response back to the MR in Gitlab as a comment.

I pass the questions in from an environment variable, tweaked it a bit for more useful and pleasant output, and it now gives me these types of answers:

I am also thinking of using the chat API now, to use Gitlab’s comment thread system as a chat interface, so that the developer can ask follow-up questions about the review and the code changes.

Lots of fun ideas in this space, and I’m sure there will be tools for this within months, if not weeks.


That’s very interesting @batjko did you open source it?

Nope, it’s just internal to our team.
Needs more work, as well, but it might become obsolete soon, anyway.
Github, for example, are bringing out their own PR assistant features that do some of the same things.

I also want to do something similar, but I have a question: when there are changes to multiple files in one MR (Merge Request), should all of them be included in one request, or should they be split into multiple requests based on the files?

You definitely hit the context limit very quickly. Turns out code uses up quite a lot of tokens.

Our bot works fine on small code bases, including multiple files (the git diff for an MR/PR gives you the diffs for all files) and GPT-4 is very good at understanding all changes no matter if it’s on multiple files.

However, you hit the token limit very quickly the larger your code changes are in total.
And here is where you will need to be a bit more clever, e.g. you could load each file’s change separately, but even then you run into issues if you have rather large changes per file (remember that code diffs show you both the removed nad the added rows as well).
If you keep your MRs small and atomic, you may be fine with that already.

On our side, we have decided that the bot needs to have awareness of the whole codebase and evaluate the diff against that context.
So we are planning to load the codebase into an in-memory vector store with Langchain, then split the diff into manageable chunks (e.g. function by function or other language-specific delimiters (Langchain supports this for a number of languages out of the box)), and then evaluate each change individually against the code base as a whole.

Finally, it will then keep a summary of each evaluated change and do another more holistic evaluation of the summaries of all the changes, before assembling the final review.

As a major long-term bonus, we were hoping to give the bot awareness of our code conventions (by loading a wiki page with all the rules into vector memory and including it in individual reviews).

This has not been prioritised yet, because it’s a fair amount of work and we don’t have time, but that would be the ultimum of automatic code reviews.


Thank you for answering my questions; I have benefited greatly. :partying_face: