Fine tuning openAI to my codebase

I am relatively new to fine tuning.

We are a software company with 100s of files of php, javascript code for our software. Copilot is great but I want to go few steps further. Would it possible to fine tune openAI model so it understands our code better to be able to fix bugs and add features? has anyone done this? Whats involved?


When you find yourself in that rare air of asking a question on coding forums that seems to either have never been asked, or nobody recommending an answer, you know you are in a special probability space. I too would like to do exactly what you are describing, with the idea that the openai interface I’m building (for me), will have a good handle on my current code base, and where I’m looking to take it. It is a task that is perfectly suited for one or two example tutorials. Hello super-devs, please consider throwing us an example of fine-tuning Codex with a proprietary code base. Please, pretty-please!, with a cherry on top.


I don’t think finetuning is the answer for this. I have worked on an AI that builds its own source code and expands its features. My take would be that to accomplish something as complex as this, you would need the AI to:

  1. read through all the source code
  2. generate an understanding of all the classes, methods and the way they are connected and depend on each other
  3. have a general understanding of what the app is about and how it works

then you could make requests for new features or changes, but the challenge is how to select and pass on the required information collected in the previous steps. It would have to be a chain of multiple prompts, allowing the AI to expand research on certain parts (when needed), coming up with conclusions and proceeding on its main task.


I code with Copilot as a VSC extension daily and the Copilot suggested auto-completions based on my private repo code. Not sure about PHP as I’m a Ruby person, gave up PHP coding years ago.

However, if you want to check for overall syntax errors in code, it is easier (at this time) to copy-and-paste modules, methods and subroutines into ChatGPT, which is pretty good at finding syntax errors, etc. I have not yet tried this with OpenAI API, as I just copy-and-paste into ChatGPT (modules, methods, not entire large pages of code) and check for syntax errors that way.

1 Like

I as well have considered this for a personal project of mine, as I assume there are some pieces of my code that could be readily and easily improved.

I have considered ways to do it, right now most of which include using chat GPT. As it can remember what happens in the conversations without having to use many tokens like with TD-3.


were you able to find a solution for this?
I’m trying to do the same…

1 Like

You you implemented this ? or do we have any existing tool ?

I agree however for a decent amount of value without such a complex knowledge graph is possible, what you described is a RAG on a code knowledge graph and every RAG person is realizing this is the way to get higher quality output, for now.

Fine tuning auto complete on APIs used in your internal code base and also having a copilot experience from a small model that auto completes

// how do we do multithreaded processes in our project:
… code is put inside in-line …

saves you having to jump to a chat session, and saves having to go to a different file, or worse, asking a expert teammate and them scoffing saying “just read the code”