Fine-tuning for Codex?

Just curious if anyone here (from OpenAI) knows if there’s going to be a fine-tuning API for Codex as well (and what the ETA might be)?

Have a use case that would really benefit from fine-tuning.


Eventually it will be, but it’s not our first priority at the moment. Were you not able to solve the problem with few-shot and prompt engineering? Do you have a lot of data you can fine-tune on?


The main issue is that each prompt can be quite long and so I often can’t even fit a few shots into the allotted tokens. I think I could figure out a way to get a lot of data for this use case.


I support the use of fine tuning codex API.

I find codex makes mistakes when you give it zero shot, but if you give it one example then it will modify the example with your own coding style and variable names to give the correct answer.

But there are too many cases that it makes mistakes, you can’t fit them all into one prompt.

It will be nice of we can have a dictionary of all the cases where codex got it wrong, and the correct code. Members of the community can chip in and help correct codex.


Well, I was going to say that it’d be nice to fine-tune Codex so I could train it on SPARQL queries (similar to SQL) in order to create Wikidata queries faster, but it freaking knows how to do that already! :open_mouth:

:mechanical_arm: :robot: :mechanical_arm:
Codex Flex

Besides being mindblown by that, I think it would be nice to be able to fine-tune Codex, however you would need quite a large amount of data to fine-tune on as @boris said. We're all still exploring how to talk to Codex in order to get desired output, but maybe there are cases where training Codex on specific modules and libraries would help improve overall performance when asking Codex to produce large amounts of code at once.

However, with proper prompt engineering and guidance on our part, we should be able to get pretty good output from Codex. There will be cases where we don’t get desired output, but we shouldn’t rely completely on Codex to solve every problem we throw at it, and like @abel said, giving it proper guidance with examples definitely helps.


I would love for this to be a higher priority. Codex seems to need a lot of shots in its few shot examples for anything that isn’t code continuation, and there are so many cool things that aren’t code continuation that it can almost do, that fine-tuning will allow for :smiley:


Would it be possible to get a very rough ETA for fine tuning codex? Could it be possible in the next 2 months? 6? 12? From my understanding, Microsoft’s own instance of the codex model is already being fine-tuned, so I understand if from OpenAI’s perspective, those interesting possibilities are already being explored. But as someone who was working on something similar to Copilot before Codex was public, it feels hard to compete, and I just hope that Microsoft isn’t manipulating OpenAI’s priorities in order to unfairly eliminate competition.

I really think fine-tuning on Codex can accelerate our progress towards AGI, indirectly and directly.


Cant you fine tune BERT and use a hybrid CODEX+BERT ?

1 Like

We’ve been experimenting with using Codex to output code in an internal domain-specific language - in other words, in a language it’s never seen before. The language is somewhat SQL-like, which I’m sure helps. But by feeding Codex about 20 examples (English description of a query, followed by the DSL version of that query) it’s already producing remarkable results and seems able to generalize from those examples.

It seems clear, though, that the more examples we can feed it, the wider range of inputs it will be able to correctly handle. So I think this is a use case that would really benefit from fine-tuning.


Please, please, please, bump the priority for fine-tuning codex. I think this could be life-changing. Even if it’s a beta. Please.


Any update on this thread?

1 Like

I absolutely agree, I would love to see Codex fine-tuning get prioritized. I have a great use case that Codex can already do pretty well on and that is quite deterministic, but I cannot fit more than 1 or 2 example prompts into the context. I have HUNDREDS of data points ready that I could fine tune a model on, and I’m sure I could get amazing results! Would love to at least get an idea of how many months away fine-tuning support is, to make the wait more bearable :laughing:


have you ever tried vector embedding ? :thinking:

What do you mean? I presume you are referring how to generate code embedding with Codex?

Yes, because you can just give it a bunch of samples it will create a vector space which u store on a database and then you play with the model to achieve what u want.

I would like to reiterate the needs for this. My personal use case is teaching it Hyperlambda, which it is terrible at for the moment. I’ve created training data using curie with some 1,100 snippets with prompts at the moment. But I suspect others might have use cases such as teaching it how to maintain some specific application, etc. For such things it would be incredibly useful, and developers are open source at heart, so I suspect it would be smart of you to prioritise this higher …


Bump. Few-shotting this becomes impractical after a while as I quickly run out of token space in the prompt. Support for fine-tuning widely expands the use cases for Codex.

How we looking on ETA in 2023?


Any ETA for this thread?
Because I have few use-cases around this.


There is currently no “ETA” for fine-tuning a base codex model.

Sorry to inform.

Hope this helps.


Have a look at : How to get Codex to produce the code you want. It’s a workaround for fine-tuning in the context of code generation

1 Like