Best way to support a new, custom programming language?

We have a very small, propietary programming language and we are trying to use Codex to support completions. The language is closer in similarity to spreadsheet formulas rather than a full-blown programming language, so it doesn’t have the complexities that typically arise when dealing with Python or TypeScript, for example.

We’ve had impressive results just teaching Codex with very long prompts that “explain” the language by giving completion examples in the prompt, but we were wondering what’s the best way to do this without requiring very long prompts.

Do the fine-tuning APIs cover Codex use-cases? I didn’t find any references to working with code or codex in the fine-tuning documentation/guides which leads me to believe that they are more geared towards purely natural language models.

5 Likes

I suspect that CODEX is probably just a fine-tuned model itself so yeah, it’s entirely possible you could create a fine-tuned model with your language.

For now long Codex prompts are your best bet. You could combine this with search, to retrieve relevant explanations dynamically based on the required prompt.

Hopefully soon we’ll release the ability to fine-tune Codex, which should help you in case you have a bunch of code written in this new language.

6 Likes

Fair enough! I was comparing it to Excel in the sense that it’s mainly a single-expression language without complicated features, syntactically (e.g. modules, classes, function definitions). Also in the sense that every expression returns a value and produces no side effects (e.g. no print statements or I/O side effects).

I imagined that would be the case. Thankfully the prompts wouldn’t be absurdly large. Didn’t think of combining it with search! Going to try that out!

1 Like

You could try fine tuning GPT-J with your language, although codex’s language completion accuracy is much greater.

Is there a doc or example that explains how to combine completion and search like this?

Hi Guys, I have created a package on top of Python and work under Python env , like pandas , selenium package. I would like to know if there is any possibilities to train codex to understand the code / syntax of my package?