Teaching GPT a new/niche programming language

I work for a software company and we have our own application-specific programming language. I want to train a model to answer coding questions in the same way ChatGPT can respond to Python questions: giving one-line responses or even writing entire functions that accomplish a task.

I have a large set of help documentation. Each function or class in our language has its own page, which describes the arguments and also provides code examples e.g.:

# Open a table
tbl = CreateObject("Table", file_name)

I see a lot of conflicting opinions on whether to use embeddings or fine-tunings (I also see that codex fine-tuning is no longer possible).

I have worked through OpenAIs python notebook on Q&A using embeddings. I’ve also played with fine-tuning and have a basic understanding of creating prompt-completion pairs and using the API to do training.

What is the right approach? Should I fine-tune the base davinci model? Should I build an embeddings database and then prepend relevant chunks into my prompt? Will either approach be successful?

Thank you!

Welcome to the community.

Since you can only fine-tune original Davinci, I would stick with embeddings… or at least test that method first then try a small fine-tune to compare results. I really think embeddings should do the trick, though.

1 Like