I work for a software company and we have our own application-specific programming language. I want to train a model to answer coding questions in the same way ChatGPT can respond to Python questions: giving one-line responses or even writing entire functions that accomplish a task.
I have a large set of help documentation. Each function or class in our language has its own page, which describes the arguments and also provides code examples e.g.:
# Open a table
tbl = CreateObject("Table", file_name)
I see a lot of conflicting opinions on whether to use embeddings or fine-tunings (I also see that codex fine-tuning is no longer possible).
I have worked through OpenAIs python notebook on Q&A using embeddings. I’ve also played with fine-tuning and have a basic understanding of creating prompt-completion pairs and using the API to do training.
What is the right approach? Should I fine-tune the base davinci model? Should I build an embeddings database and then prepend relevant chunks into my prompt? Will either approach be successful?
Thank you!