Fine-tuning GPT to learn a new coding language

Hi GPT community,

I am a developer from Internet Computer Protocol blockchain and I wish to fine-tune GPT to learn our native language Motoko. We have exponentially growth for new developers, No. 1 GitHub commit as Layer 1 blockchain in last month, so I am very confident our current example code is enough for fine-tune.

My initial approach is to fetch GitHub repo by tag of Motoko language and then use self-instruct technique (like Alpaca from LLaMA) to generate prompt given code. I wish to get some critics before I dive in. How do u think of this approach for dataset generation?

Additionally, I am open for suggestions to create a plug-in rather than fine-tune. If plug-in is more legit, what message my backend sever shall response to users’ inquiries through GPT?

Thank u for suggestions.

1 Like

First off, what model do you want to use?

This is from the OpenAI documentation:

What models can be fine-tuned? (https://platform.openai.com/docs/guides/fine-tuning/what-models-can-be-fine-tuned)

Fine-tuning is currently only available for the following base models: davinci, curie, babbage, and ada. These are the original models that do not have any instruction following training (like text-davinci-003 does for example).

I would try a few different approaches.

  1. Upload documentation to a vector database and create a langchain tool for it to query.
  2. Give it the documentation straight in it’s system message. Unsure how many tokens that takes up.

Hope that helps :slight_smile:

Hello,
I am trying to perform a similar task. Have you found any good methods to do this?

1 Like