Training GPT to learn new scripting language

gauge · July 4, 2023, 9:14pm

The company I work for uses a proprietary scripting language to implement policy. I’m interested in training GPT to understand this language for a few purposes:

Write custom code with a natural language prompt.
Interpret existing code and explain its purpose/behavior to users.
Troubleshoot by interpreting code and explaining the cause of unexpected behavior.
Act as a resource for users trying to learn the language.

Currently, I’m working on building a training dataset with some of the following:

Prompt-completion pairs for each function and procedure in the language. I may create each in reverse (e.g. “What function does X?” and “What does the X function do?”)
Prompts regarding syntax of common programming structures (e.g. LOOPs)
Prompts regarding how to create functions, calling them, passing arguments, etc.
Prompts regarding data types, supported operators, etc.
Prompts for code to solve relatively simple use cases and completions with only code and explanation in comments.

I’m looking for any advice on how to accomplish this task. Some questions I have in mind…

What model should I use?
I assume a FineTune is the right approach here, but if there’s a use for embedding, then I’d like to better understand. I’m still struggling a bit with understanding the use cases of each.
How should I format the training data? Can I use markdown in the completions?
Anything you anticipate I might not be considering that I should?

Topic		Replies	Views
Teaching GPT a new/niche programming language API	1	1733	June 2, 2023
What is the best way to teach a GPT model a new scripting language? Community gpt-4 , fine-tuning , chatgpt-plugin , functions	5	2988	December 24, 2023
Fine-tuning GPT to learn a new coding language Prompting codex , chatgpt , plugin-development , fine-tuning , api	3	3473	December 24, 2023
Fine-tuning a model without using prompt-completion API fine-tuning	1	908	July 4, 2023
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	385	August 23, 2024

Training GPT to learn new scripting language

Related topics