Best way to get GPT to output a DSL

alextkaiser · December 8, 2023, 6:39pm

Hello,

I’m trying to get GPT (either 3.5 or 4) to “translate” from a paragraph of english text to a DSL (Domain-Specific Language) that is XML that uses custom tags. What I have is:

20,000 examples of this DSL, so these are 250-500 token long text segments of this DSL
500 pages of documentation describing this DSL. This basically describes what is the correct syntax of the DSL, like what tags are possible, what attributes are possible on those tags, and an english paragraph describing what the tag means
1,000 examples of an english paragraph and the corresponding XML for that paragraph

My first attempt was to just use a multi-shot approach with the 1,000 examples. So using embeddings to find the 5 most relevant examples from the 1,000, and asking GPT to “Answer in a consistent style.”

This works okay (maybe around 60% accuracy), but I feel like I’m leaving a lot on the table because I’m not using the 20,000 examples or the 500 pages of documentation. Is there a way I could use either of those that would help?

I thought about trying to do fine-tuning with the 20,000 examples, but I can’t think of how I would do that since I don’t have the english counterpart to those examples?

Thanks,
Alex

Topic		Replies	Views
Training GPT to generate code in a private DSL API	1	1535	December 17, 2023
What is the best way to teach a GPT model a new scripting language? Community gpt-4 , fine-tuning , chatgpt-plugin , functions	5	3133	December 24, 2023
Specialized Translation Task: Fine-tuning vs examples API	1	2458	April 12, 2023
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	458	August 23, 2024
Teaching GPT a new/niche programming language API	1	1838	June 2, 2023

Best way to get GPT to output a DSL

Related topics