Training GPT to generate code in a private DSL

I have a closed-source project where we use a DSL to describe certain data structures. The DSL is similar to dockerfile or nginx.conf. I’d like GPT to generate code in it when prompted. Since the syntax is not publicly available, I can’t just ask it to. When given an example, it does match the pattern and produce output with a similar structure, but it doesn’t generalize it to the entire syntax. I could give examples to it to cover the entire syntax and/or the documentation to it, but it doesn’t fit in the model’s token limit. Maybe it will fit in GPT-4’s limits, but that’s too expensive and impractical for the scaling needs of the application.

So can I achieve this by fine-tuning? I can provide 1000s of examples and a full documentation on the syntax, if that works. But I’m not sure how to structure the prompt-examples in that case.

4 Likes