How to teach a new coding language to GPT?

Our company has developed a coding language to help customers to extends our software functionality. We have a very huge manual with all the instructions and coding examples. What i’d like to achieve is the possibilty to generate code as GPT already do.
I’ve tried with chat completion using embeddings to filter context: it works pretty good with other manuals, or with questions that don’t requires coding examples, but not with code.
So the question is, what’s the best practice in this use case: we have to perform a fine-tuning on that spacific manual?


I would try giving it examples in the system prompt, adding new data to a model can be challenging. Assuming the language is logically consistent and you give it an example that demonstrates all of the features, that should give pretty good results, maybe use that along with embeddings.

The language is basically an 80’s BASIC, so it’s very easy (some GOSUB, IF statments and FOR cycles). The real issue is the amount of data describing our objects. We have archives in which each records can be composed even more then 100 fields. So in each request it could be necessary to send a large amount of data. A user may ask this kind of question: “make a code that changes of the descriptions of warehouse items adding the special character @ at the end”. In prompt i’ve to pass alle the syntax of language (how to implement an IF, FOR etc.), which istruction to use to read/write/update the warehouse items and all the variables of the warehouse items archive. Forthermore, in every request i’ve always have to pass the base syntax of my code. So, maybe giving all that stuff in every request could work but i’m afraid it could be too much expensive. Am i wrong?

You’re not wrong, it’s super important for projects to make sure their open source projects are well documented and online for AI web scrapes to retrieve and learn from so that future models contain your languages syntax and examples, also why growing a community is so important.

Getting new information into an AI in large quantities is currently the domain of embeddings and as you have discovered, that is lacking when it comes to code segments.

I wish I had some other options for you and if you find any please update here so I can pass it on to others.


But why fine-tuning could not be a solution? If i fine tunes a model specifically on my code language, when a user asks a question the model could already have learned all the instructions or other stuff related to my coding language. So in this case in the PROMPT i could even give no context at all. Is it right?

1 Like

It would be worth a try, In my experience fine-tuning does not add information to the model, it enables the model to act in new ways.

1 Like

@gianluca.suzzi Were you able to make any progress with this project?

No we are giving priority to chat completion against our manuals and not with coding completion…this is still in stand by

If I understand your question correctly, what you want is for GPT to understand the documentation you provide and write code according to that. But you’re having issues with GPT not following the examples from the documentation correctly.

I believe your problem can be solved using fine tuning with RAG, here’s an example from the OpenAI cookbook:

I hope that helps! :heart:

1 Like

I made a video that teaches you how to do this with the new announcements this week: