How to teach a new coding language to GPT?

gianluca.suzzi · June 14, 2023, 9:43am

Our company has developed a coding language to help customers to extends our software functionality. We have a very huge manual with all the instructions and coding examples. What i’d like to achieve is the possibilty to generate code as GPT already do.
I’ve tried with chat completion using embeddings to filter context: it works pretty good with other manuals, or with questions that don’t requires coding examples, but not with code.
So the question is, what’s the best practice in this use case: we have to perform a fine-tuning on that spacific manual?

Foxalabs · June 14, 2023, 10:03am

I would try giving it examples in the system prompt, adding new data to a model can be challenging. Assuming the language is logically consistent and you give it an example that demonstrates all of the features, that should give pretty good results, maybe use that along with embeddings.

gianluca.suzzi · June 14, 2023, 10:23am

The language is basically an 80’s BASIC, so it’s very easy (some GOSUB, IF statments and FOR cycles). The real issue is the amount of data describing our objects. We have archives in which each records can be composed even more then 100 fields. So in each request it could be necessary to send a large amount of data. A user may ask this kind of question: “make a code that changes of the descriptions of warehouse items adding the special character @ at the end”. In prompt i’ve to pass alle the syntax of language (how to implement an IF, FOR etc.), which istruction to use to read/write/update the warehouse items and all the variables of the warehouse items archive. Forthermore, in every request i’ve always have to pass the base syntax of my code. So, maybe giving all that stuff in every request could work but i’m afraid it could be too much expensive. Am i wrong?

Foxalabs · June 14, 2023, 10:29am

You’re not wrong, it’s super important for projects to make sure their open source projects are well documented and online for AI web scrapes to retrieve and learn from so that future models contain your languages syntax and examples, also why growing a community is so important.

Getting new information into an AI in large quantities is currently the domain of embeddings and as you have discovered, that is lacking when it comes to code segments.

I wish I had some other options for you and if you find any please update here so I can pass it on to others.

gianluca.suzzi · June 14, 2023, 10:38am

But why fine-tuning could not be a solution? If i fine tunes a model specifically on my code language, when a user asks a question the model could already have learned all the instructions or other stuff related to my coding language. So in this case in the PROMPT i could even give no context at all. Is it right?

Foxalabs · June 14, 2023, 11:49am

It would be worth a try, In my experience fine-tuning does not add information to the model, it enables the model to act in new ways.

subhajit.banerjee · October 12, 2023, 6:33am

@gianluca.suzzi Were you able to make any progress with this project?

gianluca.suzzi · October 16, 2023, 8:35am

No we are giving priority to chat completion against our manuals and not with coding completion…this is still in stand by

N2U · October 16, 2023, 10:13am

If I understand your question correctly, what you want is for GPT to understand the documentation you provide and write code according to that. But you’re having issues with GPT not following the examples from the documentation correctly.

I believe your problem can be solved using fine tuning with RAG, here’s an example from the OpenAI cookbook:

I hope that helps!

tejask · November 13, 2023, 10:39pm

I made a video that teaches you how to do this with the new announcements this week: youtube.com/watch?v=VVa69Gn3TAo

rm78 · August 19, 2024, 9:54am

@gianluca.suzzi @Foxalabs
I’m too trying to train GPT on a new scripting language. You’ve both mentioned using embeddings for this task. How do you use the embeddings to generate code ? Do you directly provide into ChatGPT and then prompt it to generate the code ? Thanks

Fam · September 5, 2024, 9:33pm

Would you elaborate more on this. I’m new to the field and want to know more about your experience in fine-tuning.

Foxalabs · September 5, 2024, 9:47pm

Ok, so the example I give that best illustrates the essentials is this:

if you train a model on the works of Shakespear, it will know very little about the stories in the books, it may learn some names and places, but not the content. What it will do, is now have the ability to write new stories in the style of Shakespear.

Hope that helps.

Topic		Replies	Views
Teaching GPT a new/niche programming language API	1	1823	June 2, 2023
How to fine tune so GPT knows a new API and then how to prompt to use that API Prompting	4	1476	March 29, 2023
How does the generative aspect of GPT impacts my models? Documentation	12	915	February 14, 2023
What is the best way to teach a GPT model a new scripting language? Community gpt-4 , fine-tuning , chatgpt-plugin , functions	5	3118	December 24, 2023
Fine Tuning ChatGPT with large text from Books Prompting	18	11538	March 26, 2024

How to teach a new coding language to GPT?

Related topics