Fine-tuning model on new coding language

I am working on a project where I am attempting to fine-tune an openai model on a language that is not widely known, though it is similar to SQL. The goal is to go from a natural language query to accurate code. I have trained many different models on samples, docs, code blocks, but no success. If anyone had any suggestions, please let me know.

2 Likes

Fine tuning shows the model new ways of working, new ways to β€œthink”. It is not well suited to adding new data. You can try embedding the language documentation and then using an embedding retrieval on your prompt to generate the required context in order to solve the prompts request.

Adding new code related information to an LLM is still a technical challenge and one being activly worked on.

I hope this gives you at least a few directions to look into.

1 Like

heres a fun script to do the same thing no ai required just have to make the sintext and guidelines for the code its basic still needs work

theban_alphabet = {
    'A': 'ᚨ', 'B': 'α›’', 'C': 'ᚲ', 'D': 'α›ž', 'E': 'α›–', 'F': 'ᚠ', 'G': 'ᚷ', 'H': 'ᚺ',
    'I': 'ᛇ', 'J': 'ᛃ', 'K': 'ᚲ', 'L': 'α›š', 'M': 'α›—', 'N': 'ᚾ', 'O': 'ᚩ', 'P': 'α›ˆ',
    'Q': 'α›©', 'R': 'ᚱ', 'S': 'α›‹', 'T': 'ᛏ', 'U': 'ᚒ', 'V': 'ᚑ', 'W': 'ᚹ', 'X': 'α›ͺ',
    'Y': 'ᛦ', 'Z': 'α›Ž', '.': '.'
}

theban_to_english = {
    'ᚨ': 'A', 'α›’': 'B', 'ᚲ': 'C', 'α›ž': 'D', 'α›–': 'E', 'ᚠ': 'F', 'ᚷ': 'G', 'ᚺ': 'H',
    'ᛇ': 'I', 'ᛃ': 'J', 'ᚲ': 'K', 'α›š': 'L', 'α›—': 'M', 'ᚾ': 'N', 'ᚩ': 'O', 'α›ˆ': 'P',
    'α›©': 'Q', 'ᚱ': 'R', 'α›‹': 'S', 'ᛏ': 'T', 'ᚒ': 'U', 'ᚑ': 'V', 'ᚹ': 'W', 'α›ͺ': 'X',
    'ᛦ': 'Y', 'α›Ž': 'Z', '.': '.'
}

english_alphabet = {
    'A': 'A', 'B': 'B', 'C': 'C', 'D': 'D', 'E': 'E', 'F': 'F', 'G': 'G', 'H': 'H',
    'I': 'I', 'J': 'J', 'K': 'K', 'L': 'L', 'M': 'M', 'N': 'N', 'O': 'O', 'P': 'P',
    'Q': 'Q', 'R': 'R', 'S': 'S', 'T': 'T', 'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X',
    'Y': 'Y', 'Z': 'Z', '.': '.'
}

def theban_translate(text, to_theban=True):
    translated_text = ''
    translation_dict = theban_alphabet if to_theban else theban_to_english
    
    for char in text.upper():
        if char in translation_dict:
            translated_text += translation_dict[char]
        else:
            translated_text += char
    
    return translated_text

while True:
    print("Translation Options:")
    print("1. Translate English to Theban")
    print("2. Translate Theban to English")
    print("3. End the program")
    choice = input("Enter your choice (1, 2, or 3): ")

    if choice == "1":
        text_to_translate = input("Enter the text to translate from English to Theban: ")
        translated_text = theban_translate(text_to_translate, to_theban=True)
        print("English Text:", text_to_translate)
        print("Theban Translation:", translated_text)
        print()
    elif choice == "2":
        text_to_translate = input("Enter the text to translate from Theban to English: ")
        translated_text = theban_translate(text_to_translate, to_theban=False)
        print("Theban Text:", text_to_translate)
        print("English Translation:", translated_text)
        print()
    elif choice == "3":
        print("Ending the program...")
        break
    else:
        print("Invalid choice! Please try again.")
        print()

print("Program ended.")
1 Like