Extensive documentation about fine tuning

Please, does anyone have some extensive documentation on fine tuning, in particular for hyper parameters. I’ve got 2,777 training snippets, and 344 validation snippets. I’m trying to teach gpt-40-mini my programming language, and the result is crap!

I’ve tried every tweak with hyper params, but nothing changes - Regardless of what I do, the resulting model is “junk” …

Imagine your programming language as just another language the AI would have to “speak” and understand. OpenAI trained GPT-2 on 40GB to produce language completions, and it is miles behind what can be done in the present day after chat models are further post trained on tasks by the millions in RLHF.

So we have to think about what can be done, when fine-tuning is far more effective for behavior than for imparting knowledge.

And for me, that would be fine tuning on using RAG to iteratively research language documentation. Train on calling a knowledge function, then calling it again, until obtaining the exact context needed to perform the task.

You already have a pattern of input → desired output

What could enhance this is input → tool call → tool return → tool call → tool return → output.

The function calling could be synthetic yet automatic addition to your training set once you have built the knowledge base: Run a AI that is made into such a programmer with just system prompt, informed that it must have a full solution from the education-by-tool. It goes off and makes the calls, getting back the documentation. The final answer (that might be good or not) is discarded because we just need to capture the tool calling and the knowledge for example of how production will be powered. Then insert as fine-tuning turns that sequence of function-calling.

You can see that in this case, the fine-tuning would just add another layer of text prediction onto what the normal AI could produce from reading context. That’s the best use I could imagine of fine-tuning for an uphill battle.

I have already tried this, but it produces too many errors. I’ve created a loop that loops through my documentation extracting RAG data with VSS, and generates training snippets in a loop, but it’s not producing good enough results …

flowchart TB
    %% Layout direction: Top to Bottom
    direction TB

    %% Step 1: Existing FT Pattern
    subgraph S1["Step 1: Existing FT Pattern"]
        A1[FT System Msg] --> B1[User Message] --> C1[Desired Output]
    end

    %% Step 2: Tool Use with RAG
    subgraph S2["Step 2: Tool Use with RAG"]
        A2[Tool-Use System Msg] --> B2[User Message] --> C2[Tool Call] --> D2[Tool Return] --> E2[Discarded Output]
    end

    %% Step 3: Constructed Training File
    subgraph S3["Step 3: Constructed Training File"]
        A3[FT System Msg] --> B3[User Message] --> C3[Tool Call] --> D3[Tool Return] --> E3[Desired Output]
    end

    %% Copy arrows from Step 1
    A1 -.-> A3
    B1 -.-> B2
    B2 -.-> B3
    C1 -.-> E3

    %% Copy arrows from Step 2
    C2 -.-> C3
    D2 -.-> D3

or just 'cause that mermaid chart is small here.

I suggest your high quality RAG prompted with a high-quality tool-using AI would generate the tool call and tool return data needed to insert into your training file. Real data.

Two layers of bad results: maybe better?

1 Like