Hi !
I’d like to develop a template capable of generating XML code specific to an internal configuration from instructions of around 300 characters. Basic LLMs such as GPT or others give middling results, even if you give 5 examples in the prompt.
So I thought I’d have to fine-tune a template to get better performance.
I have a database of about 200 examples, + a configuration document of about 90 pages. The whole thing represents 150,000 tokens.
Do you think fine tuning is the best option? If so, do you have any advice on which hyperparameters to choose (such as learning rate, number of epochs, how much validation data to choose, etc.)?
Should I turn to another solution?
Thanks !
Also consider evals. Not suggesting this will work but not on your list.
Thank you i will also use evals to see the improvement. For you fine tuning seems to be a good choice for solving my problem, but before paying for fine tuning I want to know how to choose my parameters ? Do I need to provide exemple in the prompts of the training dataset as I am using a chat model ?
FYI
I do not do fine-tuning so can not give any feedback.
I haven’t personally used fine-tuning, but I do have extensive experience with prompt engineering, and I believe it could be a great fit for your use case.
Based on your input, it’s not entirely clear whether you’re already using prompt engineering techniques or if you’re just relying on few-shot prompting.
With well-crafted instructions and incorporating chain-of-thought prompting, there’s a good chance you could achieve good results without needing fine-tuning.
I am using usual prompt engineering technic like well-crafted instructions, chain-of-thought prompting and few-shot prompting, the result is by far better than with a simple prompt, but it is not enough. That’s why I want to fine tune the model if I can’t do anything else to improve the model.