How do I know if my fine-tuned model is actually better than the base model? (For MATH-related use cases)

johncain194 · April 17, 2024, 4:47am

For my specific use case, I want to create a fine-tuned model from either gpt 3.5 turbo or gpt 4 with vision enhancement and OCR capabilities so that the model can “understand” what they are fed on any math problems and churn correct answers and representable graphics to illustrate its points (I’m not expecting much here… well, maybe someone has already done this)

Source: https://openreview.net/pdf?id=E4hK8t7Fts

These are the cases I need to answer:

After being fine-tuned, how do I evaluate the fine-tuned model is actually better IF the result fluctuates? I mean the bot can be right, but sometimes it is not. Or in the worst case, it does not answer explicitly or apologizing?
Can I use existing LLM model that is focused on Math (like Wolfram) and integrate it to GPT 3.5 or GPT 4
How can GPT 4 (from API) produce math-related answers from users’ questions, especially statistical calculations with clear graphical representations? or when the users input pictures or handwriting then I would expect the bot to produce pictures that represent its answer?
So far, I had no luck in producing correct and representable answers, even when doing primary grade math/statistics… not to mention my target of Phd-level math.

Thank you so much before.

Best wish,

Topic		Replies	Views
Knowledge base or prompt words, which one is more efficient in solving elementary mathematics problems? API gpt-4	6	441	June 30, 2024
How to make GPT 3.5 to be a bit smarter in math? Prompting	4	5741	April 29, 2023
Fine Tune on GPT-3.5 Turbo Instruct API api	3	618	March 24, 2024
Model selection problem. Mainly used to solve mathematics and physics problems API	4	3654	June 29, 2024
Question about fine-tune in the gpt-3.5-turbo-0613 model API fine-tuning	2	963	September 22, 2023

How do I know if my fine-tuned model is actually better than the base model? (For MATH-related use cases)

Related topics