How do I provide context to finetuning?

Olx · July 25, 2023, 7:21pm

I have about 50k labeled entries I want to use to finetune GPT for a classification task. The issue I have is that the entries contain only labels that need context for GPT to better understand the problem:
…
PCOUNTM = 0.34
PCOUNTHUM = 0.24
PSURFCOUNT = 1.2

These are domain specific metrics. For instance PCOUNTM is pollen count, PCOUNTHUM is humidity adjusted etc.
It is a 100 or so highly domain specific metrics that need explanation and units. I feel like that would improve GPT ability to classify the entries, but i don’t know how to give the explanations as context for the finetuning. It would be quite expensive to just prepend the explanations to each row in my data and then do the finetuning. And I think that would go over some token limit as well.

How could I achieve this?

edit: clean up incorrect naming

_j · July 25, 2023, 7:46pm

Quite frankly: LLM can’t math.

And “ChatGPT” is a chatbot at a website, not a fine-tune base model.

If you have 50 “healthy” samples and 50 “dangerous” samples, could the language model possibly understand why they are ranked that way? Probably not. You get a random word maker, a dice roller.

Sounds like you probably could just do math yourself and use the values with a vector database if they are all the same format.

dimension = value/standard deviation x importance
classify 100 clear examples yourself into documents that also state classification.

retrieve top-5 matches for your unknown from vector database and extract classification.

Olx · July 25, 2023, 8:18pm

Understood, I was hoping a LLM could figure out some connections between the values and classify. It is a complex system and if i had ‘importance’ the task would be trivial. Our first solution was training a NN, which worked to some degree. By what you are saying, we should probably stick with our initial model as the openai models are not a good match for this task, at least the LLM ones I am aware of.

_j · July 25, 2023, 8:29pm

I just gave an importance scalar if you know that measurement is more important to the classification, like you think “2.5 particulate count” is more important to classifying health than “pollen”.

Topic		Replies	Views
Large context document and finetuning API gpt-4 , fine-tuning	1	201	September 6, 2024
Advice & Tips for finetuning API fine-tuning	7	197	October 31, 2024
Classify whether a question can be answered from the provided data API	4	2206	December 20, 2023
Fine tuning for new programming language API fine-tuning	8	287	March 19, 2025
How to Prepare Index References for Fine-Tuning: Tokenization and Context Considerations API fine-tuning , fine-tuning-problems	3	666	September 19, 2023

How do I provide context to finetuning?

Related topics