Fine-tuning dataset for data processing task

xxl228 · March 18, 2024, 4:25pm

I want to build an application using GPT API to identify specific elements (e.g. equity and inclusion language and practice) in documents. I have a list of questions that point towards these elements which I’d like GPT to answer. Through in-context learning, it does a decent job but still make mistakes sometimes. If I were to fine-tune gpt for this task, how should I prepare the training dataset? Should I provide examples of inclusive language and practice used and upload them to GPT? What would the “message” look like? I am a bit confused about how to do this.

jr.2509 · March 18, 2024, 4:33pm

For fine-tuning you use input - output pairs as training examples. So if you were to put together a data set, you could use as your input, i.e. the user message, a text snippet that you wish to evaluate vis-a-vis DEI language requirements and as your output, i.e. the assistant message, some form of evaluation/statement whether the requirements are met or not.

There are different ways you could approach this problem, but this could be one way under a fine-tuning approach.

You can find general guidance on fine-tuning here.

Topic		Replies	Views
How to fine-tune ChatGPT for design comparison? API fine-tuning	0	71	October 15, 2024
How to choose my fine tuning data? API fine-tuning , fine-tuning-problems	6	1181	January 2, 2024
Fine-tuning a model without using prompt-completion API fine-tuning	1	932	July 4, 2023
Are fine-tuned models a good way to give GPT a specific tone of voice? API api	5	3990	July 20, 2023
Trying to fine tune in python? API	4	1539	April 28, 2023

Fine-tuning dataset for data processing task

Related topics