Fine-tuning an OpenAI GPT-3 Model via API for Company Language

p.bleyleven · May 2, 2023, 2:02pm

Hello everyone,

I have a project where I want to fine-tune an OpenAI GPT-3 model via API. My aim is to create a model that follows the language of my company. To achieve this, I want to fine-tune it with many texts, but I do not have a specific prompt for the texts.

My question is whether I can simply upload the texts to the API for fine-tuning, and whether it’s better to upload them as a txt file or a JSON file. I would appreciate any advice or recommendations from the community on how to proceed with this task.

Thank you for your help!

rex.vanhorn · July 14, 2023, 1:12pm

Hello, @p.bleyleven
I’ve worked on this application myself. As far as I understand, you have to create the jsonl file finetuning, but you leave the prompt column blank, and fill the completion column with your test.
Output from this model will be in the style of your training text.
Good luck!

curt.kennedy · July 14, 2023, 1:58pm

If you want the model to repeat facts and knowledge of you company, you need to use embeddings.

If you want the response to have the same general tone and language of your company documents, you need to create prompt-completion pairs by taking chunks of data, using an AI model to neutralize the text and then set your prompt completion for the fine-tune as:

{"prompt": <neutralized text>, "completion": <original text>}

This technique will extract the “personality” of the company, while embeddings will extract the “knowledge” of the company.

Do both if you want both features.

rex.vanhorn · July 14, 2023, 2:55pm

Hi, @curt.kennedy
This is awesome!! Do you know if there is any study or paper published on this technique? (I’m happy if a technique just works, but my professors want a little more background research.)

curt.kennedy · July 14, 2023, 2:59pm

No paper for tone, other than what was mentioned in this post

But try it and see what you get. I have found that these days GPT-4 has the best performance at neutralizing tone. Turbo get’s apologetic, and adds way too much noise instead of following the directions.

Topic		Replies	Views
Are fine-tuned models a good way to give GPT a specific tone of voice? API api	5	3963	July 20, 2023
Fine-tuning to change the 'stylistic output' while keeping the LLM brain knowledge? API gpt-4 , fine-tuning , api	12	6153	June 9, 2025
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	452	August 23, 2024
Fine Tuning ChatGPT with large text from Books Prompting	18	11534	March 26, 2024
Fine Tune GPT-3 without prompt? API	2	2617	November 21, 2022

Fine-tuning an OpenAI GPT-3 Model via API for Company Language

Related topics