Trying fine-tuned model on OpenAI platform

ozgurcelik911 · May 11, 2024, 10:17am

I fine-tuned a model, but I can’t compare its performance on OpenAI platform because it says the selected model is unavailable. How can I fix this?

jr.2509 · May 11, 2024, 10:28am

Hi again!

The davinci-002 model alongside other legacy models was deprecated in January this year. Fine-tuned davinci models were also affected by this deprecation and are therefore no longer available.

See also here: https://platform.openai.com/docs/deprecations

_j · May 11, 2024, 11:11am

… is a current base model.

It just stinks compared to real GPT-3 175b parameter davinci that was removed.

It also doesn’t work in “chat” - it needs completions prompt/separator/stop training and inference on the completions endpoint.

jr.2509 · May 11, 2024, 11:19am

You are right. It’s a bit confusing.

Could the issue the OP has flagged then be related to the legacy fine-tuning endpoint?

2023-08-22: Fine-tunes endpoint

On August 22nd, 2023, we announced the new fine-tuning API (/v1/fine_tuning/jobs) and that the original /v1/fine-tunes API along with legacy models (including those fine-tuned with the /v1/fine-tunes API) will be shut down on January 04, 2024. This means that models fine-tuned using the /v1/fine-tunes API will no longer be accessible and you would have to fine-tune new models with the updated endpoint and associated base models.

Source: https://platform.openai.com/docs/deprecations/2023-08-22-fine-tunes-endpoint

_j · May 11, 2024, 11:50am

Comparison playground tells you what it is trying to do with the “system” prompt input box. It is a chat completions interface, and sends to a different API endpoint than is required to use davinci-002 or other strictly completion models.

dignity_for_all · May 11, 2024, 11:52am

The Absence of the Original Poster…

ozgurcelik911 · May 11, 2024, 11:58am

I appreciate the help you have provided so far. I have a new issue that I need assistance with. I am attempting to fine-tune gpt3.5 using the data I previously fine-tuned with da Vinci. However, I encountered an error message indicating a problem with the data structure. Does the data structure required for fine-tuning differ between da Vinci and gpt3.5? If so, how can I convert it to the appropriate format?

_j · May 11, 2024, 12:05pm

There will be a significant change in adapting your input to gpt-3.5-turbo, because it is in fact a chat model that comes with extensive pretraining in “how to chat” using the chat containers.

Whereas completions simply takes bare “prompt” and the AI continues writing the language that would appear after that (and the actual prompt being some signal like you training on the word “assistant:” to make it write as an entity), chat completions fine-tuning is done with a JSONL that has messages like the chat completions messages you normally send to the model, using the system, user, assistant roles - and assistant is the output desired from the sequence of system and user.

That will mean adapting not just how you format the file, but how you are expecting the training to work also. The AI model already comes with a self-identity and a chatting skill.

Documentation of the file to create is under than “documentation” on the forums’ sidebar.

dignity_for_all · May 11, 2024, 12:10pm

I think that the fine-tuned models using the /v1/chat/completions endpoint and the fine-tuned models using the /v1/completions (Legacy) endpoint (davinci-002) cannot be compared in the Playground.

It’s confusing though, because both appear in the Compare section of the Playground when fine-tuned.

jr.2509 · May 11, 2024, 12:13pm

Yeah and Jay made a good point about the compare functionality only being available for chat models given the interface relies on the system and user message structure. I didn’t even realize it until Jay pointed it out.

Technically, one should not even be able to set a regular completions model from the drop-down.

ozgurcelik911 · May 11, 2024, 12:14pm

Thanks again for these useful responses. What I am trying to achieve is to fine-tune the model to score a student essay and give feedback.

This is the training file structure I used to fine-tune da Vinci:

{“prompt”:“Task: [task description] | Instructions: [essay instructions for students] | Performance Indicators: [criteria for holistic essay scoring] | Essay: [student essay].”,“completion”:“Score: [numeric score assignend by human raters], Feedback: [feedback given by human raters.”}

I get consistent and accurate scoring but the feedback is still not in the desired quality.

However, I wanted to see the performance of gpt3.5 but the above file structure doesn’t work with gpt3.5.

jr.2509 · May 11, 2024, 12:18pm

No, it won’t work that way. As pointed out by @_j , you’d have to use the chat completions structure. For training data, it looks as follows:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Source: https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

One way to approach it in your specific case is to move the task description, instructions and performance indicators to the system message (assuming these would remain static) and then including the essay as input for the user message. The assistant response would then be the score along with the feedback.

dignity_for_all · May 11, 2024, 12:20pm

This URL might also be helpful.

ozgurcelik911 · May 11, 2024, 12:35pm

So, I will move task descripton, instuctions and performance indicators to system message. Then, should I separate them by using “|” within the text? like:

{“messages”: [{“role”: “system”, “content”: “Task: [text] , Instructions [text] | Performance Indicators [text] .”}, {“role”: “user”, “content”: “student essay”}, {“role”: “assistant”, “content”: “Score [numeric score] | Feedback [assistant feedback].”}]}

jr.2509 · May 11, 2024, 12:42pm

I probably would not use | as a delimiter but instead a comma or semicolon to separate the main different inputs along with some other delimiters such as ### or xml tags. For the assistant message you can just choose whatever is your desired output format.

So for example:

{“messages”: [{“role”: “system”, “content”: “###Task: [text]###, ###Instructions: [text]###, ###Performance Indicators: [text]###”}, {“role”: “user”, “content”: “student essay”}, {“role”: “assistant”, “content”: “Score [numeric score], Feedback [assistant feedback].”}]}

dignity_for_all · May 11, 2024, 12:45pm

I think it would be better to describe the task on the block in the system message and then structure the user and assistant messages in a more natural chat format, rather than using ‘|’ to separate them. It may take some trial and error to get it right.

_j · May 11, 2024, 12:48pm

I will show you an alternate take.

The cost of inference on fine-tuning models is significantly higher, besides the cost of training adequately.

Here instead is multi-shot untrained davinci-002 (using 20 GPT-4 synthetic examples). Simply the context has trained the AI how to respond. There are no other instructions.

(I had the AI actually use python as random number generator. The poor student answers showing are randomness that you’d probably want to make supervised pseudo-random.]

ozgurcelik911 · May 11, 2024, 12:52pm

Amazing. Thanks so much @jr.2509, dignity_for_all and @_j. I will try all these and see what works best. I really appreciate your help.

Topic		Replies	Views
Struggling with poor performance on fine-tuned davinci model API	15	2683	December 20, 2023
Fine tuning completation API	9	2390	December 25, 2023
Links in system prompt for GPT-4? Prompting gpt-4	10	7523	November 2, 2023
Finetuned a model, but it replies like insane API	7	1250	December 24, 2023
Fine tuned model gives empty responses API fine-tuning-problems	12	2118	December 15, 2023

Trying fine-tuned model on OpenAI platform

2023-08-22: Fine-tunes endpoint

Related topics