I fine-tuned a model, but I can’t compare its performance on OpenAI platform because it says the selected model is unavailable. How can I fix this?
Hi again!
The davinci-002 model alongside other legacy models was deprecated in January this year. Fine-tuned davinci models were also affected by this deprecation and are therefore no longer available.
See also here: https://platform.openai.com/docs/deprecations
… is a current base model.
It just stinks compared to real GPT-3 175b parameter davinci that was removed.
It also doesn’t work in “chat” - it needs completions prompt/separator/stop training and inference on the completions endpoint.
You are right. It’s a bit confusing.
Could the issue the OP has flagged then be related to the legacy fine-tuning endpoint?
2023-08-22: Fine-tunes endpoint
On August 22nd, 2023, we announced the new fine-tuning API (
/v1/fine_tuning/jobs
) and that the original/v1/fine-tunes
API along with legacy models (including those fine-tuned with the/v1/fine-tunes
API) will be shut down on January 04, 2024. This means that models fine-tuned using the/v1/fine-tunes
API will no longer be accessible and you would have to fine-tune new models with the updated endpoint and associated base models.
Source: https://platform.openai.com/docs/deprecations/2023-08-22-fine-tunes-endpoint
Comparison playground tells you what it is trying to do with the “system” prompt input box. It is a chat completions interface, and sends to a different API endpoint than is required to use davinci-002 or other strictly completion models.
The Absence of the Original Poster…
I appreciate the help you have provided so far. I have a new issue that I need assistance with. I am attempting to fine-tune gpt3.5 using the data I previously fine-tuned with da Vinci. However, I encountered an error message indicating a problem with the data structure. Does the data structure required for fine-tuning differ between da Vinci and gpt3.5? If so, how can I convert it to the appropriate format?
There will be a significant change in adapting your input to gpt-3.5-turbo, because it is in fact a chat model that comes with extensive pretraining in “how to chat” using the chat containers.
Whereas completions simply takes bare “prompt” and the AI continues writing the language that would appear after that (and the actual prompt being some signal like you training on the word “assistant:” to make it write as an entity), chat completions fine-tuning is done with a JSONL that has messages like the chat completions messages you normally send to the model, using the system, user, assistant roles - and assistant is the output desired from the sequence of system and user.
That will mean adapting not just how you format the file, but how you are expecting the training to work also. The AI model already comes with a self-identity and a chatting skill.
Documentation of the file to create is under than “documentation” on the forums’ sidebar.
I think that the fine-tuned models using the /v1/chat/completions endpoint and the fine-tuned models using the /v1/completions (Legacy) endpoint (davinci-002) cannot be compared in the Playground.
It’s confusing though, because both appear in the Compare section of the Playground when fine-tuned.
Yeah and Jay made a good point about the compare functionality only being available for chat models given the interface relies on the system and user message structure. I didn’t even realize it until Jay pointed it out.
Technically, one should not even be able to set a regular completions model from the drop-down.
Thanks again for these useful responses. What I am trying to achieve is to fine-tune the model to score a student essay and give feedback.
This is the training file structure I used to fine-tune da Vinci:
{“prompt”:“Task: [task description] | Instructions: [essay instructions for students] | Performance Indicators: [criteria for holistic essay scoring] | Essay: [student essay].”,“completion”:“Score: [numeric score assignend by human raters], Feedback: [feedback given by human raters.”}
I get consistent and accurate scoring but the feedback is still not in the desired quality.
However, I wanted to see the performance of gpt3.5 but the above file structure doesn’t work with gpt3.5.
No, it won’t work that way. As pointed out by @_j , you’d have to use the chat completions structure. For training data, it looks as follows:
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}
Source: https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset
One way to approach it in your specific case is to move the task description, instructions and performance indicators to the system message (assuming these would remain static) and then including the essay as input for the user message. The assistant response would then be the score along with the feedback.
This URL might also be helpful.
So, I will move task descripton, instuctions and performance indicators to system message. Then, should I separate them by using “|” within the text? like:
{“messages”: [{“role”: “system”, “content”: “Task: [text] , Instructions [text] | Performance Indicators [text] .”}, {“role”: “user”, “content”: “student essay”}, {“role”: “assistant”, “content”: “Score [numeric score] | Feedback [assistant feedback].”}]}
I probably would not use | as a delimiter but instead a comma or semicolon to separate the main different inputs along with some other delimiters such as ### or xml tags. For the assistant message you can just choose whatever is your desired output format.
So for example:
{“messages”: [{“role”: “system”, “content”: “###Task: [text]###, ###Instructions: [text]###, ###Performance Indicators: [text]###”}, {“role”: “user”, “content”: “student essay”}, {“role”: “assistant”, “content”: “Score [numeric score], Feedback [assistant feedback].”}]}
I think it would be better to describe the task on the block in the system message and then structure the user and assistant messages in a more natural chat format, rather than using ‘|’ to separate them. It may take some trial and error to get it right.
I will show you an alternate take.
The cost of inference on fine-tuning models is significantly higher, besides the cost of training adequately.
Here instead is multi-shot untrained davinci-002 (using 20 GPT-4 synthetic examples). Simply the context has trained the AI how to respond. There are no other instructions.
(I had the AI actually use python as random number generator. The poor student answers showing are randomness that you’d probably want to make supervised pseudo-random.]