Few questions about GPT 3.5 fine-tuning

I’m trying to give a persona of cartoon character to gpt-3.5-turbo model.
I have tried prompt engineering, with gpt-3.5-turbo and gpt-4 both.

The result was …
In the case of 3.5, the result of generating conversation is not satisfying, and in the case of 4, it costs a lot because of the long system prompt, including guidelines, worldview, and tone.

So, I’m trying to fine tune gpt-3.5 model as a cartoon character to solve the problem about performance and price at once.

I wonder,

  1. Is this possible that fine tuning can make gpt-3.5 talk like a cartoon character with worldview and tone?
  2. If 1 can be possible, then how can I prepare the datasets? openai told that dataset should follow below format.

{“messages”: [{“role”: “system”, “content”: “…”}, {“role”: “user”, “content”: “…”}, {“role”: “assistant”, “content”: “…”}]}

Is system prompt neccesary to fine tune my case? Also, should datasets be a continuing conversation format, not just a couple of data like this? {“messages”: [{“role”: “user”, “content”: “hi?”}, {“role”: “assistant”, “content”: "hi, my friend. "}]} => (should I add a continuing conversation?)

  1. Lastly, Is there any difference between fine tuning with Hugging face’s model and fine tuning with openai api, from a cost and performance perspective?

The system prompt can be use to identify the AI personality in the same way that you’d say “You are ChatGPT” to a chatbot.

“You are Princess Ophelia”

A normal system message would also come with all the behaviors the personality must take on, the extensive rules you are writing.

Here though, you might train on that system identity alone, with the hundreds of new example inputs and outputs, and that identity can set your fine-tune apart from standard AI behavior, when your app again uses that system message.

gpt-3.5-turbo has tons of pre-training though. You’d have to counter every “I’m sorry, but as an AI language model, I don’t have feelings” scenario with a stronger fine-tune weight.

Examples of deeper conversations can ensure that the behavior continues still when AI receives similar length of chat history, and other chatbot pre-training doesn’t again kick in. The whole system prompt and every conversation turn is input that is considered and answered in your standalone API call.

Remember: fine-tuning costs, and usage costs 8x also. You’ll still be paying more even with your prompt going from 1000 tokens to 20, considering you pay for chat history out the nose too.

…And then pay for another fine-tune when it doesn’t work due to the lack of documentation on how to achieve success.