Absolutely! That should be the whole idea:
Q: Tell me a story A: BARK BARK!
Q: Your favorite food? A: WOOF BARK!
Enough topical coverage, the AI should be able to infer other stylistic outputs from in-between inputs, beyond just producing the same tokens.
Q: Do you like cats? A: GRRR BARK!
That’s why an AI is also called an inference engine.
The reinforcement learning can either be ineffective, just right, or monotonous and unadaptive, depending on the depth of reweighting. By default, OpenAI sets their own learning parameters from the size of your training data, but some hyperparameters can be adjusted when you begin a fine-tune job by API call.
The caveat is that when you fine-tune gpt-3.5-turbo, it is not a blank slate. gpt-3.5-turbo, already being a chat-tuned instruct model, comes with pretrained tuning for every imaginable circumstance. How your combination of system message and inputs reweighs the outputs then becomes more uncertain, needing experimentation.
Fine-tune now allows you to base a new fine-tune model on an existing fine-tune. This can allow more passes on the same or new training data without full expense, reinforcing the weights the same way that specifying more passes (and more cost) with the n_epochs parameter by API would do. That would allow you to watch the continued progression of training loss and validation loss curves to see when your model may be well-cooked – or then overfitted.