“Where more or fresher data becomes available” - I think that this approach of encouraging fine-tuning for the application of knowledge retrieval really needs to be rolled back. There’s been a big influx of “how I train” and “why doesn’t he work” because of new interest being put out there.
Documentation needed to prevent misapplication:
- Fine tune success stories using gpt-3.5-turbo for specific applications.
- Cases where the example prompt itself doesn’t already fulfill the output.
- Cases where only an identity shapes the new unprompted behavior and logic.
- Example conversation samples, numbers of samples and variations, epochs.
- Demonstration of realized performance on closed-domain and out of domain inputs.
- Techniques for management of existent ChatGPT-style chat weights vs your desired application, integrating with or overcoming chat fine-tuning.
- Failures or misuse which cannot be overcome by the best techniques.
- Reminding that use will cost 8x, a price that can’t be reduced for content in/data out.
Better than: “experiment with this at $400/50 million tokens/epochs”.