The general release of gpt-4o-2024-08-06 model for fine-tuning begs a question whether we can fine-tune the multi-modal model using both text and images in the training/validation data.
I couldn’t find any references of being able to use images for fine-tuning in the official documentation or in the release notes, so I’m hoping someone on the forum might know a bit more on this topic.