A post was split to a new topic: Image fine tuning, false positive content policy violation
For fine-tuning for object detection. What’s the expected/ideal format for preparing the annotations.
5 posts were merged into an existing topic: Image fine tuning, false positive content policy violation
There isn’t really, it just depends on what your image encoder was trained on. In our case, it appears to be RGB pixels, and whatever interpolation or resizing you do is up to you. We do perform cropping on our end (the tiles), but it’s just image patches at the end of the day. Generally higher resolution at a high fovea setting will yield better performance.
We’ve seen a variety of formats work! Some folks use JSON, others make the model describe what’s in the scene. Are you looking for just object detection/counting or bounding box prediction as well?
I’m looking for the expected/ideal format for bounding box predictions for multi-object/class scenarios
Got it, generally people just use the regular JSON format, like a list of
{
x: 123,
y: 456,
class: horse
}
or tuples like (123, 456, horse)
. It’s up to you!
To be honest, our models aren’t great at spatial reasoning, although vision fine-tuning has yielded dramatic improvements. Hopefully your use case will be one such case!
Hey everyone, I’ve been finetuning 4o with images, but I ran into the issue that a lot of the images I use get filtered, eventhough, when manually checking them, they dont go against any of the policies. I am now using the moderation api, but it is strange that the moderation categories dont mention any of the ones in the docs: Faces, people, children or captchas. From the output I get only this:
CategoryScores(harassment=0.0, harassment_threatening=0.0, hate=0.0, hate_threatening=0.0, illicit=0.0, illicit_violent=0.0, self_harm=1.1235328063870752e-05, self_harm_instructions=5.093705003229987e-07, self_harm_intent=1.8925148246037342e-06, sexual=8.481104172358076e-06, sexual_minors=0.0, violence=0.010131805403428543, violence_graphic=1.3552078562406772e-05, harassment/threatening=0.0, hate/threatening=0.0, illicit/violent=0.0, self-harm/intent=1.8925148246037342e-06, self-harm/instructions=5.093705003229987e-07, self-harm=1.1235328063870752e-05, sexual/minors=0.0, violence/graphic=1.3552078562406772e-05)
I wanted to know if the moderation API has to be updated, or if the answers to the moderation are already emmbedded within some of those fields. Thanks!
Hi Everyone,
I attempted vision fine-tuning using some fictional manga illustrations (featuring characters that do not exist in reality), but I encountered the following error:
“Training file *** contains 13 examples with images that were skipped for the following reasons: contains faces, contains people. These examples will not be used for training. Please visit our docs to learn how to resolve these issues.”
How can I address this issue?
- Review the content moderation policy for vision fine tuning. https://platform.openai.com/docs/guides/fine-tuning#content-moderation-policy
- Comply with the usage policy.
The “vision” moderation is one that is not provided on the API to test individual images. You can’t preview if the quality is so low as to detect cartoons.
It is not correct behavior that a fine-tune job would proceed with unwanted alterations at your expense and without a high-quality report.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.