@willhang Thanks for the response Will. I completely understand where you’re coming from and had to offer similar explanations when I worked at OpenAI on the release of GPT-3, DALL-E, etc. Unfortunately I think this is a situation where the safety rationale is a bit behind where the current state of other systems are. I was really hoping to get this working instead of a doing a fine-tune with OWLv2 or LLaVA.
So Tier 5 can fine-tune on humans?
This is a brutally ineffective way to prevent abuse.
Any systems that use $ paid as a metric of trust eventually get gamed by providers that offer to run the service through their account. The only people affected by this policy are honest people.
A massive amount of use-cases are obliterated if any form of facial features cause a block. Catalogues? Marketing materials? I can understand starting with an over-zealous system and then winding down, but come on.
Please tell me this will become a little more intelligent. I can understand not wanting to train specifically on people, but as someone who was looking forward to train the vision tuning on company products (which a large amount have people in them) I won’t be able to use this.
“Too many files were skipped due to moderation” repeatedly. Ran all images through the Moderation API and none were flagged for anything. What’s wrong? There is very sparse information in the error messages, and no way to know how to solve this, which files to remove, etc.
I do not believe that is exactly what was meant here.
It is not likely referring to anyone with a Tier 5 account, rather I suspect it is referring to Enterprise-level organizations with a trusted partnership with OpenAI and who have demonstrated a legitimate use case.
No one is buying their way into fine-tuning on images of human faces.
I hope this is true. I have noticed that “tier” and “trust” seem to be mixed together lately. Although it makes sense that an active flow of tokens (and $$$) would indicate a more reputable source, any malicious actor can easily circumvent this type of system and only leave honest people with obstacles.
It would be a huge shame if OpenAI based the moderation severity on how much has been paid previously.
I agree with that fine thing for human faces and with human faces are different. It’s likely to be human face appear in the image for many applications like self driving, production monitoring, etc. these tasks dose not relay on human faces but it’s likely human faces in the images. Blurring them before fine tuning might create some information lost and a drop in performance, say if a human is standing in front of a road sign.
Yeah sorry @wguo6358 and you’re right that we have to make a tough tradeoff between blurring images vs skipping them. We’d rather introduce fewer surprises for why a fine-tuned model might perform poorly, and those surprises happen more frequently if we just blur stuff.
@AndrewMayne I don’t know what I’m allowed to say or not say but trust me, I totally feel you and I wish we had better precision around which datasets were harmful vs not. We’re working on it, believe me! We’d like to enable more safe use cases.
@curt.kennedy @RonaldGRuckus Sorry folks, to be extra clear, we’re talking about trust tier, where we review use cases from our partners (big and small). As @anon22939549 said, no one can get past this simply because they pay us more. Thanks for the backup @anon22939549!
We’re working on making this experience smoother, I promise. We knew this was gonna be a challenge for our developers, and we’re continuing to collect feedback to see how we can make this better for you all. Keep it coming!
“Too many files were skipped due to moderation” repeatedly. Ran all images through the Moderation API and none were flagged for anything. What’s wrong?
I’ve finetuned a model with images, but during inference I now keep getting the response I can't assist with images directly
. I have had this happen to me earlier today, but after a while the model managed to provide responses similar to the ground truths used during finetuning. A couple hours later now the model is back to refuse analyzing images. I should also note that I keep verifying this with the exact same prompt.
Any idea what might be going wrong here? It seems pretty random.
Its too hard to figure out which images fail the content moderation policy requirements. Im using omni-moderation-latest
model, and I filtered all the images which had a category score greater than 0
Even after that, Im still facing an issue with the content moderation.
Is there any other way we can use to know upfront on what images cannot be accepted?
Same here. Ran the full dataset through the moderation api and all passed. Still fails with the very non-informative message "“Too many files were skipped due to moderation” repeatedly.
If they are using a different model for moderation than what the moderation api is using, and the error messages are so non-informative there is no way for the user to adjust the dataset.
And I also tried calling the gpt4-o model directly on the images that didnt pass the content moderation, and I dont get any errors. so its really hard to know what the finetuning content moderation policy is, and how it is different from the completions one.
Thank you for confirming. I was a bit worried from the use of “trust tier”. As this exact terminology is used here to indicate that “paid tier” = “trust tier”
. Trust Tiers & Rate Limits: The number of API requests you can make per minute (RPM) and the number of tokens you can use per minute (TPM) depend on your Trust Tier. Stay tuned for more information on Trust Tiers.
Made it worth clarifying. I’m glad it’s not the case.
Looking forward to the upcoming updates.
Hi @willhang ! Thanks so much for your replies on this. I’m unfortunately running into the same issue, and have been trying to remove all PII from my dataset over the past few days.
Training file file-xxxx contains 3 examples with images that were skipped for the following reasons: contains people. These examples will not be used for training. Please visit our docs to learn how to resolve these issues. Using 28 examples from training file file-xxx
Could I please reach out to you via DM or via email to get your advice on what further PII removal I need to apply to my dataset? I have already censored all faces in my dataset, and I can visually see upon examination that the faces have been blocked out. It’s my first time using the vision finetuning api as well so would really appreciate any advice! Thank you!
Not sure if I should be replying to you or creating my own post but I seem to be running into a similar issue. I’m getting an “inaccessible URL” error. I’m hosting all the images on our server, and all I see are 200 responses to OpenAI. Each time I fine tune it has a different number of examples that failed due to “inaccessible URL”. Our web server is behind a Cloudflare proxy, so could that be the problem?
Looks like Cloudflare wasn’t the problem. I tried without Cloudflare’s proxy and got the same result. A random amount of URLs “inaccessible”.
Hey @SahPet, sorry that you’re running into this! Do your images contain any of the following?
https://platform.openai.com/docs/guides/fine-tuning/what-to-do-if-your-images-get-skipped
@serwanj that’s super strange. Are you still seeing this? What if you switch up the prompt?
Sorry that you’re seeing this @thiyagu.1405! Do any of your images also contain the following: https://platform.openai.com/docs/guides/fine-tuning/what-to-do-if-your-images-get-skipped?