Multiple lines of my training file is being flagged based on Moderation policies
HOWEVER, when I put those lines via Moderatio API, they do not get flagged.
Is there a workaround on customizing the Moderation scores/levels while fine-tuning?
I work in healthcare and trying to fine-tune it on certain scientific papers n approaches. It flags: names, google search result in some prompts, etc. I’m using the API to fine-tune
Would love to know how other people are fine-tuning based on highly curated info.
I don’t know if this pertains to your specific case, but:
6.1 Personal Data. If you use the Services to process personal data, you must (a) provide legally adequate privacy notices and obtain necessary consents for the processing of personal data by the Services, (b) process personal data in accordance with applicable law, and (c) if processing “personal data” or “Personal Information” as defined under applicable data protection laws, execute our Data Processing Addendum by filling out this form.
6.2 HIPAA. You agree not to use the Services to create, receive, maintain, transmit, or otherwise process any information that includes or constitutes “Protected Health Information”, as defined under the HIPAA Privacy Rule (45 C.F.R. Section 160.103), unless you have signed a Healthcare Addendum and Business Associate Agreement (together, the “Healthcare Addendum”) with us prior to creating, receiving, maintaining, transmitting, or otherwise processing this information.
I don’t know if your data gets flagged for this. I personally wouldn’t touch the stuff with OpenAI’s api, I would get a microsoft monitoring exemption before working with any PII.
The Azure cost for training also can’t be anticipated except by trial on their platform. “per computer hour”…
The training makes the break-even point of tokens-per-day less clear.
Microsoft also runs all generative model outputs through a content filter that takes an exemption to get turned off. A close reading of Azure policy might be required to see if the same moderation pass isn’t done on MS fine-tune inputs as OpenAI does.
(ps: make your own stop token on assistant output (maybe several times in a row like openai trains on with chat completion) and put some garbage after that to confuse the moderations)
If it costs anything to have weights at the ready, why is OpenAI letting a bunch of unused models not be deleted? Maybe MS doesn’t have the occasional 15 second latency of firing a fine-tune back up.