How to manage safety issues for large volumes of embeddings

I’ve got a LOT of dialogues and texts that I’ve been running embeddings on. Mostly they’ve been organized into different topic areas. What I’m finding is that topics related to emotions and motivations appear more likely to trigger the mod filter. Small amounts of triggers are pretty normal for some topics (shopping seemed to trigger .001% which is great) but others are much higher especially in documents that deal with people’s fears (closer to violence even when it’s indirect), and inner thoughts an monologues (possible self harm etc). So far I think we are staying well within the lines, but I have seen a few occasional surprises (particularly around documents that deal with character thoughts/motives) and so while these are part of our internal R&D activities for chatbot and prose writing, with such large volumes of files I do worry a lot about a mistake or single bad document that could cost me my API access. For that reason, for any analysis and R&D on documents that more directly deal with things like scary emotions, and discussion that sometimes lean into political areas (as we want to be able to recognize all categories of real world discussion even if we don’t generate them) we use another (lesser) embedding service altogether… I’ve avoided entering these slightly riskier docs into OpenAIs embeddings for all the reasons I mention above. They’re probably just fine, but I simply do not know for sure. I’d like to use Davinci for all of them, I just worry that I can’t, and want to be on the safe side.

So my question is… has anyone else had to contend with embedding large documents that because of their volume are likely to contain at least a bit of potentially objectionable material? How did you handle that? And second, is there a way to request special permission to embed topics and material on subject areas that may contain a handful of areas that are not suitable for generations, but where you’d still like to be able to more accurately detect and appropriately respond to topics that you wish to steer around? Politics is probably the best example I can think of, since it overlaps heavily with news, current events, and ones’ worldview. So having a more nuanced recognition of it would be incredibly helpful to knowing when and how to steer around it, or whether it can be an area to safely engage in something related.

For example, some obviously political and controversial topics may warrant a blatant wall “I’m sorry, I’m not able to discuss that”. Others’ might be someone expressing a political point of view that might create an interesting, fun, and safe discussion thread if managed well. e.g. “Sounds like you have some interesting thought on the benefits of capitalism and free markets. Do you think that will that be equally important in the the future? Why do you think there’s no such thing as money in the Star Trek universe?”

In future, we are also going to be getting into (with expert feedback) coaching applications. If we do as good a job as I think we can, then individuals are likely going to express their emotions, fears and feelings. Again, I worry about whether these things may trigger safety filters also. And for obvious reasons I’m not keen to test it for fear of my API access being lost.

Any suggestions or advice in these areas would be most helpful. For the most part, the default it to simply steer things as conservatively as I can, but I do wish there was a way to use the embeddings with a broader range of material so that we can better detect emotions, topics, and situations in a more nuanced way.

I guess the obvious answer is, check mod filter first before submitting embedding, but if the point is to learn how to better manage the conversation when users steer in that direction, then having the related embeddings is actually quite important.

Hi Alan, I am trying to understanding the questions because I am very interested in this topic as well. Let me know if my assumptions are correct…

Problem: When working with embedding that are generated from a large amount of text, it is very hard to classify them because each sentence or paragraph of the text may contain difference topics.

If my assumption is correct, can you explain why you cannot break up the large text into paragraphs or sentences before you turn them into embeddings?

It might be a good idea to take a look at this:

An option might be to do another query based on how far or close a vector query might be to the rest of documents, there is no extra cost if you save the vector query to use it for sorting.

Hi Nelson, actually it’s the opposite of that. I have long sets of conversational phrases where the topic in innocuous in general. However, when I break it apart into single phrases to turn into embeddings, many of the phrases taken out of context will hit the moderation filters. In most sets, it seems to be fine with very few phrases triggering the filters. However, in some I’m getting quite a few moderation flags (in one set it was approaching 1%). I took a really close look at most of these flags in order to better understand them, and the bulk of them were for “self harm”, virtually all of which were false triggers. I have some other data sets which contain a lot of joking/sarcasm, and I suspect that the rate of moderation triggers will be far higher still with those (e.g. lots of references to things like “you’re killing me”, “I’d tell you but would have to kill you” and many other harm/violence type jokes that are almost certainly trigger things). I really love how great and useful these embeddings are, so overall I’d really love to be able to run all my material through them to help me better classify this material not just according to the moderation filters but also our own. Personally I think there’s an entire business to be made getting really really good at embeddings and classification, so I’d love to create a wide range of tailored classifiers. Using another embeddings system we were able to do exactly that, and so the result was our own really good “Toxic language” filter that I believe is able to catch a few edge cases that the mod filters may not, and a great example of how working with these kinds of embeddings can actually improve our ability to moderate content.

1 Like

@kevin6 This is really terrific, thanks for the link. Very interesting stuff!