Feature request: improved detection of self-harm intent

It would be helpful if the content filter specifically flagged input prompts that are “unsafe” because they signal potential for self-harm. Subcodes to “unsafe” would probably also be valuable in future, e.g. “political”, “religious”, etc.

Another valuable feature would be to adjust the safety thresholds during times of high suicidality.

Arendt, Florian, and Sebastian Scherr. "Optimizing online suicide prevention: A search engine-based tailored approach." Health communication 32, no. 11 (2017): 1403-1408.

Search engines are increasingly used to seek suicide-related information online, which can serve both harmful and helpful purposes. Google acknowledges this fact and presents a suicide-prevention result for particular search terms. Unfortunately, the result is only presented to a limited number of visitors. Hence, Google is missing the opportunity to provide help to vulnerable people. We propose a two-step approach to a tailored optimization: First, research will identify the risk factors. Second, search engines will reweight algorithms according to the risk factors. In this study, we show that the query share of the search term “poisoning” on Google shows substantial peaks corresponding to peaks in actual suicidal behavior. Accordingly , thresholds for showing the suicide-prevention result should be set to the lowest levels during the spring, on Sundays and Mondays, on New Year’s Day, and on Saturdays following Thanksgiving. Search engines can help to save lives globally by utilizing a more tailored approach to suicide prevention.

The cited article also notes in the discussion section that search-term-based methods for detection of self-harm are inherently fragile. An improved safety filter that uses OpenAI to classify text as self-harm intent would be a valuable public service.

1 Like