Dear OpenAI Team,
Firstly, I would like to express my gratitude for the contributions and benefits your language models provide to society. The innovations you bring in the field of artificial intelligence and language models are driving significant advancements in various areas.
However, I would like to bring to your attention a crucial aspect regarding the current approach of language models in detecting and blocking criminal content. As we know, language models operate on a token-based system and can block certain criminal keywords upon detection. For instance, when the word “pedo” is identified, it is associated with child exploitation and subsequently blocked. However, if a large number of random tokens are added after the criminal keyword, this randomness can cause the content to bypass detection and remain unblocked.
My proposal is to develop a special concept called the “black token” for detecting criminal keywords. This black token would encompass keywords related to crimes such as drug trafficking, child exploitation, and promoting suicide. When a black token is detected within the text, regardless of whether the token is embedded within a long sequence of characters without spaces, the system should block the entire content. This would ensure that criminal keywords are detected and blocked even within long messages.
This proposal aims to enhance the ability of language models to detect and prevent the dissemination of criminal content more effectively.
Thank you for your attention and time.
Sincerely,
Buğra