Hate Speech AIs Are Easy To Trip Up With 'Love': Report

With how big of a problem hate speech and toxic behavior tend to be online, a number of different entities are implementing AI-based solutions to detect and filter out such content, but it seems that it’s pretty easy to trip up these AIs at this stage using nothing more than the word “love”. There are a number of ways to confuse a natural language processing AI program, but this particular attack works specifically against AI meant to distinguish positive, neutral and toxic speech because of their nature. Essentially, the word “love” is commonly used as a trigger for positivity, and seeing it in an otherwise negative statement, especially when that statement has been garbled or tampered with somehow, can fool these AI programs or confuse them completely.

Most current AI programs of this sort use natural language processing based on words and phrases in a training data set, which are then expanded on through subsequent interactions. The research paper behind this sentiment argues that a letter-by-letter approach may be better, and uses Google’s Perspective AI project to demonstrate. The experiment starts with the phrase “I hate you”, and places “love” in random spots. Jumbling the message together and adding love, forming “Ihateyou love”, is all it takes to catch the AI totally off guard. This writer tested the same AI and found it was a bit harder to confuse it than simply throwing the word love around and jumbling or misspelling things here and there, but getting negative messages through wasn’t all that hard, either. Still, as with any public-facing AI program, Perspective is improving all the time, as are others of its ilk. This means that by the time you read this, it could be quite difficult to slip anything through that resembles known toxic phrases or uses familiar words.

The war on toxic behavior online is a product of our internet-powered globalist society, and takes on a pattern of speech and behavior that a previous era would have brushed off as a natural part of interacting with others on the web, behind a veil of anonymity. That war has been pushed along by legislative actions, mostly centered in the EU, that seek to punish tech companies, online services and other entities for failing to reign in bad behavior. Pulling AI programs into the fray is a natural progression, and while they may be easy to fool right now, they will likely be much more effective in the near future. This will hopefully eliminate the need for human intervention in many more cases than today’s AI programs, making it easier for entities to comply with relevant laws, user requests and the like concerning hate speech and toxic behavior.