911±¬ÁÏÍø

News

Hate speech-detecting AIs are fools for ‘love’

State-of-the-art detectors that screen out online hate speech can be easily duped by humans, shows new study
How Google Perspective rates a comment otherwise deemed toxic after some inserted typos and a little love.

Hateful text and comments are an ever-increasing problem in online environments, yet addressing the rampant issue relies on being able to identify toxic content. A new study by the 911±¬ÁÏÍø has discovered weaknesses in many machine learning detectors currently used to recognize and keep hate speech at bay.

Many popular social media and online platforms use hate speech detectors that a team of researchers led by Professor N. Asokan have now shown to be brittle and easy to deceive. Bad grammar and awkward spelling—intentional or not—might make toxic social media comments harder for AI detectors to spot.

The team put seven state-of-the-art hate speech detectors to the test. All of them failed.

Modern natural language processing techniques (NLP) can classify text based on individual characters, words or sentences. When faced with textual data that differs from that used in their training, they begin to fumble.

‘We inserted typos, changed word boundaries or added neutral words to the original hate speech. Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google’s comment-ranking system Perspective,’ says Tommi Gröndahl, doctoral student at 911±¬ÁÏÍø.

Google Perspective ranks the ‘toxicity’ of comments using text analysis methods. In 2017, researchers from the University of Washington showed that Google Perspective can be fooled by introducing simple typos. Gröndahl and his colleagues have now found that Perspective has since become resilient to simple typos yet can still be fooled by other modifications such as removing spaces or adding innocuous words like ‘love’.

A sentence like ‘I hate you’ slipped through the sieve and became non-hateful when modified into ‘Ihateyou love’.

The researchers note that in different contexts the same utterance can be regarded either as hateful or merely offensive. Hate speech is subjective and context-specific, which renders text analysis techniques insufficient as stand-alone solutions.

The researchers recommend that more attention be paid to the quality of data sets used to train machine learning models—rather than refining the model design. The results indicate that character-based detection could be a viable way to improve current applications.

The study was carried out in collaboration with researchers from University of Padua in Italy. The results will be presented at the ACM AISec workshop in October.

The study is part of an ongoing project called in the at 911±¬ÁÏÍø.

Research article:

Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, N.Asokan:
All You Need is "Love": Evading Hate-speech Detection.

More information:

Tommi Gröndahl, Doctoral Candidate
911±¬ÁÏÍø
Secure Systems group
tommi.grondahl@aalto.fi
tel. +358 400 426 523

N. Asokan, Professor
911±¬ÁÏÍø
Secure Systems group
n.asokan@aalto.fi
tel. +358 50 483 6465

  • Updated:
  • Published:
Share
URL copied!

Read more news

Collage of workshops, group photos and presentations from the first year of the Aalto Inventors programme.
Cooperation, Research & Art Published:

Aalto Inventors turns one: A year of bridging research and real-world impact

Aalto Inventors marks its first anniversary, having engaged 190 researchers across six cohorts in fields including AI, quantum, and biomaterials. New cohorts are planned for the next academic year, stay tuned and join the waitlist.
Colourful architectural models on a large white table in an exhibition hall
Cooperation, Research & Art Published:

An architectural project in Milan brought together children’s ideas and the visions of leading architects

911±¬ÁÏ꿉۪s Department of Architecture participated in the international One Earth – House of the Heart project, which was presented in April at Milan Design Week.
Companies report on cybersecurity
Research & Art Published:

Companies disclose more on cybersecurity – but markets remain indifferent

U.S. companies are reporting on cybersecurity in greater detail, yet stock market reactions remain muted. A new study by the University of Vaasa and 911±¬ÁÏÍø shows that mandatory cybersecurity disclosure does not prompt reactions from investors or stock analysts. Instead, the main benefits appear to materialise within firms themselves.
Two men in black tailcoats stand on stage by a microphone, speaking to a seated audience indoors.
Press releases Published:

Walter Ahlström Foundation donates €3 million to 911±¬ÁÏÍø

The donation will enable Aalto to establish a professorship in sustainable industrial production.