Using AI Police to Recognize Toxic Speech on Social Media用AI警察甄别社媒有害言论

AI speech police are smart and fast， but there’s a gap between strong algorithmic performance and reality. 人工智能（AI）言论警察聪明又快捷，但强大的算法性能与现实之间存在差距。

Facebook says its artificial intelligence models identified and pulled down 27 million pieces of hate speech in the final three months of 2020. In 97 percent of the cases， the systems took action before humans had even flagged1 the posts.

据脸书介绍，2020年最后一个季度里，其人工智能模型甄别并删除了2700万条仇恨性言论。其中97%，系统是在人工标记相关帖子前就采取了行动。

That’s a huge advance， and all the other major social media platforms are using AI-powered systems in similar ways. Given that people post hundreds of millions of items every day， from comments and memes2 to articles， there’s no real alternative. No army of human moderators3 could keep up on its own.

这是一个了不起的进步，其他所有主要社交媒体平台也都在以类似方式运用人工智能驱动的系统。鉴于人们每天发布数以亿计的内容，从评论、模因到文章不一而足，平台其实也别无他法。没有哪支人类审核大军能独立维持监管工作。

But a team of human-computer interaction and AI researchers at Stanford sheds new light on why automated speech police can score highly accurately on technical tests yet provoke4 a lot dissatisfaction from humans with their decisions. The main problem： There is a huge difference between evaluating more traditional AI tasks， like recognizing spoken language， and the much messier task of identifying hate speech， harassment， or misinformation—especially in today’s polarized5 environment.

但斯坦福大学的人机互动及人工智能研究团队进一步揭示了为什么自动言论警察在技术测试中准确率得分相当高，但最终裁决却招致人们的许多不满。主要的问题在于：口语识别之类相对传统的人工智能任务与甄判仇恨性言论、骚扰或虚假消息等更棘手得多的任务相比，在评估时存在巨大差异——在当今两极分化的环境中尤其如此。

“It appears as if the models are getting almost perfect scores， so some people think they can use them as a sort of black box to test for toxicity，’’ says Mitchell Gordon， a PhD candidate in computer science who worked on the project. “But that’s not the case. They’re evaluating these models with approaches that work well when the answers are fairly clear， like recognizing whether ‘java’ means coffee or the computer language， but these are tasks where the answers are not clear.”

“模型看似可以获得几近完美的评分，因此有人认为可以把模型当作一种类似于黑箱的东西来判识有害言论。”参与该项目的计算机科学在读博士生米切尔·戈登说道，“但实际情况并非如此。在答案相当明确的情况下，比方说判断java一词指的是咖啡还是计算机语言，他们这种评估方法非常管用，但现在这些任务压根儿就没有明确的答案。”

The team hopes their study will illuminate6 the gulf between what developers think they’re achieving and the reality—and perhaps help them develop systems that grapple7 more thoughtfully with the inherent disagreements around toxic speech.

项目团队希望自己的研究能够清楚说明，在研发人员自认取得的成就与现实之间存在着鸿沟——或许还可以帮助研发人员开发出新的系统，能更加审慎地处理有关有害言论的固有分歧。

Too much disagreement

分歧重重

There are no simple solutions， because there will never be unanimous8 agreement on highly contested issues. Making matters more complicated， people are often ambivalent9 and inconsistent about how they react to a particular piece of content.

没有什么简单的解决方案，因为在高度争议性的话题上，永远不会达成共识。让问题更加复杂的是，在针对某一具体内容作出反应时，人们往往表现得自相矛盾、前后不一。

In one study， for example， human annotators10 rarely reached agreement when they were asked to label tweets that contained words from a lexicon of hate speech. Only 5 percent of the tweets were acknowledged by a majority as hate speech， while only 1.3 percent received unanimous verdicts11. In a study on recognizing misinformation， in which people were given statements about purportedly12 true events， only 70 percent agreed on whether most of the events had or had not occurred.

举例而言，某研究显示，当要求人工标注员将包含仇恨言论词表中词汇的推文进行标记时，标注员极少能达成一致。仅5%的推文被多数人认定为仇恨性言论，而仅1.3%得到一致裁决。在另一项涉及甄别虚假信息的研究中，受试人被告知据称为真的事件陈述，只有70%的人一致认定多数事件发生或未发生过。

Despite this challenge for human moderators， conventional AI models achieve high scores on recognizing toxic speech—.95 “ROCAUC”—a popular metric for evaluating AI models in which 0.5 means pure guessing and 1.0 means perfect performance. But the Stanford team found that the real score is much lower—at most .73—if you factor in the disagreement among human annotators.

尽管对人工审核员而言这是一项挑战，但常规的人工智能模型在甄别有害言论方面却得分很高，“精准率”达到0.95。“精准率”是评估人工智能模型时广泛使用的一个衡量指标，得分0.5表示“纯属猜测”，得分1.0则表示“表现完美”。但斯坦福大学研究团队发现，如果将人工标注员之间的分歧考虑进去，实际得分要低得多，最多达到0.73。

Reassessing the models

重评模型

In a new study， the Stanford team re-assesses the performance of today’s AI models by getting a more accurate measure of what people truly believe and how much they disagree among themselves.

在一项新的研究中，通过更精确衡量人们真正相信什么及彼此间有多少分歧，斯坦福大学团队对当今人工智能模型的性能进行了重新测评。

The study was overseen by Michael Bernstein and Tatsunori Hashimoto， associate and assistant professors of computer science and faculty members of the Stanford Institute for Human-Centered Artificial Intelligence （HAI）. In addition to Gordon， Bernstein， and Hashimoto， the paper’s co-authors include Kaitlyn Zhou， a PhD candidate in computer science， and Kayur Patel， a researcher at Apple Inc.

该项研究的主管是迈克尔·伯恩斯坦和桥本辰则，两人分别为计算机科学领域的副教授和助理教授，供职于斯坦福大学“以人为本人工智能”研究所。除戈登、伯恩斯坦及桥本三人外，该论文联合作者还包括计算机科学专业在读博士生凯特琳·周和苹果公司的研究员卡尤尔·帕特勒。

To get a better measure of real-world views， the researchers developed an algorithm to filter out the “noise”—ambivalence， inconsistency， and misunderstanding—from how people label things like toxicity， leaving an estimate of the amount of true disagreement. They focused on how repeatedly each annotator labeled the same kind of language in the same way. The most consistent or dominant responses became what the researchers call “primary labels，” which the researchers then used as a more precise dataset that captures more of the true range of opinions about potential toxic content.

Using AI Police to Recognize Toxic Speech on Social Media用AI警察甄别社媒有害言论

经典小说推荐

杂志订阅

友情链接