Resources ⌇ This is how we do it

A summary of ‘Content Moderation in a New Era for AI and Automation’ (2024) by the Oversight Board

This resource was published on

A summary of a report by the Oversight Board looking at the increasing use of artificial intelligence (AI) in automated decision-making (ADM) to moderate content on social media platforms.

TL;DR

The Oversight Board’s AI and content moderation report raises concerns that artificial intelligence (AI) doing content moderation on social media platforms can be a blunt instrument in cases of over-enforcement – such as where strict application of nudity rules removes breast cancer awareness content because the system cannot understand context – while turning a blind eye to some harmful content. For example, under-enforcement can occur where hate speech is coded to evade detection.

Estimated reading time: 5 minutes

This is a summary of Content Moderation in a New Era for AI and Automation, published by the Oversight Board in September 2024.


Summary

Artificial Intelligence (AI) is having a significant impact on social media platforms; both in terms of how they are incorporated into company’s platforms, and in the creation and sharing of content by users on them. For platforms, they are rolling out new AI-powered functionality that allows users to generate new content or modify content they have added to the platform while they are also upping the amount of moderation that is performed by AI. More moderation is done by machines, not human beings. This is set to accelerate. In these cases AI is enforcing platform’s content policies and deciding what content is “left up, taken down or sent for human review” [p 2].

Outside of the platforms, users have access to other high quality content generation and manipulation tools enhanced by AI. The ease at which deceivingly realistic content can be generated – both in terms of quality and quantity – poses risks when that content is used for nefarious purposes. Together, this has profound implications, “both for the decisions that companies make to design, develop and incorporate these technologies into their products, as well as the content policies enforced against higher quality user-generated content” [p 2].

The report looks specifically at three areas of concern: image-based sexual abuse, political deepfakes and the inequitable application of moderation.

Issues with non-human moderation

Because biases are baked into AI – both in the training data and the system design – automated moderation amplifies human error, reinforces existing societal biases, leans to one side of ideological divides and reduces opportunities for human oversight. Even though moderation is occurring on a massive scale there are opportunities for platforms to minimise risks associated with AI in their products. The Oversight Board calls for platforms to embed freedom of expression and human rights considerations in their AI tools early and by design.

The Report also identifies that AI moderation can be a blunt enforcement instrument. In particular, it can result in over- and under-enforcement. One way this can happen is in relation to the context of content. Because automated moderation systems cannot understand context, such as cultural or humorous nuances, content can get flagged when it shouldn’t. The caution, “enforcement that relies solely on automation, when using technologies with a limited ability to understand context, can lead to over-enforcement that disproportionately interferes with freedom of expression” [p 11]. The Oversight Board uses an example of breast cancer awareness content on Instagram to describe an instance of over-enforcement by AI. The Board notes that, “Despite numerous signals indicating the harmless and informative nature of the post [such as the words “Breast Cancer” in Portuguese at the top of the image], it was detected and removed by a machine learning classifier trained to identify nudity in photos” [p 11]. Further, penalties associated with over-enforcement by automation can see relevant accounts sanctioned or their content demoted. In worst-case scenarios, “violations can pile up and disable accounts” [pp 11–12].

Conversely, some content, such as ‘coding’ content with specific phrases, misspell words or emojis can evade algorithmic detection and enforcement [p 12]. Importantly, “when hate speech is coded to evade detection from automated systems, [this under-enforcement] can contribute to an unsafe online environment” [p 12].

Labelling harmful content

Acknowledging that “more traditional uses of AI, such as ranking algorithms, [also] contribute to political polarization” [p 8], the Report looks at political deepfake content and its potential to influence the outcome of political processes such as elections. In such cases, the Board suggests platforms could reduce the potential to mislead users and thereby reduce the potential harm of such content, by attaching a label to the content. “Labels empower people with context, allowing them to come to their own conclusions”, the Report says, “This is also a less intrusive approach than removals, so more content can be left up, allowing social media companies to protect users’ free expression” [p 9].

When more than labels are needed

Using deepfake intimate imagery as an example, the Oversight Board identifies that there are some situations in which more than labelling of AI content may be needed to address potential harms. AI-generated deepfake intimate imagery compounds the issue of image-based sexual abuse, especially where this is gender-based harassment (because women and girls are mainly the targets of revenge porn and similar actions). “For little or no cost, any individual with an internet connection and a photo of someone can produce sexualized imagery of that person, which can then be spread without their consent or knowledge” [p 6].

Simply “labeling deepfake intimate imagery is not sufficient because the harms stem from the sharing and viewing of these images, not solely from misleading people about their authenticity” [p 7]. Rather, the Oversight Board recommends platforms “focus their policies on identifying lack of consent among those targeted by such content” and suggests that “AI generation or manipulation should be considered as a signal that such images could be non-consensual” [p 7].

Inequity in moderation actions

Content moderation should be fair and equitable. This is difficult given that “content moderation resources are not always equitably distributed”,  less fact-checking is done for languages other than English and the issues of understanding the context of content is more difficult for non-English languages.


Resource metadata

Resource clusters:

Resource tags: