Content Moderation using SurgeAI's Toxicity Dataset

mugshot George Mathew
Sep 2022
Automated content moderation

What can you do when your online community outgrows your content moderation capacity?

If you’re like most community managers the first time they’re confronted by spam, scams and hate speech in their user-generated content (UGC), you write a comprehensive list of naughty words and then use the list to automatically snag potentially harmful content in real time. This is much more efficient than tasking human moderators with trawling through every comment manually in order to moderate content, but, unlike human review, it’s a blunt instrument. It produces false positives by flagging innocent content that uses one of your banned words in a benign way that you didn’t anticipate. And it produces false negatives by letting through variant spellings of abusive terms, as well as all the latest vernacular innovations that are so numerous and diverse that it would be a full-time job just to keep up with them.

Can automation provide a more efficient way to do content moderation?

Yes! You can use machine learning (ML) / artificial intelligence (AI) to largely automate your content moderation process. This is one of the most popular use cases of AI technology. If you have a ML team, you can build your own algorithm based on example cases of appropriate and inappropriate content from your UGC. Then, the AI will classify a new set of previously unseen cases based on the examples that it trained on. Next, you inspect the model’s output and correct the cases that it got wrong. These moderation decisions serve to optimize the model’s fit to your data, making it progressively better at classifying new cases by itself. By using AI moderation as a first step, you will substantially reduce the amount of content that requires human review, shifting most of the decision-making burden from your human moderators to your algorithm, lightening workflows, and reducing exposure to inappropriate or disturbing content.

Great! But I don’t have a machine learning team…

No problem. You can use a ML platform like Nyckel, whose user-friendly API makes AI moderation accessible to non-experts, allowing you to train an algorithm on your own dataset in a matter of minutes. You just need to scrape some examples taken from different types of content generated by your online community, import them into Nyckel’s API, and then sort the examples into toxic and non-toxic.

SurgeAI’s Toxicity Dataset

SurgeAI publishes a toxicity dataset gathered from thousands of online social media comments. The dataset contains 500 examples of toxic comments and 500 examples of non-toxic ones. Let’s look at how you can train a Nyckel text classification function using this data. Play with the clickable demo below:

Create a Content Moderation Function

A Word of Caution

All datasets encode the subjectivity of the humans that created them and this is especially true for content moderation. The demo above shows you how to train a function using SurgeAI’s dataset, and using this dataset is a quick way to get started with content moderation. However, before you automate your content moderation, you might want to think quite carefully about the values that matter to your community and what it would consider toxic. The best dataset to use to train a content moderation model is one that’s generated by your own community and is specific to its values.

Some Examples from the Dataset

Here are a few cases of political UGC taken from social media platforms that illustrate the subjective nature of content moderation.

Right. All of them did the same thing but a Thousand times worse than any conservatives have done. I mean the democrat party act just like a bunch of thugs that are really 🤡. Toxic
Democrats are terrorists Toxic
Toxicity goes every way in politics. When I supported yang, this sub gave me crap. When I switched back to bernie, yang gang people gave me crap. Centrists have always given me crap. Toxicity exists in every group. It’s only emphasized when bernie people do it in order to push a certain victim narrative. Not Toxic
You guys think you got it bad? I voted for Trump 😂 I’ll prolly get downvoted right here and now just for saying it again lol, shit I say something positive about Trump and I’ll have dudes commenting on my posts talking shit like 5 days after lmao Not Toxic

Do you agree with these decisions? Do you think your online community would agree? What values have human moderators encoded in their assessments of the social media dataset? Is a binary classification sufficiently granular to classify potentially harmful content using machine learning? Or do you need a multidimensional machine learning model to capture nuances that your community regards as important for content moderation?

Whatever training data you use, it will be created by human beings. And we know what they’re like. They make mistakes. They believe weird things. They have biases and prejudices. This means that if we want to use AI systems to build in more automation to our UGC content moderation, our human moderators need to remain close to the real-world values they want to see in their online communities. In other words, to fully leverage artificial intelligence in our user-generated content moderation tools, we need to optimize our algorithms via mindful human moderation.