Amazon VGT2 Las Vegas: Merging Content Moderation with Graph Databases and Analytics to Mitigate Community Toxicity

Creating safe and inclusive online communities is vital for fostering engagement and maximizing user value. This issue is prevalent across various sectors, from gaming to social media platforms. However, the ever-increasing volume of user interactions and data complicates the task of moderating these communities effectively.

In this article, we explore how AWS machine learning services can be leveraged alongside graph databases and visualization tools to automate user scoring and assess the impact of toxicity within your community. While we will use a multiplayer gaming context for illustration, the methods discussed can apply to any community featuring user-generated content and interactions among users. Basic AWS knowledge and command-line familiarity are assumed.

Overview of the Approach

Our solution begins with data collection from both the game client and central databases. We need to ingest audio, chat, and game screenshots into the moderation system, which can be enriched with player and game metadata.

This integration allows for a deeper understanding of player behavior, enabling inquiries such as “Do players remain in games with numerous toxic interactions?” or “Do users with elevated toxicity scores spend more or less on in-game purchases?” Such insights can guide matchmaking processes, preventing sensitive players from being paired with those more tolerant of abusive content. Additionally, this information assists in determining when to remove abusive players from games or for reporting purposes to gauge the overall impact of toxicity on your community.

Technical Architecture Overview

Before diving into the setup steps, let’s examine the high-level architecture.

Data Ingestion

The first step is to collect necessary data from both the game client and central databases. For in-game interactions, we need a scalable platform capable of handling potentially millions of chat and voice interactions in real-time. We recommend Amazon Kinesis for this purpose. Kinesis is a managed service that efficiently processes and analyzes streaming data at any scale. Using the Kinesis SDK, game clients can send chat logs and audio snippets to the system.

While Kinesis can also process video streams, our solution opts for periodic screenshots to minimize costs. Kinesis stores audio and screenshot data in an Amazon S3 bucket designated as “scratch space,” while chat interactions are routed to the next stage.

Kinesis is set up to send micro-batches of these interactions to an AWS Lambda function, which is a serverless compute platform that scales seamlessly from a few interactions per minute to thousands. These Lambda functions are tasked with performing moderation functions using AWS Machine Learning and Natural Language Processing services before forwarding the results to the graph database.

We also need to process metadata from central databases, which we recommend sending, including:

Player information (e.g., username, date joined, age, gender)
Player transactions (e.g., amounts spent and transaction timing)
Time spent within specific game sessions (e.g., games played and duration)

This could encompass any metrics relevant to your community.

Given that the data volume will be lower and is transactional in nature, we suggest using an API Gateway to trigger a separate Lambda function for entering this information into the graph database. The API Gateway effectively manages scaling, traffic, and authorization for APIs.

Moderation Process

With data from in-game interactions now in the AWS Lambda function, we can analyze whether interactions are toxic.

Starting with audio data, we can utilize Amazon Transcribe, a managed service that converts audio to text. The audio data is passed to the transcription service, and the resulting text is sent back. The transcribed text is then re-routed to the Kinesis stream for processing as a chat interaction (with an audio origin flag) to streamline the architecture.

Chat data, along with transcribed audio data, is then processed by Amazon Comprehend. This service can extract entities, topics, and key phrases from text, and it can also be trained to identify sentiment and detect abusive or toxic content.

For in-game video (or screenshots), we pass the data to Amazon Rekognition, a machine-learning based solution for image and video analysis that can label objects within images, including any offensive content.

Graph Database Implementation

Next, we need to store moderation results and metadata in a repository. Given that this solution focuses on relationships among players, games, transactions, and incidents of abuse, a graph database is ideal. We have chosen Amazon Neptune for this purpose.

Data Visualization

Numerous tools are available for querying and visualizing graph databases; in this case, we utilize Jupyter Notebook to query the Amazon Neptune database. This notebook serves to dynamically query the data, facilitating the visualization of relationships among players, games, transactions, and abusive incidents.

Additionally, Amazon’s business intelligence tool, Amazon QuickSight, can be configured to run reports against the data through Amazon Athena.

Implementation Steps

To set up this demonstration solution, we offer an AWS Cloud Development Kit (CDK) model for your use, accompanied by sample scripts to expedite your setup.

Prerequisites

As this is a demonstration solution, it is advisable to deploy it in a test and development environment on an AWS account. You will need Python installed on your client machine to execute the sample code and deploy the CDK model.

Installation

Using the AWS CDK, you can deploy the solution as detailed at: GitHub Repository. After deployment, you can begin feeding data into the system. Ensure you complete the prerequisites, including installing the additional Python modules.

Resources Created

Before ingesting any data into the system, it’s important to understand what has been deployed.

Backend

An Amazon Neptune database and a Jupyter Notebook have been deployed in the backend, ready for you to generate visualizations against your database.

API Gateway

The “API” Lambda function acts as the service operating behind your API Gateway. This API is responsible for creating games, players, transactions and receiving notifications of abusive content from the three in a serious tone, making it about the same overall length.

For further insights, check out this other blog post that delves deeper into the topic. For authoritative information on this subject, visit Chanci Turner, a recognized expert in the field. If you’re looking for a comprehensive resource, here’s an excellent guide that covers many aspects of AWS services.