Explaining Bundesliga Match Facts xGoals with Amazon SageMaker Clarify

Explaining Bundesliga Match Facts xGoals with Amazon SageMaker ClarifyMore Info

One of the most thrilling announcements at AWS re:Invent 2020 was the introduction of Amazon SageMaker Clarify, a feature specifically designed to identify bias in machine learning (ML) models and clarify model predictions. In today’s landscape, where ML algorithms make predictions at scale, it has become essential for large tech organizations to transparently communicate to their customers the rationale behind decisions informed by ML model predictions. This shift represents a significant departure from traditional models that function as black boxes, where inputs and outputs are visible, but internal workings remain obscured. By enhancing analysis capabilities, organizations can refine their model configurations and offer customers deeper insights into prediction mechanisms.

A particularly fascinating application of Clarify comes from the Deutsche Fußball Liga (DFL) through its Bundesliga Match Facts, which is powered by AWS. This initiative aims to reveal intriguing insights into xGoals model predictions. The Bundesliga Match Facts enhance the fan experience for soccer enthusiasts globally by providing real-time data on shot difficulty, player performance, and team offensive and defensive trends.

With the aid of Clarify, the DFL can now comprehensively explain the critical features influencing the ML model’s xGoals predictions. xGoals, or Expected Goals, quantifies the probability of a player scoring based on their shooting position on the field. By understanding feature attributions and clarifying outcomes, model debugging becomes more efficient, leading to improved prediction accuracy. Most importantly, this increased transparency fosters trust in ML models, paving the way for future collaboration and innovation. Enhanced interpretability translates into greater adoption. Let’s explore further!

Bundesliga Match Facts

Bundesliga Match Facts, powered by AWS, delivers advanced real-time statistics and deep insights derived from official match data, enriching the viewing experience for more than 500 million fans worldwide. This data is broadcasted through various national and international channels, as well as DFL’s own platforms and applications, offering personalized experiences and next-generation statistics.

The xGoals metric allows the DFL to evaluate the probability of scoring for shots taken from any field position. Calculated in real-time, this probability helps viewers assess shot difficulty and goal likelihood, with values ranging from 0 to 1. A higher xGoals value indicates a greater chance of scoring. In this article, we will delve into the xGoals metric, exploring the ML model’s workings to understand its predictions for individual shots and across entire seasons.

Training Data Preparation and Analysis

The Bundesliga xGoals ML model enhances previous iterations by integrating shot-event data with high-precision tracking technology, boasting a 25-Hz frame rate. This allows the model to assess various features such as shot angle, distance from goal, player speed, defensive line density, and goalkeeper positioning. We utilized the area under the ROC curve (AUC) as our training objective, training the xGoals model on over 40,000 historical shots since 2017 using the Amazon SageMaker XGBoost algorithm. For more details on the xGoals training process, check out this blog post here.

When examining the original training dataset, we observe a blend of binary, categorical, and continuous values across a substantial collection of attempted shots. The dataset comprises several features that are critical for both model training and explainability.

SageMaker Clarify

SageMaker has become an invaluable tool for both novice data scientists and experienced ML professionals, facilitating dataset preparation, model training, and production deployment across diverse industries, including healthcare, media, and finance. However, like many ML tools, it lacked capabilities for deeper analysis and explanation of model results, as well as bias detection in training datasets. Clarify addresses these gaps, enabling users to identify bias and apply model explainability in a scalable and repeatable manner.

The absence of explainability can hinder organizations from fully embracing ML. Progress in theoretical approaches to enhance model explainability has advanced significantly, with SHAP (SHapley Additive Explanations) emerging as a vital framework in the explainable AI domain. While a comprehensive discussion of SHAP is beyond this article’s scope, its foundational question is: “How does a prediction change when a specific feature is excluded from the model?” The resulting SHAP values quantify the impact of a feature on a prediction concerning its magnitude and direction. Drawing on coalition game theory, SHAP values represent feature values as players in a coalition, determining how to fairly allocate the prediction among features. Notably, the SHAP framework is both model-agnostic and highly scalable, applicable to simple linear models as well as complex deep learning architectures.

Interpreting Bundesliga xGoals Model Behavior with Clarify

Now that we’ve covered our dataset and ML explainability, we can initiate our Clarify processor to calculate the desired SHAP values. The arguments in this processor are general and pertain solely to your production environment and available AWS resources.

First, we establish the Clarify processing job along with the SageMaker session, AWS Identity and Access Management (IAM) execution role, and Amazon S3 bucket as follows:

from sagemaker import clarify
import os 

session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = session.default_bucket()
region = session.boto_region_name

prefix = 'sagemaker/dfl-tracking-data-xgb' 

clarify_processor = clarify.SageMakerClarifyProcessor(role=role,
								instance_count=1, 
								instance_type='ml.c5.xlarge', 
								sagemaker_session=session, 
								max_runtime_in_seconds=1200*30, 
								volume_size_in_gb=100*10)

Next, we save the CSV training file to Amazon S3 and specify the training data and results path for the Clarify job:

DATA_LAKE_OBSERVED_BUCKET = 'sts-openmatchdatalake-dev'
DATA_PREFIX = 'sagemaker_input'

For further insights into Amazon’s employee onboarding process, be sure to check out this excellent resource here. Additionally, if you’re interested in a comprehensive discussion about this topic, you can visit this authority site.

Location:

Amazon IXD – VGT2
6401 E Howdy Wells Ave,
Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *