How the Intel Olympic Technology Group Developed a Smart Coaching SaaS Application Utilizing Pose Estimation Models – Part 1

In an innovative collaboration, the Intel Olympic Technology Group (OTG), a branch of Intel dedicated to advancing technology for Olympic athletes, teamed up with AWS Machine Learning Professional Services (MLPS) to create a cutting-edge smart coaching software as a service (SaaS) application. This application leverages computer vision (CV) techniques through pose estimation models, which are a type of machine learning (ML) model designed to identify key points on the human body, such as joints. These key points serve as the foundation for calculating essential biomechanical attributes (BMA) relevant to athletes, including speed, acceleration, and posture.

The Intel OTG team aims to introduce this technology to enhance the smart coaching experience. The BMA metrics derived from pose estimation can significantly support the guidance provided by coaches and assist in monitoring athletes’ development. Traditionally, capturing this data accurately necessitates specialized IoT sensors attached to athletes, which can be hard to obtain outside of elite performance centers. Moreover, these sensors can be cumbersome and excessively costly, with the setup of a motion sensor lab exceeding $100,000 when considering cameras, motion capture suits, and software licenses. In contrast, the proposed solution offers pose estimation capabilities at a fraction of the cost, allowing for video analysis using just a standard mobile phone. This ability to analyze video through simple capture methods represents a substantial advantage.

In the first part of this two-part series, we explore the design requirements and the process through which Intel OTG developed their solution on AWS with the assistance of MLPS. The second part will delve deeper into each phase of the architecture.

A Versatile Video Processing Platform

The Intel OTG team is concentrating on track and field movements, particularly sprinting, as their primary focus while also experimenting with other athletic movements during the 2021 Tokyo Olympics. They aimed to create a versatile platform that could accommodate a wide range of end users for video ingestion and analysis.

MLPS collaborated with Intel OTG to establish scalable processing pipelines, which execute ML CV model inference on athlete videos, utilizing AWS infrastructure and services to cater to three distinct groups of users. Each user group has unique platform requirements and interacts with the platform in different ways:

Developer – Engages with a Python SDK layer that facilitates application development, including job submission and processing as well as adjustments to inference processing compute cluster settings. These capabilities can be integrated into larger applications or user interfaces.
Application user – Submits videos and seeks to interact with a user-friendly application interface, which may be a command line interface (CLI) with predefined commands or a frontend coaching dashboard linked to an inference processing compute cluster via an API layer.
Independent Software Vendor (ISV) partner – Works with infrastructure as code (IaC) that packages the solution for deployment in their own environments, such as an AWS account. They desire customization and control over the underlying infrastructure.

Addressing Technical Design Requirements

After gaining insight into the business needs for delivering various capabilities to different user segments, the MLPS team collaborated with Intel OTG to develop a comprehensive overview of the technical design requirements necessary to achieve their goals. The solution architecture is illustrated in the following diagram.

The high-level design principles that shaped the final architecture include:

API, CLI, and SDK layers to accommodate the varying access needs of different user segments.
Configurable IaC, such as AWS CloudFormation templates, to allow for independent customization and deployment for different ISV users in various environments.
A maintainable architecture that employs microservices, while minimizing infrastructural overhead through serverless and managed AWS services.
The capability to optimize latency for pose estimation processing jobs by maximizing parallelization and resource utilization.
Detailed control of latency and throughput for different tiers of end users.
A portable runtime and inference environment that operates both within and outside of AWS.
A flexible and adaptable data model.

Ultimately, the MLPS team successfully met these requirements, as depicted in the high-level process flow. An API layer facilitated by Amazon API Gateway serves as a link between the interaction layers (CLI, SDK) and the backend processing architecture. They utilized the AWS Cloud Development Kit (AWS CDK) for swift code development and deployment, establishing a straightforward framework for future resource deployments. The workflow involves the following steps:

Upon receiving a video upload, AWS Step Functions initiates the workflow orchestration, managing the submission of processing jobs through a series of AWS Lambda functions that call serverless AWS microservices.
Videos are batched and sent to Amazon Kinesis Data Streams to enable parallel job processing through sharding.
Additional parallelization and throughput control are achieved via individual consumer Lambda functions, which are activated to process batches of video frames.
The compute engine for generating inferences with ML models is powered by an Amazon Elastic Kubernetes Service (Amazon EKS) cluster, offering flexibility for a portable runtime and inference environment, whether on AWS or elsewhere. A series of Kubernetes containers encapsulates the ML inference models and pipelines. An Amazon Aurora Serverless database permits a flexible data model that tracks users and submitted jobs. This separation of user groups in the database allows for the mapping of different tiers of users to their respective access levels of throughput and latency.
Logging is performed using Amazon Kinesis Data Firehose to capture data from services such as Lambda, which is then stored in Amazon Simple Storage Service (Amazon S3) buckets. For instance, each batch of processed frames from the submitter Lambda function is logged with timestamps, action names, and Lambda function response JSON, and saved to Amazon S3.

Introducing Innovative Computer Vision Capabilities

The primary objective of this application is to provide athletes and coaches with critical biomechanical insights into their movements, facilitating training and performance enhancement. A crucial consideration for the Intel OTG team is minimizing the obstacles to delivering this feedback. This means requiring only standard inputs, such as 2D video footage captured with a smartphone camera, without the need for specialized equipment. Such accessibility allows for input and feedback to occur conveniently on the field or in a training environment. For more inspiring insights on leadership and vulnerability, check out this article featuring Brene Brown quotes.

Stay tuned for Part 2, where we will explore the intricacies of the architecture in further detail. Additionally, for insights into health plan transparency, you can refer to this valuable resource from SHRM, which discusses how to know where your health care dollars go.

For an excellent resource on the Amazon employee onboarding process, visit this link.

How the Intel Olympic Technology Group Developed a Smart Coaching SaaS Application Utilizing Pose Estimation Models – Part 1

A Versatile Video Processing Platform

Addressing Technical Design Requirements

Introducing Innovative Computer Vision Capabilities

Related Topics:

Comments

Leave a Reply Cancel reply