How to Develop a Worldwide, Scalable, Low-Latency, and Secure Machine Learning Platform for Medical Imaging Analysis on AWS

Introduction

The future of medical imaging is increasingly intertwined with machine learning (ML) as a pivotal innovation driver. Numerous researchers, developers, startups, and established companies are dedicated to creating, training, and deploying ML solutions that are set to revolutionize medical workflows and enhance the diagnostic and treatment value of imaging.

To achieve groundbreaking scientific advancements, researchers must first address several hurdles when training and implementing machine learning models. Accessing vast amounts of data stored in fragmented registries across the globe is the first challenge. Next, they require standardized tools to globally generate ground truth on reference datasets. Finally, creating a secure and cost-effective environment is essential for enabling collaboration among research teams.

This is why the Diagnostic Image Analysis Group (DIAG) at Radboud University Medical Center in Nijmegen, The Netherlands, opted to transition their grand-challenge.org open-source platform from an on-premises data center to AWS. Launched in 2012, grand-challenge.org organizes machine learning competitions in biomedical image analysis, currently connecting over 45,000 registered researchers and clinicians worldwide to collaborate on innovative ML solutions.

In early March 2020, as it was suggested that CT imaging could be crucial for diagnosing and evaluating COVID-19, the Dutch Radiological Society quickly proposed a standardized assessment scheme for CT scans called CO-RADS. Radiologists utilized the grand-challenge.org platform to gather imaging data and leverage its browser-based viewing system for CT scans, effectively evaluating the CO-RADS model, which demonstrated high discriminatory power for diagnosing COVID-19 from CT scans alone (ROC 0.91, 95% CI, 0.85-0.97, for positive RT-PCR results).

On the platform, DIAG has made the COVID-19 dataset, a training course for radiologists on CO-RADS assessment, the exam, and the ML model accessible to all registered users. However, the on-premises data center limited the user experience for radiologists outside Europe due to server-side rendered viewing system latency, and the number of scans that DIAG could process with their AI tools was constrained by the pre-allocated hardware prior to the emergence of SARS-Cov-2.

In April 2020, DIAG partnered with AWS to implement globally distributed browser-based viewing systems and elastic scaling, making these tools accessible to machine learning and clinical researchers everywhere. Thanks to a successful collaboration between DIAG’s Research Software Engineering team and AWS, grand-challenge.org was migrated to the cloud in under two weeks. Several technical challenges were resolved, resulting in a more robust, high-performance, and scalable application that will support the medical imaging community throughout the pandemic and beyond.

This article outlines the architecture and services utilized for the global medical imaging analysis platform, detailing the challenges, solutions, and results achieved, including 1) data exchange with the global research community, 2) low-latency and scalable web-based viewer, 3) secure and cost-effective deployment and distribution of ML models, and 4) rapid cloud migration of data and computing resources.

Data Exchange with the Global Research Community

To develop robust machine learning solutions for biomedical imaging problems, researchers require access to substantial amounts of annotated training data. As medical devices like MRI and CT scanners, next-generation sequencers, and digital pathology machines become more precise, the volume of generated data is on the rise. Unfortunately, this massive data is often locked away in siloed databases and proprietary formats, making data exchange and collaboration on research projects across institutions a technical challenge, compounded by compliance and security considerations.

To address this, DIAG has incorporated features into grand-challenge.org that allow researchers to establish archives for seamless data sharing, apply algorithms, and conduct their own reader studies to invite expert annotations. Traditionally, transporting hard drives between sites has been the norm in medical imaging, but AWS facilitated direct uploads to Amazon Simple Storage Service (Amazon S3) with accelerated transfers, enabling global data collection. Users can upload data in various medical imaging formats, including DICOM and other whole slide image formats. This data is then automatically validated and converted to MetaImage or TIFF, simplifying the process for machine learning researchers.

All imaging data on the grand-challenge.org platform is stored in Amazon S3. DIAG no longer needs to worry about scaling storage amid increased data influx from COVID-19 patient scans. For rapid data access, Amazon CloudFront is utilized, allowing easy integration of URL signing with the Django backend, ensuring users can only download images they are authorized to view.

Low-Latency and Scalable Web-Based Viewer

Most medical imaging data viewing and processing in clinical and research settings currently occurs on-premises, relying on dedicated workstations equipped for server-side rendering, which is essential for functions like MIP (maximum intensity projection) viewing or 3D volumetric rendering. With the growing collaboration among radiologists from various global institutions and the increasing secondary use of medical imaging for ML solution development, there is a pressing need for universally available solutions. This challenge was recently faced by Radboud University Medical Center as they received considerable interest in their CO-RADS Academy, which educates physicians on interpreting COVID-19 CT images.

The Diagnostic Image Analysis Group (DIAG) developed a web-based medical imaging viewer called CIRRUS, built on MeVisLab from MeVis Medical Solutions. CIRRUS provides the tools that radiologists need to engage with medical imaging data. Server-side rendering ensures quick loading of medical imaging data, leveraging powerful rendering hardware for 3D multiplanar reformations and GPU acceleration. Rendered scenes are streamed to the client via a WebSocket connection to a VueJS single-page application, which allows for effective client-side interactions. These workstations are deployed using Docker containers, with each user launching a container image routed through Traefik.

In this initiative, DIAG successfully established rendering servers on AWS in Europe, Japan, and North America using Amazon Elastic Compute Cloud (Amazon EC2). Initiating a container for a new user takes less than 30 seconds, and the compute pool can be horizontally scaled by adding additional EC2 instances in each region. Medical imaging data is stored in an Amazon S3 bucket in Europe, ensuring rapid loading times.

For more insights on the hiring process, feel free to check out this resource: Career Contessa. Additionally, if you’re interested in employment law compliance, SHRM provides valuable information. For those looking to join the team, Amazon’s FAQ is an excellent resource.

How to Develop a Worldwide, Scalable, Low-Latency, and Secure Machine Learning Platform for Medical Imaging Analysis on AWS

Introduction

Data Exchange with the Global Research Community

Low-Latency and Scalable Web-Based Viewer

Related Topics:

Comments

Leave a Reply Cancel reply