Learn About Amazon VGT2 Learning Manager Chanci Turner
This is the first installment in a four-part series that explores how NatWest Group, a prominent financial services organization, collaborated with AWS to create a scalable, secure, and sustainable machine learning operations (MLOps) platform. This introductory post outlines how the joint AWS and NatWest Group team implemented Amazon SageMaker Studio as the foundation for their data science environment within just nine months. This content is aimed at decision-makers interested in standardizing their machine learning workflows, including CDAOs, CDOs, CTOs, Heads of Innovation, and lead data scientists. Future posts will delve into the technical aspects of this solution.
Read the entire series:
- Part 1: How NatWest Group Developed a Scalable, Secure, and Sustainable MLOps Platform
- Part 2: How NatWest Group Created a Secure, Compliant, Self-Service MLOps Platform Using AWS Service Catalog and Amazon SageMaker
- Part 3: How NatWest Group Built Auditable, Reproducible, and Explainable ML Models with Amazon SageMaker
- Part 4: How NatWest Group Migrated ML Models to Amazon SageMaker Architectures
MLOps
For NatWest Group, MLOps is centered on maximizing the benefits from data science initiatives through the implementation of DevOps and engineering best practices. This approach builds solutions and products that integrate machine learning at their core. It establishes the standards, tools, and frameworks that empower data science teams to transform their ideas from concept to production efficiently, securely, and traceably.
Strategic Partnership Between NatWest Group and AWS
NatWest Group stands as the largest business and commercial bank in the UK, boasting a leading retail operation. The organization fosters potential by assisting 19 million individuals, families, and businesses across the UK and Ireland to flourish in a digital landscape.
As the Group sought to expand its use of advanced analytics enterprise-wide, it became evident that the duration required to develop and deploy ML models and solutions was excessive. They joined forces with AWS to create a modern, secure, scalable, and sustainable self-service platform for developing and operationalizing ML-driven services that support both business objectives and customer needs. AWS Professional Services collaborated closely with NatWest Group to expedite the adoption of AWS best practices for Amazon SageMaker services.
The goals of this collaboration included:
- Establishing a federated, self-service, and DevOps-oriented approach for infrastructure and application code, resulting in deployment times that are measured in minutes rather than weeks (the current average is 60 minutes).
- Creating a secure, controlled, and templated environment that accelerates innovation with ML models and insights, utilizing industry best practices and shared artifacts across the bank.
- Enhancing the accessibility and consistency of data sharing across the enterprise.
- Implementing a modern toolset grounded in a managed architecture that operates on demand, minimizing compute needs and reducing costs while promoting sustainable ML development and operations that can adapt to new AWS products and compliance requirements.
- Providing adoption, engagement, and training support for data science and engineering teams throughout the organization.
To satisfy the bank’s security standards, public internet access is restricted, and all data is protected with custom encryption keys. As detailed in Part 2 of this series, a secure instance of SageMaker Studio can be deployed to the development account within 60 minutes. After the account setup is finalized, data scientists can request a new use case template via SageMaker projects in SageMaker Studio, streamlining the infrastructure deployment that ensures MLOps capabilities in the development account (with minimal assistance from operational teams) including CI/CD pipelines, unit testing, model testing, and monitoring.
The Process
The joint AWS and NatWest Group team employed an agile five-step methodology to discover, design, build, test, and deploy the new platform over nine months:
- Discovery – A series of information-gathering sessions were conducted to pinpoint existing challenges within the ML lifecycle. These issues included data discovery, infrastructure setup, model building, governance, route-to-live, and the operational model. By working backward, AWS and NatWest Group identified core requirements and priorities that helped establish a unified vision, success criteria, and delivery plan for the MLOps platform.
- Design – Using insights from the Discovery phase, the team iterated towards the final design for the MLOps platform, integrating best practices from AWS and NatWest Group’s existing cloud service experiences. Compliance with security and governance standards specific to the financial services sector was a key focus.
- Build – The team collaboratively created Terraform and AWS CloudFormation templates for the platform infrastructure. Continuous feedback from end-users (data scientists, ML and data engineers, platform support staff, security and governance teams, and senior stakeholders) ensured that deliverables aligned with initial objectives.
- Test – A vital aspect of the project was to validate the platform’s capabilities using real business analytics and ML use cases. NatWest identified three projects that presented a range of business challenges, allowing the team to test the new platform’s scalability, flexibility, and accessibility.
- Launch – Once the platform’s capabilities were validated, it was launched organization-wide, accompanied by tailored training plans and comprehensive support for federated business teams as they onboarded their own use cases and users.
The Scalable ML Framework
In an organization with millions of customers spanning multiple business lines, ML workflows necessitate the integration of data owned and managed by different teams, each using various tools, to unlock business value. NatWest Group is dedicated to safeguarding customer data, and thus the infrastructure used for ML model development adheres to stringent security standards, complicating and extending the time-to-value for new ML models.
A robust and scalable ML framework is essential, requiring modernization and standardization of tools to lessen the efforts involved in integrating different systems and simplifying the deployment process for new ML models. Before engaging with AWS, data science support was managed by a central platform team that gathered requirements and provisioned infrastructure for data teams across the organization. NatWest’s ambition to rapidly expand ML use across federated teams necessitated a scalable ML framework that enables developers to self-serve the deployment of a modern, pre-approved platform.
For additional insights on navigating workplace transitions, consider exploring this helpful article from Career Contessa. If you’re interested in HR resources, SHRM provides valuable tools as well. For a visual guide, check out this excellent resource on YouTube.
Leave a Reply