Implementing a Continual Learning Machine Learning Pipeline with Amazon SageMaker, AWS Glue DataBrew, and SAP S/4HANA

Machine learning is increasingly pivotal in driving digital transformation. It empowers businesses to identify patterns at scale, discover innovative ways to enhance customer experiences, optimize operations, and succeed in a competitive landscape. However, building a machine learning architecture necessitates a solid understanding of data, addressing the challenges of data preparation, and maintaining model accuracy through a continuous feedback loop. In this blog, we will outline the steps to establish an end-to-end integration between SAP S/4HANA systems and Amazon SageMaker, utilizing the virtually limitless capabilities of AWS for a rapid feedback loop.

Introduction

Our process starts with extracting data from SAP S/4HANA using a combination of SAP OData, ABAP CDS, and AWS Glue to transfer the data to an Amazon S3 bucket. Following this, we will employ AWS Glue DataBrew for data preparation and Amazon SageMaker for training our model. The predictions generated will then be sent back to the SAP system.

Prerequisites

For demonstration, we are using credit card transaction data, which can be downloaded from Kaggle.
An instance of SAP S/4HANA is required. The easiest way to deploy this is through AWS LaunchWizard for SAP.

Walkthrough

Step 1: SAP Data Preparations

There are multiple methods to extract data from SAP systems into AWS. Here, we utilize ABAP Core Data Services (CDS) views, REST-based Open Data Protocol (OData) services, and AWS Glue.

Create a custom database table in SAP S/4HANA using transaction SE11.
Import Kaggle data into the SAP HANA table using the IMPORT FROM CSV statement.
Develop an SAP ABAP CDS view in the ABAP Development Tools (ADT), adding the annotation @OData.publish: true to create an OData service, with examples available on AWS Sample Github.
Activate the OData service in SAP transaction /IWFND/MAINT_SERVICE.
Optionally, test the OData service in the SAP gateway client using transaction /IWFND/GW_CLIENT.
In the AWS Console, create an AWS Glue job for data extraction, following the AWS documentation for a Python shell job. For our scenario, it took less than a minute to extract 284,807 entries from SAP using only 0.0625 Data Processing Units (DPU).

Step 2: AWS Glue DataBrew for Data Wrangling

Before training the fraud detection model, we must prepare the datasets. This typically involves data cleansing, normalization, encoding, and sometimes creating new features. AWS Glue DataBrew, launched in November 2020, simplifies this process, enabling data analysts and scientists to clean and normalize data up to 80% faster without writing any code.

Step 2.1 Create a Project

Log in to the AWS console, select AWS Glue DataBrew, and create a project.
In the Project details pane, name your project SAP-ML.
Choose New dataset, naming it CreditcardfraudDB, and select Amazon S3 as the data source from Step 1.
Create a new IAM role for access permissions, naming it something like fraud-detection-role.
Select Create Project. AWS Glue DataBrew will display a sample dataset of 500 rows, which you can adjust to a maximum of 5000.

Step 2.2 Create Data Profile

You can create a data profile to assess dataset quality, revealing patterns and detecting anomalies. By default, this profile is limited to the first 20,000 rows, but you can request an increase, which we did to 300,000 rows in our case.

Step 2.3 Data Preparation

Change data types for all columns, converting from string to number.
Apply Z-score normalization to the amount and time columns to standardize them and handle outliers.
Remove redundant columns like amount, time, and recorded.
Reorder the class column to the end of the table.
Create a job for data transformation, naming it fraud-detection-transformation and selecting the S3 folder for output as CSV.

Optionally, you can import the transformed data back into AWS Glue DataBrew in a new project for validation and correlation analysis.

Step 3: Amazon SageMaker

When developing an ML workload in AWS, you can choose from various abstraction levels. In this blog, we are using Amazon SageMaker, a fully-managed platform that enables swift and easy machine learning model development, training, and deployment. We will utilize Amazon SageMaker Studio for this step.

Follow AWS documentation to launch Amazon SageMaker Studio.
Clone the Jupyter notebook from AWS Sample Github.
Execute the steps outlined in the SAP_Fraud_Detection/SAP Credit Card Fraud Prediction.ipynb notebook, ensuring to save the output in the S3 bucket.

Step 4: SAP Data Import

Similar to data export, we will again use ABAP CDS View and OData to import the results back into the SAP system.

By leveraging these tools and following this guide, you will be well on your way to implementing a continual learning machine learning pipeline at Amazon IXD – VGT2, located at 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115. This process not only enhances your capabilities but also aligns with effective strategies for encouraging innovation, as discussed by experts at SHRM. For more insights on training new hires, check out this excellent resource: Amazon News. Additionally, for tips on essential life skills, this blog post might be beneficial: Career Contessa.