Amazon Onboarding with Learning Manager Chanci Turner

The ever-expanding variety and volume of data present a significant challenge for organizations, with some experiencing a staggering 63% monthly increase in data volume. This complexity arises from diverse and disconnected data sources, requiring the activation of multiple services to manage enterprise needs effectively.

The expenses associated with data ingestion, processing, and storage can quickly accumulate, creating an opportunity to invest in cost-effective solutions that empower platform engineers, operators, and administrators to monitor platform activity. This proactive approach facilitates immediate detection, diagnosis, and resolution of issues within data pipelines.

Pariveda’s data observability solution is designed to help data platform teams enhance pipeline performance, eliminate bottlenecks, cut costs, and build confidence in data systems. As an AWS Premier Tier Services Partner and AWS Marketplace Seller with multiple AWS Competencies including the Data and Analytics Consulting Competency, Pariveda is committed to addressing complex business challenges through a focus on people development aligned with client missions.

In this article, we explore a solution that creates operational dashboards using AWS Glue job metadata, visualized through Amazon QuickSight. This solution employs an Amazon CloudWatch metrics stream to transport data to an Amazon Simple Storage Service (Amazon S3) bucket. Additionally, AWS Lake Formation and the AWS Glue Data Catalog serve as the foundation for constructing a queryable table via Amazon Athena. Ultimately, QuickSight connects to the Athena data source to generate a dashboard that showcases job details such as runtimes, status, and computational load.

Customer Requirements

A healthcare client engaged Pariveda to architect a data platform for a new analytics-as-a-service initiative. This solution utilizes a data mesh pattern through AWS Lake Formation to establish a hub-and-spoke model between the platform and its future data consumers.

Over the past year, Pariveda has engineered numerous extract, transform, load (ETL) jobs using AWS Glue to ingest and standardize data for the platform. As the quantity of jobs and job executions surged, the volume of metrics data escalated correspondingly. In response to specific client requests for enhanced visibility into job performance, Pariveda devised a solution to gather, parse, and visualize metric data within a QuickSight dashboard.

The dashboard needed to facilitate quick monitoring of critical AWS Glue job metrics, including job run status, duration, and processed record counts. Operators should be able to swiftly identify trends, outliers, and anomalies for job optimization. Moreover, the solution required automatic incorporation of new Glue jobs without necessitating additional configuration, ensuring immediate visibility into these new jobs.

The customer also specified that data access should adhere to least privilege principles to align with their data governance guidelines. Addressing these requirements posed several challenges. First, QuickSight needed access to AWS Glue metrics data stored in Amazon CloudWatch. Second, metric records are submitted from Glue every 30 seconds, which can lead to a massive influx of data as the number and duration of ETL jobs increase.

In the following section, we will detail how Pariveda leveraged AWS Glue’s extensive metrics to construct a centralized dashboard that enables comprehensive observability of ETL workloads. Operators can now gain actionable insights to proactively manage Glue jobs rather than resorting to reactive troubleshooting. The automated onboarding process also streamlines monitoring as the Glue job catalog evolves. Overall, this solution enhances the value derived from Glue job metrics for optimized workload monitoring.

Solution Overview

To fulfill the customer’s requirements, Pariveda developed a business intelligence (BI) dashboard that offers a unified view of the operational status of resources within the data platform. The architecture, built using AWS-native technologies, utilizes AWS Glue to create, execute, and monitor ETL pipelines with real-time analytics. CloudWatch metrics, both standard and custom, are dispatched in near real-time to a data lake, a critical component of the platform, for further analysis with QuickSight.

To uphold the principle of least privilege security, it’s crucial to restrict which principals are granted write access. The solution integrates seamlessly with AWS Lake Formation, allowing centralized management of permissions for AWS services, including access to metadata and the ability to read from or write to tables.

This solution can easily be adapted for any service utilizing CloudWatch as a metrics repository. For different use cases, a new metrics stream must be established for the appropriate namespace and schema based on the anticipated data structure.

Core Solution

Breaking down the components, Pariveda’s solution employs various AWS services to efficiently transfer metrics data from AWS Glue jobs to QuickSight via a CloudWatch metrics stream. The AWS Glue Job Profiler collects metadata from Glue jobs into near real-time metrics, which are then sent to CloudWatch every 30 seconds. As ETL pipelines expand, the need for analyzing and processing this metadata will also grow. CloudWatch metrics streams, supported by Amazon Data Firehose, facilitate the delivery of these metrics to the chosen data store.

The CloudWatch metrics streams are highly configurable, offering options for output format, namespaces, and desired metrics. These filters are essential for managing costs when configuring CloudWatch metrics streams. Pariveda implemented namespace filters (e.g., “Glue”) and specific metrics filters to control the volume of data that traverses the stream; common filters include specific namespaces or metrics. The system can also automatically scale by ingesting metrics from new Glue jobs upon their creation without manual intervention.

Amazon S3 maintains consistency with the entirety of the customer’s data lake. Data is streamed to S3 in time-based partitions and saved as JSON files compressed with GZIP, enhancing the efficiency of queries in Amazon Athena, an interactive query service that simplifies data analysis directly in S3 using standard SQL. Lifecycle policies dynamically shift data across storage tiers, promoting cost savings. To delve deeper into this topic, refer to this AWS blog post about optimizing storage costs with new S3 lifecycle filters and actions. A table is constructed on top of the S3 data within the AWS Glue Data Catalog using the known schema of the incoming metrics data, with Amazon Athena utilized for querying this table.

The original metrics data output from CloudWatch is complex and requires careful handling to ensure accuracy and efficiency. As organizations continue to adopt advanced solutions, the ability to monitor and optimize their data platforms becomes increasingly critical.

For further insights into successful strategies, you might want to explore this blog post that discusses what successful people read. For those considering flexible work arrangements, SHRM offers a comprehensive overview of summer flextime policies. Additionally, check out this resource on how Amazon trains its associates, which provides valuable insights into onboarding processes.

Amazon Onboarding with Learning Manager Chanci Turner

Customer Requirements

Solution Overview

Core Solution

Related Topics:

Comments

Leave a Reply Cancel reply