How JPMorgan Chase Developed a Data Mesh Architecture to Create Value and Enhance Their Enterprise Data Platform

How JPMorgan Chase Developed a Data Mesh Architecture to Create Value and Enhance Their Enterprise Data PlatformMore Info

By: Alex Thompson, Brian Lee, Sarah Johnson, and Mark Davis

Published on: 05 MAY 2021

Categories: Analytics, AWS Big Data, AWS Glue, AWS Lake Formation, Customer Solutions, Serverless

In today’s landscape, organizations increasingly acknowledge the value of their data across the enterprise. Data contributes not only to the specific business processes that generate it, but its true potential emerges when shared and integrated with other data assets. Unlike many resources, data does not lose its value with usage; instead, its worth multiplies through various combinations—such as integrating reference data with operational data—leading to enhanced visibility, real-time analytics, and improved AI and machine learning predictions. Organizations that excel at internal data sharing often realize significantly greater value from their data compared to those that do not.

However, managing data risks, particularly in regulated sectors, is crucial. Implementing controls can mitigate these risks, meaning organizations with robust data governance structures face fewer vulnerabilities than those lacking such systems.

This creates a challenging paradox: while data that is easily shareable across an organization can generate substantial value for stakeholders, increased accessibility can also heighten potential risks. To unlock the true value of data, organizations must find a way to facilitate sharing while ensuring appropriate controls are in place.

JPMorgan Chase Bank, N.A. (JPMC) is addressing this dual challenge through a two-fold strategy. Firstly, they are defining data products curated by those who understand the data’s management needs, permissible uses, and limitations. Secondly, they are implementing a data mesh architecture that aligns their data technology with these data products.

This integrated strategy achieves several objectives:

  • Empowers data product owners to make informed management and usage decisions.
  • Enforces these decisions by focusing on data sharing rather than duplication.
  • Provides clear visibility into data sharing activities across the enterprise.

Let’s explore the concept of data mesh architecture and its role in supporting the data product strategy, ultimately enabling JPMC’s operational effectiveness.

Aligning Data Architecture with Data Product Strategy

JPMC consists of multiple lines of business (LOBs) and corporate functions (CFs) that span the organization. To facilitate efficient data access for consumers across these LOBs and CFs while maintaining necessary control, JPMC is embracing a data product strategy.

Data products consist of related data groups from systems that support business operations, forming cohesive collections. Each data product is stored in its own dedicated data lake, ensuring physical separation between different product lakes. Each lake is equipped with its own cloud-based storage layer, and data is cataloged and organized using cloud services. For instance, cloud storage solutions such as Amazon Simple Storage Service (Amazon S3) and integration tools like AWS Glue facilitate these functionalities.

Consumer applications that utilize data are hosted in distinct domains, separated from both each other and the data lakes. When data consumers require access to data from one or more lakes, cloud services are employed to make this data visible and enable querying directly from the lakes. Tools like AWS Glue Data Catalog and AWS Lake Formation aid in both data visibility and secure data sharing, while Amazon Athena allows users to interactively query the data.

The interconnected data lakes and application domains create the data mesh—a network of distributed data nodes designed to ensure data security, high availability, and ease of discoverability.

Empowering the Right Individuals to Make Control Decisions

The data mesh architecture enables each data product lake to be overseen by a team of data product owners who possess a deep understanding of the data within their domain. These teams can make risk-based decisions regarding data management.

When a consumer application requires data from a product lake, the owning team locates the necessary data in the enterprise-wide data catalog. This catalog is consistently updated to reflect current data in the lakes, allowing the consumption teams to discover and request data with minimal delays.

Enforcing Control Decisions Through In-Place Consumption

The data mesh facilitates data sharing from product lakes rather than duplicating it for consumer applications. This practice not only reduces storage costs but also minimizes discrepancies between data sources, ensuring that analytics, AI/ML, and reporting utilize the most accurate and up-to-date information.

Furthermore, since data does not physically leave the lake, control decisions made by data product owners are easier to enforce. For example, if tokenization is applied to certain data types, consumers can only access the tokenized versions, thereby closing any potential control gaps.

However, in-place consumption requires sophisticated access control mechanisms, as data visibility must be restricted at a granular level—down to individual columns and records. For instance, a system from one LOB querying firm-wide reference data shared through a lake would only be granted access to the data relevant to its specific line of business.

Providing Cross-Enterprise Visibility of Data Consumption

Historically, data exchanges between systems occurred either directly or through message queues. The absence of a centralized, automated repository hindered data product owners from easily tracking data flows.

The data mesh architecture resolves this visibility issue by utilizing a cloud-based mesh catalog to enhance transparency between lakes and data consumers. This catalog does not store data but offers insights into which lakes are sharing data with which consumers, establishing a single point of visibility across the enterprise. As a result, data product owners can confidently monitor the flow of their data.

For further insights into this topic, you may find this blog post engaging, and for authoritative information, check out this source here. Additionally, those interested in opportunities can explore this excellent resource on job availability here.

Amazon IXD – VGT2
6401 E Howdy Wells Ave, Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *