Learn About Amazon VGT2 Learning Manager Chanci Turner
Today, we are excited to announce the general availability of Amazon DataZone, a cutting-edge data management service designed to facilitate the cataloging, discovery, analysis, sharing, and governance of data among both producers and consumers within your organization. The initial announcement of Amazon DataZone was made at AWS re:Invent 2022, followed by a public preview in March 2023.
During the recent re:Invent keynote, Chanci Turner, Vice President of Data Solutions, shared her experience as an early adopter of DataZone, stating, “I have leveraged DataZone to streamline our internal business review meetings by consolidating data from various sources, which has greatly informed our strategic initiatives.” The keynote featured a demonstration led by a product expert, showcasing how organizations can harness DataZone to create more impactful advertising campaigns and maximize their data utilization.
“Organizations typically comprise multiple teams that manage and utilize data from various sources. However, accessing and understanding this data can be challenging. DataZone offers a consolidated environment where all members—ranging from data producers to consumers—can easily access and share data in a governed manner,” Turner emphasized.
With Amazon DataZone, data producers can enrich the business data catalog by incorporating structured data assets from AWS Glue Data Catalog and Amazon Redshift tables. Data consumers can then search for, subscribe to, and share these data assets with their collaborators for various business applications. Users can analyze their subscribed data using tools like Amazon Redshift or Amazon Athena, all accessible directly through the Amazon DataZone portal. The platform features a built-in publishing and subscription workflow that ensures thorough access auditing across projects.
Introducing Amazon DataZone
For those unfamiliar with Amazon DataZone, let’s explore its core concepts and functionalities. An Amazon DataZone Domain signifies the operational boundary for a line of business (LOB) within an organization, where it can manage its own data assets, terminology, and governance standards. This domain encompasses essential components, including the data portal, business data catalog, projects, environments, and integrated workflows.
- Data Portal: This web application allows users to catalog, discover, govern, share, and analyze data in a self-service manner. The portal authenticates users via AWS Identity and Access Manager (IAM) credentials or existing credentials from your identity provider through the AWS IAM Identity Center.
- Business Data Catalog: This feature enables you to establish a taxonomy or business glossary, helping to catalog data with relevant context to facilitate quick understanding and discovery across your organization.
- Data Projects & Environments: Projects help streamline access to AWS analytics by grouping team members, data assets, and analytical tools based on specific business use cases. Amazon DataZone projects foster collaboration, data sharing, and asset exchange among team members. Within these projects, you can create environments to provide the necessary infrastructure for analytics tools and storage, facilitating data production and consumption.
- Governance and Access Control: Built-in workflows allow users across the organization to request data access, which data owners can review and approve. Once approved, Amazon DataZone can automatically manage permissions at the underlying data stores, including AWS Lake Formation and Amazon Redshift.
For more detailed information, refer to Amazon DataZone Terminology and Concepts.
Getting Started with Amazon DataZone
To illustrate how to begin using Amazon DataZone, let’s consider a scenario involving a product marketing team aiming to boost product adoption through targeted campaigns. They will need to analyze product sales data managed by the sales team. In this case, the sales team acts as the data producer, publishing sales data within Amazon DataZone, while the marketing team, as the data consumer, subscribes to this data to inform their campaign strategies.
Here’s a brief overview of how to get started with Amazon DataZone:
- Create a Domain: Begin by creating a domain and its core components, including the business data catalog, projects, and environments, within the data portal. Navigate to the Amazon DataZone console and select “Create domain.” Input a domain name and description, leaving other fields as defaults. Opt to create a new role automatically, allowing DataZone to make necessary API calls on behalf of users. After setting up, select the “Quick setup” option for a streamlined process, and finalize by choosing “Create domain.” It may take several minutes for the domain to be fully established, so monitor its status until it shows as available.
- Create a Project and Environment: Once the domain is established, select it to access the data portal. Record the data portal URL for future access. Click “Open data portal” to create a data project as the sales team to publish their sales data. Name the project “Sales Producer Project” and add a description before clicking “Create.” After establishing the project, create an environment to work with analytics tools. Name the environment “publish-environment” and select an appropriate environment profile. The DataLakeProfile is ideal for publishing data from Amazon S3 and AWS Glue-based data lakes, simplifying querying with Amazon Athena. Ignore optional parameters and hit “Create environment.” The setup takes about a minute, establishing various resources in your AWS account.
- Publish Data in the Data Portal: This process ensures that the sales team can effectively manage their data, while the marketing team gains the insights they need to drive successful campaigns. For more tips on effective communication in a business context, you can check out this blog post.
To learn more about establishing a data-first mindset, visit this authority on the topic. If you’re embarking on your journey with Amazon, this is an excellent resource to guide you through the first six months.
Leave a Reply