Root Cause Analysis Utilizing DoWhy: An Open Source Python Library for Causal Machine Learning

Root Cause Analysis Utilizing DoWhy: An Open Source Python Library for Causal Machine LearningLearn About Amazon VGT2 Learning Manager Chanci Turner

Identifying the underlying causes of changes within complex systems can pose significant challenges, often necessitating profound domain expertise and potentially extensive manual effort. For instance, if we observe an unexpected decline in the profits of a product sold through an online platform, various interconnected factors could subtly influence the overall profitability.

Imagine having automated tools that can streamline and expedite this investigative process—a library that can uncover the root causes of an observed effect with just a few lines of code! This is the primary objective of the root cause analysis (RCA) capabilities found in the DoWhy open-source Python library, to which AWS has contributed an array of novel causal machine learning (ML) algorithms. These algorithms, developed from years of Amazon research on graphical causal models, were introduced in DoWhy version 0.8 in July of last year. Additionally, AWS has partnered with Microsoft to establish a new organization named PyWhy, which now houses DoWhy. According to its charter, PyWhy’s mission is to “develop an open-source ecosystem for causal machine learning that advances the state of the art and makes it accessible to both practitioners and researchers. We create and maintain interoperable libraries, tools, and other resources that cover various causal tasks and applications, all linked via a common API focusing on foundational causal operations and the complete analysis process.”

In this article, we will closely examine these algorithms, specifically demonstrating their utility in the context of root cause analysis in complex systems. By applying DoWhy’s causal ML algorithms, we can significantly reduce the time taken to identify root causes. To illustrate this, we will explore an example scenario using randomly generated synthetic data where the ground truth is already known.

The Scenario

Consider an online shop selling a smartphone for $999. The product’s overall profitability hinges on several factors, such as the number of units sold, operational costs, and advertising expenditures. Conversely, the number of units sold is influenced by factors like the volume of visitors to the product page, the unit price, and any ongoing promotions. Let’s say we observe a consistent profit throughout 2021, but there is a sudden and sharp decline in profit at the start of 2022. What could be the reason behind this?

In the following scenario, we will utilize DoWhy to better understand the causal impacts of various factors influencing profit and identify the reasons for the profit decline. To analyze the situation, we must first outline our beliefs about the causal relationships at play. For this, we will compile daily records of the different factors affecting profit, including:

  • Shopping Event?: A binary indicator of whether a special shopping event occurred, such as Black Friday or Cyber Monday.
  • Ad Spend: The amount spent on advertising campaigns.
  • Page Views: The number of visitors to the product detail page.
  • Unit Price: The price of the smartphone, which may vary due to temporary discounts.
  • Sold Units: The quantity of smartphones sold.
  • Revenue: Daily revenue generated.
  • Operational Cost: Daily operational expenses, encompassing production costs, advertising spending, administrative costs, etc.
  • Profit: Daily profit.

By examining these attributes, we can leverage our domain knowledge to articulate the cause-and-effect relationships in the form of a directed acyclic graph, which illustrates our causal graph. An arrow from X to Y (X → Y) indicates a direct causal relationship, where X is the cause of Y. In this scenario, we understand the following:

  • Shopping Event? influences:
    • Ad Spend: Increased spending is needed to promote the product during special events.
    • Page Views: Shopping events typically draw a large number of visitors due to discounts and various offers.
    • Unit Price: Retailers often reduce prices during shopping events.
    • Sold Units: Events coincide with major celebrations when consumers tend to purchase more.
  • Ad Spend affects:
    • Page Views: Higher ad spending increases the likelihood of visits to the product page.
    • Operational Cost: Ad expenses contribute to operational costs.
  • Page Views influences:
    • Sold Units: More visitors generally translate to more sales.
  • Unit Price affects:
    • Sold Units: Pricing directly influences the quantity sold.
    • Revenue: Daily revenue is the product of sold units and unit price.
  • Sold Units affects:
    • Revenue: The number of units sold significantly impacts total revenue.
    • Operational Cost: Manufacturing costs rise with increased unit sales.
  • Operational Cost influences:
    • Profit: Profit is calculated as revenue minus operational costs.
  • Revenue impacts:
    • Profit: Similar reasoning applies as for operational costs.

Step 1: Define Causal Models

Next, we will model these causal relationships using DoWhy’s graphical causal model (GCM) module. The first step is to define a structural causal model (SCM), which combines the causal graph with generative models that describe how the data is generated.

To represent the graph structure, we will use NetworkX, a widely-used open-source Python graph library. In NetworkX, we can depict our causal graph like this:

import networkx as nx

causal_graph = nx.DiGraph([
    ('Page Views', 'Sold Units'),
    ('Revenue', 'Profit'),
    ('Unit Price', 'Sold Units'),
    ('Unit Price', 'Revenue'),
    ('Shopping Event?', 'Page Views'),
    ('Shopping Event?', 'Sold Units'),
    ('Shopping Event?', 'Unit Price'),
    ('Shopping Event?', 'Ad Spend'),
    ('Ad Spend', 'Page Views'),
    ('Ad Spend', 'Operational Cost'),
    ('Sold Units', 'Revenue'),
    ('Sold Units', 'Operational Cost'),
    ('Operational Cost', 'Profit')
]) 

Next, we will examine the data from 2021:

import pandas as pd

pd.options.display.float_format = '${:,.2f}'.format  # Format dollar columns
data_2021 = pd.read_csv('2021 Data.csv', index_col='Date')
data_2021.head()

Here, we can see one data entry for each day in 2021, encompassing all the variables depicted in the causal graph. Keep in mind that in the synthetic data discussed in this blog post, shopping events were also randomly generated.

While we have established the causal graph, we still need to associate generative models with each node. With DoWhy, we can either manually specify these models or allow the library to automatically infer suitable models based on the data. We will take advantage of the latter approach:

from dowhy import gcm

# Create the structural causal model object
scm = gcm.StructuralCausalModel(causal_graph)

# Automatically assign generative models to each node based on the given data
gcm.auto.assign_causal_mechanisms(scm, data_2021)

Whenever feasible, it is advisable to assign models based on prior knowledge, as this can enhance the reliability of the results. For more guidance on efficient planning, you might find this daily planner article useful.

For those interested in policies regarding shift differentials, SHRM provides authoritative insights on this topic. Additionally, if you’re curious about what to expect on your first day, this Reddit thread is an excellent resource.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *