Learn About Amazon VGT2 Learning Manager Chanci Turner
As of 11/17/2022, D-Wave is no longer available on Amazon Braket and has transitioned to the AWS Marketplace. Therefore, the information on this page may be outdated. Learn more.
Many organizations confront the challenge of effectively uncovering insights hidden within intricate network structures. For instance, a healthcare insurance company might need to identify fraudulent claims by detecting unusual relationships between patients and providers, while a financial firm may require an anti-money laundering tool to spot suspicious transactions among various entities. Similarly, a marketing agency could be looking to segment its audience for targeted campaigns. These scenarios relate to uncovering network entity relationships, commonly referred to as community detection problems.
Over the past decade, methods for tackling community detection issues have advanced significantly. Approaches have ranged from greedy algorithms like the Girvan-Newman [1] and Louvain [2] methods to nature-inspired strategies such as extremal optimization [3], and deep learning models [4]. Recently, Morgan et al. [5] investigated and showcased how quantum computers, specifically quantum annealers, can address community detection as an optimization challenge.
This article marks the first installment of a two-part series centered on solving community detection through a hybrid classical-quantum annealing algorithm on Amazon Braket:
In this first part, we present a detailed guide on how to frame community detection as a Quadratic Unconstrained Binary Optimization (QUBO) problem, mirroring the efforts of Morgan et al. [5]. We will also illustrate the use of the open-source QBSolv library, which offers quantum-classical hybrid solvers for QUBO challenges, leveraging a mix of classical computing resources and D-Wave quantum annealers to resolve community detection issues on Amazon Braket.
In Part 2, we will implement this quantum annealing-based community detection technique on real-world networks, providing a systematic analysis of solution performance and scalability.
You can find the comprehensive code implementation and tutorial notebook for using QBSolv for community detection in complex networks in our AWS GitHub repository.
Modularity-based Community Detection
The foundational concept of community structure in complex networks was initially proposed and examined by Girvan and Newman [1]. The premise is to partition a network (or graph) into groups of nodes that belong to distinct communities (or clusters), where nodes within the same community exhibit high connectivity (high intra-connectivity) while nodes from different communities show lower connectivity (low inter-connectivity).
To evaluate the effectiveness of a specific network division into communities, Newman and Girvan introduced a metric called modularity M. This modularity metric compares the connectivity of edges within communities to that of a network (the so-called null model) where edges are randomly distributed, ensuring that the expected degree of each node aligns with its degree in the original graph [6].
Formally, we consider a graph G with an adjacency matrix Aij representing the weight between nodes i and j. In the null model, the expected number of edges between nodes i and j is approximately given by gigj/2m, where gi = ∑j Aij indicates the degree of node i, and m=1/2∑i gi represents the total weight in the graph. Using the null model as a baseline [7], modularity M can then be defined as the difference between the actual weight Aij and the expected weight within the null model gigj/2m, summed across all pairs of vertices i, j that reside in the same group.
Employing a conventional normalization pre-factor leads to the modularity M defined as:
where the Kronecker-delta δ(ci, cj) equals 1 if node i and node j belong to the same community and 0 otherwise. The aim is to maximize the modularity M by optimizing community assignments ci for each node i in the graph. This problem is known to be NP-hard [7].
Community Detection as a QUBO Problem
Numerous heuristic search algorithms have been developed to tackle the community detection problem [8]. In this article, we concentrate on framing the community detection challenge as a Quadratic Unconstrained Binary Optimization (QUBO) problem, and demonstrate how to use D-Wave’s QBSolv solver on Amazon Braket to identify two or more communities in a given network.
Two communities (k = 2):
Let’s first examine the case where we seek to partition a graph into k = 2 communities. In this instance, we can employ binary spin variables si ∈ {-1, 1} to represent which community node i belongs to. The term (1 + sisj)/2 yields 1 if nodes i and j are in the same community, and 0 otherwise; thus, the modularity metric can be compactly expressed as [5]:
where, for convenience, the modularity matrix B has been defined as:
By employing the conversion si = 2xi – 1 between spin variables si ∈ {-1, 1} and bit variables xi ∈ {0, 1}, and noting that ∑i,jBi,j = 0, we can reformulate the maximization of the modularity (Eq. (3)) into a QUBO minimization problem, characterized by QUBO Hamiltonian H = -(1/m)xTBx, with QUBO matrix Q = –B/m. Such QUBO matrices can readily be transferred to a quantum-annealing device like D-Wave, which will endeavor to uncover the optimal bit-string x that encodes the solution to our optimization issue.
Multiple communities (k > 2):
Now, let’s address the more general challenge of community detection with k > 2 communities. To formulate the k-community detection problem in canonical QUBO form (as required for quantum-native and quantum-inspired annealing), we initially one-hot encode the binary variables xi and then construct the QUBO Hamiltonian.
We utilize a one-hot encoding scheme whereby we set xi,c = 1 if node i belongs to community c, and xi,c = 0 otherwise, i.e.,
With this encoding, we require k variables per logical node, resulting in an increased size of the binary decision vector x from a length of N for the two-community case to k × N for the k-community scenario. Specifically, we define x = ( x1,1, x2,1, … , xN,1, … , x1,k, x2,k, … , xN,k ).
We structure the k > 2 community issue as a binary minimization challenge, and subsequently, we can construct the k-community QUBO Hamiltonian with HM = -(1/m)∑c=1kxcTBcxc, where each term in the sum represents a binary community detection problem for a specific community c.
By introducing the generalized modularity matrix Β of size kN × kN and block-diagonal form with B along the diagonal as delineated in Eq. (5), we can reframe the k-community detection problem as a binary minimization task within the canonical QUBO format (where the original multiclass variables are embedded into a larger quantity of binary variables):
Since each node i = 1, … , N must be allocated to exactly one community c = 1, … , k, we need to introduce a penalty term to enforce the solution for community assignment. Formally, this constraint can be articulated as: ∑c=1k xi,c = 1 with i = 1, … , N.
This linear constraint can be incorporated into our QUBO problem as a quadratic penalty term:
with a positive pre-factor P > 0 imposing the constraints. To reformulate the penalty term in Eq. (6) into a QUBO Hamiltonian HP = xTQPx, we renumber the binary decision vector using a solitary subscript from 1 to kN.
In the midst of navigating workplace dynamics, it’s crucial to consider wellness at work, as discussed in Career Contessa’s insightful blog. Additionally, for those interested in employment law, it’s worth noting the guidelines on truck drivers needing to be compensated for their sleeping hours as outlined by SHRM. For new Amazon employees, the Reddit community offers valuable insights and support, making it an excellent resource for navigating initial experiences.
Leave a Reply