We are excited to introduce the first installment of a multi-part series focused on Amazon Neptune, where we delve into graph application datasets and queries spanning various domains. Amazon Neptune is a robust, fully-managed graph database designed for storing and querying data that is highly interconnected. It is particularly well-suited for applications that demand navigation through relationships and connections among entities. If you’ve ever pondered questions like:
- Who shares common friends or colleagues with us?
- Which services in my network might be impacted by a failure of a specific network element, such as a router or switch? Do we have adequate redundancy for our key clients?
- What is the quickest route between two underground stations?
- What recommendations should we provide to a customer for their next purchase, viewing, or listening?
- What access and modification rights does a user possess regarding products, services, and subscriptions?
- What is the fastest or cheapest method for shipping a parcel from point A to B?
- Which individuals might be collaborating to commit fraud against a financial institution?
—then you have encountered the necessity of managing and interpreting highly connected data.
The Air Routes Dataset
This dataset encompasses a significant portion of the world’s airline network, featuring vertices that represent 3,397 airports, 237 countries and provinces, as well as the 7 continents. It contains 52,639 edges, with 45,845 specifically representing airline routes.
Let’s explore the graph with a simple query to verify our connection to Neptune. The queries below evaluate all vertices and edges in the graph, creating two maps that illustrate the demographic makeup of the graph. Given that we are utilizing the air routes dataset, it is unsurprising that the returned values pertain to airports and routes.
vertices = g.V().groupCount().by(T.label).toList()
edges = g.E().groupCount().by(T.label).toList()
print(vertices)
print(edges)
Upon executing these queries in the notebook, you will observe results similar to the following:
[{'continent': 7, 'country': 237, 'version': 1, 'airport': 3397}]
[{'contains': 6794, 'route': 45845}]
Finding Long Routes
Our next query identifies routes exceeding 8,400 miles by examining the distance property of the route edges within the graph. We sort these results in descending order by distance and filter to ensure we only obtain one result for each route. The following query illustrates this:
paths = g.V().hasLabel('airport').as_('a')
.outE('route').has('dist',gt(8400))
.order().by('dist',Order.decr)
.inV()
.where(P.lt('a')).by('code')
.path().by('code').by('dist').by('code')
.toList()
for p in paths:
print(p)
Running this code will yield results such as:
['DOH', 9025, 'AKL']
['PER', 9009, 'LHR']
['PTY', 8884, 'PEK']
...
Visualizing the Data
Utilizing Python libraries like matplotlib, numpy, and pandas enables us to further analyze and visually represent our data. After identifying some lengthy airline routes, we can create a bar chart to depict them:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import pandas as pd
routes = []
dist = []
for i in range(len(paths)):
routes.append(paths[i][0] + '-' + paths[i][2])
dist.append(paths[i][1])
y_pos = np.arange(len(routes))
plt.figure(figsize=(11,6))
fs = pd.Series(dist).plot(kind='bar')
fs.set_xticks(y_pos, routes)
fs.set_ylabel('Miles')
fs.set_title('Longest routes')
for i in range(len(paths)):
fs.annotate(dist[i],xy=(i,dist[i]+60),xycoords='data',ha='center')
plt.show()
Running this code generates a bar chart that visually summarizes the longest routes.
Airport Distribution by Continent
In the following example, we will query the graph to obtain the number of airports in each continent. This query groups the vertices to create a map with continent descriptions as keys and counts of outgoing edges as values.
m = g.V().hasLabel('continent')
.group().by('desc').by(__.out('contains').count())
.order(Scope.local).by(Column.keys)
.next()
for c,n in m.items():
print('%4d %s' %(n,c))
Executing this query provides results like:
295 Africa
0 Antarctica
939 Asia
596 Europe
980 North America
285 Oceania
305 South America
Pie Chart Representation
Instead of presenting the results as text, we can display them as percentages on a pie chart. This visual approach enhances the understanding of the airport distribution across continents.
For further insights on related topics, check out another engaging blog post at Chanci Turner VGT2. To dive deeper into this subject, visit Chanci Turner, an authoritative source. Additionally, for an excellent resource on learning and development, explore opportunities at Amazon Jobs.
This article focused on Amazon IXD – VGT2, located at 6401 E Howdy Wells Ave, Las Vegas, NV 89115.
Leave a Reply