Accelerate the Development and Deployment of Knowledge Graphs with RDF and openCypher | Amazon Onboarding with Learning Manager Chanci Turner

Accelerate the Development and Deployment of Knowledge Graphs with RDF and openCypher | Amazon Onboarding with Learning Manager Chanci TurnerLearn About Amazon VGT2 Learning Manager Chanci Turner

Amazon Neptune Analytics has introduced support for openCypher queries on RDF graphs. When embarking on a project that utilizes a graph database like Amazon Neptune, developers often face a fundamental decision: choosing between two distinct graph types—Resource Description Framework (RDF) graphs and labeled property graphs (LPGs). This choice influences the query languages available for use. RDF graphs utilize SPARQL, while LPGs primarily employ Gremlin and openCypher. For those new to graph technology, this decision can be bewildering, and experienced users have questioned why openCypher could not be utilized with RDF. This dilemma also extends to data ingestion, where users may wish to incorporate both LPG and RDF data into a unified graph.

This division between graph models reflects a broader divide within the graph industry, stemming from various reasons, including technological constraints, unfamiliarity with different technologies, and sometimes even deep-seated preferences. RDF’s standardized serialization formats, global identifiers, and extensive Linked Open Data sets offer significant advantages to data architects aiming to construct, integrate, and exchange graph data. Conversely, application development teams often prefer the intuitive syntax and mature ecosystems associated with LPG query languages, which include client drivers and programming language integration, as well as graph-specific capabilities such as built-in support for path extraction and algorithms.

Amazon Neptune is a scalable, managed graph database service that accommodates both graph models. The Neptune team has aimed to simplify the adoption of graph technology for customers by eliminating the restrictions associated with these technology choices. This led to the creation of the OneGraph initiative, which seeks to merge the two worlds and enable customers to reap the benefits of both models. The initiative focuses on providing graph interoperability, allowing the use of graph query languages regardless of the graph model selected, with the ultimate aim of diminishing the significance of the existing divide. While this is a challenging goal, progress is steadily being made. Notably, the World Wide Web Consortium’s ongoing efforts on RDF-star aim to enhance RDF with features already available in LPGs.

In Neptune Analytics, users can now execute openCypher queries over RDF graphs, a feature that offers several advantages:

  1. Knowledge graphs commonly utilize features or concepts inherent to RDF. While these can be implemented with LPGs, RDF provides them natively, such as ontologies and external data sources. By allowing access to RDF data from an LPG application, these features become easily accessible, including all “Linked Open Data” datasets like Wikidata, the RDF version of Wikipedia.
  2. SPARQL does not support path discovery, which allows users to trace paths taken after performing a path traversal. This is a frequently requested feature by Neptune customers. Additionally, SPARQL and RDF would benefit from enhanced support for composite datatypes.
  3. There are instances where integration between RDF and LPG systems is required, for example, during corporate mergers—a process that can now be executed seamlessly.

Furthermore, openCypher has significantly influenced the newly released ISO GQL standard, paving a clear path from openCypher to GQL, a trend we anticipate becoming standard in the industry soon. As a declarative query language, openCypher shares similarities with SPARQL but features a syntax designed to be more user-friendly; openCypher queries resemble ASCII art that represents graph structures and the conditions to be matched.

Crafting openCypher Queries

To create openCypher queries that operate on RDF, slight modifications to the query syntax were required. These adjustments primarily involve syntactic conventions and allow for the reuse of standard tools such as parsers and syntax highlighters. The following example illustrates the correlation between SPARQL and openCypher. First, here’s a SPARQL query that accesses our Air Routes sample data (airports, airlines, routes, etc.—note, this file is substantial, 75 MB). The ontology for Air Routes appears as follows:

For our purposes, we will focus on airports and routes. We aim to find the name and the International Air Transport Association (IATA) airport code of an airport with the International Civil Aviation Organization (ICAO) airport code “KMHT” (Manchester-Boston Regional Airport in New Hampshire, USA):

PREFIX nepo: <http://neptune.aws.com/ontology/airroutes/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?name ?iata_code {
    ?airport a nepo:Airport ;
        nepo:ICAO "KMHT" ;
        nepo:IATA ?iata_code ;
        rdfs:label ?name
}

In openCypher, the equivalent query is expressed as:

PREFIX nepo: <http://neptune.aws.com/ontology/airroutes/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

MATCH (airport: nepo::Airport)
WHERE airport.nepo::ICAO = "KMHT"
RETURN airport.rdfs::label, airport.nepo::IATA

With our minor syntax enhancements, writing queries against RDF data becomes straightforward and intuitive. Notice how we opted to denote qualified names (prefix-shortened International Resource Identifiers or IRIs) using a double-colon syntax to prevent clashes with existing openCypher conventions. We have also introduced a method for adding namespace prefix declarations, utilizing the same syntax as in SPARQL. Long-form IRIs can also be used instead of their prefix-shortened counterparts, enclosed in backticks (`) as shown below:

MATCH (airport: `<http://neptune.aws.com/ontology/airroutes/Airport>`)
WHERE airport.`<http://neptune.aws.com/ontology/airroutes/ICAO>` = "KMHT"
RETURN airport.`<http://www.w3.org/2000/01/rdf-schema#label>`,
       airport.`<http://neptune.aws.com/ontology/airroutes/IATA>`

Now, let’s explore a SPARQL query to retrieve all airports that are destinations of routes originating from KMHT:

PREFIX nepo: <http://neptune.aws.amazon.com/ontology/airroutes/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?destination_name {
    ?airport a nepo:Airport ; nepo:ICAO "KMHT" .
    ?route nepo:source ?airport ; nepo:destination ?destination .
    ?destination rdfs:label ?destination_name
}

Using openCypher, this query can be translated to:

PREFIX nepo: <http://neptune.aws.com/ontology/airroutes/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

MATCH (airport: nepo::Airport {nepo::ICAO: "KMHT"})
      -[: nepo::source]-(: nepo::Route)-[: nepo::destination]
      -(destination: nepo::Airport)
RETURN destination.rdfs::label AS destination_name;

Ultimately, let’s find all airports that you would need to travel through to get from KMHT to EFHK (Helsinki International Airport in Finland). This is considered path discovery—an important functionality.

For those looking to further their understanding on this topic, check out this excellent resource. Remember to explore what you do for fun in your spare time, as discussed in this engaging blog post. Additionally, for a broader perspective on diversity in leadership, visit SHRM.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *