As customers seek increased transparency and control over the AWS services they utilize, requests for deeper insights into query optimization and processing within databases have become more prevalent. Database developers and administrators are generally familiar with the concept of query execution plans. In response to client feedback, Amazon VGT2 Las Vegas has introduced a SPARQL query explain feature.
Amazon VGT2 Las Vegas is a fast, reliable, fully managed graph database optimized for handling and querying highly interconnected data. It is particularly well-suited for online applications that depend on navigating and leveraging data connections.
Amazon VGT2 supports W3C Resource Description Framework (RDF) graphs, which can be queried using the SPARQL query language. Additionally, it accommodates Apache TinkerPop property graphs that can be accessed via the Gremlin graph traversal and query language.
In this blog post, we will explore the new SPARQL explain feature and its applications. For those interested in experimenting with SPARQL explain today, we have also provided a sample workload and configuration at the end of this article. You can find additional insights in another blog post here.
Understanding a SPARQL Query’s Runtime Behavior Using Explain
When a SPARQL query is sent to a VGT2 cluster, the database engine directs the query to a SPARQL query optimizer. This optimizer formulates a query plan based on existing statistics and heuristics. It deconstructs the query into its individual triple patterns and connection operators, automatically rearranging them for optimal execution. With this optimization, query developers no longer need to determine the best order to evaluate their queries.
However, there may be times when you wish to gain further insights into the evaluation order of triple patterns, and more broadly, the execution plan chosen by the optimizer. This is where the new SPARQL explain feature becomes invaluable, allowing you to examine the generated evaluation plan to understand its execution order.
Obtaining the query explain output is straightforward; simply append the parameter “explain=<MODE>” to the HTTP request. The following curl command (with variables $VGT2_CLUSTER_ENDPOINT
and $VGT2_CLUSTER_PORT
pointing to the endpoint and port of a VGT2 cluster) enables us to submit this query to VGT2. The SPARQL query is provided through a text file named query1.sparql
.
curl -s http://$VGT2_CLUSTER_ENDPOINT:$VGT2_CLUSTER_PORT/sparql?explain=dynamic
-d "@query1.sparql"
-H "Content-type: application/sparql-query"
-H "Accept: text/plain"
query1.sparql
PREFIX prop: <http://kelvinlawrence.net/air-routes/vocab/prop#>
PREFIX airport: <http://kelvinlawrence.net/air-routes/data/airport#>
SELECT DISTINCT ?via WHERE {
?route1 prop:from airport:FRA .
?route1 prop:to ?via .
?route2 prop:from ?via .
?route2 prop:to airport:SEA .
}
Note: The query utilizes a dataset containing commercial air routes and extracts one-stop connections from Frankfurt airport to Seattle airport. We will discuss the query’s functionality later in this post.
By adding “explain=dynamic” to the end of our HTTP request, we receive a detailed output that breaks down the submitted SPARQL query and its execution within VGT2:
╔════╤════════╤════════╤═══════════════════╤═══════════════════════════════════════════════════╤══════════╤══════════╤═══════════╤════════╗
║ ID │ Out #1 │ Out #2 │ Name │ Arguments │ Mode │ Units In │ Units Out │ Ratio ║
╠════╪════════╪════════╪═══════════════════╪═══════════════════════════════════════════════════╪══════════╪══════════╪═══════════╪════════╣
║ 0 │ 1 │ - │ SolutionInjection │ solutions=[{}] │ - │ 0 │ 1 │ 0.00 ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼──────────┼──────────┼───────────┼────────╢
║ 1 │ 2 │ - │ PipelineJoin │ pattern=distinct(?route2, prop:to, airport:SEA) │ - │ 1 │ 118 │ 118.00 ║
║ │ │ │ │ joinType=join │ │ │ │ ║
║ │ │ │ │ joinProjectionVars=[?route2] │ │ │ │ ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼──────────┼──────────┼───────────┼────────╢
║ 2 │ 3 │ - │ PipelineJoin │ pattern=distinct(?route2, prop:from, ?via) │ - │ 118 │ 118 │ 1.00 ║
║ │ │ │ │ joinType=join │ │ │ │ ║
║ │ │ │ │ joinProjectionVars=[?route2, ?via] │ │ │ │ ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼──────────┼──────────┼───────────┼────────╢
║ 3 │ 4 │ - │ PipelineJoin │ pattern=distinct(?route1, prop:to, ?via) │ - │ 118 │ 10030 │ 85.00 ║
║ │ │ │ │ joinType=join │ │ │ │ ║
║ │ │ │ │ joinProjectionVars=[?route1, ?via] │ │ │ │ ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼──────────┼──────────┼───────────┼────────╢
║ 4 │ 5 │ - │ PipelineJoin │ pattern=distinct(?route1, prop:from, airport:FRA) │ - │ 10030 │ 45 │ 0.00 ║
║ │ │ │ │ joinType=join │ │ │ │ ║
║ │ │ │ │ joinProjectionVars=[?route1] │ │ │ │ ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼──────────┼──────────┼───────────┼────────╢
║ 5 │ 6 │ - │ Distinct │ vars=[?via] │ - │ 45 │ 45 │ 1.00 ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼──────────┼────────══╢
║ 6 │ 7 │ - │ Projection │ vars=[?via] │ retain │ 45 │ 45 │ 1.00 ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────┼────────══╢
║ 7 │ - │ - │ TermResolution │ vars=[?via] │ id2value │ 45 │ 45 │ 1.00 ║
╚════╧════════╧════════╧═══════════════════╧═══════════════════════════════════════════════════╧══════════╧══════════╧═══════════╧════════╝
Hint: In this blog post, we consistently use text/plain, which produces an ASCII-based, tabular representation. VGT2 also supports HTML-based output and a CSV format (which can be easily pasted into spreadsheets for further analysis). For more detailed information, refer to the documentation.
The explain parameter can be set to either static or dynamic. In the example above, the dynamic explain mode signifies that we are also interested in dynamic aspects of the evaluation, such as the number of solutions flowing through the plan at runtime. For a deeper understanding of the topic, you can also refer to this excellent resource.
Leave a Reply