Queries and Query Features
Generally, HeFQUIN supports queries that are written in the SPARQL query language. Yet, for the time being, every query must be given in the form of a so-called source assignment (using SERVICE
clauses) and only a subset of the language features is supported natively within the HeFQUIN engine (with the rest being supported through integration with Apache Jena). This page provides information about:
- the expected source-assignment representation of queries and
- the SPARQL features supported natively by the HeFQUIN engine.
Source Assignments
Since HeFQUIN does not (yet) have a proper source selection & query decomposition component, you need to indicate explicitly which part of the overall query pattern in the WHERE
clause is expected to be matched in the data of which federation member(s). To this end, all triple patterns of the query pattern must be wrapped within SERVICE
clauses. Of course, any such SERVICE
clause may contain multiple triple patterns, and the same triple pattern may be repeated in different SERVICE
clauses (e.g., to consider matching triples from multiple federation members). Any other part of the query pattern (e.g., a FILTER
, a BIND
clause, UNION
, OPTIONAL
, etc.) may be specified within or outside of the SERVICE
clauses.
As an example of a correct source-assignment representation, consider the following query.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT * WHERE {
SERVICE <http://dbpedia.org/sparql> {
<http://dbpedia.org/resource/Berlin> dbo:country ?c .
?c owl:sameAs ?cc
}
FILTER STRSTARTS( STR(?cc), "http://www.wikidata.org/" )
SERVICE <https://query.wikidata.org/sparql> {
?cc rdfs:label ?o
}
}
Placement and order of sub-patterns. Notice, that the placement of SPARQL features within or outside of the SERVICE
clauses has no relevance for the query planner of the HeFQUIN engine, and neither has the order in which the SERVICE
clauses are listed. For instance, the fact that the example query above has a FILTER
in between (and outside of) the two SERVICE
clauses is not meant to be understood that the query has to be executed in this order (with the FILTER
to be applied after the result of the first SERVICE
clause has been retrieved). Instead, the query planning component of HeFQUIN may decide to move operators (such as the aforementioned FILTER
) into or outside of subqueries if it considers such a change as the more efficient query plan (and, of course, only if such a change does not affect the overall query result).
In this context it is also important to mention that the SERVICE
clause for a particular federation member does not need to reflect the potential limitations of the type of data access interface provided by the federation member. For instance, even if a TPF server with its Triple Pattern Fragment interface can only handle requests consisting of a single triple pattern, a SERVICE
clause for such a TPF server may contain multiple triple patterns (and even other query features such as a FILTER
or a UNION
). The query planning component of HeFQUIN takes care of creating a suitable query execution plan for these cases.
Service IRIs. Both of the two SERVICE
clauses in the example query above provide a service IRI (namely,
http://dbpedia.org/sparql
and https://query.wikidata.org/sparql
, respectively).
With one exception (see below), HeFQUIN expects all SERVICE
clauses to be of this form (rather than having a variable in place of the service IRI). The IRIs to be used as such service IRIs are the IRIs specified as a value of either the endpointAddress
property or the exampleFragmentAddress
property of the federation members mentioned in the description of your federation. If your query mentions a service IRI that is not the endpointAddress
or the exampleFragmentAddress
of any federation member in the federation to be queried, then the HeFQUIN query planner returns an error.
Providing the service IRIs via VALUES
clauses. The only case in which the SERVICE
clauses may have a variable in place of a service IRI is in queries in which VALUES
clauses are used to specify the service IRIs. In particular, such a query must satisfy the following conditions:
- For every
SERVICE
clause with a variable, the query pattern must contain aVALUES
clause with that variable somewhere before theSERVICE
clause. Moreover, if there are otherVALUES
clauses in between, these must all come directly after theVALUES
clause with the variable of theSERVICE
clause. - None of the variables bound by the
VALUES
clauses is mentioned anywhere else in the query pattern except as the variable of aSERVICE
clause.
As an example, consider the following query.
PREFIX ex: <http://example.org/>
SELECT * WHERE {
VALUES (?s1 ?s2) {
(ex:endpoint1 ex:endpoint2)
(ex:endpoint1 ex:endpoint3)
}
SERVICE ?s1 { .. some pattern (that neither mentions ?s1 nor ?s2) .. }
SERVICE ?s2 { .. also some pattern (that also doesn't mention ?s1 or ?s2) .. }
}
The VALUES
clause may also be split up in order to avoid a combinatorial blow-up of possible combinations. For instance, the previous example query may also be provided in the following form.
PREFIX ex: <http://example.org/>
SELECT * WHERE {
VALUES ?s1 { ex:endpoint1 }
VALUES ?s2 { ex:endpoint2 ex:endpoint3 }
SERVICE ?s1 { .. some pattern (that neither mentions ?s1 nor ?s2) .. }
SERVICE ?s2 { .. also some pattern (that also doesn't mention ?s1 or ?s2) .. }
}
When using multiple VALUES
clauses, it is not even necessary to place them all at the beginning of the WHERE
clause. Instead, some of them may be moved closer to the SERVICE
clause(s) in which the variables that they introduce are used as the service variable. For instance, the example query above may also be provided in the following form.
PREFIX ex: <http://example.org/>
SELECT * WHERE {
VALUES ?s1 { ex:endpoint1 }
SERVICE ?s1 { .. some pattern (that neither mentions ?s1 nor ?s2) .. }
VALUES ?s2 { ex:endpoint2 ex:endpoint3 }
SERVICE ?s2 { .. also some pattern (that also doesn't mention ?s1 or ?s2) .. }
}
However, for queries given in this form (with VALUES
clauses in between other patterns), there is one limitation: HeFQUIN does not support SERVICE
clauses with a variable that is not bound by the VALUES
clause(s) that come closest before the SERVICE
clause. For instance, for the previous example query it is not supported to add another SERVICE
clause with variable ?s1
after the second VALUES
clause; instead, such a SERVICE
clause would need to be placed next to the existing SERVICE
clause with variable ?s1
.
SPARQL Features Supported Natively by the HeFQUIN Engine
While all of the features of version 1.1 of the SPARQL query language can be used in the queries given to HeFQUIN, the query planning and query execution components of the HeFQUIN engine support only a subset of these features. For queries that use other features, HeFQUIN relies on the SPARQL processor of Apache Jena. That is, any such query is compiled into a query execution plan by Jena's query processor such that the subplan(s) for the parts of the query pattern that the HeFQUIN engine supports are handled by the HeFQUIN engine. The solution mappings that the HeFQUIN engine produces for such a subplan are then passed to Jena's query processor. Due to this setup, queries that use features not supported natively by the HeFQUIN engine may experience a performance penalty. Currently, the HeFQUIN engine supports the following features natively (support for further features may be added in the future).
- Basic graph patterns
- Optional patterns
- Union patterns
FILTER
with arbitrary conditions- Grouping of any of the aforementioned types of patterns (i.e., joining the results of sub-patterns)