Queries and Query Features

Generally, HeFQUIN supports queries that are written in the SPARQL query language. Yet, for the time being, every query must be given in the form of a so-called source assignment (using SERVICE clauses) and only a subset of the language features is supported natively within the HeFQUIN engine (with the rest being supported through integration with Apache Jena). This page provides information about:

Source Assignments

Since HeFQUIN does not (yet) have a proper source selection & query decomposition component, you need to indicate explicitly which part of the overall query pattern in the WHERE clause is expected to be matched in the data of which federation member(s). To this end, all triple patterns of the query pattern must be wrapped within SERVICE clauses. Of course, any such SERVICE clause may contain multiple triple patterns, and the same triple pattern may be repeated in different SERVICE clauses (e.g., to consider matching triples from multiple federation members). Any other part of the query pattern (e.g., a FILTER, a BIND clause, UNION, OPTIONAL, etc.) may be specified within or outside of the SERVICE clauses.

As an example of a correct source-assignment representation, consider the following query.

PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl:    <http://www.w3.org/2002/07/owl#>
PREFIX dbo:    <http://dbpedia.org/ontology/>

SELECT * WHERE {
	SERVICE <http://dbpedia.org/sparql> {
		<http://dbpedia.org/resource/Berlin> dbo:country ?c .
		?c owl:sameAs ?cc
	}
	FILTER STRSTARTS( STR(?cc), "http://www.wikidata.org/" )
	SERVICE <https://query.wikidata.org/sparql> {
		?cc rdfs:label ?o
	}
}

Placement and order of sub-patterns. Notice, that the placement of SPARQL features within or outside of the SERVICE clauses has no relevance for the query planner of the HeFQUIN engine, and neither has the order in which the SERVICE clauses are listed. For instance, the fact that the example query above has a FILTER in between (and outside of) the two SERVICE clauses is not meant to be understood that the query has to be executed in this order (with the FILTER to be applied after the result of the first SERVICE clause has been retrieved). Instead, the query planning component of HeFQUIN may decide to move operators (such as the aforementioned FILTER) into or outside of subqueries if it considers such a change as the more efficient query plan (and, of course, only if such a change does not affect the overall query result).

In this context it is also important to mention that the SERVICE clause for a particular federation member does not need to reflect the potential limitations of the type of data access interface provided by the federation member. For instance, even if a TPF server with its Triple Pattern Fragment interface can only handle requests consisting of a single triple pattern, a SERVICE clause for such a TPF server may contain multiple triple patterns (and even other query features such as a FILTER or a UNION). The query planning component of HeFQUIN takes care of creating a suitable query execution plan for these cases.

Service IRIs. Both of the two SERVICE clauses in the example query above provide a service IRI (namely, http://dbpedia.org/sparql and https://query.wikidata.org/sparql, respectively). With one exception (see below), HeFQUIN expects all SERVICE clauses to be of this form (rather than having a variable in place of the service IRI). The IRIs to be used as such service IRIs are the IRIs specified as a value of either the endpointAddress property or the exampleFragmentAddress property of the federation members mentioned in the description of your federation. If your query mentions a service IRI that is not the endpointAddress or the exampleFragmentAddress of any federation member in the federation to be queried, then the HeFQUIN query planner returns an error.

Providing the service IRIs via VALUES clauses. The only case in which the SERVICE clauses may have a variable in place of a service IRI is in queries in which VALUES clauses are used to specify the service IRIs. In particular, such a query must satisfy the following conditions:

As an example, consider the following query.

PREFIX ex: <http://example.org/>

SELECT * WHERE {
  VALUES (?s1 ?s2) {
    (ex:endpoint1 ex:endpoint2)
    (ex:endpoint1 ex:endpoint3)
  }
  SERVICE ?s1 { .. some pattern (that neither mentions ?s1 nor ?s2) .. }
  SERVICE ?s2 { .. also some pattern (that also doesn't mention ?s1 or ?s2) .. }
}

The VALUES clause may also be split up in order to avoid a combinatorial blow-up of possible combinations. For instance, the previous example query may also be provided in the following form.

PREFIX ex: <http://example.org/>

SELECT * WHERE {
  VALUES ?s1 { ex:endpoint1 }
  VALUES ?s2 { ex:endpoint2 ex:endpoint3 }
  SERVICE ?s1 { .. some pattern (that neither mentions ?s1 nor ?s2) .. }
  SERVICE ?s2 { .. also some pattern (that also doesn't mention ?s1 or ?s2) .. }
}

When using multiple VALUES clauses, it is not even necessary to place them all at the beginning of the WHERE clause. Instead, some of them may be moved closer to the SERVICE clause(s) in which the variables that they introduce are used as the service variable. For instance, the example query above may also be provided in the following form.

PREFIX ex: <http://example.org/>

SELECT * WHERE {
  VALUES ?s1 { ex:endpoint1 }
  SERVICE ?s1 { .. some pattern (that neither mentions ?s1 nor ?s2) .. }
  VALUES ?s2 { ex:endpoint2 ex:endpoint3 }
  SERVICE ?s2 { .. also some pattern (that also doesn't mention ?s1 or ?s2) .. }
}

However, for queries given in this form (with VALUES clauses in between other patterns), there is one limitation: HeFQUIN does not support SERVICE clauses with a variable that is not bound by the VALUES clause(s) that come closest before the SERVICE clause. For instance, for the previous example query it is not supported to add another SERVICE clause with variable ?s1 after the second VALUES clause; instead, such a SERVICE clause would need to be placed next to the existing SERVICE clause with variable ?s1.

SPARQL Features Supported Natively by the HeFQUIN Engine

While all of the features of version 1.1 of the SPARQL query language can be used in the queries given to HeFQUIN, the query planning and query execution components of the HeFQUIN engine support only a subset of these features. For queries that use other features, HeFQUIN relies on the SPARQL processor of Apache Jena. That is, any such query is compiled into a query execution plan by Jena's query processor such that the subplan(s) for the parts of the query pattern that the HeFQUIN engine supports are handled by the HeFQUIN engine. The solution mappings that the HeFQUIN engine produces for such a subplan are then passed to Jena's query processor. Due to this setup, queries that use features not supported natively by the HeFQUIN engine may experience a performance penalty. Currently, the HeFQUIN engine supports the following features natively (support for further features may be added in the future).