LUPOSDATE Demonstration

Table of Contents |
In the DFG project LUPOSDATE, we develop logically and physically optimized Semantic Web Database engines, which we call LUPOSDATE engines, or LUPOSDATE system. The LUPOSDATE system aims to efficiently process the RDF query language SPARQL [16] [11] over large-scale Semantice Web data. In the meanwhile also the rule language RIF BLD and the ontology languages OWL and RDFS are supported.
In order to present the funcationalities of LUPOSDATE engines, we have developed an online demonstration of our system in form of an applet or as application using Java Web Start. Please see Starting the Demonstration. In this article, we introduce the LUPOSDATE system, its functionalities, usage of the demo, and related knowledge.
To read more about Semantic Web databases and LUPOSDATE, please take a look into our Semantic Web book [9].
The LUPOSDATE SPARQL system supports various approaches to manage RDF data and process SPARQL queries: Index, RDF3X, Stream, Jena and Sesame. Jena [21] and Sesame [3] refer to third-party SPARQL engines. Index is our in-memory Engine presented in [6]. Stream is our stream-based implementation (see [10]). RDF3X is a re-implementation of [14], but is further enhanced with additional optimization strategies.
All our approaches and reimplementations except of our Stream engine support full SPARQL 1.0 [16] and SPARQL 1.1 [11] and run the over 200 W3C test cases for SPARQL 1.0 [4] and nearly all of SPARQL 1.1 [15] successfully, which cover a broad range of technical details of the SPARQL language. The stream engine does not support named graphs, and thus it runs the W3C test cases successfully except of those, which require named graphs support.
The Index, RDF3X and Stream engine can be utilized to run also RIF BLD [2] rules. Our engines support the ontology languages OWL [12] and RDFS [13] by expressing their inference in RIF rules [17] and processing them afterwards. The LUPOSDATE system supports to use general RIF inference rules applied to ontology and instance data, or to generate RIF inference rules based on the concrete ontology data and apply them on the instance data. The latter is supported in two versions: The first one follows the W3C description in [17], the second uses a more optimized variant. Furthermore, for SPARQL queries on inferred data LUPOSDATE have the options to materialize first all inferred triples before evaluating the given SPARQL query, or to integrate the execution plans of the inference rules and the SPARQL query and optimize them together with the goal to infer only the data which is asked for during processing the given SPARQL query.

Figure 1: Architecture of LUPOSDATE system
Figure 1 presents the architecture of the LUPOSDATE engines. A given SPARQL query is first transformed into a coreSPARQL query. coreSPARQL is a core fragment of SPARQL, which eliminates redundant language constructs and allows only machine-friendly syntax (see [7]). The coreSPARQL hence simplifies the following processing of SPARQL. An operator graph is generated from the coreSPARQL query in order to further logically and physically optimize queries. Logical optimization aims to reorganize the operator graph into an equivalent operator graph, which generates the same output for any input as the original operator graph, but optimizes the execution time of query evaluation. Physical optimization aims to obtain the operator graph with the best estimated execution times by choosing physical operators for the logical ones. For all implementations except of Stream, the data is indexed. Stream is a stream-based implementation, which indexes only that fragment of data, which is necessary to answer the query during query evaluation. Therefore, Stream does not have an extra indexing phase and the data is directly transmitted to the query evaluation phase.
A similar approach applies to the evaluation of RIF rules, where RIF rules are first parsed into an abstract syntax tree and afterwards transformed into a simplified version of the abstract syntax tree before the evaluation continues with the transformation into an operator graph.
OWL and RDFS ontologies are supported by generating corresponding RIF inference rules, which are afterwards processed as described above. Figure 1 presents the architecture for the approach, where the operator graphs of RIF inference rules and the given SPARQL query are merged and afterwards optimized together to avoid inferring unnecessary triples. The other (not presented) approach would be to first infer all triples by evaluating the RIF inference rules and add all these triples to the index, such that they are considered for all subsequent SPARQL query evaluations.
Note that depending on the individual approach, the final operator graphs consist of different sets of operators. We present an overview of the supported logical and physical operators in the subsection after describing the Demonstration system.
Please use this link for loading the LUPOSDATE Web Client into your browser.
Our LUPOSDATE SPARQL engnies support various operators, which are able to process full SPARQL [16] [11], and most of which are enumerated in the following table. For more details on these logical and physical operators, please have a look into the literature (about databases), if we do not refer to a special publication there.
Logical Operator | Physical Operator(s) | Approaches | Description |
Join | HashJoin | All | HashJoin implements a round based hash join algorithm with a building phase and a probing phase. In the building phase, the smaller input data is divided into several partitions by using hash functions repeatedly until each partition fits into main memory. The larger input data is partitioned in the same way. Afterwards in the probing phase, the corresponding partitions are joined. |
HashMapIndexJoin, DBBPTreeIndexJoin, HybridIndexJoin | All | These physical operators build an index for the input data of its operands. For each received input data, these join algorithms access the index of the other operand in order to compute the join result. The used indices are hash maps, disk based B+-trees or an index, which uses hash maps for data, which fits into main memory, otherwise disk based B+-trees. | |
NestedLoopJoin | All | The Nested-Loop-Join uses a nested loop in order to compute the join result. | |
DBMergeSortedBagMergeJoin, TreeBagMergeJoin | All | These operators first sort their input data and use then the merge join algorithm in order to compute the join result. For sorting their input data, TreeBagMergeJoin uses the TreeBag java class for sorting its input data, DBMergeSortedBagMergeJoin uses a disk-based merge sort for bags algorithm. Note that if all the input data fits into main memory, the latter does not use a hard disk and works fully in main memory. | |
MergeJoinWithoutSorting | RDF3X (- Presorting), Hexastore (- Presorting) | MergeJoinWithoutSorting expects its input data in the correct sorted way and applies the merge join algorithm directly on its input data. | |
SIPFilterOperator(Iterator) | SIPFilterOperator(Iterator) | RDF3X (- Presorting), Hexastore (- Presorting) | This operator constructs a bloom filter of its operands data and informs corresponding triple patterns about this bloom filter. The informed triple patterns use the bloom filter to early discard irrelevant intermediate results. |
Union | Union | All | Union builds the union of the results of its operands. |
MergeUnion | RDF3X (- Presorting), Hexastore (- Presorting) | MergeUnion expects its input data to be sorted and merges the results of its operands, such that the final result is still sorted. | |
Filter | Filter | All | Filter evaluates filter expressions of SPARQL. |
Projection | Projection | All | Projection projects the result to a given list of variables. |
Result | Result | All | This operator is the final operator and provides access to the final result of the query. |
Construct | Construct | All | This operator transforms a bag of environments consisting of variable bindings into a set of triples based on templates of a given Construct clause in the query. |
MakeBooleanResult | MakeBooleanResult | All | This operator transforms a bag of environments consisting of variable bindings into a Boolean value for an Ask-query. |
(Fast)Sort | DBMergeSortedBagSort, HybridSortedBagSort, InsertionSort, QuickSort, TreeMapSort, FastSortLazyLiteral, FastSortPresortingNumbers, SortLimit | All | These operators sort their input data using a disk based merge sort algorithms for bags, the insertion sort algorithm, the quicksort algorithm or the TreeMap java class. The FastSort operators use literal ids or attached presorting numbers for fast sorting. SortLimit optimizes a sort operator with a succeeding Limit operator by early discarding irrelevant (larger) data. |
Distinct | DBSetBlockingDistinct | All | DBSetBlockingDistinct uses the disk based merge sort algorithm for sets for duplicate elimination. |
HashBlockingDistinct | All | HashBlockingDistinct uses a hash set for duplicate elimination. | |
HybridBlockingDistinct | All | HybridBlockingDistinct uses hash sets for input data, which fits into main memory, and otherwise a disk based merge sort algorithm for sets for duplicate elimination. | |
InMemoryDistinct | All | InMemoryDistinct uses a hash set for duplicate elimination, but is not blocking in comparison to HashBlockingDistinct, i.e. returns an item as early as possible if it is different to the previous items. | |
SortedDataDistinct | RDF3X ( - Presorting), Hexastore ( - Presorting) | SortedDataDistinct expects its input data to be sorted, such that the duplicate elimination is efficient by just comparing the current item with its previous item. | |
Limit | Limit | All | Limit implements the support for the Limit-clause. |
Offset | Offset | All | Offset implements the support for the Offset-clause. |
AddBinding | AddBinding | All | This operator might occur because of a logical optimization rule for constant propagation. |
AddBindingFromOtherVar | AddBindingFromOtherVar | All | This operator might occur because of a logical optimization rule for variable propagation. |
IndexCollection | IndexCollection | Index, RDF3X ( - Presorting), Hexastore ( - Presorting) | This is the root node for the indexing approaches and start point for their evaluations. |
Index, RDF3XIndex, HexastoreIndex | Index, RDF3XIndex, HexastoreIndex | Index, RDF3X ( - Presorting), Hexastore ( - Presorting) | These operators implement the index access based on one (or more) triple patterns for the Index, RDF3X (-Presorting) and Hexastore (- Presorting) approaches. |
PatternMatcher | SimplePatternMatcher | Stream | For each input RDF triple, the SimplePatternMatcher operator tests each triple pattern, if it matches the triple and transmits the triple to this triple pattern (see [10]). |
HashPatternMatcher | Stream | The HashPatternMatcher uses a hash function in order to directly determine those triple patterns, which match an input triple (see [10]). | |
TriplePattern | TriplePattern | Stream | The Triple Pattern operator tests, if it matches an input RDF triple and compute the resultant variable bindings in this case (see [10]). |
The Optional operator, which is not contained in this table, corresponds to a left-outer-semi-join. We support similar algorithms for the Optional operator as for the Join operator. The names of the Optional operators contain “Optional” instead of “Join” for the different implementations, i.e. DBBPTreeIndexOptional, DBMergeSortedBagOptional, HashMapIndexOptional, HybridIndexOptional, MergeOptionalSort, MergeWithoutSortingOptional, NaiveOptional (similar to NestedLoopJoin) and TreeBagOptional.
There are more operators for the SPARUL support, which we do not enumerate here.
All the implemented logical optimization rules are presented on this webpage.
We present some queries and RDF data used in the demo to demonstrate, e.g. which example queries can be optimized with which logical optimization rules and where the different engines are especially efficient. We present only queries from the SP2B benchmark here. Nevertherless, an analogous table could be created for the LUBM queries [5], which are also used in our demo.
Query (Queries) | Data | Engine(s) | Description |
sp2b_q1.sparql | sp2b_demo.n3 | Index ↔ RDF3X (- Presorting), Hexastore (- Presorting) ↔ Stream | All engines first transform the query into a CoreSPARQL query, which e.g. eliminates prefixes (see [7]). The operator graphs of this simple query show the different evaluation strategies of the different evaluators. For the Index engine, all three triple patterns are in an Index operator, which correspond to an evaluation of a right-deep join tree. For RDF3X (- Presorting) and Hexastore (- Presorting), collation orders of indices are used to apply merge joins in an optimized join order. The Stream engine applies the operators on RDF triples just when they are read from the input, such that they have to pass a pattern matcher, which distribute them to the specific triple patterns. |
sp2b_q2.sparql | sp2b_demo.n3 | RDF3X/Hexastore ↔ RDF3X/Hexastore - Presorting | As special optimization capability of RDF3X/Hexastore - Presorting, in the physical optimization, a Sort operator is replaced by using a MergeOptionalSort operator instead of a MergeWithoutSortingOptional operator. |
sp2b_q3a.sparql, | sp2b_demo.n3 | Index, RDF3X (- Presorting), Hexastore (- Presorting) | These queries are examples, where the rule Constant Propagation can be used. |
sp2b_q4.sparql, | sp2b_demo.n3 | RDF3X (- Presorting), Hexastore (- Presorting) | These queries are examples, where the rule Push Filter can be used. |
sp2b_q4.sparql, | sp2b_demo.n3 | RDF3X/Hexastore ↔ RDF3X/Hexastore - Presorting | Here, as special optimization capability of RDF3X/Hexastore - Presorting, in the physical optimization, MergeJoinSort operators are used, such that Merge Join operators can be applied instead of HashMapIndexJoin. |
sp2b_q4.sparql, | sp2b_demo.n3 | RDF3X/Hexastore ↔ RDF3X/Hexastore - Presorting | Here, as special optimization capability of RDF3X/Hexastore - Presorting, in the physical optimization, SortedDataDistinct can be used instead of DBSetBlockingDistinct by using a previous MergeJoinSort or MergeOptionalSort. |
sp2b_q5a.sparql | sp2b_demo.n3 | Index, RDF3X (- Presorting), Hexastore (- Presorting) | The rule Variable Propagation cannot be applied, as ?name1 and ?name2 only appear as objects. |
sp2b_q6.sparql | sp2b_demo.n3 | Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream | The rule Bound Variable in Optional is applied for this query. |
sp2b_q1.sparql, | sp2b_demo.n3 | Stream | For the Stream engine, the rule Binary Join replaces Join operators with more than two operands with binary Join operators. |
sp2b_E1.sparql | sp2b_demo.n3 | Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream | The operator graph of this query demonstrates the application of the rule Bound Variable In Union. |
sp2b_E2.sparql | sp2b_demo.n3 | Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream | The rule Constant Propagation of Filter with ORs is applied during optimizing this query. |
sp2b_E3.sparql | sp2b_demo.n3 | Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream | The application of the rule Variable Propagation is demonstrated when optimizing this query. |
sp2b_E4.sparql | sp2b_demo.n3 | RDF3X (- Presorting), Hexastore (- Presorting) | A Sort operator can be eliminated using an already existing order in the intermediate results during the physical optimization. |
sp2b_E4.sparql | sp2b_demo.n3 | RDF3X (- Presorting), Hexastore (- Presorting) | A SortedDataDistinct operator can be applied using an already existing order in the intermediate results during the physical optimization. |
- T. Berners-Lee, Notation 3 - An RDF language for the Semantic Web. W3C, 1998. Available at: http://www.w3.org/DesignIssues/Notation3.html
- Boley, H., Kifer, M., RIF Basic Logic Dialect, W3C Recommendation, 2010.
- Broekstra, J., Kampman, A., and van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. ISWC, Springer, Sardinia, 2002.
- Lee Feigenbaum (editor), DAWG Testcases, http://www.w3.org/2001/sw/DataAccess/tests/r2, 2008.
- Guo, Y., Pan, Z., and Heflin, J. LUBM: A Benchmark for OWL Knowledge Base Systems. Web Semantics 3(2), 2005.
- Jinghua Groppe, Sven Groppe, Sebastian Ebers, Volker Linnemann, Efficient Processing of SPARQL Joins in Memory by Dynamically Restricting Triple Patterns, 24th ACM Symposium on Applied Computing (ACM SAC 2009), Waikiki Beach, Honolulu, Hawaii, USA, 2009.
- Groppe, J., Groppe, S., and Kolbaum, J., Optimization of SPARQL by using coreSPARQL, 11th International Conference on Enterprise Information Systems, Milan, Italy, 2009.
- Jinghua Groppe, Sven Groppe, Andreas Schleifer, Volker Linnemann, LuposDate: A Semantic Web Database System, 18th ACM Conference on Information and Knowledge Management (ACM CIKM 2009), Hong Kong, China, 2009.
- Sven Groppe, Data Management and Query Processing in Semantic Web Databases, Springer, 2011. ISBN 978-3-642-19356-9 Book Webpage
- Sven Groppe, Jinghua Groppe, Dirk Kukulenz, Volker Linnemann, A SPARQL Engine for Streaming RDF Data, 3rd International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2007), Shanghai, China, 2007. This paper received an honorable mention at the SITIS'07 Conference.
- Harris, S., Seaborne, A., SPARQL 1.1 Query Language, W3C Working Draft, 2012.
- Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P. F., Rudolph, S., OWL 2 Web Ontology Language Primer, W3C Recommendation, 2009.
- Manola, F., Miller, E., RDF Primer, W3C Recommendation, 2004.
- Thomas Neumann, Gerhard Weikum, RDF3X: a RISCstyle Engine for RDF, Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008.
- Polleres, A., SPARQL1.1: Test case structure, Working Document, 2012.
- Prud’hommeaux E., Seaborne A. SPARQL Query Language for RDF, W3C Recommendation, 2008.
- Reynolds, D., OWL 2 RL in RIF, W3C Working Group Note, 2010.
- Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. SP2Bench: A SPARQL Performance Benchmark, 25th International Conference on Data Engineering (ICDE), Shanghai, China, 2009.
- A. Seaborne and G. Manjunath. SPARQL/Update, A language for updating RDF graphs. http://jena.hpl.hp.com/~afs/SPARQL-Update.html, 2008.
- Cathrin Weiss, Panagiotis Karras, Abraham Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008.
- Wilkinson, K., Sayers, C., Kuno, H. A., and Reynolds, D. Efficient RDF Storage and Retrieval in Jena2. In SWDB’03 co-located with VLDB 2003, Berlin, Germany, 2003.