Rule Documentation

All Packages - Package Correct Operatorgraph - Package Logical Optimization - Package Physical OptimizationFRAMES - NO FRAMES

Rule Optimizing Join Order

Short Description:

The join order is optimized by etimating the cardinality of the result of single joins and first joining those, which have the estimated smallest number of intermediate results.

Long Description:

We have two different types of query optimizers.

The first type is a simple query optimizer, which generates right-order trees for the overall join computation. This type of query optimizer just joins the intermediate result of a previous join with the result of that triple pattern, which have (a) the smallest number of results, (b) the smallest number of new bound variables, or (c) a combination of both, i.e. the query optimizers uses (b) as primary and (c) as secondary criterion. This query optimizer is used for the Index-approach (except with the optimization option set to BINARY). The Stream approach can only use (b) as criterion as the query has to be optimized before the data arrives as stream.

The previous type of query optimizer does not consider the number of results after one or more join computations, which considers our second type of query optimizer. This second type of query optimizer is used for the RDF3X (- Presorting), Hexastore (- Presorting) and Index (with optimization option set to BINARY) approaches.

For the second type of query optimizer, we use equi-depth histograms [1] for the join cardinality estimation in order to calculate the cost of each join and thus the overall cost of a concrete plan. We employ the technique of the dynamic programming for generating the execution plan with the optimal join order, i.e. our query optimizer composes a new best solution by considering the best solutions of its subproblems. In more detail, for a set S of triple patterns, our query optimizer builds every possible disjoint subsets S1 and S2, the union of which is S, i.e. S = S1 union S2. For each of these subsets S1 and S2, our query optimizer looks up the best solutions for their join orderings, which has been computed earlier, and calculates a new cost for this solution. Among these solutions, our query optimizer chooses the ones with the minimal costs for the overall solution for S.

A merge join can be computed using different collation orders. For example, a merge join can be applied to compute the triple patterns ?a <type> ?b and <book> ?b ?a, using either ?a as the primary order, or ?b as the primary order. In order to reduce the number of intermediate plans, our query optimizer only checks whether or not a merge join can be applied, but does not store the collation orders used to compute the join at this step. The used collation orders for answering the single triple patterns are fixed (and stored) in the last step, when the whole plan is considered.

Since merge joins without additional sorting phases are very cheap, our query optimizer chooses merge joins without additional sorting phases as much as possible.

[1] Piatetsky-Shapiro G., Connell C.: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conf., 1984.

All Packages - Package Correct Operatorgraph - Package Logical Optimization - Package Physical OptimizationFRAMES - NO FRAMES