Institut für Informationssysteme


KeyX:
Key-Oriented XML Index

deutsche Version

Description of the project:

The Extensible Markup Language (XML) is designed to become the standard data format for exchanging information in the Internet. The increasing usage of XML data by web applications in electronic commerce for instance demands the connection between XML technology and database management systems.
Indexes are usually used to accelerate specific queries in database management systems. In relational database management systems (RDBMS) indexes are broadly explored and implemented in commercial products for a long time. Indexes in XML database management systems (XDBMS) are still an active field of research. Different approaches supporting different types of queries were introduced in the past. Approaches that are not selective to specific queries require the whole XML data to be indexed and may lead to enormous space consumption and poor performance if changes to the XML data occur often.

In this project we introduce a new index approach, called key-oriented XML index (KeyX), that uses specific XML element or attribute values as keys referencing arbitrary nodes in the XML data. KeyX is selective to specific queries avoiding efforts spent for elements which are never queried. This concept reduces memory consumption and unproductive index updates.
KeyX supports a wide range of queries: pure path queries, queries with a predicate (key value comparison), range queries, queries with the self-ordescendant axis and wildcard queries. The indexes for all these queries use the same data structure, so we do not need different techniques to support the different query types.Our index is built upon specific element or attribute values which we call keys. The return value of the query is a reference to one ore more elements or attributes in the XML data and differs from the key in most cases, e.g. selecting a book(return value) by its title(key). Thus KeyX avoids costly navigation in the XML data if the key of a predicate-query is not the returned element.
The selection of indexes is an important task when tuning a database which is performed by a database administrator or an index propagation tool which suggests a set of suitable indexes. The Index Selection Problem (ISP) is transferred to our KeyX index. Applying the ISP, a workload of database operations is analyzed and a set of selective indexes that minimizes the total execution time for the workload is suggested. Because the workload is analyzed periodically and suitable indexes are created or dropped automatically our implementation of KeyX guarantees high performance over the total life time of a database.
Our approach works without any schema like DTD or XML Schema. This is an important demand for XDBMS. In schemaless XML data element types may appear or disappear during the lifetime of the database. This is the reason why it is impossible to define all indexes in advance.

Involved researcher:

Involved students (in alphabetical order):


Publications


webmaster / 25.06.2004