KeyX: Key-Oriented XML Index
deutsche Version
Description of the project:
The Extensible Markup Language (XML) is designed to become the standard data format
for exchanging information in the Internet. The increasing usage of XML data by
web applications in electronic commerce for instance demands the connection between
XML technology and database management systems.
Indexes are usually used to accelerate specific queries in database
management systems. In relational database management systems (RDBMS) indexes
are broadly explored and implemented in commercial products for a long
time. Indexes in XML database management systems (XDBMS) are still an active
field of research. Different approaches supporting different types of queries
were introduced in the past. Approaches that are not selective to specific queries
require the whole XML data to be indexed and may lead to enormous space consumption
and poor performance if changes to the XML data occur often.
In this project we introduce a new
index approach, called key-oriented XML index (KeyX), that uses specific XML
element or attribute values as keys referencing arbitrary nodes in the XML data.
KeyX is selective to specific queries avoiding efforts spent for elements which
are never queried. This concept reduces memory consumption and unproductive
index updates.
KeyX supports a wide range of queries: pure path queries, queries
with a predicate (key value comparison), range queries, queries with the self-ordescendant
axis and wildcard queries. The indexes for all these queries use the
same data structure, so we do not need different techniques to support the different
query types.Our index is built upon specific element
or attribute values which we call keys. The return value of the query is a
reference to one ore more elements or attributes in the XML data and differs from
the key in most cases, e.g. selecting a book(return value) by its title(key). Thus KeyX
avoids costly navigation in the XML data if the key of a predicate-query is not the
returned element.
The selection of indexes is an important task when tuning a
database which is performed by a database administrator or an index propagation
tool which suggests a set of suitable indexes.
The Index Selection Problem (ISP) is transferred to our KeyX index. Applying the ISP, a
workload of database operations is analyzed and a set of selective indexes that
minimizes the total execution time for the workload is suggested. Because the
workload is analyzed periodically and suitable indexes are created or dropped
automatically our implementation of KeyX guarantees high performance over
the total life time of a database.
Our approach works without any schema like DTD or XML
Schema. This is an important demand for XDBMS. In schemaless XML data element
types may appear or disappear during the lifetime of the database. This is the
reason why it is impossible to define all indexes in advance.
Involved researcher:
- Dipl.-Inf. Beda Christoph Hammerschmidt (former)
- Dipl.-Inform. Martin Kempa (former)
Involved students (in alphabetical order):
- Chu Dangxing (Exchange student)
- Konstantin Ens (Bachelor thesis)
- Timm Gehrmann (Bachelor thesis)
- Khaled Haj-Yahya (Bachelor thesis)
- Florian Massel (Bachelor thesis)
- Alexander Pfalzgraf (Bachelor thesis)
- Philipp Stursberg (Bachelor thesis)
webmaster / 25.06.2004