Data Integration by Describing Sources with Constraint Databases


Xun Cheng, Department of Computing Science, University of California, Santa Barbara

Guozhu Dong, Department of Computer Science and Engineering, Wright State University, gdong@cs.wright.edu

Tzekwan Lau, Department of Computing Science, University of California, Santa Barbara

Jianwen Su, Department of Computing Science, University of California, Santa Barbara, su@cs.ucsb.edu


IEEE International Conference on Data Engineering (ICDE), Sydney, March, 1999.


Abstract

We develop a data integration approach for the efficient evaluation of queries over autonomous source databases. The approach is based on some novel applications and extensions of constraint databases techniques. We assume the existence of a global database schema. The contents of each data source are described using a set of constraint tuples over the global schema; each such tuple indicates possible contributions from the source. The "source description catalog" (SDC) of a global relation consists of its associated constraint tuples. Such a way of description is advantageous since it is flexible to add new sources and to modify existing ones. In our framework, to evaluate a conjunctive query over the global schema, a plan generator first identifies relevant data sources by "evaluating" the query against the SDCs using techniques of constraint query evaluation; it then formulates an evaluation plan, consisting of some specialized queries over different paths. The evaluation of a query associated with a path is done by a sequence of partial evaluations at data sources along the path, similar to side-ways information passing of Datalog; the partially evaluated queries travel along their associated paths. Our SDC-based query planning is efficient since it avoids the NP-complete query rewriting process. We can achieve further optimization using techniques such as emptiness test.