Improving Distributed Query Processing by Hash-Semijoins
Judy C. R. Tseng and Arbee L. P. Chen+ Department of Computer Science Chung-Hua Polytechnic Institute 30 Tung Shiang, Hsinchu, Taiwan, R.O.C. + Department of Computer Science National Tsing Hua University Hsinchu, 300, Taiwan, R.O.C.
The semijoin is an effective relational operator which is often applied in distributed query processing to reduce data transmission cost. A semijoin operation is said to be cost-effective if the benefit obtained exceeds the cost of executing it. In this paper, we propose a new relational operator, called a hashsemijoin, to greatly reduce the cost by sacrificing some benefits. This new operator is designed based on the concept of search filters. We formulate a sequence of theorems and devise an algorithm to transform a semijoin program into a more cost-effective one by backware replacing certain traditional semijoins with hash-semijoins.