Journal of Information Science and Engineering, Vol. 35 No. 2, pp. 471-484

An Efficient Search Algorithm for Fingerprint Databases

Department of Computer Engineering
Seoul National University of Science and Technology
Seoul, 139-743 Korea

In this paper, we present an efficient search algorithm for fingerprint databases that store songs or data with a similar structure. A song is represented by a high dimensional binary vector using the audio fingerprinting technique. Audio fingerprinting extracts from a song a fingerprint which is a content-based compact signature that summarizes an audio recording. A song can be recognized by matching an extracted fingerprint to a database of known audio fingerprints. In this paper, we are given a binary fingerprint database of songs and focus our attention on the problem of effective and efficient database search. However, the nature of high dimensionality and binary space makes many modern search algorithms inapplicable. The high dimensionality of fingerprints suffers from the curse of dimensionality, i.e., as the dimension increases, the search performance decreases exponentially. In order to tackle this problem, we propose a new search algorithm based on inverted indexing, the multiple sub-fingerprint match principle, the offset match principle, and the early termination strategy. We evaluate our technique using a database of 2,000 songs containing approximately 4,000,000 sub-fingerprints and the experimental result shows encouraging performance.

Keywords: fingerprint database, binary database, audio fingerprint, similarity search, audio identification

