Journal of Information Science and Engineering, Vol. 33 No. 2, pp. 499-515

The Normalized Distance Preserving Binary Codes and Distance Table

In the Euclidean space, the approximate nearest neighbors (ANN) search measures the similarity degree through computing the Euclidean distances, which owns high time complexity and large memory overhead. To address these problems, this paper maps the data from the Euclidean space into the Hamming space, and the normalized distance similarity restriction and the quantization error are required to satisfy. Firstly, the encoding centers and their binary labels are obtained through a lookup-based mechanism. Then, the candidate hashing functions are learnt under supervision of the binary labels, and the ones which satisfy the entropy criterion are selected to boost the distinctiveness of the learnt binary codes. During the training procedure, multiple groups of the hashing functions are generated based on different kinds of centers, which can weaken the inferior influence of the initial centers. The data with minimal average Hamming distances are returned as the nearest neighbors. In the Hamming space, different Euclidean distances may be substituted by one identical value, thus a distance table is predefined to distinguish the similarity degrees among the data pairs with the same Hamming distance. The final experimental results show that our algorithm is superior to many state-of-the-art methods.

Keywords:
approximate nearest neighbors search, binary codes, hashing algorithm, semi- supervised learning, entropy