JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]


Journal of Information Science and Engineering, Vol. 33 No. 2, pp. 499-515


The Normalized Distance Preserving Binary Codes and Distance Table


HONGWEI ZHAO1,2, ZHEN WANG1, PINGPING LIU1,2 AND BIN WU1
1
School of Computer Science and Technology
2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education
Jilin University
ChangChun, 130012 P.R.
China
E-mail: {zhaohw; liupp}@jlu.edu.cn; {wangzhen14; wubin14}@mails.jlu.edu.cn


    In the Euclidean space, the approximate nearest neighbors (ANN) search measures the similarity degree through computing the Euclidean distances, which owns high time complexity and large memory overhead. To address these problems, this paper maps the data from the Euclidean space into the Hamming space, and the normalized distance similarity restriction and the quantization error are required to satisfy. Firstly, the encoding centers and their binary labels are obtained through a lookup-based mechanism. Then, the candidate hashing functions are learnt under supervision of the binary labels, and the ones which satisfy the entropy criterion are selected to boost the distinctiveness of the learnt binary codes. During the training procedure, multiple groups of the hashing functions are generated based on different kinds of centers, which can weaken the inferior influence of the initial centers. The data with minimal average Hamming distances are returned as the nearest neighbors. In the Hamming space, different Euclidean distances may be substituted by one identical value, thus a distance table is predefined to distinguish the similarity degrees among the data pairs with the same Hamming distance. The final experimental results show that our algorithm is superior to many state-of-the-art methods.


Keywords: approximate nearest neighbors search, binary codes, hashing algorithm, semi- supervised learning, entropy

  Retrieve PDF document (JISE_201702_13.pdf)