Anchor graph hashing is a widely used hashing approach for nearest-neighbor search for high-dimensional data. We developed an efficient algorithm for anchor graph hashing that guarantees the same search results as the original algorithm of anchor graph hashing. The original algorithm incurs a high computational cost to compute hashes, but our algorithm is highly efficiency in obtaining hashes for large-scale data.
The original algorithm computes an m×m dense matrix representing the relationships between anchors to compute hashes. It computes the hashes from eigenvectors of the dense matrix. However, since it incurs high computational cost to compute eigendecomposition for the dense matrix, it requires large computational time to obtain the hashes. Our algorithm computes the eigenvalues of a sparse tridiagonal matrix that has the same eigenvalues as the dense matrix, and it computes the corresponding eigenvectors to efficiently obtain the hashes.
Our approach can reduce the computation time to obtain hashes up to 300 times faster than the original algorithm. Our algorithm enables us to compute hashes within one hour for large-scale datasets with tens of millions of data points and enable real-time nearest-neighbor search.
Our algorithm enables real-time nearest-neighbor search without sacrificing the search result. Since anchor graph hashing is used in many applications, such as drug discovery and cancer detection, our algorithm can be a breakthrough in improving the scalability of applications.
Recognition Research Group, Media Information Laboratory, NTT Communication Science Laboratories