藉由條件隨機域與排名保留特性提升雜湊學習之效果

指導教授：徐宏民臺灣大學：資訊工程學研究所吳君哲Wu, Chun-CheChun-CheWu2014-11-262018-07-052014-11-262018-07-052013http://ntur.lib.ntu.edu.tw//handle/246246/261414為了在高維度、高資料量的資料中找到相似的資料，將每一筆資料以一串二元訊號表示是近幾年相當流行的一種作法。如此一來，我們可以大大地壓縮需要表示每筆資料的儲存空間，更重要的是，當我們在搜尋相近的資料時，我們只要計算資料之間的漢名距離(Hamming Distance)，由於漢名距離是相當有效率的運算，因此我們可以大幅地縮短運算時間。為了使二元訊號的表示法更簡潔與更具辨別性，如何有效地利用從資料中所能得到的資訊(例如：資料之間的幾何關係、每對資料之間的類別等資訊) 去學習較好的表示法就變得相當重要。然而，二元表示法的學習是一個NP-Hard 的問題。許多人將二元的條件放鬆，如此一來，問題可簡化成一個eigen-problem(可以有效率的最佳化)。但是，條件的放鬆會引入其他的誤差，會使得我們所學得的雜湊函式偏差。因此，在這篇論文中，我們將此問題看成一個有在前資訊的二元標記問題，我們提出了一個有效的二元表示學習法來抑制先前相關文獻的偏差問題。我們利用條件隨機域作為我們的數學模型，來描述我們的問題。更者，我們也提出了比先前文獻更有效地促進技巧來提升我們演算法的表現，使我們所習得的二元表示法更加地簡直與具有分辨性。我們的實驗結果證實了我們演算法的成功，它的表現超越了先前的文獻Transforming data into binary codes for Approximate Nearest Neighbor (ANN) search has caught lots of attention in recent years. Two major advantages of binary codes are dramatically reducing the search time and storage. To make the codes more discriminative and more compact, it is crucial to leverage the (partial) pair-wise label1/similarity information if provided. However, binary code learning is an integer programming problem, which is NP-hard. So, one promising branch of solutions is to relax the problem into real value domain as an eigen-problem. Nevertheless, the relaxation will introduce additional errors which will bias the learning results due to the quadratic objective function. Hence, we treat the hash generation process in a novel aspect, i.e., (binary) labeling problem with prior knowledge. That is, we propose a binary code enhancement method to suppress the bias effect resulting from eigen-based solutions, and model the problem into Conditional Random Field (CRF). Moreover, we also adopt the boosting scheme to preserve the distance (rank) in Hamming space. Our experimental results show that our framework has significant improvement compared to the prior works.口試委員會審定書i 誌謝ii 摘要iii Abstract iv 1 Introduction 1 2 Related Works and Backgrounds 5 2.0.1 Spectral Hashing (SpH) . . . . . . . . . . . . . . . . . . . . . . 6 2.0.2 Semi-Supervised Hashing (SSH) . . . . . . . . . . . . . . . . . . 6 2.0.3 Sequential Projection Learning for Hashing (SPLH) . . . . . . . 7 3 Motivations 8 3.1 Biased Projection Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Boosting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2.1 Philosophy of Boosting . . . . . . . . . . . . . . . . . . . . . . . 9 3.2.2 Ordinal Loss Boosting Strategy . . . . . . . . . . . . . . . . . . 10 4 Sequential CRF Enhanced Binary Codes with Rank-Preserving Boosting 11 4.1 Initialization of Projection Hyperplanes wk . . . . . . . . . . . . . . . . 12 4.2 CRF Enhanced Binary Codes (CEBC) . . . . . . . . . . . . . . . . . . . 13 4.3 Projection Hyperplanes wk . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.5 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.6 Rank-Preserving Boosting (RP-boosting) . . . . . . . . . . . . . . . . . 16 4.7 Extension to Semi-Supervised . . . . . . . . . . . . . . . . . . . . . . . 17 5 Experiments 18 5.1 Datasets and Abbreviation Explanations . . . . . . . . . . . . . . . . . . 18 5.2 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2.1 CRF Enhanced Binary Codes . . . . . . . . . . . . . . . . . . . 20 5.2.2 Rank-Preserving Boosting . . . . . . . . . . . . . . . . . . . . . 21 5.2.3 The Effect of Sequential Learning . . . . . . . . . . . . . . . . . 22 5.2.4 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 22 5.2.5 Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . 23 5.2.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 23 6 Conclusion 25 Bibliography 261322200 bytesapplication/pdf論文公開時間：2014/03/08論文使用權限：同意有償授權(權利金給回饋本人)搜尋與檢索雜湊學習監督學習藉由條件隨機域與排名保留特性提升雜湊學習之效果Hash Learning with Conditional Random Field and Rank Preserving Boostingthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/261414/1/ntu-102-R00922045-1.pdf