Repository logo
  • English
  • 中文
Log In
Have you forgotten your password?
  1. Home
  2. College of Electrical Engineering and Computer Science / 電機資訊學院
  3. Networking and Multimedia / 資訊網路與多媒體研究所
  4. Image Graph Construction and Semantic Annotation for Large-Scale Social Multimedia
 
  • Details

Image Graph Construction and Semantic Annotation for Large-Scale Social Multimedia

Date Issued
2014
Date
2014
Author(s)
Hsieh, Liang-Chi
URI
http://ntur.lib.ntu.edu.tw//handle/246246/263469
Abstract
In recent years, mobile devices equipped with cameras prevail on consumer markets. These devices plus the emerged trend of multimedia sharing on social networks, makes the scale of multimedia data grow explosively. These raw multimedia data are usually stored without well organized. That causes significant challenge to further retrieving and using these content. With regard to the large-scale multimedia content, we can explore and leverage hidden relations and semantic meanings to help us create useful multimedia applications. In this dissertation, we focus on two problems faced in dealing with large-scale multimedia: data volume and semantics. First, for the data volume problem, in order to improve navigation and search experience over large-scale image data, we investigate the efficient method to construct image graphs that represent visual and semantic relations between images. We leverage constructed graphs to build efficient and scalable group-based image search system. Binary codes are very compact representation for storing and searching image data. However, how to efficient index and search very large-scale images encoded as longer binary codes is still a challenging problem. We propose a new search framework for very large-scale binary image codes that leverages GPU devices to achieve better performance and storage efficiency than previous works. For the second problem with regard to multimedia semantics, we propose several methods to extract semantics from multimedia content shared in social networks. There exist bother visual and semantic relations between images. These relations can be explored to help us better navigate and use image collections. However, current image search systems generally use multi-pages image list to display their search results. The list causes no significant harm when the user''s search target is obvious. However, in the case with the query of higher ambiguity, it is usually difficult for users to find their search targets in such long image list. The kind of paged image lists causes browsing problem for mobile devices too. That is because mobile devices are usually only equipped display screen with limited size. Thus, we propose to build a group-based image search system that summarizes image search results in semantic and visual groups. We leverage visual and semantic relations of images to construct image graphs at offline stage. This design makes the system be efficient at responding user online query. In order to scale up for large-scale images, we propose to use modern parallel technology MapReduce to solve scalability issue in this system. Compared with constructing graphs on single machine, our graph construction method is 69 times faster. In order to solve the data volume problem faced by processing very large-scale image data, binary codes are recently recognized as enabling and promising technique for encoding and searching images. The compact representation of binary code provides better storage efficiency when dealing with huge image data. Besides, compared with other image representations, the pairwise similarity computation of binary codes is much faster. For example, the similarity comparison between a query and millions of binary codes can be done in less than one second with very simple baseline method of linear scanning. These advantages make binary codes as an important component for applications on very large-scale image data. However, when it is required to encode very large-scale image data (at least 1 billion images) as longer binary codes (more than 32 bits), how to efficiently store and search these binary codes still is a challenging problem. We propose a new framework to store and search very large-scale binary codes that leverages GPU devices. Compared with multiple hashing index method proposed in previous work, our random-sampling index approaches are more storage efficient and simpler. It supports both exact and approximate nearest neighbor search on binary codes. By leveraging the parallel computation of GPU, we also achieve faster search time performance than previous works. In order to further improve storage efficiency of our index, we propose a compression scheme for binary codes called bit compression. With GPU-based decompression method, compression version of index would not sacrifice too much search performance. Large-scale image data without properly annotated hinders image browsing and searching application. This problem motivates the development of effective automatic image annotation method. Given an image without textual information, automatic image annotation method can select best textual annotations for the image. Prior works in this area mostly focus on supervised learning approaches. These approaches are not practical due to poor performance, out-of-vocabulary problem, and being time-consuming in acquiring training data and learning. Thus, we claim that automatic image annotation by search over user-contributed photo sites (e.g., Flickr) would be an alternative solution to this problem. The intuition behind it is to select most suitable annotations for unlabeled image from the tags associated with visually similar user-contributed photos. However, the tags are generally few and noisy. To solve this problem, we propose a tag expansion method and use visual and semantic consistency between tag and image. We show that the proposed method significantly outperforms prior works and even provide more diverse annotations. Microblogging as a new form of communication on Internet, has attracted the attention from researchers recently. Relying the real-time and conversational properties of microblogging, its users update their statuses and share experience within their the social network. Those characteristics also make microblogging an important tool for users to share or discuss real world events such as earth quake or sport game. We propose a novel and flexible solution to detect and recognize real-time events from sport games based on analyzing the messages posted on microblogging services. We take Twitter as the experiment platform and collect a large-scale dataset of Twitter messages that are called tweets for 18 prominent sport games covering four types of sports in 2011. We also collect corresponding sport videos for those games. The proposed solution applies moving-threshold burst detection on the volume of tweets to detect highlights in sport games. A tf-idf-based weighting method is applied on the tweets within detected highlights for semantic extraction. According to the experiments we perform on the tweet and video datasets, we find that the proposed methods can achieve competent performance in sport event detection and recognition. Besides, our method can find non pre-defined tidbits that are difficult to detect in previous works. Not all images are interesting to people. People are drawn by interesting images and ignore tasteless ones. Image interestingness has the importance no less than other subjective image properties that have received significant research interest, but has not been systematically studied before. In this proposal, we focus on visual and social aspects of image interestingness. We rely on crowdsourcing tools to survey human perceptions for these subjective properties and verify data by analyzing consistency and reliability. We show that people have an agreement when deciding if an image is interesting or not. We examine the correlation between the social, visual aspects of interestingness and aesthetics. By exploring the correlation, we find that: (1) Weak correlation between social interestingness and both of visual interestingness and image aesthetics indicates that the images frequently re-shared by people are not necessarily aesthetic or visually interesting. (2) High correlation between image aesthetics and visual interestingness implies aesthetic images are more likely to be visually interesting to people. Then we wonder what features of an image lead to social interestingness, e.g. receiving more likes and shares on social networking sites? We train classifiers to predict visual and social interestingness and investigate the contribution from different image features. We find that social and visual interestingness can be best predicted with color and texture, respectively, providing a way to manipulate social and visual liking of images with image features. Further, we investigate the correlation between social/visual image interestingness and image color. We find that colors with arousal effect show more frequently in images with higher social interestingness. That could be explained by previous studies for activation-related affect of colors and provides useful and important advice when advertising on social networking sites.
Subjects
社群多媒體
影像圖
語意標記
分散式運算
Type
thesis
File(s)
Loading...
Thumbnail Image
Name

ntu-103-D97944011-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):45a56cc1fd0a1d1defbbc0856af89e7f

臺大位居世界頂尖大學之列,為永久珍藏及向國際展現本校豐碩的研究成果及學術能量,圖書館整合機構典藏(NTUR)與學術庫(AH)不同功能平台,成為臺大學術典藏NTU scholars。期能整合研究能量、促進交流合作、保存學術產出、推廣研究成果。

To permanently archive and promote researcher profiles and scholarly works, Library integrates the services of “NTU Repository” with “Academic Hub” to form NTU Scholars.

總館學科館員 (Main Library)
醫學圖書館學科館員 (Medical Library)
社會科學院辜振甫紀念圖書館學科館員 (Social Sciences Library)

開放取用是從使用者角度提升資訊取用性的社會運動,應用在學術研究上是透過將研究著作公開供使用者自由取閱,以促進學術傳播及因應期刊訂購費用逐年攀升。同時可加速研究發展、提升研究影響力,NTU Scholars即為本校的開放取用典藏(OA Archive)平台。(點選深入了解OA)

  • 請確認所上傳的全文是原創的內容,若該文件包含部分內容的版權非匯入者所有,或由第三方贊助與合作完成,請確認該版權所有者及第三方同意提供此授權。
    Please represent that the submission is your original work, and that you have the right to grant the rights to upload.
  • 若欲上傳已出版的全文電子檔,可使用Open policy finder網站查詢,以確認出版單位之版權政策。
    Please use Open policy finder to find a summary of permissions that are normally given as part of each publisher's copyright transfer agreement.
  • 網站簡介 (Quickstart Guide)
  • 使用手冊 (Instruction Manual)
  • 線上預約服務 (Booking Service)
  • 方案一:臺灣大學計算機中心帳號登入
    (With C&INC Email Account)
  • 方案二:ORCID帳號登入 (With ORCID)
  • 方案一:定期更新ORCID者,以ID匯入 (Search for identifier (ORCID))
  • 方案二:自行建檔 (Default mode Submission)
  • 方案三:學科館員協助匯入 (Email worklist to subject librarians)

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science