Facing media-information explosion era, demand is rapidly growing for searching and identifying desired media information not only by the keywords but on the basis of audio, video or image content itself. We consider media search a core technology for such needs. The most basic function of media search involves detecting and locating media fragments that are "similar" to a fragment of audio, video or image given as a query, on a huge amount of unlabelled audio, video or image archives. It is not a simple task; technical challenges include achieving high robustness because media signals are often converted to various encoding formats, mixed with other signals such as background music, or even edited and re-edited into different versions. Search speed is also crucial, considering the huge volumes of audio and video being created, distributed, and exchanged by individuals, public institutions, and corporations around the world. From this perspective, we have been developing new principles and methods of fast and robust media search, including searches based on appearance and structural similarities with respect to various properties.
Media search, along with keyword and voice search, will be a vital, future information retrieval method. For example, media search on portable devices (see figure) is just becoming practical. Moreover, we expect that media search will be used for the media cloud, in which huge amounts of media data are stored, created, distributed, and consumed. Media search will be a core technology that explores the links between one form of media data and another, or media data and various kinds of information in the media cloud, based on the relationships between the parts of the media content, such as partial similarity or partial quotations. We anticipate that media search will be an essential technology to fully utilize all the media information now exploding worldwide.
We developed RMS, a similarity-based search technology, for audio and video. It offers excellent robustness, accuracy, and very high search speed using coarsely quantized features and their spatiotemporal consistency. We have been improving its performance through a variety of field tests in the real world, including Internet monitoring and broadcast background music search. Over the last decade, we have achieved much higher accuracy and robustness and over 1,000 times the processing speed.
Media Link Analysis is a technique for automatically adding annotations to a vast amount of media data based on partial similarity. For example, for simultaneously recorded multiple-channel TV broadcasts, the analysis compares the latest video data with the stored data. If any segments match, a link is created. The number of links means the number of usages. The analysis shows the usage count for each segment, and the counts can be used as an index of popularity or importance.
Image matching techniques detect a region that matches a target image in a larger input image. Our proposed technique analyzes an input image, adaptively partitions it into cells, and determines the order of the comparisons for each cell to reduce the matching computation. It then efficiently prunes unnecessary matches and guarantees detection of the optimal position. With these mechanisms, our experimentation showed that our technique is up to 600 times faster than an existing standard method.