To evaluate the performance of our instant mobile video search system, we use the video dataset in TRACVID 2011 , which contains more than 19,200 videos, with the length of 600 hours in total. However, in past works, query videos are all generated by programs, which are different from the real-world mobile video queries recorded by mobile users. Therefore, we asked 25 volunteers to record video clips from the source dataset as queries. In order to better imitate the possible using habits of different mobile users, the choice of the volunteers is diversity: there are eight females and 17 males in the volunteers. Their age distribution is shown in Figure 1 (a) and career distribution is shown in Figure 1 (b). In addition, most of the volunteers have at least one year experience of using smart phone. All of them have ever watched the videos on the mobile devices, including eight users who frequently watch videos by mobile. They recorded the videos according to their interesting and using different mobile devices (iPhone 4S, HTC z710t, Samsung Galaxy S3, etc.). Finally, we have 1,400 query videos and the average length is 30 seconds. Figure 1 (c) shows the category distribution of the recorded query videos. Some keyframes of the queries can be seen in Figure 1 (d). According to the examples, we can find that the recording process may produce serious image distortions, such as blurs, color changes, reflection and affine transformation. Besides, the recorded audio signals are also noisy or even silent.
Figure 1. A dataset of real-world mobile video queries.
We randomly invited 12 users from the volunteers who help us to record the query videos to use our proposed mobile video search system (LAVES). The subjects include three female and nine male company staffs and college students, with the ages ranging from 22 to 36. We only invite company staffs and college students as they are the main users of the mobile video search system. For the subjects have attended the recording task, only after three minutes of orientation and demonstration, all of them know how to use the system very well. From the interview, we found that 10 subjects thought the instant video search process is very cool when they first saw it. It is worth noting when they knew our system does not restrict users to record either audio or video, all the subjects thought it is very convenient and said that they will try to use the system only based on audio or video manually. This indicates that the switches among single-modality-based and multi-modality-based solutions are preferred by users.
After learning the system well, the subjects were asked to use the application to accomplish the following two tasks.
Task 1. Each subject selected 5~10 videos and tried to search them by the LAVES. As our system has not been released to the public yet, the subjects are unable to conduct the search tasks outsides the lab. Thus, the query videos were chosen from the TRACVID 2011 dataset described in Section 2. Besides, the subjects can search the video based on audio, video, or both of them.
Task 2. In this task, we compared our LAVES with the popular mobile video search applications (i.e., IntoNow and VideoSurf). After learning how to use these applications, all subjects tried to use the three applications freely by themselves for 20 minutes. Then, a questionnaire was filled by each subject to evaluate the usability, user friendliness, and user experience.
In Task 1, the subjects searched 120 videos in total, among which only 16 tasks failed to find the similar videos from the top five similar videos. In addition, all users thought the progressive search process reduces their query time and improves their search experience. According to our records, the average query time for all users is 8.5 seconds. While in Task 2, the quantitative evaluation of user satisfaction scores with these mobile video search applications is listed in Table 1. This shows the advantage of LAVES over the other two mobile video search applications. All the users thought our application is attractive and easy to use with a friendly user interface, especially without restrictions on the use of audio or video as a query input. 91.67% subjects were satisfied with the progressive search process. They thought it is natural and gave 4.3/5.0 for its effectiveness. Most of the subjects gave a positive response when they were asked whether they would install this application and recommend to their friends. Moreover, the subjects gave useful comments to us, such as "add more fashionable videos to the dataset", "add search results shared function" and so on.
Table 1. A summary of user study by comparing three mobile video search applications: (a) IntoNow , (b) VideoSurf , and (c) LAVES (our work). 1กซ5 indicate the worst to the best level.
||Easy to use
||Effectiveness of progressive search
To encourage research work on mobile video search, we will provide the following information about the dataset according to your request:
(1) The query video dataset and the ground-truth.
(2) The source videos are got from IACC.1.tv10.training, IACC.1.A and IACC.1.B in TRACVID 2011, which can be downloaded from .
For the all these information, please email your full name and affiliation to the contact person (firstname.lastname@example.org). We ask for your information only to make sure the dataset is used for non-commercial research purposes. We will not give it to any third parties or publish it publicly anywhere.
If you use the dataset, please cite the following paper:
Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, Shipeng Li. "Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing,"
Proc. of ACM Multimedia 2013: 887-896. [PDF] [VIDEO]
If you have any questions about the MVQ dataset, feel free to contract:
Name: Wu Liu
We are continuously striving to improve the dataset, and will greatly appreciate any comments and suggestions from you.