Dr. Juan Cao (²Ü¾ê)

enjoy my work, enjoy my life

MultiMedia Computing Group
Institute of Computing Technology
Chinese Academy of Science

TEL: +86-010-62600890   Email:caojuan@ict.ac.cn

No.6 South Kexueyuan Road, Haidian District, Beijing 100190, China

       I am an associate professor at Multimedia Computing Group (MCG), Institute of Computing Technology (ICT), Chinese Academy of Science (CAS), the leader of MCG is Prof. YongDong Zhang. I received my Ph.D at ICT supervised by Prof. JinTao Li, and I got the B.E. and M.S. degrees in computing science from Xiang Tan University supervised by Prof. JingYe Zhou.
        I was a Senior Research Associate in the Video Retrieval Group(VIREO), City University of Hong Kong from May 2009 to August 2009, under the supervision of Dr. Chong-Wah Ngo;
and I was a visiting scholar in Digital Video/ Multimedia Lab(DVMM), Columbia University from 2010 to 2011, under the supervision of Professor Shih-Fu Chang.

Research Interests

        My current research interests include social network analysis, social multimedia mining, and the applications on real systems, such as the Real-time UGC News Verification System developed for XinHua News Agency, and a Large Scale XinHua KnowledgeBase integrated professional articles and social network information.

Systems and Databases

        I have participated many large projects. As the person in charge, I have successfully designed and developed the following video retrieval systems with my excellent team members.

  • Real-time UGC News Verification System

  •         Online social media, like Mircoblog has become one of the most important news communication media nowadays. However, it is also filled with rumours and fake news. Without verification, such information could spread promptly through social network and result in serious consequences. To verify numerous user generated contents on social media, we develop this real-time system. This system evaluates a UGC news from several aspects: the news source, content, propagation pattern and key participants. We not only give a comprehensive judgement from all the aspects, but also give insights into each aspects to justify the conclusion. The real-time feature of this system enables itself to discover immediate news and give evaluation promptly. With desirable interface, users can search events, browse hot or typical cases and inspect evaluation details for each news conveniently on our system.

                 Figure 1 Front page

            Figure 2 the evaluation result of a news

  • Interactive video retrieval system: VideoMap

  •         Here we introduce the highlights of our interactive video retrieval system VideoMap. To enhance the efficiency, the system has a map-based displaying interface, which gives the user a global view about the similarity relationships among the whole video collection, and provides an active annotating manner to quickly localize the potential positive samples. Meanwhile, the proposed map supports multiple modality feedback, including the visual shots, high-level concepts and keywords. The system can improve the retrieval performance by automatically optimizing these feedback strategies.

                 Figure 1 VideoMap¡¯s retrieval interface

            Figure 2 video map for active recommendation

    The demos of VideoMap system can be found in http://mcg.ict.ac.cn/videomap.htm

  • Large-Scale Web Video database: MCG-WEBV

  •         The goal of MCG-WEBV is constructing a general database for all kinds of web video research. Firstly, it collects the most viewed videos of every month on YouTube, which are most valuable to do mining for their high quality and popular contents. Meanwhile, the database is expanded to the related videos and ones uploaded by the same authors, which aims to keep the original social network information on YouTube. Secondly, the database provides comprehensive features for the video analysis and process, including the raw videos, keyframes, metadata features, web statistic features, and low-level features of textual, visual and audio. Finally, the database also includes the ground-truth of 73 hot web topics by human annotation, and the labels of 15 video categories from YouTube website.
             The version 1.0 of MCG-WEBV released in 2009 consists of 80,031 videos from Dec. 2008 to Feb. 2009, with 3,283 core videos and 76,748 expanded videos. The version 2.0 of MCG-WEBV contains 248,887 videos from Dec. 2008 to Nov. 2009, with 14,473 core videos and 234,414 expanded videos.
             Until now, a lot of web video related researches have been studied at this database such as topic discovery and track, web video categorization etc. We believe that MCG-WEBV will provide a significant foundation for web video research.

    The database can be downloaded from http://mcg.ict.ac.cn/mcg-webv.htm

  • Video Retrieval System in International Evaluation

  • Figure 3. the flowchart of MCG-ICT-CAS automatic search system
            The MCG-ICT-CAS automatic search system for TRECVID 2008 is as Fig. 3. In the concept-based module, we propose a novel distribution based concept selection (DBCS) approach, which achieves a stable good performance for all the topics (0.053). In the visual-based module, we focus on the low dimensional semantic features by Latent Dirichlet Allocation model and get an infAP of 0.033. Finally, a re-ranking technology based on the motion and face and a multi-runs and multi-examples fusion approach (SSC) were applied to aggregate the basic search results, which produced a significant improvement.

    I and my group participate the search task of the international video retrieval evaluation(TRECVID) from 2007 to 2010, and have got encouraging results. Our automatic video retrieval system has won the second and first prize separetly in 2007 and 2008. The interactive video retrieval system has won the second prize in 2009.

    Figure 4. MCG-ICT-CAS automatic search
    results in TRECVID 2008

    Figure 5. MCG-ICT-CAS automatic search
    results in TRECVID 2007

  • Go2View web video retrieval engine

  •         This is a prototype system for web video index and retrieval. It includes the collector for web video content, high-dimensional indexing technology, and multi-modality retrieval supporting the visual examples and textual queries.
            My main work in this system is the text-based index and retrieval. Moreover, I developed a semantic retrieval module, which can automatically analyze the queries by Latent Semantic Indexing(LSI),and find the semantic relevant results.



    1. Juan Cao, YongDong Zhang, RongRong Ji, Fei Xie, Yu Su, Web Video Topic Discovery and Structuralization with Social Network, Neurocomputing, 2014, in press.
    2. Juan Cao, YongDong Zhang, RongRong Ji, Xin Li, On Application-Unbiased Benchmarking of Web Videos from a Socail Network Perspective, Multimedia Tools and Applications, 2014, in press.
    3. Zhiwei Jin, Juan Cao, Yu-Gang Jiang, YongDong Zhang, News Credibility Evaluation on Microblog with a Hierarchical Propagation Model, IEEE International Conference on Data Mining, 2014, ShenZhen, China.


    1. Xingyu Gao, Juan Cao, Qin He, Jintao Li, A novel method for geographical social event detection in social media, ICIMCS, Huangshan, China, 2013.
    2. Xingyu Gao, Juan Cao, Zhiwei Jin, Xin Li, Jintao Li, GeSoDeck: A Geo-Social Event Detection and Tracking System, ACM Multimedia, 2013.
    3. YiCheng Song, YongDong Zhang, Juan Cao, Xingyu Gao, Jintao Li, A Unified Geolocation Framework for Web Videos, ACM Transactions on Intelligent Systems and Technology(TIST), 2013.


    1. Yingcheng Song, YongDong Zhang, Juan Cao, Tian Xia, Wu Liu£¬JinTao Li, Web Video Geolocation by Geotagged Social Resources, IEEE Transaction on Multimedia(TMM), 14(2):456-470, 2012.
    2. Zhineng Chen, Chong-Wah Ngo, Juan Cao, and Wei Zhang, Community as a Connector: Associating Faces with Celebrity Names in Web Videos, ACM Multimedia(ACM MM), Nara, Japan, October 2012.


    1. Juan Cao, Chong-Wah Ngo, YongDong Zhang, JinTao Li, Tracking Web Video Topics: Discovery, Visualization and Monitoring, IEEE Transactions on Circuits and Systems for Video Technology(CSVT), 2011, in press.
    2. Lin Pang, Juan Cao, YongDong Zhang, Shouxun Lin, Leveraging Collective Wisdom for Web Video Retrieval through Heterogeneous Community Discovery , ACM Multimedia, 2011.
    3. Bailan Feng, Lei Bao, Juan Cao, Yongdong Zhang, Shouxun Lin. An Effective Video Retrieval Method Based on Multi-mode Concept Relation Graph. Chinese Journal of Computer-aided Design & Computer Graphics, 2011.
    4. Zhineng Chen, Juan Cao, Tian Xia, Yicheng Song, Yongdong Zhang, Jintao Li, Web Video Retagging, Multimedia Tools and Applications 2011.
    5. Lin Pang, Juan Cao, Lei Bao, Yongdong Zhang and Shouxun Lin. Towards hierarchical context: unfolding visual community potential for interactive video retrieval, Multimedia Tools and Applications, 2011.


    1. Juan Cao, Chong-Wah Ngo, YongDong Zhang, Liang Ma, Trajectory-based Visualization of Web Video Topics, ACM International Conference on Multimedia (ACM MM), Florence, Italy, 2010 (accepted).
    2. Juan Cao,Yong-Dong Zhang, Lin Pang, Bai-Lan Feng, Jin-Tao Li, Known-Item Search by MCG-ICT-CAS , TREC Video Retrieval Evaluation Online Proceedings (TRECVID), 2010.
    3. Lei Bao, Juan Cao, YongDong Zhang, MingYu Chen, JinTao Li, Alexander Hauptmann, Explicit and Implicit Concept-based Video Retrieval with Bipartite Graph Propagation Model, ACM International Conference on Multimedia (ACM MM), 2010 (accepted).
    4. Yingcheng Song, Juan Cao, ZhiNeng Chen, YongDong Zhang, JinTao Li, Tag Transformer, ACM International Conference on Multimedia (ACM MM), Florence, Italy, 2010 (accepted).
    5. Zhineng Chen, Juan Cao, Yicheng Song, YongDong Zhang, JinTao Li, Web Video Categorization based on Wikipedia Categories and Content-Duplicated Open Resources, ACM International Conference on Multimedia (ACM MM), Florence, Italy, 2010 (accepted).
    6. Bailan Feng, Juan Cao, Lei Bao, Yongdong Zhang, Shouxun Lin, Xiuguo Bao, Xiaochun Yun. Graph-Based Multi-Space Semantic Correlation Diffusion for Video Retrieval. The Visual Computer, International Journal of Computer Graphics (SCI, IF: 1.061), Accepted.
    7. Zhineng Chen, Juan Cao, Yicheng Song, Junbo Guo, Yongdong Zhang, Jintao Li: Context-oriented web video tag recommendation. WWW 2010: 1079-1080
    8. Bailan Feng£¬Juan Cao, Zhineng Chen, Yongdong Zhang, Shouxun Lin£¬Multi-Modal Query Expansion for Web Video Search the ACM SIGIR 2010, Geneva, Switzerland, 2010
    9. Lin Pang, Juan Cao, Junbo Guo, Shouxun Lin, Yan Song: Bag of Spatio-temporal Synonym Sets for Human Action Recognition. MMM 2010: 422-432

    1. Juan Cao, HongFang Jing, Chong-Wzah Ngo, YongDong Zhang, Distribution-based Concept Selection for Concept-based Video Retrieval, ACM International Conference on Multimedia (ACM MM), Beijing, China, Oct. 2009.
    2. Juan Cao, Tian Xia, Jintao Li, YongDong Zhang, Sheng Tang, A density-based method for adaptive LDA mod el selection, Neurocomputing, 72(7-9): 1775-1781 (2009)
    3. Juan Cao, YongDong Zhang, JunBo Guo, Lei Bao, JinTao Li, VideoMap: An Interactive Video Retrieval System of MCG-ICT-CAS. ACM International Conference on Image and Video Retrieval (CIVR), Santorin, 2009.
    4. Juan Cao, YongDong Zhang, YiCheng Song, ZhiNeng Chen, Xu Zhang, and JinTao Li, MCG-WEBV: A Benchmark Dataset for Web Video Analysis, Technical Report, ICT-MCG-09-001, Institute of Computing Technology, May. 2009
    5. Tian Xia, Juan Cao, YongDong Zhang, and JinTao Li. On defining affinity graph for spectral clustering through ranking on manifolds. In Neurocomputing , 72 (2009) 3203-3211.
    6. Lei Bao, Juan Cao, Tian Xia, YongDong Zhang, JinTao Li, Locally Non-negative Linear Structure Learning for Interactive Image Retrieval, ACM International Conference on Multimedia (ACM MM), Beijing, China, Oct. 2009.
    7. Bailan Feng, Juan Cao, Shouxun Lin, Yongdong Zhang, Kun Tao, motion region-based trajectory analysis and re-ranking for video retrieval, IEEE International Conference on Multimedia and Expo(ICME£©, 2009
    8. Xu Zhang, Yicheng Song, Juan Cao, Yongdong Zhang, Jintao Li, "Large Scale Incremental Web Video Categorization", In Proceeding of the 1st ACM MM2009 Workshop on Web-Scale Multimedia Corpus (WSMC), Beijing, 2009.
    9. Yicheng Song, Yongdong Zhang, Xu Zhang, Juan Cao, Jintao Li. Google Challenge: Incremental-Learning for Web Video Categorization on Robust Semantic Feature Space. In Proceedings of the 17th International ACM Conference on Multimedia (MM2009), Beijing, China, November 2009.


    1. Juan Cao, Jintao Li, Yongdong Zhang, The optimal condition of LDA model for video retrieval, Chinese Journal of Computers.Vol.31, no.10, pp.1780-1787,2008.
    2. Juan Cao, Yongdong Zhang, Bailan Feng, Xiufeng Hua, Lei Bao, and Xu Zhang, MCG-ICT-CAS TRECVID2008 search task report, TREC Video Retrieval Evaluation Online Proceedings (TRECVID), 2008


    1. Juan Cao, Jintao Li, Yongdong Zhang, Sheng Tang, LDA-Based Retrieval Framework for Semantic News Video Retrieval.IEEE International Conference on Semantic Computing (ICSC), 155-160, 2007
    2. Juan Cao, Sheng Tang, Jintao Li, Yongdong Zhang, Xuefeng Pan, A Lexicon-Guided LSI Method for Semantic News Video Retrieval, PCM2007, pp. 187-195, 2007.
    3. Sheng Tang, YongDong Zhang, JinTao Li, Juan Cao, Huanbo Luan, Qiaoyan He, Xu Zhang. TRECVID 2007 Search Tasks by NUS-ICT, TREC Video Retrieval Evaluation Online Proceedings (TRECVID), 2007
    4. Xuefeng Pan, Jintao Li, Yongdong Zhang, Sheng Tang, Juan Cao, Retrieval Method for Same Video Content in Different Format based on Spatiotemporal Features¡±, 29th European Conference on Information Retrieval (ECIR), Rome, Italy, 2-5 April 2007.


    1. Juan Cao, Jintao Li, Yongdong Zhang, and Sheng Tang, A Novel Method for Spoken Text Feature Extraction in Semantic Video Retrieval, PCM06, pp. 270 ¨C 278, 2006