NERCMS Obtained Great Results in TRECVID Reviews
It is reported that in the international video analysis and retrieval technology evaluation TRECVID, the team, led by Prof. Hu Ruimin, School of Computer Science, Wuhan University, has achieved the best results, in 30 (MAP) of 0.758, marking the team has entered the field of international video retrieval first echelon, in November 2016.
This retrieval task requires the evaluation team to retrieve a video clip of a particular person in a particular scene from the massive video data (more than 470,000 video clips) (see Figure 1). Evaluation team using multimedia retrieval, computer vision, machine learning and other technologies to analyze and understand the video content and identify the official evaluation of the subject content, the more be found , the higher the average accuracy of retrieval,the evaluation system is better. This task supports the user to propose two aspects of the search conditions of people and scenes, the evaluation system in the massive video to find both conditions to meet the video clips. For example, users want to get from the massive video "Mr. Obama at White House " video. Obviously, it is not easy to find the video of "Mr. Obama" because of different dress and attitude of Mr. Obama. The system needs to be sorted out in the "White House" among the many similar scenes that appear in Mr. Obama. Therefore, this year's evaluation task is very challenging.
The joint team led by Prof. Hu Ruimin, Dean of School of Computer Science, Wuhan University, put forward a multi-scale deconvolution regression face detection network and deeply embedded face recognition system in the face of many different sizes of search characters, many attitude changes and large background interference. The method of scene retrieval based on the combination of local perspective and global perspective is put forward, which effectively reduces the missed rate of the scene. Finally, a new method of scene retrieval is proposed. On this basis, the team further integrates human prior knowledge, with multi-source cross-modal information, filtering a large number of unmanned faces, outdoor scenes and vehicles and other irrelevant information, thus greatly reducing the noise information source. The organizer, the National Institute of Standards and Technology (NIST), says the system is unique, interesting, clever, and informative "(your system is unique, interesting, clever and ultimately informative). In the end, the team designed the system in the 30 official search of the subject, the retrieval accuracy of 0.758 average best score, marking the team has entered the field of international video search first echelon.