Rapid Similarity Prediction, Forensic Search & Retrieval in Video

Download Project Report (Phase 2, Year 4)

Project Description

Overview and Significance

This project develops video analytics for maintaining airport and perimeter security. Our objectives include real-time suspicious activity detection, seamless tracking of individuals across sparse multi-camera networks and the forensic search of individuals and activities in years of archived data.

Surveillance networks are becoming increasingly effective in the public and private sector. Generally, use of these surveillance networks falls into a real-time or forensic capacity. For real-time use, the activities of interest are known a-priori and the challenge is to detect those activities as they occur in the video; whereas for forensic use, the data is archived until a user decides on an activity to search for. Forensic use calls for a method of content-based retrieval in large video corpuses, based on user-defined queries.

The significance of a real-time activity monitoring effort to the Department of Homeland Security (DHS) is that these methods will enable the real-time detection of suspicious activities and entities throughout an airport by seamlessly tagging and tracking objects. Suspicious activities include suspicious baggage drops, suspicious behavior and abandoning objects. The significance of the forensic search capability is that it will allow for an autonomous search that matches user defined activity queries in years of compressed data. These include detecting incidents such as a baggage drop, etc., as well as finding all precursors of an incident, such as who met with this target/person. To put this into perspective, Boston Logan International Airport (BOS) currently has the capability to store ~1 month’s data and much of the forensics requires significant human involvement. Current approaches are not scalable given the ever-increasing deployment of cameras.

We will describe ongoing efforts on both real-time monitoring and forensic search in more detail below. In general, identifying relevant information for tracking across multiple cameras with non-overlapping views is challenging. This is difficult given the wide range of variations, ranging from the traditional pose, illumination and scale issues to spatio-temporal variations of a scene itself. We propose to develop robust techniques for a variety of environments including unstructured, highly cluttered and occluded scenarios. A significant focus of the project is the development of robust features. An important consideration is that the selected features should not only be informative and easy to extract from the raw video but should also be invariant to pose, illumination and scale variations. Traditional approaches have employed photometric properties. However,these features are sensitive either to pose, illumination and scale variations or are sensitive to clutter. Moreover, they do not help capture the essential patterns of activity in the field of view. Consequently, they are not sufficiently informative for generalization within a multi-camera framework.

Our outlier detection will be coupled with longer-term semantic threat discovery. In this context, we plan to leverage our multi-camera tag and track algorithms.
Phase 2 Year 2 Annual Report
Project Leader
  • Venkatesh Saligrama
    Associate Professor
    Boston University

  • David Castañón
    Boston University

Faculty and Staff Currently Involved in Project
  • Ziming Zhang

Students Currently Involved in Project
  • Yuting Chen
    Boston University
  • Gregory Castanon
    Boston University
  • Marc Eder