Action Recognition and Localization using Weakly-Supervised Learning

 

Image

 

We present a novel probabilistic model for recognizing actions by identifying and extracting information from discriminative regions in videos. The model is trained in a weakly-supervised manner: training videos are annotated only with training label with- out any action location information within the video. Additionally, we eliminate the need for any pre-processing measures to help shortlist candidate action locations. Our local- ization experiments on UCF Sports dataset show that the discriminative regions produced by this weakly supervised system are comparable in quality to action locations produced by systems that require training on datasets with fully annotated location information. Furthermore, our classification experiments on UCF Sports and two other major action recognition benchmark datasets, HMDB and UCF101, show that our recognition system significantly outperforms the baseline models and is better than or comparable to the state-of-the-art.

 

Action Recognition by Weakly-Supervised Discriminative Region Localization. [paper] [poster] [bibtex]
Hakan Boyraz, Syed Zain Masood, Baoyuan Liu, Marshall F. Tappen and Hassan Foroosh.
Proc. of British Machine Vision Conference (BMVC), Nottingham, UK (September 2014).

Face Recognition using Social Network Information

 

Image Image

 

We propose an album-oriented face-recognition model that exploits the album structure for face recognition in online social networks. Albums, usually associated with pictures of a small group of people at a certain event or occasion, provide vital information that can be used to effectively reduce the possible list of candidate labels. We show how this intuition can be formalized into a model that expresses a prior on how albums tend to have many pictures of a small number of people. We also show how it can be extended to include other information available in a social network. Using two real-world datasets independently drawn from Facebook, we show that this model is broadly applicable and can significantly improve recognition rates.

 

Exploring Album Structure for Face Recognition in Online Social Networks [paper] [bibtex]
Jason Hochreiter, Zhongkai Han, Syed Zain Masood, Spencer Fonte and Marshall F. Tappen.
Image and Vision Computing, (January 2014)

 

Album-Oriented Face Recognition For Online Social Networks. [paper] [presentation] [bibtex]
(Best Student Paper Honorable Mention Award)
Zhongkai Han, Syed Zain Masood, Jason Hochreiter, Spencer Fonte and Marshall F. Tappen.
Proc. of IEEE International Conference on on Automatic Face and Gesture Recognition (FG), Shanghai, China (April 2013).

The recognition code is coming soon.
Face Recognition code: [code]

Reducing Observational Latency for Action Recognition

 

Image

 

This work presents a novel dataset and algorithms for reducing the latency in recognizing the action. Latency in classification is minimized with a classifier based on logistic regression that uses canonical poses to identify the action. The classifier is trained from the dataset using a learning formulation that makes it possible to train the classifier to reduce latency. The classifier is compared against both a Bag of Words and a Conditional Random Field classifier and is found to be superior in both pre-segmented and on-line classification tasks.

 

Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition [paper] [bibtex]
Syed Zain Masood, Chris Ellis, Marshall F. Tappen, Joseph J. LaViola Jr. and Rahul Sukthankar.
International Journal of Computer Vision, Volume 101 Issue 3 (February 2013)

 

Measuring and Reducing Observational Latency when Recognizing Actions. [paper] [presentation] [bibtex]
Syed Zain Masood, Chris Ellis, Adarsh Nagaraja, Marshall F. Tappen, Joseph J. LaViola Jr. and Rahul Sukthankar.
Proc. of IEEE International Conference on Computer Vision (ICCV) Workshops, Barcelona, Spain (November 2011).

The UCF Kinect dataset is available here.
UCF Kinect dataset: [dataset]

Correcting Cuboid Corruption for Action Recognition

 

Image

 

The purpose of this research project is to identify weakness in systems based on popular descriptors. We create a new synthetic complex dataset and show using experiments that introduction of complex background causes a significant degradation in recognition performance. Parameter fine-tuning or better interest point selection is unable to resolve the issue since the problem lies at the cuboid level. It is only by eliminate background information within cuboids that leads to a significant improvement.

 

Correcting Cuboid Corruption For Action Recognition In Complex Environment. [paper] [presentation] [bibtex]
Syed Zain Masood, Adarsh Nagaraja, Nazar Khan, Jiejie Zhu and Marshall F. Tappen.
Proc. of IEEE International Conference on Computer Vision (ICCV) Workshops, Barcelona, Spain (November 2011).

The synthetically created UCF Weizmann Dynamic dataset is available here.
UCF Weizmann Dynamic dataset: [dataset]

Shadow Detection in Monochromatic Images

 

Image

 

This research task focuses on detecting shadows in monochromatic natural images. We emphasize the fact that shadow classification is challenging but possible in the absence of invariant color cues. We propose using shadow variant and invariant cues from illumination, texture and derivative characteristics with a Boosted Decision Tree (BDT) classifier integrated with a Conditional Random Field (CRF).

 

Learning to Recognize Shadows in Monochromatic Natural Images. [paper] [bibtex]
Jiejie Zhu, Kegan Samuel, Syed Zain Masood and Marshall F. Tappen.
Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA (June 2010).

Pixel Saturation Removal

 

Image

 

The goal for this project is to devise an efficient and effective method of estimating the correct response of saturated pixels in an image. Saturation, a term used for pixels exhibiting color values greater than 235, can occur in some or all color channels. We rely on non-saturated color channels ratios of a pixel and its neighbors in calculating the correct response for the saturated channel. If however, the ratios can not be estimated, we use neighboring pixel color information to compute the true color values.

 

Automatic Correction of Saturated Regions in Photographs using Cross-Channel Correlation. [paper] [presentation] [bibtex]
Syed Zain Masood, Jiejie Zhu and Marshall F. Tappen.
Computer Graphics Forum (CGF), International Journal of Eurographics Association, Volume 28 Issue 7 (October 2009)

The code and images are available here. The code only works for Raw images.
Saturation Removal in Raw Images: [code] [images]

Code for converting non-linear Jpegs to linear Tiff is also provided.
Image Linearization using RBFs: [code]