Home |
Teaching |
Research |
Publications |
Resume |
Links |
Contact Me |
Action Recognition and Localization using Weakly-Supervised Learning
We present a novel probabilistic model for recognizing actions by identifying and extracting information
from discriminative regions in videos. The model is trained in a weakly-supervised manner: training
videos are annotated only with training label with- out any action location information within the video.
Additionally, we eliminate the need for any pre-processing measures to help shortlist candidate action
locations. Our local- ization experiments on UCF Sports dataset show that the discriminative regions produced
by this weakly supervised system are comparable in quality to action locations produced by systems
that require training on datasets with fully annotated location information. Furthermore, our classification
experiments on UCF Sports and two other major action recognition benchmark datasets, HMDB and UCF101,
show that our recognition system significantly outperforms the baseline models and is better than or comparable
to the state-of-the-art.
Action Recognition by Weakly-Supervised Discriminative Region Localization. [paper] [poster] [bibtex] Hakan Boyraz, Syed Zain Masood, Baoyuan Liu, Marshall F. Tappen and Hassan Foroosh. Proc. of British Machine Vision Conference (BMVC), Nottingham, UK (September 2014). Face Recognition using Social Network Information
We propose an album-oriented
face-recognition model that exploits the album structure for
face recognition in online social networks. Albums, usually
associated with pictures of a small group of people at a certain
event or occasion, provide vital information that can be used
to effectively reduce the possible list of candidate labels. We
show how this intuition can be formalized into a model that
expresses a prior on how albums tend to have many pictures
of a small number of people. We also show how it can be
extended to include other information available in a social
network. Using two real-world datasets independently drawn
from Facebook, we show that this model is broadly applicable
and can significantly improve recognition rates.
Exploring Album Structure for Face Recognition in Online Social Networks [paper] [bibtex] Jason Hochreiter, Zhongkai Han, Syed Zain Masood, Spencer Fonte and Marshall F. Tappen. Image and Vision Computing, (January 2014) Album-Oriented Face Recognition For Online Social Networks. [paper] [presentation] [bibtex] (Best Student Paper Honorable Mention Award) Zhongkai Han, Syed Zain Masood, Jason Hochreiter, Spencer Fonte and Marshall F. Tappen. Proc. of IEEE International Conference on on Automatic Face and Gesture Recognition (FG), Shanghai, China (April 2013). The recognition code is coming soon. Face Recognition code: [code] Reducing Observational Latency for Action Recognition
This work presents a novel dataset and algorithms for reducing the latency in recognizing the action. Latency in classification is minimized with a classifier based on logistic
regression that uses canonical poses to identify the action. The classifier is trained from the dataset using a learning formulation that makes it possible to train the
classifier to reduce latency. The classifier is compared against both a Bag of Words and a Conditional Random Field classifier and is found to be superior in both pre-segmented
and on-line classification tasks.
Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition [paper] [bibtex] Syed Zain Masood, Chris Ellis, Marshall F. Tappen, Joseph J. LaViola Jr. and Rahul Sukthankar. International Journal of Computer Vision, Volume 101 Issue 3 (February 2013) Measuring and Reducing Observational Latency when Recognizing Actions. [paper] [presentation] [bibtex] Syed Zain Masood, Chris Ellis, Adarsh Nagaraja, Marshall F. Tappen, Joseph J. LaViola Jr. and Rahul Sukthankar. Proc. of IEEE International Conference on Computer Vision (ICCV) Workshops, Barcelona, Spain (November 2011). The UCF Kinect dataset is available here. UCF Kinect dataset: [dataset] Correcting Cuboid Corruption for Action Recognition
The purpose of this research project is to identify weakness in systems based on popular descriptors. We create a new synthetic complex dataset and show using experiments that
introduction of complex background causes a significant degradation in recognition performance. Parameter fine-tuning or better interest point selection is unable to resolve the
issue since the problem lies at the cuboid level. It is only by eliminate background information within cuboids that leads to a significant improvement.
Correcting Cuboid Corruption For Action Recognition In Complex Environment. [paper] [presentation] [bibtex] Syed Zain Masood, Adarsh Nagaraja, Nazar Khan, Jiejie Zhu and Marshall F. Tappen. Proc. of IEEE International Conference on Computer Vision (ICCV) Workshops, Barcelona, Spain (November 2011). The synthetically created UCF Weizmann Dynamic dataset is available here. UCF Weizmann Dynamic dataset: [dataset] Shadow Detection in Monochromatic Images
This research task focuses on detecting shadows in monochromatic natural images. We emphasize the fact that shadow classification is challenging but possible in the absence of
invariant color cues. We propose using shadow variant and invariant cues from illumination, texture and derivative characteristics with a Boosted Decision Tree (BDT) classifier
integrated with a Conditional Random Field (CRF).
Learning to Recognize Shadows in Monochromatic Natural Images. [paper] [bibtex] Jiejie Zhu, Kegan Samuel, Syed Zain Masood and Marshall F. Tappen. Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA (June 2010). Pixel Saturation Removal
The goal for this project is to devise an efficient and effective method of estimating the correct response of saturated pixels in an image. Saturation, a term used for pixels
exhibiting color values greater than 235, can occur in some or all color channels. We rely on non-saturated color channels ratios of a pixel and its neighbors in calculating the
correct response for the saturated channel. If however, the ratios can not be estimated, we use neighboring pixel color information to compute the true color values.
Automatic Correction of Saturated Regions in Photographs using Cross-Channel Correlation. [paper] [presentation] [bibtex] Syed Zain Masood, Jiejie Zhu and Marshall F. Tappen. Computer Graphics Forum (CGF), International Journal of Eurographics Association, Volume 28 Issue 7 (October 2009) The code and images are available here. The code only works for Raw images. Saturation Removal in Raw Images: [code] [images] Code for converting non-linear Jpegs to linear Tiff is also provided. Image Linearization using RBFs: [code] |