Showing 1–1 of 1 results for author: Moghaddam, M M K
-
Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection
Authors:
Mohammad Mahdi Kazemi Moghaddam,
Ehsan Abbasnejad,
Javen Shi
Abstract:
Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shop** experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer…
▽ More
Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shop** experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer vision systems to do so, however, is still an open problem due to its intrinsic challenges as well as extrinsic enforced difficulties like lack of publicly available data and unique environment conditions (wild). In this work, We emphasise on detecting the first and by far the most crucial cue in behaviour analysis; that is human activity detection in computer vision. To do so, we introduce a framework for integrating human pose and object motion to both temporally detect and classify the activities in a fine-grained manner (very short and similar activities). We incorporate partial human pose and interaction with the objects in a multi-stream neural network architecture to guide the spatiotemporal attention mechanism for more efficient activity recognition. To this end, in the absence of pose supervision, we propose to use the Generative Adversarial Network (GAN) to generate exact joint locations from noisy probability heat maps. Additionally, based on the intuition that complex actions demand more than one source of information to be identified even by humans, we integrate the second stream of object motion to our network as a prior knowledge that we quantitatively show improves the recognition results. We empirically show the capability of our approach by achieving state-of-the-art results on MERL shop** dataset. We further investigate the effectiveness of this approach on a new shop** dataset that we have collected to address existing shortcomings.
△ Less
Submitted 26 June, 2019; v1 submitted 10 May, 2019;
originally announced May 2019.