-
Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning
Authors:
Nobline Yoo,
Olga Russakovsky
Abstract:
The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast a…
▽ More
The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast amount of unlabeled visual data, though at the present cost of accuracy. In this work, we explore ways to improve self-supervised HPE. We (1) analyze the relationship between reconstruction quality and pose estimation accuracy, (2) develop a model pipeline that outperforms the baseline which inspired our work, using less than one-third the amount of training data, and (3) offer a new metric suitable for self-supervised settings that measures the consistency of predicted body part length proportions. We show that a combination of well-engineered reconstruction losses and inductive priors can help coordinate pose learning alongside reconstruction in a self-supervised paradigm.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Point and Ask: Incorporating Pointing into Visual Question Answering
Authors:
Arjun Mani,
Nobline Yoo,
Will Hinthorn,
Olga Russakovsky
Abstract:
Visual Question Answering (VQA) has become one of the key benchmarks of visual recognition progress. Multiple VQA extensions have been explored to better simulate real-world settings: different question formulations, changing training and test distributions, conversational consistency in dialogues, and explanation-based answering. In this work, we further expand this space by considering visual qu…
▽ More
Visual Question Answering (VQA) has become one of the key benchmarks of visual recognition progress. Multiple VQA extensions have been explored to better simulate real-world settings: different question formulations, changing training and test distributions, conversational consistency in dialogues, and explanation-based answering. In this work, we further expand this space by considering visual questions that include a spatial point of reference. Pointing is a nearly universal gesture among humans, and real-world VQA is likely to involve a gesture towards the target region.
Concretely, we (1) introduce and motivate point-input questions as an extension of VQA, (2) define three novel classes of questions within this space, and (3) for each class, introduce both a benchmark dataset and a series of baseline models to handle its unique challenges. There are two key distinctions from prior work. First, we explicitly design the benchmarks to require the point input, i.e., we ensure that the visual question cannot be answered accurately without the spatial reference. Second, we explicitly explore the more realistic point spatial input rather than the standard but unnatural bounding box input. Through our exploration we uncover and address several visual recognition challenges, including the ability to infer human intent, reason both locally and globally about the image, and effectively combine visual, language and spatial inputs. Code is available at: https://github.com/princetonvisualai/pointingqa .
△ Less
Submitted 18 February, 2022; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Using the Higgs boson to probe the littlest Higgs model with T-parity through $Z_H W_H$ production at the LHC
Authors:
Kingman Cheung,
Kang Young Lee,
So Young Shim,
Jeonghyeon Song,
Namseok Yoo
Abstract:
In the littlest Higgs model with T-parity, the production cross section of the T-odd heavy gauge boson pair $Z_H W_H$ is quite sizable at the LHC. In addition, both the $W_H$ and $Z_H$ bosons have almost exclusively one decay channel into $W A_H$ and $H A_H$ respectively, where the dark matter candidate $A_H$ yields a large missing energy signal. Upon the discovery of the Higgs boson at 125 GeV, w…
▽ More
In the littlest Higgs model with T-parity, the production cross section of the T-odd heavy gauge boson pair $Z_H W_H$ is quite sizable at the LHC. In addition, both the $W_H$ and $Z_H$ bosons have almost exclusively one decay channel into $W A_H$ and $H A_H$ respectively, where the dark matter candidate $A_H$ yields a large missing energy signal. Upon the discovery of the Higgs boson at 125 GeV, we study the discovery sensitivity of the final state $ pp \to b\bar{b} l+\rlap{\,/}{E}_T$ to probe the model at the LHC. We find that the standard model backgrounds are manageable by applying suitable kinematic cuts. The LHC running at $\sqrt{s}=14$ TeV with a 100/fb total luminosity is sensitive to the model if the symmetry breaking scale $f$ is below about 850 GeV.
△ Less
Submitted 8 February, 2013; v1 submitted 4 February, 2013;
originally announced February 2013.