Skip to main content

Showing 1–19 of 19 results for author: Jeni, L A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2405.18438  [pdf, other

    cs.CV

    GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts

    Authors: Zoltán Á. Milacski, Koichiro Niinuma, Ryosuke Kawamura, Fernando de la Torre, László A. Jeni

    Abstract: The connection between our 3D surroundings and the descriptive language that characterizes them would be well-suited for localizing and generating human motion in context but for one problem. The complexity introduced by multiple modalities makes capturing this connection challenging with a fixed set of descriptors. Specifically, closed vocabulary scene encoders, which require learning text-scene… ▽ More

    Submitted 8 April, 2024; originally announced May 2024.

    Comments: 18 pages, 5 figures

  3. arXiv:2312.11894  [pdf, other

    cs.CV cs.AI cs.LG

    3D-LFM: Lifting Foundation Model

    Authors: Mosam Dabhi, Laszlo A. Jeni, Simon Lucey

    Abstract: The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3DPO and PAUL) with resilience to noise, occlusions, and p… ▽ More

    Submitted 26 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Visit the project page at https://3dlfm.github.io for links to additional media, code, and videos. The site also features a custom GPT tailored to address queries related to 3D-LFM. Accepted at CVPR 2024

  4. arXiv:2312.05664  [pdf, other

    cs.CV

    CoGS: Controllable Gaussian Splatting

    Authors: Heng Yu, Joel Julin, Zoltán Á. Milacski, Koichiro Niinuma, László A. Jeni

    Abstract: Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and ren… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  5. arXiv:2312.00937  [pdf, other

    cs.CV

    Zero-Shot Video Question Answering with Procedural Programs

    Authors: Rohan Choudhury, Koichiro Niinuma, Kris M. Kitani, László A. Jeni

    Abstract: We propose to answer zero-shot questions about videos by generating short procedural programs that derive a final answer from solving a sequence of visual subtasks. We present Procedural Video Querying (ProViQ), which uses a large language model to generate such programs from an input question and an API of visual modules in the prompt, then executes them to obtain the output. Recent similar proce… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 16 pages, 7 figures

  6. arXiv:2310.16832  [pdf, other

    cs.CV

    LightSpeed: Light and Fast Neural Light Fields on Mobile Devices

    Authors: Aarush Gupta, Junli Cao, Chaoyang Wang, Ju Hu, Sergey Tulyakov, Jian Ren, László A Jeni

    Abstract: Real-time novel-view image synthesis on mobile devices is prohibitive due to the limited computational power and storage. Using volumetric rendering methods, such as NeRF and its derivatives, on mobile devices is not suitable due to the high computational cost of volumetric rendering. On the other hand, recent advances in neural light field representations have shown promising real-time view synth… ▽ More

    Submitted 26 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Project Page: http://lightspeed-r2l.github.io/ . Add camera ready version

    Journal ref: NeurIPS 2023

  7. arXiv:2309.07910  [pdf, other

    cs.CV

    TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting

    Authors: Rohan Choudhury, Kris Kitani, Laszlo A. Jeni

    Abstract: Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. We significantly reduce computation compared to the s… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted at ICCV 2023

  8. arXiv:2304.12455  [pdf, other

    cs.CV

    Unsupervised Style-based Explicit 3D Face Reconstruction from Single Image

    Authors: Heng Yu, Zoltan A. Milacski, Laszlo A. Jeni

    Abstract: Inferring 3D object structures from a single image is an ill-posed task due to depth ambiguity and occlusion. Typical resolutions in the literature include leveraging 2D or 3D ground truth for supervised learning, as well as imposing hand-crafted symmetry priors or using an implicit representation to hallucinate novel viewpoints for unsupervised methods. In this work, we propose a general adversar… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: CVPR workshop

  9. arXiv:2303.16333  [pdf, other

    cs.CV

    Flow supervision for Deformable NeRF

    Authors: Chaoyang Wang, Lachlan Ewen MacDonald, Laszlo A. Jeni, Simon Lucey

    Abstract: In this paper we present a new method for deformable NeRF that can directly use optical flow as supervision. We overcome the major challenge with respect to the computationally inefficiency of enforcing the flow constraints to the backward deformation field, used by deformable NeRFs. Specifically, we show that inverting the backward deformation function is actually not needed for computing scene f… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  10. arXiv:2303.14243  [pdf, other

    cs.CV

    DyLiN: Making Light Field Networks Dynamic

    Authors: Heng Yu, Joel Julin, Zoltan A. Milacski, Koichiro Niinuma, Laszlo A. Jeni

    Abstract: Light Field Networks, the re-formulations of radiance fields to oriented rays, are magnitudes faster than their coordinate network counterparts, and provide higher fidelity with respect to representing 3D structures from 2D observations. They would be well suited for generic scene representation and manipulation, but suffer from one problem: they are limited to holistic and static scenes. In this… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  11. arXiv:2211.08610  [pdf, other

    cs.CV

    CoNFies: Controllable Neural Face Avatars

    Authors: Heng Yu, Koichiro Niinuma, Laszlo A. Jeni

    Abstract: Neural Radiance Fields (NeRF) are compelling techniques for modeling dynamic 3D scenes from 2D image collections. These volumetric representations would be well suited for synthesizing novel facial expressions but for two problems. First, deformable NeRFs are object agnostic and model holistic movement of the scene: they can replay how the motion changes over time, but they cannot alter it in an i… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: accepted by FG2023

  12. arXiv:2210.01721  [pdf, other

    cs.CV cs.AI cs.LG

    MBW: Multi-view Bootstrap** in the Wild

    Authors: Mosam Dabhi, Chaoyang Wang, Tim Clifford, Laszlo Attila Jeni, Ian R. Fasel, Simon Lucey

    Abstract: Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022 conference. Project webpage and code: https://github.com/mosamdabhi/MBW

  13. arXiv:2202.12368  [pdf, other

    cs.CV

    Instantaneous Physiological Estimation using Video Transformers

    Authors: Ambareesh Revanur, Ananyananda Dasari, Conrad S. Tucker, Laszlo A. Jeni

    Abstract: Video-based physiological signal estimation has been limited primarily to predicting episodic scores in windowed intervals. While these intermittent values are useful, they provide an incomplete picture of patients' physiological status and may lead to late detection of critical conditions. We propose a video Transformer for estimating instantaneous heart rate and respiration rate from face videos… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: 13 pages, 4 figures, AAAI workshop and Springer Studies in Computational Intelligence 2022. For project page see https://github.com/revanurambareesh/instantaneous_transformer

  14. arXiv:2109.10471  [pdf, other

    cs.CY cs.CV cs.LG eess.IV

    The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation

    Authors: Ambareesh Revanur, Zhihua Li, Umur A. Ciftci, Lijun Yin, Laszlo A. Jeni

    Abstract: Telehealth has the potential to offset the high demand for help during public health emergencies, such as the COVID-19 pandemic. Remote Photoplethysmography (rPPG) - the problem of non-invasively estimating blood volume variations in the microvascular tissue from video - would be well suited for these situations. Over the past few years a number of research groups have made rapid advances in remot… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: ICCVw'21. V4V Dataset and Challenge: https://vision4vitals.github.io/

  15. arXiv:2106.05779  [pdf, other

    cs.CV cs.GR

    Deep Implicit Surface Point Prediction Networks

    Authors: Rahul Venkatesh, Tejan Karmali, Sarthak Sharma, Aurobrata Ghosh, R. Venkatesh Babu, László A. Jeni, Maneesh Singh

    Abstract: Deep neural representations of 3D shapes as implicit functions have been shown to produce high fidelity models surpassing the resolution-memory trade-off faced by the explicit representations using meshes and point clouds. However, most such approaches focus on representing closed shapes. Unsigned distance function (UDF) based approaches have been proposed recently as a promising alternative to re… ▽ More

    Submitted 14 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: 22 pages, 17 figures

  16. arXiv:2103.06498  [pdf, other

    cs.CV cs.AI

    3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

    Authors: Xiangyu Xu, Hao Chen, Francesc Moreno-Noguer, Laszlo A. Jeni, Fernando De la Torre

    Abstract: 3D human pose and shape estimation from monocular images has been an active research area in computer vision. Existing deep learning methods for this task rely on high-resolution input, which however, is not always available in many scenarios such as video surveillance and sports broadcasting. Two common approaches to deal with low-resolution images are applying super-resolution techniques to the… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2007.13666

  17. arXiv:2010.10979  [pdf, other

    cs.CV

    Synthetic Expressions are Better Than Real for Learning to Detect Facial Actions

    Authors: Koichiro Niinuma, Itir Onal Ertugrul, Jeffrey F Cohn, László A Jeni

    Abstract: Critical obstacles in training classifiers to detect facial actions are the limited sizes of annotated video databases and the relatively low frequencies of occurrence of many actions. To address these problems, we propose an approach that makes use of facial expression generation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  18. arXiv:2007.13666  [pdf, other

    cs.CV cs.LG eess.IV

    3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning

    Authors: Xiangyu Xu, Hao Chen, Francesc Moreno-Noguer, Laszlo A. Jeni, Fernando De la Torre

    Abstract: 3D human shape and pose estimation from monocular images has been an active area of research in computer vision, having a substantial impact on the development of new applications, from activity recognition to creating virtual avatars. Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images; however, high-resolution visual content is no… ▽ More

    Submitted 9 August, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: ECCV 2020, project page: https://sites.google.com/view/xiangyuxu/3d_eccv20

  19. arXiv:1702.04174  [pdf, other

    cs.CV

    FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge

    Authors: Michel F. Valstar, Enrique Sánchez-Lozano, Jeffrey F. Cohn, László A. Jeni, Jeffrey M. Girard, Zheng Zhang, Lijun Yin, Maja Pantic

    Abstract: The field of Automatic Facial Expression Analysis has grown rapidly in recent years. However, despite progress in new approaches as well as benchmarking efforts, most evaluations still focus on either posed expressions, near-frontal recordings, or both. This makes it hard to tell how existing expression recognition approaches perform under conditions where faces appear in a wide range of poses (or… ▽ More

    Submitted 14 February, 2017; originally announced February 2017.

    Comments: FERA 2017 Baseline Paper