Skip to main content

Showing 1–13 of 13 results for author: Kant, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.05235  [pdf, other

    cs.CV

    SPAD : Spatially Aware Multiview Diffusers

    Authors: Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin

    Abstract: We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross-view interactions, and fine-tune it on a high quality subset of Objaverse. We find that a naive extension of the self-attention proposed in prior work (e.g. MVD… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Webpage: https://yashkant.github.io/spad

  2. arXiv:2402.00867  [pdf, other

    cs.CV

    AToM: Amortized Text-to-Mesh using 2D Diffusion

    Authors: Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

    Abstract: We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly output representations other than polygonal meshes, AToM directly generates high-quality textured meshes in less than 1 second with around 10 times re… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 19 pages with appendix and references. Webpage: https://snap-research.github.io/AToM/

  3. arXiv:2312.14154  [pdf, other

    cs.CV

    Virtual Pets: Animatable Animal Generation in 3D Scenes

    Authors: Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, Liangyan Gui, Hsin-Ying Lee

    Abstract: Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment. To circumvent the limited availability of 3D motion data aligned with environmental geometry, we leverage monocular internet videos and extract deformable NeRF representations for the fo… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Preprint. Project page: https://yccyenchicheng.github.io/VirtualPets/

  4. arXiv:2310.16167  [pdf, other

    cs.CV

    iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

    Authors: Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

    Abstract: We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaver… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to SIGGRAPH Asia, 2023 (Conference Papers)

  5. arXiv:2304.06937  [pdf, other

    cs.CV

    CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

    Authors: Tianshu Kuai, Akash Karthikeyan, Yash Kant, Ashkan Mirzaei, Igor Gilitschenski

    Abstract: Animating an object in 3D often requires an articulated structure, e.g. a kinematic chain or skeleton of the manipulated object with proper skinning weights, to obtain smooth movements and surface deformations. However, existing models that allow direct pose manipulations are either limited to specific object categories or built with specialized equipment. To reduce the work needed for creating an… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: Project Page: https://camm3d.github.io/

  6. arXiv:2302.09227  [pdf, other

    cs.CV cs.GR

    Invertible Neural Skinning

    Authors: Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

    Abstract: Building animatable and editable models of clothed humans from raw 3D scans and poses is a challenging problem. Existing reposing methods suffer from the limited expressiveness of Linear Blend Skinning (LBS), require costly mesh extraction to generate each new pose, and typically do not preserve surface correspondences across different poses. In this work, we introduce Invertible Neural Skinning (… ▽ More

    Submitted 4 March, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

  7. arXiv:2301.06866  [pdf, other

    cs.CV

    Building Scalable Video Understanding Benchmarks through Sports

    Authors: Aniket Agarwal, Alex Zhang, Karthik Narasimhan, Igor Gilitschenski, Vishvak Murahari, Yash Kant

    Abstract: Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We… ▽ More

    Submitted 26 March, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  8. arXiv:2207.01583  [pdf, other

    cs.CV

    LaTeRF: Label and Text Driven Object Radiance Fields

    Authors: Ashkan Mirzaei, Yash Kant, Jonathan Kelly, Igor Gilitschenski

    Abstract: Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method fo… ▽ More

    Submitted 18 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Journal ref: European Conference on Computer Vision (ECCV) 2022

  9. arXiv:2205.10712  [pdf, other

    cs.CV

    Housekeep: Tidying Virtual Households using Commonsense Reasoning

    Authors: Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal

    Abstract: We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging misplaced objects without explicit instructions specifying which objects need to be rearranged. Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house. Specifically, w… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  10. arXiv:2111.03994   

    cs.HC cs.CV cs.LG

    NarrationBot and InfoBot: A Hybrid System for Automated Video Description

    Authors: Shasta Ihorn, Yue-Ting Siu, Aditya Bodi, Lothar Narins, Jose M. Castanon, Yash Kant, Abhishek Das, Ilmi Yoon, Pooyan Fazli

    Abstract: Video accessibility is crucial for blind and low vision users for equitable engagements in education, employment, and entertainment. Despite the availability of professional and amateur services and tools, most human-generated descriptions are expensive and time consuming. Moreover, the rate of human-generated descriptions cannot match the speed of video production. To overcome the increasing gaps… ▽ More

    Submitted 11 January, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: This article has been withdrawn by arXiv administration due to an unresolvable authorship dispute

  11. arXiv:2010.06087  [pdf, other

    cs.CV

    Contrast and Classify: Training Robust VQA Models

    Authors: Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

    Abstract: Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions. Existing approaches address this by augmenting the dataset with question paraphrases from visual question generation models or adversarial perturbations. These approaches use the combined data to learn an answer classifier by m… ▽ More

    Submitted 18 April, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

  12. arXiv:2007.12146  [pdf, other

    cs.CV

    Spatially Aware Multimodal Transformers for TextVQA

    Authors: Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

    Abstract: Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images to answer a question. Existing approaches are limited in their use of spatial relations and rely on fully-connected transformer-like architectures to implicitly learn the spatial structure of a scene. I… ▽ More

    Submitted 22 December, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: Accepted at European Conference on Computer Vision, 2020

  13. arXiv:1901.09517  [pdf, other

    cs.LG stat.ML

    ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks)

    Authors: Harshal Mittal, Kartikey Pandey, Yash Kant

    Abstract: This work is a part of ICLR Reproducibility Challenge 2019, we try to reproduce the results in the conference submission PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks. Adaptive gradient methods proposed in past demonstrate a degraded generalization performance than the stochastic gradient descent (SGD) with momentum. The authors try to address… ▽ More

    Submitted 28 January, 2019; originally announced January 2019.

    Comments: ICLR Reproducibility Challenge 2019 Report for Padam (11 pages, 30 figures)