Skip to main content

Showing 1–22 of 22 results for author: Zhuang, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2406.05649  [pdf, other

    cs.CV cs.AI

    GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

    Authors: Peiye Zhuang, Songfang Han, Chaoyang Wang, Aliaksandr Siarohin, Jiaxu Zou, Michael Vasilkovsky, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quali… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 19 pages, 17 figures. Project page: https://snap-research.github.io/GTR/

  3. arXiv:2403.16056  [pdf, other

    cs.CL cs.AI

    Qibo: A Large Language Model for Traditional Chinese Medicine

    Authors: Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo

    Abstract: Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident pre… ▽ More

    Submitted 22 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  4. arXiv:2402.00867  [pdf, other

    cs.CV

    AToM: Amortized Text-to-Mesh using 2D Diffusion

    Authors: Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

    Abstract: We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly output representations other than polygonal meshes, AToM directly generates high-quality textured meshes in less than 1 second with around 10 times re… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 19 pages with appendix and references. Webpage: https://snap-research.github.io/AToM/

  5. arXiv:2401.05583  [pdf, other

    cs.CV

    Diffusion Priors for Dynamic View Synthesis from Monocular Videos

    Authors: Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Junli Cao, Guocheng Qian, Hsin-Ying Lee, Sergey Tulyakov

    Abstract: Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. Existing methods struggle to distinguishing between motion and structure, particularly in scenarios where camera poses are either unknown or constrained compared to object motion. Furthermore, with information solely from reference images, it is extremely challenging to hallucinate unseen regions t… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  6. arXiv:2312.08885  [pdf, other

    cs.CV

    SceneWiz3D: Towards Text-guided 3D Scene Composition

    Authors: Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce Scen… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Project page: https://zqh0253.github.io/SceneWiz3D/

  7. arXiv:2305.18766  [pdf, other

    cs.CV cs.AI cs.LG

    HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

    Authors: Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo

    Abstract: The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited… ▽ More

    Submitted 11 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Project page: https://hifa-team.github.io/HiFA-site/

  8. arXiv:2303.00165  [pdf, other

    cs.CV cs.AI

    Diffusion Probabilistic Fields

    Authors: Peiye Zhuang, Samira Abnar, Jiatao Gu, Alex Schwing, Joshua M. Susskind, Miguel Ángel Bautista

    Abstract: Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains. However, to adapt diffusion generative modeling to these domains the denoising network needs to be carefully designed for each domain independently, oftentimes under the assumption that data lives in a Euclidean grid. In this paper we introduce Diffusion Prob… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023. 20 pages, 17 figures

  9. arXiv:2211.03930  [pdf, other

    cs.CV cs.MM eess.IV

    ReLoc: A Restoration-Assisted Framework for Robust Image Tampering Localization

    Authors: Peiyu Zhuang, Haodong Li, Rui Yang, Jiwu Huang

    Abstract: With the spread of tampered images, locating the tampered regions in digital images has drawn increasing attention. The existing image tampering localization methods, however, suffer from severe performance degradation when the tampered images are subjected to some post-processing, as the tampering traces would be distorted by the post-processing operations. The poor robustness against post-proces… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 12 pages, 5 figures

  10. arXiv:2210.05835  [pdf, other

    cs.CV cs.AI cs.LG

    Synthetic Power Analyses: Empirical Evaluation and Application to Cognitive Neuroimaging

    Authors: Peiye Zhuang, Bliss Chapman, Ran Li, Oluwasanmi Koyejo

    Abstract: In the experimental sciences, statistical power analyses are often used before data collection to determine the required sample size. However, traditional power analyses can be costly when data are difficult or expensive to collect. We propose synthetic power analyses; a framework for estimating statistical power at various sample sizes, and empirically explore the performance of synthetic power a… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to Asilomar 2019

  11. arXiv:2210.05828  [pdf, other

    cs.CV cs.AI cs.LG

    AMICO: Amodal Instance Composition

    Authors: Peiye Zhuang, Jia-bin Huang, Ayush Saraf, Xuejian Rong, Changil Kim, Denis Demandolx

    Abstract: Image composition aims to blend multiple objects to form a harmonized image. Existing approaches often assume precisely segmented and intact objects. Such assumptions, however, are hard to satisfy in unconstrained scenarios. We present Amodal Instance Composition for compositing imperfect -- potentially incomplete and/or coarsely segmented -- objects onto a target image. We first develop object sh… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to BMVC 2021, 20 oages, 12 figures

  12. arXiv:2210.05825  [pdf, other

    cs.CV cs.AI

    Controllable Radiance Fields for Dynamic Face Synthesis

    Authors: Peiye Zhuang, Liqian Ma, Oluwasanmi Koyejo, Alexander G. Schwing

    Abstract: Recent work on 3D-aware image synthesis has achieved compelling results using advances in neural rendering. However, 3D-aware synthesis of face dynamics hasn't received much attention. Here, we study how to explicitly control generative model synthesis of face dynamics exhibiting non-rigid motion (e.g., facial expression change), while simultaneously ensuring 3D-awareness. For this we propose a Co… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to 3DV 2022. 13 pages, 15 figures

  13. arXiv:2205.09185  [pdf, other

    physics.ins-det cs.LG hep-ex nucl-ex physics.comp-ph

    AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

    Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

    Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More

    Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 16 pages, 18 figures, 2 appendices, 3 tables

  14. arXiv:2107.12618  [pdf, other

    cs.CV

    Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021

    Authors: Haisheng Su, Peiqin Zhuang, Yukun Li, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao

    Abstract: This technical report presents an overview of our solution used in the submission to 2021 HACS Temporal Action Localization Challenge on both Supervised Learning Track and Weakly-Supervised Learning Track. Temporal Action Localization (TAL) requires to not only precisely locate the temporal boundaries of action instances, but also accurately classify the untrimmed videos into specific categories.… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: Winner of HACS21 Challenge Weakly Supervised Learning Track with extra data. arXiv admin note: text overlap with arXiv:2103.13141

  15. arXiv:2102.09471  [pdf, other

    cs.CV cs.LG

    DeeperForensics Challenge 2020 on Real-World Face Forgery Detection: Methods and Results

    Authors: Liming Jiang, Zhengkui Guo, Wayne Wu, Zhaoyang Liu, Ziwei Liu, Chen Change Loy, Shuo Yang, Yuanjun Xiong, Wei Xia, Baoying Chen, Peiyu Zhuang, Sili Li, Shen Chen, Tai** Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Liujuan Cao, Rongrong Ji, Changlei Lu, Ganchao Tan

    Abstract: This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection. The challenge employs the DeeperForensics-1.0 dataset, one of the most extensive publicly available real-world face forgery detection datasets, with 60,000 videos constituted by a total of 17.6 million frames. The model evaluation is conducted online on a high-quality hidden test set… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: Technical report. Challenge website: https://competitions.codalab.org/competitions/25228

  16. arXiv:2102.01187  [pdf, other

    cs.CV

    Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

    Authors: Peiye Zhuang, Oluwasanmi Koyejo, Alexander G. Schwing

    Abstract: Controllable semantic image editing enables a user to change entire image attributes with a few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits that are entangled, glo… ▽ More

    Submitted 28 March, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted to ICLR 2021. 14 pages, 15 figures

  17. arXiv:2007.09867  [pdf, other

    cs.CV

    Interpretable Foreground Object Search As Knowledge Distillation

    Authors: Boren Li, Po-Yu Zhuang, Jian Gu, Mingyang Li, ** Tan

    Abstract: This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be grouped into a small number of patterns. Instances within each pattern are compatible with any quer… ▽ More

    Submitted 21 July, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: This paper will appear at ECCV 2020

  18. arXiv:2007.05597  [pdf, other

    eess.IV cs.CV cs.LG

    EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision

    Authors: Siddharth Biswal, Peiye Zhuang, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Jimeng Sun

    Abstract: Deep generative models have enabled the automated synthesis of high-quality data for diverse applications. However, the most effective generative models are specialized to data from a single domain (e.g., images or text). Real-world applications such as healthcare require multi-modal data from multiple domains (e.g., both images and corresponding text), which are difficult to acquire due to limite… ▽ More

    Submitted 15 January, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  19. arXiv:2002.10191  [pdf, other

    cs.CV

    Learning Attentive Pairwise Interaction for Fine-Grained Classification

    Authors: Peiqin Zhuang, Yali Wang, Yu Qiao

    Abstract: Fine-grained classification is a challenging problem, due to subtle differences among highly-confused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans can effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwi… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: Accepted at AAAI-2020

  20. arXiv:1907.06134  [pdf, other

    cs.CV cs.LG eess.IV

    FMRI data augmentation via synthesis

    Authors: Peiye Zhuang, Alexander G. Schwing, Sanmi Koyejo

    Abstract: We present an empirical evaluation of fMRI data augmentation via synthesis. For synthesis we use generative mod-els trained on real neuroimaging data to produce novel task-dependent functional brain images. Analyzed generative mod-els include classic approaches such as the Gaussian mixture model (GMM), and modern implicit generative models such as the generative adversarial network (GAN) and the v… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  21. arXiv:1906.05251  [pdf, other

    cs.CV cs.LG eess.IV

    Compressed Sensing MRI via a Multi-scale Dilated Residual Convolution Network

    Authors: Yuxiang Dai, Peixian Zhuang

    Abstract: Magnetic resonance imaging (MRI) reconstruction is an active inverse problem which can be addressed by conventional compressed sensing (CS) MRI algorithms that exploit the sparse nature of MRI in an iterative optimization-based manner. However, two main drawbacks of iterative optimization-based CSMRI methods are time-consuming and are limited in model capacity. Meanwhile, one main challenge for re… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: 27 pages and 13 figures

  22. arXiv:1710.00967  [pdf, other

    cs.RO

    Mechanical Design of a Cartesian Manipulator for Warehouse Pick and Place

    Authors: M. McTaggart, D. Morrison, A. W. Tow, R. Smith, Norton Kelly-Boxall, Anton Milan, T. Pham Zheyu Zhuang, J. Leitner, I. Reid, P. Corke, C. Lehnert

    Abstract: Robotic manipulation and gras** in cluttered and unstructured environments is a current challenge for robotics. Enabling robots to operate in these challenging environments have direct applications from automating warehouses to harvesting fruit in agriculture. One of the main challenges associated with these difficult robotic manipulation tasks is the motion planning and control problem for mult… ▽ More

    Submitted 18 June, 2018; v1 submitted 2 October, 2017; originally announced October 2017.

    Comments: ACRV Tech Report

    Report number: ACRV-TR-2017-02