Skip to main content

Showing 1–31 of 31 results for author: Yeh, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02155  [pdf, other

    cs.CV

    Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

    Authors: Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich

    Abstract: Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initializat… ▽ More

    Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. project page https://pals.ttic.edu/p/alpha-invariance

  2. arXiv:2403.12553  [pdf, other

    cs.LG

    Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs

    Authors: Md Ashiqur Rahman, Robert Joseph George, Mogab Elleithy, Daniel Leibovici, Zongyi Li, Boris Bonev, Colin White, Julius Berner, Raymond A. Yeh, Jean Kossaifi, Kamyar Azizzadenesheli, Anima Anandkumar

    Abstract: Existing neural operator architectures face challenges when solving multiphysics problems with coupled partial differential equations (PDEs), due to complex geometries, interactions between physical variables, and the lack of large amounts of high-resolution training data. To address these issues, we propose Codomain Attention Neural Operator (CoDA-NO), which tokenizes functions along the codomain… ▽ More

    Submitted 5 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  3. arXiv:2312.02967  [pdf, other

    cs.CV

    AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model

    Authors: Boheng Zhao, Rana Hanocka, Raymond A. Yeh

    Abstract: Ambigrams are calligraphic designs that have different meanings depending on the viewing orientation. Creating ambigrams is a challenging task even for skilled artists, as it requires maintaining the meaning under two different viewpoints at the same time. In this work, we propose to generate ambigrams by distilling a large-scale vision and language diffusion model, namely DeepFloyd IF, to optimiz… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://raymond-yeh.com/AmbiGen/

  4. arXiv:2311.18815  [pdf, other

    cs.CV

    IMMA: Immunizing text-to-image Models against Malicious Adaptation

    Authors: Amber Yijia Zheng, Raymond A. Yeh

    Abstract: Advancements in text-to-image models and fine-tuning methods have led to the increasing risk of malicious adaptation, i.e., fine-tuning to generate harmful unauthorized content. Recent works, e.g., Glaze or MIST, have developed data-poisoning techniques which protect the data against adaptation methods. In this work, we consider an alternative paradigm for protection. We propose to ``immunize'' th… ▽ More

    Submitted 16 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  5. arXiv:2311.02922  [pdf, other

    cs.LG cs.CV

    Truly Scale-Equivariant Deep Nets with Fourier Layers

    Authors: Md Ashiqur Rahman, Raymond A. Yeh

    Abstract: In computer vision, models must be able to adapt to changes in image resolution to effectively carry out tasks such as image segmentation; This is known as scale-equivariance. Recent works have made progress in develo** scale-equivariant convolutional neural networks, e.g., through weight-sharing and kernel resizing. However, these networks are not truly scale-equivariant in practice. Specifical… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  6. arXiv:2305.16316  [pdf, other

    cs.CV

    Making Vision Transformers Truly Shift-Equivariant

    Authors: Renan A. Rojas-Gomez, Teck-Yian Lim, Minh N. Do, Raymond A. Yeh

    Abstract: For computer vision, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs' output remains sensitive to small spatial shifts in the input, i.e., not shift invariant. To address this shortcoming, we introduce novel data-adaptive designs for each of the modules in ViTs, such as tokenization, self-attention… ▽ More

    Submitted 28 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  7. arXiv:2212.11715  [pdf, other

    cs.GR

    GeoCode: Interpretable Shape Programs

    Authors: Ofek Pearl, Itai Lang, Yuhua Hu, Raymond A. Yeh, Rana Hanocka

    Abstract: Map** high-fidelity 3D geometry to a representation that allows for intuitive edits remains an elusive goal in computer vision and graphics. The key challenge is the need to model both continuous and discrete shape variations. Current approaches, such as implicit shape representation, lack straightforward interpretable encoding, while others that employ procedural methods output coarse geometry.… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: project page: https://threedle.github.io/GeoCode/

  8. arXiv:2212.00774  [pdf, other

    cs.CV cs.LG

    Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation

    Authors: Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, Greg Shakhnarovich

    Abstract: A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D dat… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: project page https://pals.ttic.edu/p/score-jacobian-chaining

  9. arXiv:2210.08001  [pdf, other

    cs.CV cs.AI cs.LG

    Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

    Authors: Renan A. Rojas-Gomez, Teck-Yian Lim, Alexander G. Schwing, Minh N. Do, Raymond A. Yeh

    Abstract: We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers. It is widely applicable as it can be integrated into any convolutional network by replacing down/upsampling layers. We evaluate LPS on ima… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  10. arXiv:2210.05735  [pdf, other

    cs.CV cs.GR cs.LG

    TetGAN: A Convolutional Neural Network for Tetrahedral Mesh Generation

    Authors: William Gao, April Wang, Gal Metzer, Raymond A. Yeh, Rana Hanocka

    Abstract: We present TetGAN, a convolutional neural network designed to generate tetrahedral meshes. We represent shapes using an irregular tetrahedral grid which encodes an occupancy and displacement field. Our formulation enables defining tetrahedral convolution, pooling, and upsampling operations to synthesize explicit mesh connectivity with variable topological genus. The proposed neural network layers… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to BMVC2022

  11. arXiv:2209.03953  [pdf, other

    cs.CV cs.LG

    Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

    Authors: Xiaodan Du, Raymond A. Yeh, Nicholas Kolkin, Eli Shechtman, Greg Shakhnarovich

    Abstract: We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  12. arXiv:2204.03647  [pdf, other

    cs.CV cs.AI

    Adapting CLIP For Phrase Localization Without Further Training

    Authors: Jiahao Li, Greg Shakhnarovich, Raymond A. Yeh

    Abstract: Supervised or weakly supervised methods for phrase localization (textual grounding) either rely on human annotations or some other supervised models, e.g., object detectors. Obtaining these annotations is labor-intensive and may be difficult to scale in practice. We propose to leverage recent advances in contrastive language-vision models, CLIP, pre-trained on image and caption pairs collected fro… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  13. arXiv:2204.03643  [pdf, other

    cs.CV

    Total Variation Optimization Layers for Computer Vision

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing

    Abstract: Optimization within a layer of a deep-net has emerged as a new direction for deep-net layer design. However, there are two main challenges when applying these layers to computer vision tasks: (a) which optimization problem within a layer is useful?; (b) how to ensure that computation within a layer remains efficient? To study question (a), in this work, we propose total variation (TV) minimization… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  14. arXiv:2204.03640  [pdf, other

    cs.LG cs.CV

    Equivariance Discovery by Learned Parameter-Sharing

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing

    Abstract: Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: AISTATS 2022

  15. arXiv:2111.12299  [pdf, other

    cs.LG

    EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search

    Authors: Qian Jiang, Xiaofan Zhang, Deming Chen, Minh N. Do, Raymond A. Yeh

    Abstract: In hardware-aware Differentiable Neural Architecture Search (DNAS), it is challenging to compute gradients of hardware metrics to perform architecture search. Existing works rely on linear approximations with limited support to customized hardware accelerators. In this work, we propose End-to-end Hardware-aware DNAS (EH-DNAS), a seamless integration of end-to-end hardware benchmarking, and fully a… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: 8 pages, 5 figures

  16. arXiv:2108.03319  [pdf, other

    cs.AI

    Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning

    Authors: Iou-Jen Liu, Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Solving complex real-world tasks, e.g., autonomous fleet control, often involves a coordinated team of multiple agents which learn strategies from visual inputs via reinforcement learning. Many existing multi-agent reinforcement learning (MARL) algorithms however don't scale to environments where agents operate on visual inputs. To address this issue, algorithmically, recent works have focused on… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: IROS 2021; Project page: https://ioujenliu.github.io/SemanticTracklets/

  17. arXiv:2107.11444  [pdf, other

    cs.AI

    Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

    Authors: Iou-Jen Liu, Unnat Jain, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent deep reinforcement learning algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. However, existing methods suffer from a common challenge: agents struggle to ide… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: ICML 2021; Project Page: https://ioujenliu.github.io/CMAE/

  18. arXiv:2106.06927  [pdf, other

    cs.CV cs.LG cs.NE

    Inverting Adversarially Robust Networks for Image Synthesis

    Authors: Renan A. Rojas-Gomez, Raymond A. Yeh, Minh N. Do, Anh Nguyen

    Abstract: Despite unconditional feature inversion being the foundation of many image synthesis applications, training an inverter demands a high computational budget, large decoding capacity and imposing conditions such as autoregressive priors. To address these limitations, we propose the use of adversarially robust representations as a perceptual primitive for feature inversion. We train an adversarially… ▽ More

    Submitted 21 October, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

    Comments: Accepted at the 16th Asian Conference on Computer Vision (ACCV 2022)

  19. arXiv:2105.08612  [pdf, other

    cs.CV cs.GR cs.LG

    SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data

    Authors: Yuan-Ting Hu, Jiahong Wang, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Extracting detailed 3D information of objects from video data is an important goal for holistic scene understanding. While recent methods have shown impressive results when reconstructing meshes of objects from a single image, results often remain ambiguous as part of the object is unobserved. Moreover, existing image-based datasets for mesh reconstruction don't permit to study models which integr… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: CVPR 2021 Oral

  20. arXiv:2012.09849  [pdf, other

    cs.LG cs.AI

    High-Throughput Synchronous Deep RL

    Authors: Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: Accepted to NeurIPS 2020; Project page: https://ioujenliu.github.io/HTS-RL/

  21. arXiv:2011.12022  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

    Authors: Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson

    Abstract: We propose an end-to-end trainable approach to single-channel speech separation with unknown number of speakers. Our approach extends the MulCat source separation backbone with additional output heads: a count-head to infer the number of speakers, and decoder-heads for reconstructing the original signals. Beyond the model, we also propose a metric on how to evaluate source separation with variable… ▽ More

    Submitted 30 November, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Project Page: https://junzhejosephzhu.github.io/Multi-Decoder-DPRNN/ Submitted to ICASSP 2021

  22. arXiv:2007.01293  [pdf, other

    cs.LG cs.CV stat.ML

    Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

    Authors: Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study how to use a different weight for every unlabeled example. Manual tuning of all those weights -- as done in prior work -- is no longer possible. Instead, we adjus… ▽ More

    Submitted 29 October, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: NeurIPS camera ready

  23. arXiv:1911.00029  [pdf, other

    cs.CV cs.LG

    Chirality Nets for Human Pose Regression

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Alexander G. Schwing

    Abstract: We propose Chirality Nets, a family of deep nets that is equivariant to the "chirality transform," i.e., the transformation to create a chiral pair. Through parameter sharing, odd and even symmetry, we propose and prove variants of standard building blocks of deep nets that satisfy the equivariance property, including fully connected layers, convolutional layers, batch-normalization, and LSTM/GRU… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Accepted to NeurIPS2019

  24. arXiv:1911.00025  [pdf, other

    cs.LG cs.CV stat.ML

    PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

    Authors: Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Sample efficiency and scalability to a large number of agents are two important goals for multi-agent reinforcement learning systems. Recent works got us closer to those goals, addressing non-stationarity of the environment from a single agent's perspective by utilizing a deep net critic which depends on all observations and actions. The critic input concatenates agent observations and actions in… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Accepted to CORL2019

  25. arXiv:1811.08815  [pdf, other

    cs.CV

    Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

    Authors: Khoi-Nguyen C. Mac, Dhiraj Joshi, Raymond A. Yeh, **jun Xiong, Rogerio S. Feris, Minh N. Do

    Abstract: Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus on produc… ▽ More

    Submitted 6 November, 2019; v1 submitted 21 November, 2018; originally announced November 2018.

    Comments: Accepted at ICCV 2019 as oral

  26. arXiv:1803.11209  [pdf, other

    cs.CV

    Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

    Authors: Raymond A. Yeh, **jun Xiong, Wen-mei W. Hwu, Minh N. Do, Alexander G. Schwing

    Abstract: Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all po… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to NIPS 2017

  27. arXiv:1803.11185  [pdf, other

    cs.CV

    Unsupervised Textual Grounding: Linking Words to Image Concepts

    Authors: Raymond A. Yeh, Minh N. Do, Alexander G. Schwing

    Abstract: Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale da… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to CVPR 2018

  28. arXiv:1702.03351  [pdf, other

    cs.NI

    Forward Collision Vehicular Radar with IEEE 802.11: Feasibility Demonstration through Measurements

    Authors: Enoch R. Yeh, Robert C. Daniels, Robert W. Heath, Jr

    Abstract: Increasing safety and automation in transportation systems has led to the proliferation of radar and IEEE 802.11 dedicated short range communication (DSRC) in vehicles. Current implementations of vehicular radar devices, however, are expensive, use a substantial amount of bandwidth, and are susceptible to multiple security risks. Consider the feasibility of using an IEEE 802.11 orthogonal frequenc… ▽ More

    Submitted 5 June, 2017; v1 submitted 10 February, 2017; originally announced February 2017.

  29. arXiv:1702.02463  [pdf, other

    cs.CV cs.GR cs.LG

    Video Frame Synthesis using Deep Voxel Flow

    Authors: Ziwei Liu, Raymond A. Yeh, Xiaoou Tang, Yiming Liu, Aseem Agarwala

    Abstract: We address the problem of synthesizing new video frames in an existing video, either in-between existing frames (interpolation), or subsequent to them (extrapolation). This problem is challenging because video appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation is challenging, while newer neural-network-based methods that hallucina… ▽ More

    Submitted 5 August, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

    Comments: To appear in ICCV 2017 as an oral paper. More details at the project page: https://liuziwei7.github.io/projects/VoxelFlow.html

  30. arXiv:1611.09961  [pdf, other

    cs.CV

    Semantic Facial Expression Editing using Autoencoded Flow

    Authors: Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala

    Abstract: High-level manipulation of facial expressions in images --- such as changing a smile to a neutral expression --- is challenging because facial expression changes are highly non-linear, and vary depending on the appearance of the face. We present a fully automatic approach to editing faces that combines the advantages of flow-based face manipulation with the more recent generative capabilities of V… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

  31. arXiv:1607.07539  [pdf, other

    cs.CV

    Semantic Image Inpainting with Deep Generative Models

    Authors: Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do

    Abstract: Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only a single image generally produce unsatisfactory results due to the lack of high level context. In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditionin… ▽ More

    Submitted 13 July, 2017; v1 submitted 26 July, 2016; originally announced July 2016.