Skip to main content

Showing 1–50 of 51 results for author: Tao, A

.
  1. arXiv:2405.19335  [pdf, other

    cs.CV cs.CL cs.LG

    X-VILA: Cross-Modality Alignment for Large Language Model

    Authors: Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei **, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

    Abstract: We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LLM outputs, X-VILA achieves cross-modality understanding, reasoning, and generation. To facilitate this cross-modality alignment, we curate an effectiv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Technical Report

  2. arXiv:2405.13899  [pdf, ps, other

    stat.ML cs.LG

    Symmetric Linear Bandits with Hidden Symmetry

    Authors: Nam Phuong Tran, The Anh Ta, Debmalya Mandal, Long Tran-Thanh

    Abstract: High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2402.07067  [pdf, other

    cs.GT cs.LG

    Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

    Authors: Nam Phuong Tran, The Anh Ta, Shuqing Shi, Debmalya Mandal, Yali Du, Long Tran-Thanh

    Abstract: Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in dete… ▽ More

    Submitted 22 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  4. arXiv:2312.07533  [pdf, other

    cs.CV

    VILA: On Pre-training for Visual Language Models

    Authors: Ji Lin, Hongxu Yin, Wei **, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

    Abstract: Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language pre-training process, where the model learns to perform joint modeling on both modalities. In this work, we examine the design options for VLM pre-trai… ▽ More

    Submitted 16 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  5. arXiv:2308.10008  [pdf, ps, other

    eess.SY cs.RO

    What is the Impact of Releasing Code with Publications? Statistics from the Machine Learning, Robotics, and Control Communities

    Authors: Siqi Zhou, Lukas Brunke, Allen Tao, Adam W. Hall, Federico Pizarro Bejarano, Jacopo Panerati, Angela P. Schoellig

    Abstract: Open-sourcing research publications is a key enabler for the reproducibility of studies and the collective scientific progress of a research community. As all fields of science develop more advanced algorithms, we become more dependent on complex computational toolboxes -- sharing research ideas solely through equations and proofs is no longer sufficient to communicate scientific developments. Ove… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  6. arXiv:2307.00440  [pdf, other

    math.CO

    Friezes over $\mathbb Z[\sqrt{2}]$

    Authors: Esther Banaian, Libby Farrell, Amy Tao, Kayla Wright, Joy Zhichun Zhang

    Abstract: A frieze on a polygon is a map from the diagonals of the polygon to an integral domain which respects the Ptolemy relation. Conway and Coxeter previously studied positive friezes over $\mathbb{Z}$ and showed that they are in bijection with triangulations of a polygon. We extend their work by studying friezes over $\mathbb Z[\sqrt{2}]$ and their relationships to dissections of polygons. We largely… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  7. arXiv:2306.15840  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci physics.chem-ph

    Molecular-Scale Visualization of Steric Effects of Ligand Binding to Reconstructed Au(111) Surfaces

    Authors: Liya Bi, Sasawat Jamnuch, Amanda Chen, Alexandria Do, Krista P. Balto, Zhe Wang, Qingyi Zhu, Yufei Wang, Yanning Zhang, Andrea R. Tao, Tod A. Pascal, Joshua S. Figueroa, Shaowei Li

    Abstract: Direct imaging of single molecules at nanostructured interfaces is a grand challenge, with potential to enable new, precise material architectures and technologies. Of particular interest are the structural morphology and spectroscopic signatures of the adsorbed molecule, where modern probes are only now being developed with the necessary spatial and energetic resolution to provide detailed inform… ▽ More

    Submitted 28 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  8. arXiv:2306.11071  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    ColabFit Exchange: open-access datasets for data-driven interatomic potentials

    Authors: Joshua A. Vita, Eric G. Fuemmeler, Amit Gupta, Gregory P. Wolfe, Alexander Quanming Tao, Ryan S. Elliott, Stefano Martiniani, Ellad B. Tadmor

    Abstract: Data-driven (DD) interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with… ▽ More

    Submitted 6 September, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

  9. arXiv:2306.06189  [pdf, other

    cs.CV cs.AI cs.LG

    FasterViT: Fast Vision Transformers with Hierarchical Attention

    Authors: Ali Hatamizadeh, Greg Heinrich, Hongxu Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

    Abstract: We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-… ▽ More

    Submitted 1 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: ICLR'24 Accepted Paper

  10. arXiv:2306.02991  [pdf, other

    physics.atom-ph cond-mat.quant-gas

    Second-scale rotational coherence and dipolar interactions in a gas of ultracold polar molecules

    Authors: Philip D. Gregory, Luke M. Fernley, Albert Li Tao, Sarah L. Bromley, Jonathan Stepp, Zewen Zhang, Svetlana Kotochigova, Kaden R. A. Hazzard, Simon L. Cornish

    Abstract: Ultracold polar molecules uniquely combine a rich structure of long-lived internal states with access to controllable long-range, anisotropic dipole-dipole interactions. In particular, the rotational states of polar molecules confined in optical tweezers or optical lattices may be used to encode interacting qubits for quantum computation or pseudo-spins for simulating quantum magnetism. As with al… ▽ More

    Submitted 11 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 12 pages, 7 figures (main text and supplementary information combined)

  11. arXiv:2305.11102  [pdf, other

    cs.CV

    Progressive Learning of 3D Reconstruction Network from 2D GAN Data

    Authors: Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

    Abstract: This paper presents a method to reconstruct high-quality textured 3D models from single images. Current methods rely on datasets with expensive annotations; multi-view images and their camera parameters. Our method relies on GAN generated multi-view image datasets which have a negligible annotation cost. However, they are not strictly multi-view consistent and sometimes GANs output distorted image… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Web-page: https://research.nvidia.com/labs/adlr/progressive-3d-learning. arXiv admin note: text overlap with arXiv:2203.09362

  12. arXiv:2305.10474  [pdf, other

    cs.CV cs.GR cs.LG

    Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

    Authors: Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is co… ▽ More

    Submitted 25 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: ICCV 2023. Project webpage: https://research.nvidia.com/labs/dir/pyoco

  13. The James Webb Space Telescope Mission

    Authors: Jonathan P. Gardner, John C. Mather, Randy Abbott, James S. Abell, Mark Abernathy, Faith E. Abney, John G. Abraham, Roberto Abraham, Yasin M. Abul-Huda, Scott Acton, Cynthia K. Adams, Evan Adams, David S. Adler, Maarten Adriaensen, Jonathan Albert Aguilar, Mansoor Ahmed, Nasif S. Ahmed, Tanjira Ahmed, Rüdeger Albat, Loïc Albert, Stacey Alberts, David Aldridge, Mary Marsha Allen, Shaune S. Allen, Martin Altenburg , et al. (983 additional authors not shown)

    Abstract: Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astrono… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figures

  14. arXiv:2211.02152  [pdf, other

    math.OC

    Binary-Continuous Sum-of-ratios Optimization: Discretization, Approximations, and Convex Reformulations

    Authors: Tien Mai, Ngan Ha Duong, Thuy Anh Ta

    Abstract: We study a class of non-convex sum-of-ratios programs which can be used for decision-making in prominent areas such as product assortment and price optimization, facility location, and security games. Such an optimization problem involves both continuous and binary decision variables and is known to be highly non-convex and intractable to solve. We explore a discretization approach to approximate… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  15. Dynamics-aware Adversarial Attack of Adaptive Neural Networks

    Authors: An Tao, Yueqi Duan, Yingqi Wang, Jiwen Lu, Jie Zhou

    Abstract: In this paper, we investigate the dynamics-aware adversarial attack problem of adaptive neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. However, this assumption does not hold for many recently proposed adaptive neural networks, which adaptively deactivate unnecessary execution uni… ▽ More

    Submitted 10 January, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2112.09428

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2024

  16. arXiv:2205.07345  [pdf, other

    math.OC econ.EM

    Joint Location and Cost Planning in Maximum Capture Facility Location under Multiplicative Random Utility Maximization

    Authors: Ngan Ha Duong, Tien Thanh Dam, Thuy Anh Ta, Tien Mai

    Abstract: We study a joint facility location and cost planning problem in a competitive market under random utility maximization (RUM) models. The objective is to locate new facilities and make decisions on the costs (or budgets) to spend on the new facilities, aiming to maximize an expected captured customer demand, assuming that customers choose a facility among all available facilities according to a RUM… ▽ More

    Submitted 11 February, 2023; v1 submitted 15 May, 2022; originally announced May 2022.

    Journal ref: Computer and Operations Research (2023)

  17. arXiv:2203.09362  [pdf, other

    cs.CV

    Fine Detailed Texture Learning for 3D Meshes with Generative Models

    Authors: Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

    Abstract: This paper presents a method to reconstruct high-quality textured 3D models from both multi-view and single-view images. The reconstruction is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network. In the generative learning pipeli… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  18. arXiv:2202.00011  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

    Authors: Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

    Abstract: Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this… ▽ More

    Submitted 30 October, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: WACV 2024

  19. arXiv:2112.09428   

    cs.CV

    Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

    Authors: An Tao, Yueqi Duan, He Wang, Ziyi Wu, Pengliang Ji, Haowen Sun, Jie Zhou, Jiwen Lu

    Abstract: In this paper, we investigate the dynamics-aware adversarial attack problem in deep neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. However, this assumption does not hold for many recently proposed networks, e.g. 3D sparse convolution network, which contains input-dependent execut… ▽ More

    Submitted 20 January, 2023; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: We have improved the quality of this work and updated a new version to address the limitations of the proposed method

  20. arXiv:2111.13587  [pdf, other

    cs.CV cs.LG

    Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

    Authors: John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

    Abstract: Vision transformers have delivered tremendous success in representation learning. This is primarily due to effective token mixing through self attention. However, this scales quadratically with the number of pixels, which becomes infeasible for high-resolution inputs. To cope with this challenge, we propose Adaptive Fourier Neural Operator (AFNO) as an efficient token mixer that learns to mix in t… ▽ More

    Submitted 27 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

  21. arXiv:2110.08497  [pdf, other

    math.OC

    Robust Maximum Capture Facility Location under Random Utility Maximization Models

    Authors: Anh Thuy Ta, Tien Thanh Dam, Tien Mai

    Abstract: We study a robust version of the maximum capture facility location problem in a competitive market, assuming that each customer chooses among all available facilities according to a random utility maximization (RUM) model. We employ the generalized extreme value (GEV) family of models and assume that the parameters of the RUM model are not given exactly but lie in convex uncertainty sets. The prob… ▽ More

    Submitted 11 February, 2023; v1 submitted 16 October, 2021; originally announced October 2021.

    Journal ref: European Journal of Operational Research (2023)

  22. arXiv:2108.13394  [pdf, ps, other

    math.CO

    Topology of augmented Bergman complexes

    Authors: Elisabeth Bullock, Aidan Kelley, Victor Reiner, Kevin Ren, Gahl Shemy, Dawei Shen, Brian Sun, Amy Tao, Zhichun Joy Zhang

    Abstract: The augmented Bergman complex of a matroid is a simplicial complex introduced recently in work of Braden, Huh, Matherne, Proudfoot and Wang. It may be viewed as a hybrid of two well-studied pure shellable simplicial complexes associated to matroids: the independent set complex and Bergman complex. It is shown here that the augmented Bergman complex is also shellable, via two different families o… ▽ More

    Submitted 16 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Very minor edits

    MSC Class: 05B35; 52B22; 06A07

  23. arXiv:2106.06533  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    View Generalization for Single Image Textured 3D Models

    Authors: Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, Bryan Catanzaro

    Abstract: Humans can easily infer the underlying 3D geometry and texture of an object only from a single 2D image. Current computer vision methods can do this, too, but suffer from view generalization problems - the models inferred tend to make poor predictions of appearance in novel views. As for generalization problems in machine learning, the difficulty is balancing single-view accuracy (cf. training err… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: CVPR 2021. Project website: https://nv-adlr.github.io/view-generalization

  24. arXiv:2104.02983  [pdf, other

    math.NA

    Optimal fire allocation in a combat model of mixed NCW type

    Authors: My A. Vu, Nam H. Nguyen, Hanh Le T. Nguyen, Anh N. Ta, Mong H. Nguyen

    Abstract: In this work, we introduce a nonlinear Lanchester model of NCW-type and study a problem of finding the optimal fire allocation for this model. A Blue party $B$ will fight against a Red party consisting of $A$ and $R$, where $A$ is an independent force and $R$ fights with supports from a supply unit $N$. A battle may consist of several stages but we consider the problem of finding optimal fire allo… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

  25. arXiv:2103.16748  [pdf, other

    cs.CV cs.GR

    Dual Contrastive Loss and Attention for GANs

    Authors: Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz

    Abstract: Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Yet generated images are still easy to spot especially on datasets with high variance (e.g. bedroom, church). In this paper, we propose various improvements to further push the boundaries in image generation. Specifically, we propose a novel dual contras… ▽ More

    Submitted 17 March, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted to ICCV'21

  26. Submodularity and Local Search Approaches for Maximum Capture Problems under Generalized Extreme Value Models

    Authors: Tien Thanh Dam, Thuy Anh Ta, Tien Mai

    Abstract: We study the maximum capture problem in facility location under random utility models, i.e., the problem of seeking to locate new facilities in a competitive market such that the captured user demand is maximized, assuming that each customer chooses among all available facilities according to a random utility maximization model. We employ the generalized extreme value (GEV) family of discrete choi… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Journal ref: European Journal of Operational Research - 300(2022) 953-965

  27. SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation

    Authors: An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou

    Abstract: Most existing point cloud instance and semantic segmentation methods rely heavily on strong supervision signals, which require point-level labels for every point in the scene. However, such strong supervision suffers from large annotation costs, arousing the need to study efficient annotating. In this paper, we discover that the locations of instances matter for both instance and semantic 3D scene… ▽ More

    Submitted 24 July, 2022; v1 submitted 18 December, 2020; originally announced December 2020.

    Journal ref: IEEE Transactions on Image Processing, vol. 31, pp. 4952-4965, 2022

  28. arXiv:2009.00197  [pdf, other

    eess.IV q-bio.QM

    Deep unsupervised learning for Microscopy-Based Malaria detection

    Authors: Alexander Tao, Boran Han

    Abstract: Malaria, a mosquito-borne disease caused by a parasite, kills over 1 million people globally each year. People, if left untreated, may develop severe complications, leading to death. Effective and accurate diagnosis is important for the management and control of malaria. Our research focuses on utilizing machine learning to improve the efficiency in Malaria diagnosis. We utilize a modified U-net a… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

  29. arXiv:2008.05250  [pdf, ps, other

    cs.AI math.NA

    Optimizing fire allocation in a NCW-type model

    Authors: Nam Hong Nguyen, My Anh Vu, Dinh Van Bui, Anh Ngoc Ta, Manh Duc Hy

    Abstract: In this paper, we introduce a non-linear Lanchester model of NCW-type and investigate an optimization problem for this model, where only the Red force is supplied by several supply agents. Optimal fire allocation of the Blue force is sought in the form of a piece-wise constant function of time. A threatening rate is computed for the Red force and each of its supply agents at the beginning of each… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: 6 pages on NCW-type model

  30. arXiv:2007.07243  [pdf, other

    cs.CV cs.GR

    Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter

    Authors: Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro

    Abstract: Conventional CNNs for texture synthesis consist of a sequence of (de)-convolution and up/down-sampling layers, where each layer operates locally and lacks the ability to capture the long-term structural dependency required by texture synthesis. Thus, they often simply enlarge the input texture, rather than perform reasonable synthesis. As a compromise, many recent methods sacrifice generalizabilit… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  31. arXiv:2005.10821  [pdf, other

    cs.CV

    Hierarchical Multi-Scale Attention for Semantic Segmentation

    Authors: Andrew Tao, Karan Sapra, Bryan Catanzaro

    Abstract: Multi-scale inference is commonly used to improve the results of semantic segmentation. Multiple images scales are passed through a network and then the results are combined with averaging or max pooling. In this work, we present an attention-based approach to combining multi-scale predictions. We show that predictions at certain scales are better at resolving particular failures modes, and that t… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: 11 pages, 5 figures

  32. arXiv:2004.10289  [pdf, other

    cs.CV

    Panoptic-based Image Synthesis

    Authors: Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro

    Abstract: Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation. Previous conditional image synthesis algorithms mostly rely on semantic maps, and often fail in complex environments where multiple instances occlude each other. We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic image… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  33. arXiv:2001.09518  [pdf, other

    cs.CV

    Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

    Authors: Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

    Abstract: Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint-level annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in ord… ▽ More

    Submitted 26 January, 2020; originally announced January 2020.

  34. arXiv:1912.11683  [pdf, other

    cs.CV cs.LG eess.IV

    Neural ODEs for Image Segmentation with Level Sets

    Authors: Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

    Abstract: We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from data a speed function describing the evolution. In addition, for cases where an initial contour is not available and to alleviate the need for careful choice or… ▽ More

    Submitted 25 December, 2019; originally announced December 2019.

  35. arXiv:1910.12713  [pdf, other

    cs.CV cs.GR cs.LG

    Few-shot Video-to-Video Synthesis

    Authors: Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro

    Abstract: Video-to-video synthesis (vid2vid) aims at converting an input semantic video, such as videos of human poses or segmentation masks, to an output photorealistic video. While the state-of-the-art of vid2vid has advanced significantly, existing approaches share two major limitations. First, they are data-hungry. Numerous images of a target human subject or a scene are required for training. Second, a… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: In NeurIPS, 2019

  36. arXiv:1909.02749  [pdf, other

    cs.CV cs.LG stat.ML

    Video Interpolation and Prediction with Unsupervised Landmarks

    Authors: Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

    Abstract: Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space,… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: Technical Report

  37. arXiv:1906.05928  [pdf, other

    cs.CV

    Unsupervised Video Interpolation Using Cycle Consistency

    Authors: Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro

    Abstract: Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. For a triplet of consecutive frames, we optimize models to minimize the dis… ▽ More

    Submitted 27 March, 2021; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Published in ICCV 2019. Codes are available at https://github.com/NVIDIA/unsupervised-video-interpolation. Project website https://nv-adlr.github.io/publication/2019-UnsupervisedVideoInterpolation

  38. Consecutive Detecting Arrays for Interaction Faults

    Authors: Ce Shi, Ling Jiang, Aiyuan Tao

    Abstract: The concept of detecting arrays was developed to locate and detect interaction faults arising between the factors in a component-based system during software testing. In this paper, we propose a family of consecutive detecting arrays (CDAs) in which the interactions between factors are considered to be ordered. CDAs can be used to generate test suites for locating and detecting interaction faults… ▽ More

    Submitted 25 January, 2024; v1 submitted 26 May, 2019; originally announced May 2019.

    MSC Class: 05B15; 05B20; 62K15; 94C12

    Journal ref: Graphs and Combinatorics 36, 1203-1218 (2020)

  39. arXiv:1903.02728  [pdf, other

    cs.CV

    Graphical Contrastive Losses for Scene Graph Parsing

    Authors: Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

    Abstract: Most scene graph parsers use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses mul… ▽ More

    Submitted 16 August, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  40. arXiv:1812.01593  [pdf, other

    cs.CV cs.AI cs.MM cs.RO

    Improving Semantic Segmentation via Video Propagation and Label Relaxation

    Authors: Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

    Abstract: Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels.… ▽ More

    Submitted 2 July, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: CVPR 2019 Oral. Code link: https://github.com/NVIDIA/semantic-segmentation. YouTube link: https://www.youtube.com/watch?v=aEbXjGZDZSQ

  41. arXiv:1811.11718  [pdf, other

    cs.CV

    Partial Convolution based Padding

    Authors: Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro

    Abstract: In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks. We call it partial convolution based padding, with the intuition that the padded region can be treated as holes and the original input as non-holes. Specifically, during the convolution operation, the convolution results are re-weighted near image borders… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: 11 pages; code is available at https://github.com/NVIDIA/partialconv

  42. arXiv:1811.09543  [pdf, other

    cs.CV

    An Interpretable Model for Scene Graph Generation

    Authors: Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

    Abstract: We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature's contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and inv… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.00662

  43. arXiv:1811.00684  [pdf, other

    cs.CV

    SDCNet: Video Prediction Using Spatially-Displaced Convolution

    Authors: Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

    Abstract: We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent app… ▽ More

    Submitted 27 March, 2021; v1 submitted 1 November, 2018; originally announced November 2018.

    Comments: Published in ECCV 2018. Codes available at https://github.com/NVIDIA/semantic-segmentation/tree/sdcnet/sdcnet. Project page available at https://nv-adlr.github.io/publication/2018-SDCNet

  44. arXiv:1811.00662  [pdf, other

    cs.CV

    Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

    Authors: Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

    Abstract: This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle. Three key factors contribute the most to our success: 1) language bias is a powerful baseline for this task. We build the empirical distribution $P(predicate|subject,object)$ in the training set and directly use that in testing. This baseline achieved the 2nd place… ▽ More

    Submitted 7 November, 2018; v1 submitted 1 November, 2018; originally announced November 2018.

  45. arXiv:1808.06601  [pdf, other

    cs.CV cs.GR cs.LG

    Video-to-Video Synthesis

    Authors: Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

    Abstract: We study the problem of video-to-video synthesis, whose goal is to learn a map** function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored… ▽ More

    Submitted 3 December, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

    Comments: In NeurIPS, 2018. Code, models, and more results are available at https://github.com/NVIDIA/vid2vid

  46. arXiv:1804.07723  [pdf, other

    cs.CV

    Image Inpainting for Irregular Holes Using Partial Convolutions

    Authors: Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro

    Abstract: Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Post-processing is usually used to reduce such artifacts, bu… ▽ More

    Submitted 15 December, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: Update: camera-ready; L1 loss is size-averaged; code of partial conv layer: https://github.com/NVIDIA/partialconv. Published at ECCV 2018

  47. arXiv:1711.11585  [pdf, other

    cs.CV cs.GR cs.LG

    High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

    Authors: Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro

    Abstract: We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversaria… ▽ More

    Submitted 20 August, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: v2: CVPR camera ready, adding more results for edge-to-photo examples

  48. arXiv:1512.02197  [pdf, other

    physics.optics physics.ins-det physics.plasm-ph

    Photoemission-based microelectronic devices

    Authors: Ebrahim Forati, Tyler J. Dill, Andrea Tao, Dan Sievenpiper

    Abstract: The vast majority of modern microelectronic devices rely on carriers within semiconductors due to their integrability. Therefore, the performance of these devices is limited due to natural semiconductor properties such as band gap and electron velocity. Replacing the semiconductor channel in conventional microelectronic devices with a gas or vacuum channel may scale their speed, wavelength, and po… ▽ More

    Submitted 19 April, 2016; v1 submitted 7 December, 2015; originally announced December 2015.

  49. arXiv:1501.02155  [pdf, ps, other

    math.MG cs.LO

    A formal proof of the Kepler conjecture

    Authors: Thomas Hales, Mark Adams, Gertrud Bauer, Dat Tat Dang, John Harrison, Truong Le Hoang, Cezary Kaliszyk, Victor Magron, Sean McLaughlin, Thang Tat Nguyen, Truong Quang Nguyen, Tobias Nipkow, Steven Obua, Joseph Pleso, Jason Rute, Alexey Solovyev, An Hoai Thi Ta, Trung Nam Tran, Diep Thi Trieu, Josef Urban, Ky Khac Vu, Roland Zumkeller

    Abstract: This article describes a formal proof of the Kepler conjecture on dense sphere packings in a combination of the HOL Light and Isabelle proof assistants. This paper constitutes the official published account of the now completed Flyspeck project.

    Submitted 9 January, 2015; originally announced January 2015.

    Comments: 21 pages

  50. arXiv:1305.2659  [pdf

    q-bio.PE q-bio.GN

    SARS-CoV originated from bats in 1998 and may still exist in humans

    Authors: Ailin Tao, Yuyi Huang, Peilu Li, Jun Liu, Nanshan Zhong, Chiyu Zhang

    Abstract: SARS-CoV is believed to originate from civets and was thought to have been eliminated as a threat after the 2003 outbreak. Here, we show that human SARS-CoV (huSARS-CoV) originated directly from bats, rather than civets, by a cross-species jump in 1991, and formed a human-adapted strain in 1998. Since then huSARS-CoV has evolved further into highly virulent strains with genotype T and a 29-nt dele… ▽ More

    Submitted 13 May, 2013; v1 submitted 12 May, 2013; originally announced May 2013.

    Comments: 18 pages,7 figures,2 tables