Search | arXiv e-print repository

arXiv:2005.12444 [pdf, other]

SegAttnGAN: Text to Image Generation with Segmentation Attention

Authors: Yuchuan Gou, Qiancheng Wu, Minghao Li, Bo Gong, Mei Han

Abstract: In this paper, we propose a novel generative network (SegAttnGAN) that utilizes additional segmentation information for the text-to-image synthesis task. As the segmentation data introduced to the model provides useful guidance on the generator training, the proposed model can generate images with better realism quality and higher quantitative measures compared with the previous state-of-art metho… ▽ More In this paper, we propose a novel generative network (SegAttnGAN) that utilizes additional segmentation information for the text-to-image synthesis task. As the segmentation data introduced to the model provides useful guidance on the generator training, the proposed model can generate images with better realism quality and higher quantitative measures compared with the previous state-of-art methods. We achieved Inception Score of 4.84 on the CUB dataset and 3.52 on the Oxford-102 dataset. Besides, we tested the self-attention SegAttnGAN which uses generated segmentation data instead of masks from datasets for attention and achieved similar high-quality results, suggesting that our model can be adapted for the text-to-image synthesis task. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: Accepted to the AI for Content Creation Workshop at CVPR 2020

arXiv:2005.01651 [pdf, ps, other]

Structured Distributed Compressive Channel Estimation over Doubly Selective Channels

Authors: Qibo Qin, Lin Gui, Bo Gong, Xiang Ren, Wen Chen

Abstract: For an orthogonal frequency-division multiplexing (OFDM) system over a doubly selective (DS) channel, a large number of pilot subcarriers are needed to estimate the numerous channel parameters, resulting in low spectral efficiency. In this paper, by exploiting temporal correlation of practical wireless channels, we propose a highly efficient structured distributed compressive sensing (SDCS) based… ▽ More For an orthogonal frequency-division multiplexing (OFDM) system over a doubly selective (DS) channel, a large number of pilot subcarriers are needed to estimate the numerous channel parameters, resulting in low spectral efficiency. In this paper, by exploiting temporal correlation of practical wireless channels, we propose a highly efficient structured distributed compressive sensing (SDCS) based joint multi-symbol channel estimation scheme. Specifically, by using the complex exponential basis expansion model (CE-BEM) and exploiting the sparsity in the delay domain within multiple OFDM symbols, we turn to estimate jointly sparse CE-BEM coefficient vectors rather than numerous channel taps. Then a sparse pilot pattern within multiple OFDM symbols is designed to obtain an ICI-free structure and transform the channel estimation problem into a joint-block-sparse model. Next, a novel block-based simultaneous orthogonal matching pursuit (BSOMP) algorithm is proposed to jointly recover coefficient vectors accurately. Finally, to reduce the CE-BEM modeling error, we carry out smoothing treatments of already estimated channel taps via piecewise linear approximation.Simulation results demonstrate that the proposed channel estimation scheme can achieve higher estimation accuracy than conventional schemes, although with a smaller number of pilot subcarriers. △ Less

Submitted 23 April, 2020; originally announced May 2020.

Comments: IEEE TVT

arXiv:2005.00570 [pdf, ps, other]

When Ensembling Smaller Models is More Efficient than Single Large Models

Authors: Dan Kondratyuk, Mingxing Tan, Matthew Brown, Boqing Gong

Abstract: Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results… ▽ More Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results from experiments on CIFAR-10 and ImageNet that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute, even when those individual models' weights and hyperparameters are highly optimized. Furthermore, this gap in improvement widens as models become large. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models, especially when the models approach the size of what their dataset can foster. Instead of using the common practice of tuning a single large model, one can use ensembles as a more flexible trade-off between a model's inference speed and accuracy. This also potentially eases hardware design, e.g., an easier way to parallelize the model across multiple workers for real-time or distributed inference. △ Less

Submitted 1 May, 2020; originally announced May 2020.

arXiv:2004.10018 [pdf, ps, other]

Block Distributed Compressive Sensing Based Doubly Selective Channel Estimation and Pilot Design for Large-Scale MIMO Systems

Authors: Bo Gong, Lin Gui, Qibo Qin, Xiang Ren, Wen Chen

Abstract: The doubly selective (DS) channel estimation in the large-scale multiple-input multiple-output (MIMO) systems is a challenging problem due to the large number of the channel coefficients to be estimated, which requires unaffordable and prohibitive pilot overhead. In this paper, firstly we conduct the analysis about the common sparsity of the basis expansion model (BEM) coefficients among all the B… ▽ More The doubly selective (DS) channel estimation in the large-scale multiple-input multiple-output (MIMO) systems is a challenging problem due to the large number of the channel coefficients to be estimated, which requires unaffordable and prohibitive pilot overhead. In this paper, firstly we conduct the analysis about the common sparsity of the basis expansion model (BEM) coefficients among all the BEM orders and all the transmit-receive antenna pairs. Then a novel pilot pattern is proposed, which inserts the guard pilots to deal with the inter carrier interference (ICI) under the superimposed pilot pattern. Moreover, by exploiting the common sparsity of the BEM coefficients among different BEM orders and different antennas, we propose a block distributed compressive sensing (BDCS) based DS channel estimator for the large-scale MIMO systems. Its structured sparsity leads to the reduction of the pilot overhead under the premise of guaranteeing the accuracy of the estimation. Furthermore, taking consideration of the block structure, a pilot design algorithm referred to as block discrete stochastic optimization (BDSO) is proposed. It optimizes the pilot positions by reducing the coherence among different blocks of the measurement matrix. Besides, a linear smoothing method is extended to large-scale MIMO systems to improve the accuracy of the estimation. Simulation results verify the performance gains of our proposed estimator and the pilot design algorithm compared with the existing schemes. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: TVT

arXiv:2004.04588 [pdf, ps, other]

Finite Element Approximation of the Modified Maxwell's Stekloff Eigenvalues

Authors: Bo Gong, Jiguang Sun, Xinming Wu

Abstract: The modified Maxwell's Stekloff eigenvalue problem arises recently from the inverse electromagnetic scattering theory for inhomogeneous media. This paper contains a rigorous analysis of both the eigenvalue problem and the associated source problem on Lipschitz polyhedra. A new finite element method is proposed to compute Stekloff eigenvalues. By applying the Babuska-Osborn theory, we prove an erro… ▽ More The modified Maxwell's Stekloff eigenvalue problem arises recently from the inverse electromagnetic scattering theory for inhomogeneous media. This paper contains a rigorous analysis of both the eigenvalue problem and the associated source problem on Lipschitz polyhedra. A new finite element method is proposed to compute Stekloff eigenvalues. By applying the Babuska-Osborn theory, we prove an error estimate without additional regularity assumptions. Numerical results are presented for validation. △ Less

Submitted 9 April, 2020; originally announced April 2020.

MSC Class: 65N25; 65N30; 35Q61

arXiv:2003.14032 [pdf, other]

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

Authors: Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Zerong Xi, Boqing Gong, Hassan Foroosh

Abstract: The need for fine-grained perception in autonomous driving systems has resulted in recently increased research on online semantic segmentation of single-scan LiDAR. Despite the emerging datasets and technological advancements, it remains challenging due to three reasons: (1) the need for near-real-time latency with limited hardware; (2) uneven or even long-tailed distribution of LiDAR points acros… ▽ More The need for fine-grained perception in autonomous driving systems has resulted in recently increased research on online semantic segmentation of single-scan LiDAR. Despite the emerging datasets and technological advancements, it remains challenging due to three reasons: (1) the need for near-real-time latency with limited hardware; (2) uneven or even long-tailed distribution of LiDAR points across space; and (3) an increasing number of extremely fine-grained semantic classes. In an attempt to jointly tackle all the aforementioned challenges, we propose a new LiDAR-specific, nearest-neighbor-free segmentation algorithm - PolarNet. Instead of using common spherical or bird's-eye-view projection, our polar bird's-eye-view representation balances the points across grid cells in a polar coordinate system, indirectly aligning a segmentation network's attention with the long-tailed distribution of the points along the radial axis. We find that our encoding scheme greatly increases the mIoU in three drastically different segmentation datasets of real urban LiDAR single scans while retaining near real-time throughput. △ Less

Submitted 26 April, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

Comments: Accepted by CVPR 2020; Code at https://github.com/edwardzhou130/PolarSeg

arXiv:2003.13960 [pdf, other]

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model

Authors: Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Abstract: We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner. Progress on this problem can significantly reduce the dependence on large-scale datasets for learning high-performing visual recognition models. There are two major challenges. One is that the number of queries into the teacher model should be… ▽ More We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner. Progress on this problem can significantly reduce the dependence on large-scale datasets for learning high-performing visual recognition models. There are two major challenges. One is that the number of queries into the teacher model should be minimized to save computational and/or financial costs. The other is that the number of images used for the knowledge distillation should be small; otherwise, it violates our expectation of reducing the dependence on large-scale datasets. To tackle these challenges, we propose an approach that blends mixup and active learning. The former effectively augments the few unlabeled images by a big pool of synthetic images sampled from the convex hull of the original images, and the latter actively chooses from the pool hard examples for the student neural network and query their labels from the teacher model. We validate our approach with extensive experiments. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: Accepted to CVPR 2020

arXiv:2003.10780 [pdf, other]

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

Authors: Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Abstract: Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes. We analyze this mismatch from a domain adaptation point of view. First of all, we connect existing class-balanced methods for long-tailed classification to target s… ▽ More Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes. We analyze this mismatch from a domain adaptation point of view. First of all, we connect existing class-balanced methods for long-tailed classification to target shift, a well-studied scenario in domain adaptation. The connection reveals that these methods implicitly assume that the training data and test data share the same class-conditioned distribution, which does not hold in general and especially for the tail classes. While a head class could contain abundant and diverse training examples that well represent the expected data at inference time, the tail classes are often short of representative training data. To this end, we propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach. We validate our approach with six benchmark datasets and three loss functions. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: Accepted for publication at CVPR2020

arXiv:2003.02698 [pdf, ps, other]

Position-Based Interference Elimination for High Mobility OFDM Channel Estimation in Multi-cell Systems

Authors: Xiang Ren, Wen Chen, Bo Gong, Qibo Qin, Lin Gui

Abstract: Orthogonal frequency-division multiplexing (OFD-M) and multi-cell architecture are widely adopted in current high speed train (HST) systems for providing high data rate wireless communications. In this paper, a typical multi-antenna OFDM HST communication system with multi-cell architecture is considered, where the inter-carrier interference (ICI) caused by high mobility and multi-cell interferenc… ▽ More Orthogonal frequency-division multiplexing (OFD-M) and multi-cell architecture are widely adopted in current high speed train (HST) systems for providing high data rate wireless communications. In this paper, a typical multi-antenna OFDM HST communication system with multi-cell architecture is considered, where the inter-carrier interference (ICI) caused by high mobility and multi-cell interference (MCI) are both taken into consideration. By exploiting the train position information, a new position-based interference elimination method is proposed to eliminate both the MCI and ICI for a general basis expansion model (BEM). We show that the MCI and ICI can be completely eliminated by the proposed method to get the ICI-free pilots at each receive antenna. In addition, for the considered multi-cell HST system, we develop a low-complexity compressed channel estimation method and consider the optimal pilot pattern design. Both the proposed interference elimination method and the optimal pilot pattern are robust to the train speed and position,as well as the multi-cell multi-antenna system. Simulation results demonstrate the benefits and robustness of the proposed method in the multi-cell HST system. △ Less

Submitted 1 March, 2020; originally announced March 2020.

arXiv:2002.00169 [pdf, other]

Deep Multi-View Enhancement Hashing for Image Retrieval

Authors: Chenggang Yan, Biao Gong, Yuxuan Wei, Yue Gao

Abstract: Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However, large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods… ▽ More Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However, large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods can well preserve the diverse characteristics of data. Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance. In this paper, we propose a supervised multi-view hash model which can enhance the multi-view information through neural networks. This is a completely new hash learning method that combines multi-view and deep learning methods. The proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network. We have also designed a variety of multi-data fusion methods in the Hamming space to preserve the advantages of both convolution and multi-view. In order to avoid excessive computing resources on the enhancement procedure during retrieval, we set up a separate structure called memory network which participates in training together. The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets, and the results show that our method significantly outperforms the state-of-the-art single-view and multi-view hashing methods. △ Less

Submitted 15 June, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

arXiv:2001.06129 [pdf, ps, other]

doi 10.1038/s41598-020-72850-6

A Three-Dimensional Laser Interferometer Gravitational-Wave Detector

Authors: Mengxu Liu, Bi** Gong

Abstract: The gravitational wave (GW) has opened a new window to the universe beyond the electromagnetic spectrum. Since 2015, dozens of GW events have been caught by the ground-based GW detectors through laser interferometry. However, all the ground-based detectors are L-shaped Michelson interferometers, with very limited directional response to GW. Here we propose a three-dimensional (3-D) laser interfero… ▽ More The gravitational wave (GW) has opened a new window to the universe beyond the electromagnetic spectrum. Since 2015, dozens of GW events have been caught by the ground-based GW detectors through laser interferometry. However, all the ground-based detectors are L-shaped Michelson interferometers, with very limited directional response to GW. Here we propose a three-dimensional (3-D) laser interferometer detector in the shape of a regular triangular pyramid, which has more spherically symmetric antenna pattern. Moreover, the new configuration corresponds to much stronger constraints on parameters of GW sources, and is capable of constructing null-streams to get rid of the signal-like noise events. A 3-D detector of kilometer scale of such kind would shed new light on the joint search of GW and electromagnetic emission. △ Less

Submitted 6 October, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: 7 pages, 5 figures, published on Scientific Reports

Journal ref: Sci Rep 10, 16285 (2020)

arXiv:2001.05340 [pdf, ps, other]

Finite Element Approximation of Transmission Eigenvalues for Anisotropic Media

Authors: Bo Gong, Jiguang Sun, Tiara Turner, Chunxiong Zheng

Abstract: The transmission eigenvalue problem arises from the inverse scattering theory for inhomogeneous media and has important applications in many qualitative methods. The problem is posted as a system of two second order partial differential equations and is essentially nonlinear, non-selfadjoint, and of higher order. It is nontrivial to develop effective numerical methods and the proof of convergence… ▽ More The transmission eigenvalue problem arises from the inverse scattering theory for inhomogeneous media and has important applications in many qualitative methods. The problem is posted as a system of two second order partial differential equations and is essentially nonlinear, non-selfadjoint, and of higher order. It is nontrivial to develop effective numerical methods and the proof of convergence is challenging. In this paper, we formulate the transmission eigenvalue problem for anisotropic media as an eigenvalue problem of a holomorphic Fredholm operator function of index zero. The Lagrange finite elements are used for discretization and the convergence is proved using the abstract approximation theory for holomorphic operator functions. A spectral indicator method is developed to compute the eigenvalues. Numerical examples are presented for validation. △ Less

Submitted 15 January, 2020; originally announced January 2020.

arXiv:2001.05332 [pdf, ps, other]

A new finite element approach for the Dirichlet eigenvalue problem

Authors: Wenqiang Xiao, Bo Gong, Jiguang Sun, Zhimin Zhang

Abstract: In this paper, we propose a new finite element approach, which is different than the classic Babuska-Osborn theory, to approximate Dirichlet eigenvalues. The Dirichlet eigenvalue problem is formulated as the eigenvalue problem of a holomorphic Fredholm operator function of index zero. Using conforming finite elements, the convergence is proved using the abstract approximation theory for holomorphi… ▽ More In this paper, we propose a new finite element approach, which is different than the classic Babuska-Osborn theory, to approximate Dirichlet eigenvalues. The Dirichlet eigenvalue problem is formulated as the eigenvalue problem of a holomorphic Fredholm operator function of index zero. Using conforming finite elements, the convergence is proved using the abstract approximation theory for holomorphic operator functions. The spectral indicator method is employed to compute the eigenvalues. A numerical example is presented to validate the theory. △ Less

Submitted 15 January, 2020; originally announced January 2020.

MSC Class: 65N25; 65N30; 35P30

arXiv:2001.02378 [pdf, other]

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Authors: Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang

Abstract: Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed class… ▽ More Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. In our experiments, we show that our method can be applied to modern deep neural networks on a wide range of datasets, including Cifar-10, ImageNet, MNIST, and SVHN. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius. △ Less

Submitted 14 March, 2022; v1 submitted 8 January, 2020; originally announced January 2020.

Comments: Published in ICLR 2020. 20 Pages

arXiv:1912.11684 [pdf, other]

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

Authors: Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

Abstract: A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given… ▽ More A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data. To accomplish this task, the agent is required to learn from various modalities, i.e. relating the audio signal to the visual environment. Here we describe an approach to audio-visual embodied navigation that takes advantage of both visual and audio pieces of evidence. Our solution is based on three key ideas: a visual perception mapper module that constructs its spatial memory of the environment, a sound perception module that infers the relative location of the sound source from the agent, and a dynamic path planner that plans a sequence of actions based on the audio-visual observations and the spatial memory of the environment to navigate toward the goal. Experimental results on a newly collected Visual-Audio-Room dataset using the simulated multi-modal environment demonstrate the effectiveness of our approach over several competitive baselines. △ Less

Submitted 7 March, 2020; v1 submitted 25 December, 2019; originally announced December 2019.

Comments: Accepted by ICRA 2020. Project page: http://avn.csail.mit.edu

arXiv:1911.09665 [pdf, other]

Adversarial Examples Improve Image Recognition

Authors: Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

Abstract: Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate aux… ▽ More Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples. We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger. For instance, by applying AdvProp to the latest EfficientNet-B7 [28] on ImageNet, we achieve significant improvements on ImageNet (+0.7%), ImageNet-C (+6.5%), ImageNet-A (+7.0%), Stylized-ImageNet (+4.8%). With an enhanced EfficientNet-B8, our method achieves the state-of-the-art 85.5% ImageNet top-1 accuracy without extra data. This result even surpasses the best model in [20] which is trained with 3.5B Instagram images (~3000X more than ImageNet) and ~9.4X more parameters. Models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet. △ Less

Submitted 14 April, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: CVPR 2020, models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

arXiv:1909.03403 [pdf, other]

Open Compound Domain Adaptation

Authors: Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

Abstract: A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e.g., sunny weather) for achieving high performance on the test data in a target domain (e.g., rainy weather). Whether the target contains a single homogeneous domain or multiple heterogeneous domains, existing works always assume that there exist clear distinctions between the domains, which… ▽ More A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e.g., sunny weather) for achieving high performance on the test data in a target domain (e.g., rainy weather). Whether the target contains a single homogeneous domain or multiple heterogeneous domains, existing works always assume that there exist clear distinctions between the domains, which is often not true in practice (e.g., changes in weather). We study an open compound domain adaptation (OCDA) problem, in which the target is a compound of multiple homogeneous domains without domain labels, reflecting realistic data collection from mixed and novel situations. We propose a new approach based on two technical insights into OCDA: 1) a curriculum domain adaptation strategy to bootstrap generalization across domains in a data-driven self-organizing fashion and 2) a memory module to increase the model's agility towards novel domains. Our experiments on digit classification, facial expression recognition, semantic segmentation, and reinforcement learning demonstrate the effectiveness of our approach. △ Less

Submitted 29 March, 2020; v1 submitted 8 September, 2019; originally announced September 2019.

Comments: To appear in CVPR 2020 as an oral presentation. Code, datasets and models are available at: https://liuziwei7.github.io/projects/CompoundDomain.html

arXiv:1909.00889 [pdf, other]

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data

Authors: Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong

Abstract: We propose to harness the potential of simulation for the semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any data of target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First,… ▽ More We propose to harness the potential of simulation for the semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any data of target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with the styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on the generalization from GTA and SYNTHIA to Cityscapes, BDDS and Mapillary; and our method achieves superior results over the state-of-the-art techniques. Remarkably, our generalization results are on par with or even better than those obtained by state-of-the-art simulation-to-real domain adaptation methods, which access the target domain data at training time. △ Less

Submitted 10 August, 2022; v1 submitted 2 September, 2019; originally announced September 2019.

Comments: ICCV 2019

arXiv:1908.09547 [pdf, other]

Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach

Authors: Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong

Abstract: We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains. Our approach draws on an insight connecting two existing works: curriculum domain adaptation and self-training. Inspired by the former, PyCDA constructs a pyramid curriculum which c… ▽ More We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains. Our approach draws on an insight connecting two existing works: curriculum domain adaptation and self-training. Inspired by the former, PyCDA constructs a pyramid curriculum which contains various properties about the target domain. Those properties are mainly about the desired label distributions over the target domain images, image regions, and pixels. By enforcing the segmentation neural network to observe those properties, we can improve the network's generalization capability to the target domain. Motivated by the self-training, we infer this pyramid of properties by resorting to the semantic segmentation network itself. Unlike prior work, we do not need to maintain any additional models (e.g., logistic regression or discriminator networks) or to solve minmax problems which are often difficult to optimize. We report state-of-the-art results for the adaptation from both GTAV and SYNTHIA to Cityscapes, two popular settings in unsupervised domain adaptation for semantic segmentation. △ Less

Submitted 26 August, 2019; originally announced August 2019.

arXiv:1908.06354 [pdf, other]

A Fast and Accurate One-Stage Approach to Visual Grounding

Authors: Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo

Abstract: We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. The performances of existing propose-and-rank two-stage methods are capped by the quality of the region candidates they propose in the first stage --- if none of the candidates could cover the ground truth region, there is no hope in the second stage to rank the right region to the to… ▽ More We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. The performances of existing propose-and-rank two-stage methods are capped by the quality of the region candidates they propose in the first stage --- if none of the candidates could cover the ground truth region, there is no hope in the second stage to rank the right region to the top. To avoid this caveat, we propose a one-stage model that enables end-to-end joint optimization. The main idea is as straightforward as fusing a text query's embedding into the YOLOv3 object detector, augmented by spatial features so as to account for spatial mentions in the query. Despite being simple, this one-stage approach shows great potential in terms of both accuracy and speed for both phrase localization and referring expression comprehension, according to our experiments. Given these results along with careful investigations into some popular region proposals, we advocate for visual grounding a paradigm shift from the conventional two-stage methods to the one-stage framework. △ Less

Submitted 17 August, 2019; originally announced August 2019.

Comments: ICCV 2019 Oral

arXiv:1906.06765 [pdf, other]

Defending Against Adversarial Attacks Using Random Forests

Authors: Yifan Ding, Liqiang Wang, Huan Zhang, **feng Yi, Deliang Fan, Boqing Gong

Abstract: As deep neural networks (DNNs) have become increasingly important and popular, the robustness of DNNs is the key to the safety of both the Internet and the physical world. Unfortunately, some recent studies show that adversarial examples, which are hard to be distinguished from real examples, can easily fool DNNs and manipulate their predictions. Upon observing that adversarial examples are mostly… ▽ More As deep neural networks (DNNs) have become increasingly important and popular, the robustness of DNNs is the key to the safety of both the Internet and the physical world. Unfortunately, some recent studies show that adversarial examples, which are hard to be distinguished from real examples, can easily fool DNNs and manipulate their predictions. Upon observing that adversarial examples are mostly generated by gradient-based methods, in this paper, we first propose to use a simple yet very effective non-differentiable hybrid model that combines DNNs and random forests, rather than hide gradients from attackers, to defend against the attacks. Our experiments show that our model can successfully and completely defend the white-box attacks, has a lower transferability, and is quite resistant to three representative types of black-box attacks; while at the same time, our model achieves similar classification accuracy as the original DNNs. Finally, we investigate and suggest a criterion to define where to grow random forests in DNNs. △ Less

Submitted 16 June, 2019; originally announced June 2019.

arXiv:1905.08641 [pdf, other]

doi 10.1103/PhysRevB.101.220510

Emergent superconductivity in single crystalline $\mathrm{MgTi}_2\mathrm{O}_4$ films via structural engineering

Authors: Wei Hu, Zhongpei Feng, Ben-Chao Gong, Ge He, Dong Li, Mingyang Qin, Yujun Shi, Qian Li, Qinghua Zhang, Jie Yuan, Beiyi Zhu, Kai Liu, Tao Xiang, Lin Gu, Fang Zhou, Xiaoli Dong, Zhongxian Zhao, Kui **

Abstract: Spinel compounds have demonstrated rich functionalities but rarely shown superconductivity. Here, we report the emergence of superconductivity in the spinel $\mathrm{MgTi}_2\mathrm{O}_4$, known to be an insulator with a complicated order. The superconducting transition is achieved by engineering a superlattice of $\mathrm{MgTi}_2\mathrm{O}_4$ and $\mathrm{SrTiO}_3$. The onset transition temperatur… ▽ More Spinel compounds have demonstrated rich functionalities but rarely shown superconductivity. Here, we report the emergence of superconductivity in the spinel $\mathrm{MgTi}_2\mathrm{O}_4$, known to be an insulator with a complicated order. The superconducting transition is achieved by engineering a superlattice of $\mathrm{MgTi}_2\mathrm{O}_4$ and $\mathrm{SrTiO}_3$. The onset transition temperature in the $\mathrm{MgTi}_2\mathrm{O}_4$ layer can be tuned from 0 to 5 K in such geometry, concurrently with a stretched $c$-axis (from 8.51 to 8.53 Å) compared to the bulk material. Such a positive correlation without saturation suggests ample room for the further enhancement. Intriguingly, the superlattice exhibits isotropic upper critical field $H_{\mathrm{c}2}$ that breaks the Pauli limit, distinct from the highly anisotropic feature of interface superconductivity. The origin of superconductivity in the $\mathrm{MgTi}_2\mathrm{O}_4$ layer is understood in combination with the electron energy loss spectra and the first-principles electronic structure calculations, which point to the birth of superconductivity in the $\mathrm{MgTi}_2\mathrm{O}_4$ layer by preventing the Ti-Ti dimerization. Our discovery not only provides a platform to explore the interplay between the superconductivity and other exotic states, but also opens a new window to realize superconductivity in the spinel compounds as well as other titanium oxides. △ Less

Submitted 30 May, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

Comments: 5 pages, 4 figures

Journal ref: Phys. Rev. B 101, 220510 (2020)

arXiv:1905.04690 [pdf, other]

doi 10.1103/PhysRevA.100.012116

Binary Discrimination in Quantum Systems via Hypothesis Testing

Authors: Beili Gong, Wei Cui

Abstract: We investigate the discrimination of two candidates of an unknown parameter in quantum systems with continuous weak measurement, inspired by the application of hypothesis testing in distinguish-ing two Hamiltonians [Kiilerich and Mølmer, Phys. Rev. A, 98, 022103 (2018)]. Based on the measurement output and stochastic master equation, temporal evolutions of posterior probabilities of two hypotheses… ▽ More We investigate the discrimination of two candidates of an unknown parameter in quantum systems with continuous weak measurement, inspired by the application of hypothesis testing in distinguish-ing two Hamiltonians [Kiilerich and Mølmer, Phys. Rev. A, 98, 022103 (2018)]. Based on the measurement output and stochastic master equation, temporal evolutions of posterior probabilities of two hypotheses are given by Bayes' formula. The Bayes criterion is presented by the likelihood ratio conditioned on the outcome of measurements. Different from the calculation method based on maximum a posteriori criterion, the Bayes criterion based method for calculating the average prob-ability of making errors is more suitable and efficient in general situation of binary discrimination. Finally, an example of distinguishing two candidate Hamiltonians is given and the running times of calculating the average probability of error under the Bayes criterion and the maximum a posteriori criterion are compared to illustrate the feasibility of the hypothesis testing in quickly distinguishing two candidates of the parameter to be estimated. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Journal ref: Phys. Rev. A 100, 012116 (2019)

arXiv:1905.00441 [pdf, other]

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

Authors: Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

Abstract: Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques. In this paper, we propose a black-box adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently. Instead of searching for an "optimal" adversarial example for a be… ▽ More Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques. In this paper, we propose a black-box adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently. Instead of searching for an "optimal" adversarial example for a benign input to a targeted DNN, our algorithm finds a probability density distribution over a small region centered around the input, such that a sample drawn from this distribution is likely an adversarial example, without the need of accessing the DNN's internal layers or weights. Our approach is universal as it can successfully attack different neural networks by a single algorithm. It is also strong; according to the testing against 2 vanilla DNNs and 13 defended ones, it outperforms state-of-the-art black-box or white-box attack methods for most test cases. Additionally, our results reveal that adversarial training remains one of the best defense techniques, and the adversarial examples are not as transferable across defended DNNs as them across vanilla DNNs. △ Less

Submitted 9 December, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

arXiv:1904.05160 [pdf, other]

Large-Scale Long-Tailed Recognition in an Open World

Authors: Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

Abstract: Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced tes… ▽ More Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at https://liuziwei7.github.io/projects/LongTail.html. △ Less

Submitted 16 April, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: To appear in CVPR 2019 as an oral presentation. Code, datasets and models are available at https://liuziwei7.github.io/projects/LongTail.html

arXiv:1904.03276 [pdf, other]

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

Authors: Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

Abstract: The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence. In this paper, we consider the problem of learning to simultaneously transfer across both environments (ENV) and tasks (TASK), probably more importantly, by learning from only sparse (ENV, TASK) pairs out of all the possible combinations. We propose a novel compositional neural ne… ▽ More The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence. In this paper, we consider the problem of learning to simultaneously transfer across both environments (ENV) and tasks (TASK), probably more importantly, by learning from only sparse (ENV, TASK) pairs out of all the possible combinations. We propose a novel compositional neural network architecture which depicts a meta rule for composing policies from the environment and task embeddings. Notably, one of the main challenges is to learn the embeddings jointly with the meta rule. We further propose new training methods to disentangle the embeddings, making them both distinctive signatures of the environments and tasks and effective building blocks for composing the policies. Experiments on GridWorld and Thor, of which the agent takes as input an egocentric view, show that our approach gives rise to high success rates on all the (ENV, TASK) pairs after learning from only 40% of them. △ Less

Submitted 26 May, 2021; v1 submitted 5 April, 2019; originally announced April 2019.

Comments: presented at NeurIPS 2018 as a Spotlight

arXiv:1903.12532 [pdf]

Magnetic exchange induced Weyl state in a semimetal EuCd2Sb2

Authors: Hao Su, Benchao Gong, Wujun Shi, Haifeng Yang, Hongyuan Wang, Wei Xia, Zhenhai Yu, Peng-Jie Guo, **hua Wang, Linchao Ding, Liangcai Xu, Xiaokang Li, Xia Wang, Zhiqiang Zou, Na Yu, Zengwei Zhu, Yulin Chen, Zhongkai Liu, Kai Liu, Gang Li, Yanfeng Guo

Abstract: Magnetic Weyl semimetals (WSMs) bearing long-time pursuing are still very rare. We herein identified magnetic exchange induced Weyl state in EuCd2Sb2, a semimetal in type IV magnetic space group, via performing high magnetic field (B) magneto-transport measurements and ab initio calculations. For the A-type antiferromagnetic (AFM) structure of EuCd2Sb2, external B larger than 3.2 T can align Eu sp… ▽ More Magnetic Weyl semimetals (WSMs) bearing long-time pursuing are still very rare. We herein identified magnetic exchange induced Weyl state in EuCd2Sb2, a semimetal in type IV magnetic space group, via performing high magnetic field (B) magneto-transport measurements and ab initio calculations. For the A-type antiferromagnetic (AFM) structure of EuCd2Sb2, external B larger than 3.2 T can align Eu spins to be fully polarized along the c-axis and consequently drive the system into a ferromagnetic (FM) state. Measurements up to B ~ 55 T revealed a striking Shubnikov-de Hass oscillation imposed by a nontrivial Berry phase. We unveiled a phase transition from a small-gap AFM topological insulator into a FM WSM in which Weyl points emerged along the Γ-Z path. Fermi arcs on (100) and (010) surfaces are also revealed. The results pave a way towards realization of various topological states in a single material through magnetic exchange manipulation. △ Less

Submitted 29 March, 2019; originally announced March 2019.

Comments: 31 papges,10 figures, 2 tables

arXiv:1903.08161 [pdf, other]

doi 10.1088/1674-1137/43/8/083104

Perturbative QCD for $J/ψ$ Inclusive Production Via Initial State Radiation at $e^+e^-$ collider

Authors: Bin Gong, Yu-Dong Wang, Jian-Xiong Wang

Abstract: Up to the next-leading order (NLO) of quantum chromodynamics (QCD), the process $e^+e^-\to J/ψ+X$ with the center-of-mass (CM) energy range from 3.7 to 10.6 GeV is calculated. At 10.6 GeV, the results is consistent with the experiment results at the Belle. However, the predictions are much smaller than the measurement at BESIII at low CM energy range from 3.7 to 4.6 GeV. This indicates that the co… ▽ More Up to the next-leading order (NLO) of quantum chromodynamics (QCD), the process $e^+e^-\to J/ψ+X$ with the center-of-mass (CM) energy range from 3.7 to 10.6 GeV is calculated. At 10.6 GeV, the results is consistent with the experiment results at the Belle. However, the predictions are much smaller than the measurement at BESIII at low CM energy range from 3.7 to 4.6 GeV. This indicates that the convergence of QCD perturbative expansion becomes worse as the CM energy becomes lower and closer to the inclusive $J/ψ$ production threshold. For a further study of the QCD mechanism on $J/ψ$ production at $e^+e^-$ collider with different CM energy, the initial state radiation effect of $e^+e^-\to J/ψ+gg$ and $e^+e^-\to J/ψ+c \bar{c}$ are calculated at the QCD NLO. The results are plotted and the numbers of events for different CM energy bins are provided for the designed SuperKEKB. This provides a method to precisely test the validity of perturbative prediction on $J/ψ$ production in future measurements. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: 7 pages, 9 figures

arXiv:1903.01283 [pdf, other]

Force Tracking in Cavity Optomechanics with a Two-Level Quantum System by Kalman Filtering

Authors: Beili Gong, Daoyi Dong, Weizhou Su, Wei Cui

Abstract: This paper investigates waveform estimation (tracking) of the time-varying force in a two-level optomechanical system with backaction noise by Kalman filtering. It is assumed that the backaction and measurement noises are Gaussian and white. By discretizing the continuous-time optomechanical system, the state of the resulting system can be estimated by the unbiased minimum variance Kalman filterin… ▽ More This paper investigates waveform estimation (tracking) of the time-varying force in a two-level optomechanical system with backaction noise by Kalman filtering. It is assumed that the backaction and measurement noises are Gaussian and white. By discretizing the continuous-time optomechanical system, the state of the resulting system can be estimated by the unbiased minimum variance Kalman filtering. Then an estimator of the time-varying force is obtained, provided that the external force is also in discrete time. Furthermore, the accuracy of the force estimation, described by the mean squared error, is derived theoretically. Finally, the feasibility of the proposed algorithm is illustrated by comparing the theoretical accuracy with the numerical accuracy in a numerical example. △ Less

Submitted 4 March, 2019; originally announced March 2019.

arXiv:1902.09255 [pdf, other]

doi 10.1145/3308558.3313621

Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

Authors: Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang

Abstract: Real-time traffic volume inference is key to an intelligent city. It is a challenging task because accurate traffic volumes on the roads can only be measured at certain locations where sensors are installed. Moreover, the traffic evolves over time due to the influences of weather, events, holidays, etc. Existing solutions to the traffic volume inference problem often rely on dense GPS trajectories… ▽ More Real-time traffic volume inference is key to an intelligent city. It is a challenging task because accurate traffic volumes on the roads can only be measured at certain locations where sensors are installed. Moreover, the traffic evolves over time due to the influences of weather, events, holidays, etc. Existing solutions to the traffic volume inference problem often rely on dense GPS trajectories, which inevitably fail to account for the vehicles which carry no GPS devices or have them turned off. Consequently, the results are biased to taxicabs because they are almost always online for GPS tracking. In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems. Our approach employs a high-fidelity traffic simulator and deep reinforcement learning to recover full vehicle movements from the incomplete trajectories. In order to jointly model the recovered trajectories and dense GPS trajectories, we construct spatiotemporal graphs and use multi-view graph embedding to encode the multi-hop correlations between road segments into real-valued vectors. Finally, we infer the citywide traffic volumes by propagating the traffic values of monitored road segments to the unmonitored ones through masked pairwise similarities. Extensive experiments with two big regions in a provincial capital city in China verify the effectiveness of our approach. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: Accepted by The Web Conference (WWW) 2019

arXiv:1812.09953 [pdf, other]

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes

Authors: Yang Zhang, Philip David, Hassan Foroosh, Boqing Gong

Abstract: During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving and augmented reality. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs o… ▽ More During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving and augmented reality. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data hinders the models' performance. Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network, while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach. △ Less

Submitted 9 January, 2019; v1 submitted 24 December, 2018; originally announced December 2018.

Comments: This is the journal version of arXiv:1707.09465

arXiv:1812.06423 [pdf, other]

Classifier and Exemplar Synthesis for Zero-Shot Learning

Authors: Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha

Abstract: Zero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as learning manifold embeddings from graphs composed of object classes, leading to a flexible approach that synthesizes "classifiers" for the unseen classes. Then… ▽ More Zero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as learning manifold embeddings from graphs composed of object classes, leading to a flexible approach that synthesizes "classifiers" for the unseen classes. Then, we define an auxiliary task of synthesizing "exemplars" for the unseen classes to be used as an automatic denoising mechanism for any existing ZSL approaches or as an effective ZSL model by itself. On five visual recognition benchmark datasets, we demonstrate the superior performances of our proposed frameworks in various scenarios of both conventional and generalized ZSL. Finally, we provide valuable insights through a series of empirical analyses, among which are a comparison of semantic representations on the full ImageNet benchmark as well as a comparison of metrics used in generalized ZSL. Our code and data are publicly available at https://github.com/pujols/Zero-shot-learning-journal △ Less

Submitted 18 July, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

Comments: Extended version of arXiv:1603.00550 (CVPR 2016) and arXiv:1605.08151 (ICCV 2017); Accepted for publication in International Journal of Computer Vision (IJCV)

arXiv:1812.02597 [pdf, other]

doi 10.1103/PhysRevD.103.095030

Signature of 2HDM at Higgs Factories

Authors: Wenhai Xie, R. Benbrik, Abdeljalil Habjia, Souad Taj, Bin Gong, Qi-Shu Yan

Abstract: The full one-loop corrections, both the weak and QED corrections, to the process $e^+ e^- \to Z φ$ ($φ=h^0,H^0$) in the two Higgs doublet model (2HDM) at the Higgs factories are presented. Up to the $O(α_{em})$ level, the virtual corrections are evaluated by using the FeynArts/FormCalc packages. The real emission corrections are computed using the Feynman Diagram Calculation (FDC) package and the… ▽ More The full one-loop corrections, both the weak and QED corrections, to the process $e^+ e^- \to Z φ$ ($φ=h^0,H^0$) in the two Higgs doublet model (2HDM) at the Higgs factories are presented. Up to the $O(α_{em})$ level, the virtual corrections are evaluated by using the FeynArts/FormCalc packages. The real emission corrections are computed using the Feynman Diagram Calculation (FDC) package and the collinear divergences are regularized by the structure functions of an electron. Using the FeynArts/FormCalc and the FDC packages, we study the corrections in the Standard Model (SM) and the 2HDM, respectively. Gauge dependence arising in the normalization of mixing angles is removed by using the pinch technique. After taking into account experimental constraints from the current LHC data, we propose four interesting benchmark scenarios for future colliders. By using these benchmark scenarios, we evaluate the deviation of $Δσ(e^+ e^- \to Z φ)$ from their SM values. We also examine Higgs boson decays $φ\to b\bar{b}$ and $φ\to τ^+τ^-$, which can have large electroweak (EW) contributions from triple Higgs couplings which are absent in the SM. It is found that for these benchmark scenarios, both EW and real emission corrections are sizeable and could be measured at a future $e^+ e^-$ colliders such as the ILC, CLIC, and CEPC. △ Less

Submitted 15 April, 2021; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: 28 pages, 15 figures, to appear in PRD

Journal ref: Phys. Rev. D 103, 095030 (2021)

arXiv:1810.08989 [pdf, other]

doi 10.1103/PhysRevD.99.014044

The remaining parts for the long-standing J/psi polarization puzzle

Authors: Yu Feng, Bin Gong, Chao-Hsi Chang, Jian-Xiong Wang

Abstract: Based on the non-relativistic quantum chromodynamics factorization formalism, the polarization parameters $λ_{θφ}$ and $λ_φ$ of $J/ψ$ hadroproduction are analyzed in helicity frame and calculated at QCD next-to-leading order for the first time. For prompt $J/ψ$ production, we take into account the feeddown contributions from $χ_{cJ}$ and $ψ(2S)$ decays. The theoretical predictions for the polariza… ▽ More Based on the non-relativistic quantum chromodynamics factorization formalism, the polarization parameters $λ_{θφ}$ and $λ_φ$ of $J/ψ$ hadroproduction are analyzed in helicity frame and calculated at QCD next-to-leading order for the first time. For prompt $J/ψ$ production, we take into account the feeddown contributions from $χ_{cJ}$ and $ψ(2S)$ decays. The theoretical predictions for the polarization parameters $λ_{θφ}$ and $λ_φ$ of $J/ψ$ are presented. With the theoretical results we have done the fit to the experimental measurements on yield and polarization for $J/ψ$ hadroproduction simultaneously, and found that the results are coincide with the experimental measurements at the LHC quite well. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: 5 pages, 4 figures

Journal ref: Phys. Rev. D 99, 014044 (2019)

arXiv:1810.05853 [pdf, ps, other]

A Model of Solar Dynamo with Alternative Conversion of Large-Scale Magnetic Field and Production of Sunspots

Authors: Bi** Gong

Abstract: Since the discovery of solar cycle related with magnetic field in 1908, deep seated oscillatory dynamo has been studied extensively. However, there are still open questions on the solar dynamo, e.g., asymmetric conversion between large-scale poloidal and toroidal field as well as physics underlying the butterfly pattern of sunspots. Here we report a new generation of large-scale magnetic field and… ▽ More Since the discovery of solar cycle related with magnetic field in 1908, deep seated oscillatory dynamo has been studied extensively. However, there are still open questions on the solar dynamo, e.g., asymmetric conversion between large-scale poloidal and toroidal field as well as physics underlying the butterfly pattern of sunspots. Here we report a new generation of large-scale magnetic field and process of energy release. The inductive action of fluid motions pervading the solar interior is represented by a RLC circuit in which the toroidal field built up through twisting of poloidal field, so called $ω$-effect, plays the role of a capacitor. Such a RLC circuit not only provides a self-sustained oscillatory system avoiding Cowling's antidynamo theorem, but also site of rapid magnetic reconnection which reproduces quadrupole magnetic field interpreting the behavior of sunspots and moving of foot-point in solar activities. Moreover, parameters of the circuit and the Sun are well consistent with the 22-year solar cycle. △ Less

Submitted 13 October, 2018; originally announced October 2018.

Comments: 5pages,3 figures

arXiv:1809.06557 [pdf, other]

doi 10.1145/3272127.3275060

Image Super-Resolution via Deterministic-Stochastic Synthesis and Local Statistical Rectification

Authors: Weifeng Ge, Bingchen Gong, Yizhou Yu

Abstract: Single image superresolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this paper, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high resolution image as a combination of two components, a deterministic component and a stochastic c… ▽ More Single image superresolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this paper, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the groundtruth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: to appear in SIGGRAPH Asia 2018

arXiv:1808.02992 [pdf, other]

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

Authors: Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong

Abstract: The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip. In this paper, for the sake of both furthering this exploration and our own interest in a realistic application, we study image-to-video translation and particularly focus on the videos of facial expressions. This prob… ▽ More The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip. In this paper, for the sake of both furthering this exploration and our own interest in a realistic application, we study image-to-video translation and particularly focus on the videos of facial expressions. This problem challenges the deep neural networks by another temporal dimension comparing to the image-to-image translation. Moreover, its single input image fails most existing video generation methods that rely on recurrent models. We propose a user-controllable approach so as to generate video clips of various lengths from a single face image. The lengths and types of the expressions are controlled by users. To this end, we design a novel neural network architecture that can incorporate the user input into its skip connections and propose several improvements to the adversarial training method for the neural network. Experiments and user studies verify the effectiveness of our approach. Especially, we would like to highlight that even for the face images in the wild (downloaded from the Web and the authors' own photos), our model can generate high-quality facial expression videos of which about 50\% are labeled as real by Amazon Mechanical Turk workers. △ Less

Submitted 8 August, 2018; originally announced August 2018.

Comments: 10 pages

arXiv:1807.10957 [pdf, other]

Improving Sequential Determinantal Point Processes for Supervised Video Summarization

Authors: Aidean Sharghi, Ali Borji, Chengtao Li, Tianbao Yang, Boqing Gong

Abstract: It is now much easier than ever before to produce videos. While the ubiquitous video data is a great source for information discovery and extraction, the computational challenges are unparalleled. Automatically summarizing the videos has become a substantial need for browsing, searching, and indexing visual content. This paper is in the vein of supervised video summarization using sequential deter… ▽ More It is now much easier than ever before to produce videos. While the ubiquitous video data is a great source for information discovery and extraction, the computational challenges are unparalleled. Automatically summarizing the videos has become a substantial need for browsing, searching, and indexing visual content. This paper is in the vein of supervised video summarization using sequential determinantal point process (SeqDPP), which models diversity by a probabilistic distribution. We improve this model in two folds. In terms of learning, we propose a large-margin algorithm to address the exposure bias problem in SeqDPP. In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary. Moreover, we also significantly extend a popular video summarization dataset by 1) more egocentric videos, 2) dense user annotations, and 3) a refined evaluation scheme. We conduct extensive experiments on this dataset (about 60 hours of videos in total) and compare our approach to several competitive baselines. △ Less

Submitted 24 October, 2018; v1 submitted 28 July, 2018; originally announced July 2018.

arXiv:1807.07948 [pdf, other]

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Authors: Zhezhi He, Boqing Gong, Deliang Fan

Abstract: Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm in the low power and resource-limited embedded systems. As the countermeasure to this problem, in this work, we propose statistical weight scaling and residual e… ▽ More Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm in the low power and resource-limited embedded systems. As the countermeasure to this problem, in this work, we propose statistical weight scaling and residual expansion methods to reduce the bit-width of the whole network weight parameters to ternary values (i.e. -1, 0, +1), with the objectives to greatly reduce model size, computation cost and accuracy degradation caused by the model compression. With about 16x model compression rate, our ternarized ResNet-32/44/56 could outperform full-precision counterparts by 0.12%, 0.24% and 0.18% on CIFAR- 10 dataset. We also test our ternarization method with AlexNet and ResNet-18 on ImageNet dataset, which both achieve the best top-1 accuracy compared to recent similar works, with the same 16x compression rate. If further incorporating our residual expansion method, compared to the full-precision counterpart, our ternarized ResNet-18 even improves the top-5 accuracy by 0.61% and merely degrades the top-1 accuracy only by 0.42% for the ImageNet dataset, with 8x model compression rate. It outperforms the recent ABC-Net by 1.03% in top-1 accuracy and 1.78% in top-5 accuracy, with around 1.25x higher compression rate and more than 6x computation reduction due to the weight sparsity. △ Less

Submitted 20 July, 2018; originally announced July 2018.

arXiv:1807.06714 [pdf, other]

Defend Deep Neural Networks Against Adversarial Examples via Fixed and Dynamic Quantized Activation Functions

Authors: Adnan Siraj Rakin, **feng Yi, Boqing Gong, Deliang Fan

Abstract: Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks. To this end, many defense approaches that attempt to improve the robustness of DNNs have been proposed. In a separate and yet related area, recent works have explored to quantize neural network weights and activation functions into low bit-width to compress model size and reduce computational complexi… ▽ More Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks. To this end, many defense approaches that attempt to improve the robustness of DNNs have been proposed. In a separate and yet related area, recent works have explored to quantize neural network weights and activation functions into low bit-width to compress model size and reduce computational complexity. In this work, we find that these two different tracks, namely the pursuit of network compactness and robustness, can be merged into one and give rise to networks of both advantages. To the best of our knowledge, this is the first work that uses quantization of activation functions to defend against adversarial examples. We also propose to train robust neural networks by using adaptive quantization techniques for the activation functions. Our proposed Dynamic Quantized Activation (DQA) is verified through a wide range of experiments with the MNIST and CIFAR-10 datasets under different white-box attack methods, including FGSM, PGD, and C & W attacks. Furthermore, Zeroth Order Optimization and substitute model-based black-box attacks are also considered in this work. The experimental results clearly show that the robustness of DNNs could be greatly improved using the proposed DQA. △ Less

Submitted 18 December, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

arXiv:1807.04219 [pdf, other]

How Local is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization

Authors: Yandong Li, Liqiang Wang, Tianbao Yang, Boqing Gong

Abstract: The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity. If videos are lengthy like hours-long egocentric videos, it is necessary to track the temporal structures of the videos and enforce local diversity. The local diversity refers to that the shots selected from a short time dura… ▽ More The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity. If videos are lengthy like hours-long egocentric videos, it is necessary to track the temporal structures of the videos and enforce local diversity. The local diversity refers to that the shots selected from a short time duration are diverse but visually similar shots are allowed to co-exist in the summary if they appear far apart in the video. In this paper, we propose a novel probabilistic model, built upon SeqDPP, to dynamically control the time span of a video segment upon which the local diversity is imposed. In particular, we enable SeqDPP to learn to automatically infer how local the local diversity is supposed to be from the input video. The resulting model is extremely involved to train by the hallmark maximum likelihood estimation (MLE), which further suffers from the exposure bias and non-differentiable evaluation metrics. To tackle these problems, we instead devise a reinforcement learning algorithm for training the proposed model. Extensive experiments verify the advantages of our model and the new learning algorithm over MLE-based methods. △ Less

Submitted 23 August, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

Journal ref: European Conference on Computer Vision (ECCV 2018)

arXiv:1807.00202 [pdf, other]

Improved Techniques for Learning to Dehaze and Beyond: A Collective Study

Authors: Yu Liu, Guanlong Zhao, Boyuan Gong, Yang Li, Ritu Raj, Niraj Goel, Satya Kesav, Sandeep Gottimukkala, Zhangyang Wang, Wenqi Ren, Dacheng Tao

Abstract: Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e.g., object detection) of hazy images. For the first task, we investigated a variety of loss functions and show that perception-driven loss significan… ▽ More Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e.g., object detection) of hazy images. For the first task, we investigated a variety of loss functions and show that perception-driven loss significantly improves dehazing performance. In the second task, we provide multiple solutions including using advanced modules in the dehazing-detection cascade and domain-adaptive object detectors. In both tasks, our proposed solutions significantly improve performance. GitHub repository URL is: https://github.com/guanlongzhao/dehaze △ Less

Submitted 29 July, 2018; v1 submitted 30 June, 2018; originally announced July 2018.

Comments: updated: typo fixed and some other improvements on writing

arXiv:1806.02478 [pdf, ps, other]

doi 10.1103/PhysRevA.99.012703

Bloch bound state of spin-orbit-coupled fermions in an optical lattice

Authors: Baihua Gong, Shuai Li, Xin-Hui Zhang, Bo Liu, Wei Yi

Abstract: Understanding fundamentals of few-body physics provides an interesting bottom-up approach for the clarification of many-body properties. The remarkable experimental progress in realizing spin-orbit coupling (SOC) in optical Raman lattices offers a renewed thrust towards discovering novel few-body features induced by the interplay between SOC and optical lattices. Using the Wilson renormalization m… ▽ More Understanding fundamentals of few-body physics provides an interesting bottom-up approach for the clarification of many-body properties. The remarkable experimental progress in realizing spin-orbit coupling (SOC) in optical Raman lattices offers a renewed thrust towards discovering novel few-body features induced by the interplay between SOC and optical lattices. Using the Wilson renormalization method to account for high-band effects, we study the low-energy two-body scattering processes of spin-$1/2$ fermions in spin-orbit coupled optical lattices. We demonstrate that, under weak SOC, adding a small lattice potential would destabilize shallow two-body bound states, contrary to conventional wisdom. On the other hand, when lattice is sufficiently deep, two-body bound states are always stabilized by increasing the lattice depth. This intriguing non-monotonic behavior of the bound-state stability derives from the competition between SOC and optical lattices, and can be explained by analyzing the low-energy density of states. We also discuss the impact of high-band effects on such a behavior, as well as potential experimental detections. △ Less

Submitted 14 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: 6 pages, 4 figures, including supplementary materials

Journal ref: Phys. Rev. A 99, 012703 (2019)

arXiv:1804.04519 [pdf, ps, other]

doi 10.1016/j.scib.2018.05.036

The melilite-type compound (Sr$_{1-x}$,$A_x$)$_2$MnGe$_2$S$_6$O ($A$=K, La) being a room temperature ferromagnetic semiconductor

Authors: Huan-Cheng Yang, Ben-Chao Gong, Kai Liu, Zhong-Yi Lu

Abstract: The seeking of room temperature ferromagnetic semiconductors, which take advantages of both the charge and spin degrees of freedom of electrons to realize a variety of functionalities in devices integrated with electronic, optical, and magnetic storage properties, has been a long-term goal of scientists and engineers. Here, by using the spin-polarized density functional theory calculations, we pre… ▽ More The seeking of room temperature ferromagnetic semiconductors, which take advantages of both the charge and spin degrees of freedom of electrons to realize a variety of functionalities in devices integrated with electronic, optical, and magnetic storage properties, has been a long-term goal of scientists and engineers. Here, by using the spin-polarized density functional theory calculations, we predict a new series of high temperature ferromagnetic semiconductors based on the melilite-type oxysulfide Sr$_2$MnGe$_2$S$_6$O through hole (K) and electron (La) do**. Due to the lack of strong antiferromagnetic superexchange between Mn ions, the weak antiferromagnetic order in the parent compound Sr$_2$MnGe$_2$S$_6$O can be suppressed easily by charge do** with either $p$-type or $n$-type carriers, giving rise to the expected ferromagnetic order. At a do** concentration of 25%, both the hole-doped and electron-doped compounds can achieve a Curie temperature ($T_\text{c}$) above 300 K. The underlying mechanism is analyzed. Our study provides an effective approach for exploring new types of high temperature ferromagnetic semiconductors. △ Less

Submitted 12 April, 2018; originally announced April 2018.

Comments: 6 pages, 4 figures, 1 table

Journal ref: Sci. Bull. 63, 887 (2018)

arXiv:1804.00413 [pdf, other]

End-to-End Learning of Motion Representation for Video Understanding

Authors: Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

Abstract: Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations… ▽ More Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally concatenated with other task-specific networks to formulate an end-to-end architecture, thus making our method more efficient than current multi-stage approaches by avoiding the need to pre-compute and store features on disk. Finally, the parameters of the TVNet can be further fine-tuned by end-to-end training. This enables TVNet to learn richer and task-specific patterns beyond exact optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach. Our TVNet achieves better accuracies than all compared methods, while being competitive with the fastest counterpart in terms of features extraction time. △ Less

Submitted 2 April, 2018; originally announced April 2018.

Comments: CVPR 2018 spotlight. The first two authors contributed equally to this paper

arXiv:1803.07950 [pdf, other]

End-to-End Video Captioning with Multitask Reinforcement Learning

Authors: Lijun Li, Boqing Gong

Abstract: Although end-to-end (E2E) learning has led to impressive progress on a variety of visual understanding tasks, it is often impeded by hardware constraints (e.g., GPU memory) and is prone to overfitting. When it comes to video captioning, one of the most challenging benchmark tasks in computer vision, those limitations of E2E learning are especially amplified by the fact that both the input videos a… ▽ More Although end-to-end (E2E) learning has led to impressive progress on a variety of visual understanding tasks, it is often impeded by hardware constraints (e.g., GPU memory) and is prone to overfitting. When it comes to video captioning, one of the most challenging benchmark tasks in computer vision, those limitations of E2E learning are especially amplified by the fact that both the input videos and output captions are lengthy sequences. Indeed, state-of-the-art methods for video captioning process video frames by convolutional neural networks and generate captions by unrolling recurrent neural networks. If we connect them in an E2E manner, the resulting model is both memory-consuming and data-hungry, making it extremely hard to train. In this paper, we propose a multitask reinforcement learning approach to training an E2E video captioning model. The main idea is to mine and construct as many effective tasks (e.g., attributes, rewards, and the captions) as possible from the human captioned videos such that they can jointly regulate the search space of the E2E neural network, from which an E2E video captioning model can be found and generalized to the testing phase. To the best of our knowledge, this is the first video captioning model that is trained end-to-end from the raw video input to the caption output. Experimental results show that such a model outperforms existing ones to a large margin on two benchmark video captioning datasets. △ Less

Submitted 1 January, 2019; v1 submitted 21 March, 2018; originally announced March 2018.

arXiv:1803.01541 [pdf, other]

Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

Authors: Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, Liqiang Wang

Abstract: Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax two-player training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1-Lipschitz cont… ▽ More Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax two-player training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1-Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semi-supervised learning methods. As a result, it gives rise to not only better photo-realistic samples than the previous methods but also state-of-the-art semi-supervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000 labeled images, to the best of our knowledge. △ Less

Submitted 5 March, 2018; originally announced March 2018.

Comments: Accepted as a conference paper in International Conference on Learning Representation(ICLR). Xiang Wei and Boqing Gong contributed equally in this work

arXiv:1802.02679 [pdf, other]

A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

Authors: Yifan Ding, Liqiang Wang, Deliang Fan, Boqing Gong

Abstract: The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks us… ▽ More The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high. △ Less

Submitted 21 March, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

Journal ref: IEEE Winter Conf. on Applications of Computer Vision 2018

arXiv:1802.01549 [pdf, other]

Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples

Authors: Adnan Siraj Rakin, Zhezhi He, Boqing Gong, Deliang Fan

Abstract: Deep learning algorithms and networks are vulnerable to perturbed inputs which is known as the adversarial attack. Many defense methodologies have been investigated to defend against such adversarial attack. In this work, we propose a novel methodology to defend the existing powerful attack model. We for the first time introduce a new attacking scheme for the attacker and set a practical constrain… ▽ More Deep learning algorithms and networks are vulnerable to perturbed inputs which is known as the adversarial attack. Many defense methodologies have been investigated to defend against such adversarial attack. In this work, we propose a novel methodology to defend the existing powerful attack model. We for the first time introduce a new attacking scheme for the attacker and set a practical constraint for white box attack. Under this proposed attacking scheme, we present the best defense ever reported against some of the recent strong attacks. It consists of a set of nonlinear function to process the input data which will make it more robust over the adversarial attack. However, we make this processing layer completely hidden from the attacker. Blind pre-processing improves the white box attack accuracy of MNIST from 94.3\% to 98.7\%. Even with increasing defense when others defenses completely fail, blind pre-processing remains one of the strongest ever reported. Another strength of our defense is that it eliminates the need for adversarial training as it can significantly increase the MNIST accuracy without adversarial training as well. Additionally, blind pre-processing can also increase the inference accuracy in the face of a powerful attack on CIFAR-10 and SVHN data set as well without much sacrificing clean data accuracy. △ Less

Submitted 7 February, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

arXiv:1711.02778 [pdf, ps, other]

doi 10.1103/PhysRevA.97.012119

Real-time quantum state estimation in circuit QED via Bayesian approach

Authors: Yang Yang, Beili Gong, Wei Cui

Abstract: Using a circuit QED device, we present a theoretical study of real-time quantum state estimation via quantum Bayesian approach. Suitable conditions under which the Bayesian approach can accurately update the density matrix of the qubit are analyzed. We also consider the correlation between some basic and physically meaningful parameters of the circuit QED and the performance of the Bayesian approa… ▽ More Using a circuit QED device, we present a theoretical study of real-time quantum state estimation via quantum Bayesian approach. Suitable conditions under which the Bayesian approach can accurately update the density matrix of the qubit are analyzed. We also consider the correlation between some basic and physically meaningful parameters of the circuit QED and the performance of the Bayesian approach. Our results advance the understanding of quantum Bayesian approach and pave the way to study quantum feedback control and adaptive control. △ Less

Submitted 7 November, 2017; originally announced November 2017.

Journal ref: Phys. Rev. A 97, 012119 (2018)

Showing 101–150 of 204 results for author: Gong, B