Search | arXiv e-print repository

Open Long-Tailed Recognition in a Dynamic World

Authors: Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

Abstract: Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed… ▽ More Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set which includes both known and open classes. OLTR++ handles imbalanced classification, few-shot learning, open-set recognition, and active learning in one integrated algorithm, whereas existing classification approaches often focus only on one or two aspects and deliver poorly over the entire spectrum. The key challenges are: 1) how to share visual knowledge between head and tail classes, 2) how to reduce confusion between tail and open classes, and 3) how to actively explore open classes with learned knowledge. Our algorithm, OLTR++, maps images to a feature space such that visual concepts can relate to each other through a memory association mechanism and a learned metric (dynamic meta-embedding) that both respects the closed world classification of seen classes and acknowledges the novelty of open classes. Additionally, we propose an active learning scheme based on visual memory, which learns to recognize open classes in a data-efficient manner for future expansions. On three large-scale open long-tailed datasets we curated from ImageNet (object-centric), Places (scene-centric), and MS1M (face-centric) data, as well as three standard benchmarks (CIFAR-10-LT, CIFAR-100-LT, and iNaturalist-18), our approach, as a unified framework, consistently demonstrates competitive performance. Notably, our approach also shows strong potential for the active exploration of open classes and the fairness analysis of minority groups. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. Extended version of our previous CVPR oral paper (arXiv:1904.05160)

arXiv:2207.08170 [pdf, other]

doi 10.1051/0004-6361/201937415

Possible tidal dissipation in millisecond pulsar binaries

Authors: D. Wang, B. P. Gong

Abstract: The post-Keplerian(PK) parameters inferred from pulsar timing provide a convenient way to test Einstein's general theory of relativity. However, before obtaining a pure orbital decay $\dot{P}_b$ induced by gravitational wave radiation, which is one of the PK parameters, a number of factors need to be accounted for carefully. The effect of tidal dissipation on $\dot{P}_b$ has been thought of as neg… ▽ More The post-Keplerian(PK) parameters inferred from pulsar timing provide a convenient way to test Einstein's general theory of relativity. However, before obtaining a pure orbital decay $\dot{P}_b$ induced by gravitational wave radiation, which is one of the PK parameters, a number of factors need to be accounted for carefully. The effect of tidal dissipation on $\dot{P}_b$ has been thought of as negligible. Here, we investigate the data for possible effects of tidal dissipation on $\dot{P}_b$. The possibility of the tidal dissipation as a contributor to $\dot{P}_b$ in a large sample of millisecond pulsar binaries is investigated in detail. We collected a large sample of pulsar binaries with measured $\dot{P}_b$. All of the systems are millisecond pulsars. The residual $\dot{P}^{Res}_b$ of these systems was obtained by subtracting the three normal effects, that is to say the effect of Shklovskii, line-of-sight acceleration, and gravitational radiation. Assuming that tidal dissipation is responsible for such a residual $\dot{P}^{Res}_b$, the tidal parameters of these systems can be calculated and compared with the tidal models. The residual $\dot{P}^{Res}_b$ is distributed over the half positive and half negative. The dynamical tidal model can explain the residual $\dot{P}_b$ of millisecond pulsar-white dwarf binaries. And the Love number of the main-sequence companion of \object{PSR J1227-4853} can be derived as a reasonable value $k_2=0.177^{+0.098}_{-0.058}$ with the equilibrium tidal model. Those results are compatible with the scenario of tidal dissipation. Additionally, a weak correlation between the tidal parameter and orbital period is revealed, likely originating in the tidal process of the recycled stage which is worthy of further investigation. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Journal ref: A&A 663, A75 (2022)

arXiv:2207.01813 [pdf, other]

Stochastic Variational Methods in Generalized Hidden Semi-Markov Models to Characterize Functionality in Random Heteropolymers

Authors: Yun Zhou, Boying Gong, Tao Jiang, Ting Xu, Haiyan Huang

Abstract: Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP),… ▽ More Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP), which transport protons across lipid bilayers selectively and rapidly like natural proton channels. We utilized the probabilistic graphical model framework and developed a generalized hidden semi-Markov model (GHSMM-RHP) to extract the function-determining sequence features, including the transmembrane segments within a chain and the sequence heterogeneity among different chains. We developed stochastic variational methods for efficient inference on parameter estimation and predictions, and empirically studied their computational performance from a comparative perspective on Bayesian (i.e., stochastic variational Bayes) versus frequentist (i.e., stochastic variational expectation-maximization) frameworks that have been studied separately before. The real data results agree well with the laboratory experiments, and suggest GHSMM-RHP's potential in predicting protein-like behavior at the polymer-chain level. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.03091 [pdf, ps, other]

doi 10.3847/1538-4357/ac75d1

The discovery of a rotating radio transient J1918$-$0449 with intriguing emission properties with the five hundred meter aperture spherical radio telescope

Authors: J. L. Chen, Z. G. Wen, J. P. Yuan, N. Wang, D. Li, H. G. Wang, W. M. Yan, R. Yuen, P. Wang, Z. Wang, W. W. Zhu, J. R. Niu, C. C. Miao, M. Y. Xue, B. P. Gong

Abstract: In this study, we report on a detailed single pulse analysis of the radio emission from a rotating radio transient (RRAT) J1918$-$0449 which is the first RRAT discovered with the five hundred meter aperture spherical radio telescope (FAST). The sensitive observations were carried out on 30 April 2021 using the FAST with a central frequency of 1250 MHz and a short time resolution of 49.152 $μ$s, wh… ▽ More In this study, we report on a detailed single pulse analysis of the radio emission from a rotating radio transient (RRAT) J1918$-$0449 which is the first RRAT discovered with the five hundred meter aperture spherical radio telescope (FAST). The sensitive observations were carried out on 30 April 2021 using the FAST with a central frequency of 1250 MHz and a short time resolution of 49.152 $μ$s, which forms a reliable basis to probe single pulse emission properties in detail. The source was successively observed for around 2 hours. A total of 83 dispersed bursts with significance above 6$σ$ are detected over 1.8 hours. The source's DM and rotational period are determined to be 116.1$\pm$0.4 \pcm \ and 2479.21$\pm$0.03 ms, respectively. The share of registered pulses from the total number of observed period is 3.12\%. No underlying emission is detected in the averaged off pulse profile. For bursts with fluence larger than 10 Jy ms, the pulse energy follows a power-law distribution with an index of $-3.1\pm0.4$, suggesting the existence of bright pulse emission. We find that the distribution of time between subsequent pulses is consistent with a stationary Poisson process and find no evidence of clustering over the 1.8 h observations, giving a mean burst rate of one burst every 66 s. Close inspection of the detected bright pulses reveals that 21 pulses exhibit well-defined quasi-periodicities. The subpulse drifting is present in non-successive rotations with periodicity of $2.51\pm0.06$ periods. Finally, possible physical mechanisms are discussed. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 11 pages, 11 figures

arXiv:2206.00626 [pdf, ps, other]

Regular Convergence and Finite Element Methods for Eigenvalue Problems

Authors: Bo Gong, Jiguang Sun

Abstract: Regular convergence, together with various other types of convergence, has been studied since the 1970s for the discrete approximations of linear operators. In this paper, we consider the eigenvalue approximation of compact operators whose spectral problem can be written as the eigenvalue problem of some holomophic Fredholm operator function. Focusing on the finite element methods (conforming, dis… ▽ More Regular convergence, together with various other types of convergence, has been studied since the 1970s for the discrete approximations of linear operators. In this paper, we consider the eigenvalue approximation of compact operators whose spectral problem can be written as the eigenvalue problem of some holomophic Fredholm operator function. Focusing on the finite element methods (conforming, discontinuous Galerkin, etc.), we show that the regular convergence of discrete holomorphic operator functions follows from the approximation property of the finite element spaces and the compact convergence of the discrete operators in some suitable Sobolev space. The convergence for eigenvalues is then obtained using the discrete approximation theory for the eigenvalue problems of holomorphic Fredholm operator functions. The result can be used to show the convergence of various finite element methods for eigenvalue problems such as the Dirhcilet eigenvalue problem and the biharmonic eigenvalue problem. △ Less

Submitted 19 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

MSC Class: 35P30; 65N30; 78A46

arXiv:2205.07505 [pdf, ps, other]

doi 10.1051/0004-6361/202142450

Cyanopolyyne line survey towards high-mass star-forming regions with TMRT

Authors: Y. X. Wang, J. S. Zhang, Y. T. Yan, J. J. Qiu, J. L. Chen, J. Y. Zhao, Y. P. Zou, X. C. Wu, X. L. He, Y. B. Gong, J. H. Cai

Abstract: We carried out a cyanopolyyne line survey towards a large sample of HMSFRs using the Shanghai Tian Ma 65m Radio Telescope (TMRT). Our sample consisted of 123 targets taken from the TMRT C band line survey. It included three kinds of sources, namely those with detection of the 6.7 GHz CH3OH maser alone, with detection of the radio recombination line (RRL) alone, and with detection of both (hereafte… ▽ More We carried out a cyanopolyyne line survey towards a large sample of HMSFRs using the Shanghai Tian Ma 65m Radio Telescope (TMRT). Our sample consisted of 123 targets taken from the TMRT C band line survey. It included three kinds of sources, namely those with detection of the 6.7 GHz CH3OH maser alone, with detection of the radio recombination line (RRL) alone, and with detection of both (hereafter referred to as Maser-only, RRL-only, and Maser-RRL sources, respectively). We detected HC3N in 38 sources, HC5N in 11 sources, and HC7N in G24.790+0.084, with the highest detection rate being found for Maser-RRL sources and a very low detection rate found for RRL-only sources. Their column densities were derived using the rotational temperature measured from the NH3 lines. And we constructed and fitted the far-infrared (FIR) spectral energy distributions. Based on these, we derive their dust temperatures, H2 column densities, and abundances of cyanopolyynes relative to H2. The detection rate, the column density, and the relative abundance of HC3N increase from Maser-only to Maser-RRL sources and decrease from Maser-RRL to RRL-only sources. This trend is consistent with the proposed evolutionary trend of HC3N under the assumption that our Maser-only, Maser-RRL, and RRL-only sources correspond to massive young stellar objects, ultra-compact HII regions, and normal classical HII regions, respectively. Furthermore, a statistical analysis of the integrated line intensity and column density of HC3N and shock-tracing molecules (SiO, H2CO) enabled us to find positive correlations between them. This suggests that HC3N may be another tracer of shocks, and should therefore be the subject of further observations and corresponding chemical simulations. Our results indirectly support the idea that the neutral--neutral reaction between C2H2 and CN is the dominant formation pathway of HC3N. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 23 pages, 5 figures, 4 tables, Accepted to A&A

Journal ref: A&A 663, A177 (2022)

arXiv:2204.05376 [pdf, other]

medXGAN: Visual Explanations for Medical Classifiers through a Generative Latent Space

Authors: Amil Dravid, Florian Schiffers, Boqing Gong, Aggelos K. Katsaggelos

Abstract: Despite the surge of deep learning in the past decade, some users are skeptical to deploy these models in practice due to their black-box nature. Specifically, in the medical space where there are severe potential repercussions, we need to develop methods to gain confidence in the models' decisions. To this end, we propose a novel medical imaging generative adversarial framework, medXGAN (medical… ▽ More Despite the surge of deep learning in the past decade, some users are skeptical to deploy these models in practice due to their black-box nature. Specifically, in the medical space where there are severe potential repercussions, we need to develop methods to gain confidence in the models' decisions. To this end, we propose a novel medical imaging generative adversarial framework, medXGAN (medical eXplanation GAN), to visually explain what a medical classifier focuses on in its binary predictions. By encoding domain knowledge of medical images, we are able to disentangle anatomical structure and pathology, leading to fine-grained visualization through latent interpolation. Furthermore, we optimize the latent space such that interpolation explains how the features contribute to the classifier's output. Our method outperforms baselines such as Gradient-Weighted Class Activation Map** (Grad-CAM) and Integrated Gradients in localization and explanatory ability. Additionally, a combination of the medXGAN with Integrated Gradients can yield explanations more robust to noise. The code is available at: https://avdravid.github.io/medXGAN_page/. △ Less

Submitted 17 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 10 pages, 11 figures, accepted to CVPR TCV workshop

ACM Class: I.5.4; I.5.1; I.4.9; I.4.5; I.2.10

arXiv:2204.05237 [pdf, other]

doi 10.1103/PhysRevD.109.015009

Full one-loop radiative corrections to $e^+ e^-\to H^+H^-$ in the inert doublet model

Authors: Hamza Abouabid, Abdesslam Arhrib, Jaouad El Falaki, Bin Gong, Wenhai Xie, Qi-Shu Yan

Abstract: We compute the full one-loop radiative corrections for charged scalar pair production $e^{+}e^{-}\to H^{+}H^{-}$ in the inert doublet model. The on-shell renormalization scheme has been used. We take into account both the weak contributions as well as the soft and hard QED corrections. We compute both the real emission and the one-loop virtual corrections using the Feynman diagrammatic method. The… ▽ More We compute the full one-loop radiative corrections for charged scalar pair production $e^{+}e^{-}\to H^{+}H^{-}$ in the inert doublet model. The on-shell renormalization scheme has been used. We take into account both the weak contributions as well as the soft and hard QED corrections. We compute both the real emission and the one-loop virtual corrections using the Feynman diagrammatic method. The resummed cross section is introduced to cure the Coulomb singularity which occurs in the QED corrections. We have analyzed the parameter space of the inert doublet model in three scenarios after taking into account theoretical constraints, the collider experimental bounds, and dark matter search bounds as well. It is found that the weak interaction dominates the radiative corrections, and its size is determined by the triple Higgs coupling $λ_{h^0 H^+ H^-}$, which is further connected to the mass of the charged scalar. In the scenario where all the constraints are taken into account, we find that for $\sqrt{s}=250$ GeV and $\sqrt{s}=500$ GeV, the weak corrections are around $-6\% \sim-5\%$ and $-10\% \sim -3\%$, respectively. While for $\sqrt{s}=1000$ GeV, the weak corrections can reach $-15\% \sim +25\%$. The new feature is that the weak corrections can be positive near the threshold when the charged scalar is heavier than 470 GeV. Six benchmark points for future collider searches have been proposed. △ Less

Submitted 21 January, 2024; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 45 pages, 5 figures, added 3 appendices, accepted in PRD

arXiv:2204.05179 [pdf]

Emergent superconductivity in van der Waals Kagome material Pd3P2S8 under high pressure

Authors: Qi Wang, Xiaole Qiu, Cuiying Pei, Benchao Gong, Lingling Gao, Yi Zhao, Weizheng Cao, Changhua Li, Shihao Zhu, Mingxin Zhang, Yulin Chen, Kai Liu, Yanpeng Qi

Abstract: Kagome lattice systems have been proposed to host rich physics, which provide an excellent platform to explore unusual quantum states. Here, we report on the discovery of superconductivity in van der Waals material Pd3P2S8 under pressure. The superconductivity is observed in Pd3P2S8 for those pressures where the temperature dependence of the resistivity changes from a semiconducting-like behavior… ▽ More Kagome lattice systems have been proposed to host rich physics, which provide an excellent platform to explore unusual quantum states. Here, we report on the discovery of superconductivity in van der Waals material Pd3P2S8 under pressure. The superconductivity is observed in Pd3P2S8 for those pressures where the temperature dependence of the resistivity changes from a semiconducting-like behavior to that of a normal metal. The superconducting transition temperature Tc increases with applied pressure and reaches ~ 6.83 K at 79.5 GPa. Combining high-pressure XRD, Raman spectroscopy and theoretical calculations, our results demonstrate that the observed superconductivity induced by high pressure in Pd3P2S8 is closely related to the formation of amorphous phase, which results from the structural instability due to the enhanced coupling between interlayer Pd and S atoms upon compression. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: 13 pages, 5 figures

arXiv:2203.14346 [pdf, other]

doi 10.1088/1674-1137/ac6a4e

A new approach for amplitudes with multiple fermion lines

Authors: Feng Zhang, Bin Gong, Jian-Xiong Wang

Abstract: A new approach for tree-level amplitudes with multiple fermion lines is presented. It mainly focuses on the simplification of fermion lines. By calculating two vectors recursively without any matrix multiplications, the result of a fermion line is reduced to a very compact form depending only on the two vectors. The comparisons with other packages are presented, and the results show that our packa… ▽ More A new approach for tree-level amplitudes with multiple fermion lines is presented. It mainly focuses on the simplification of fermion lines. By calculating two vectors recursively without any matrix multiplications, the result of a fermion line is reduced to a very compact form depending only on the two vectors. The comparisons with other packages are presented, and the results show that our package FDC gives a very good performance in the processes of multiple fermion lines with this new approach and some other improvements. A further comparison with WHIZARD shows that this new approach has a competitive efficiency in computing pure amplitude square without phase space integration. △ Less

Submitted 26 June, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: 9 pages, 1 figure

arXiv:2203.08065 [pdf, other]

Surrogate Gap Minimization Improves Sharpness-Aware Training

Authors: Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu

Abstract: The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by minimizing a \textit{perturbed loss} defined as the maximum loss within a neighborhood in the parameter space. However, we show that both sharp and flat minima can have a low perturbed loss, implying that SAM does not always prefer flat minima. Instead, we define a \textit{surrogate gap}, a measure equivalent to th… ▽ More The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by minimizing a \textit{perturbed loss} defined as the maximum loss within a neighborhood in the parameter space. However, we show that both sharp and flat minima can have a low perturbed loss, implying that SAM does not always prefer flat minima. Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small. The surrogate gap is easy to compute and feasible for direct minimization during training. Based on the above observations, we propose Surrogate \textbf{G}ap Guided \textbf{S}harpness-\textbf{A}ware \textbf{M}inimization (GSAM), a novel improvement over SAM with negligible computation overhead. Conceptually, GSAM consists of two steps: 1) a gradient descent like SAM to minimize the perturbed loss, and 2) an \textit{ascent} step in the \textit{orthogonal} direction (after gradient decomposition) to minimize the surrogate gap and yet not affect the perturbed loss. GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities. Theoretically, we show the convergence of GSAM and provably better generalization than SAM. Empirically, GSAM consistently improves generalization (e.g., +3.2\% over SAM and +5.4\% over AdamW on ImageNet top-1 accuracy for ViT-B/32). Code is released at \url{ https://sites.google.com/view/gsam-iclr22/home}. △ Less

Submitted 19 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: Paper accepted by ICLR22, https://openreview.net/forum?id=edONMAnhLu-

arXiv:2112.07074 [pdf, other]

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

Authors: Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

Abstract: In this paper, we explore the possibility of building a unified foundation model that can be adapted to both vision-only and text-only tasks. Starting from BERT and ViT, we design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads. To efficiently pre-train the proposed model jointly on unpaired images and text, we propose… ▽ More In this paper, we explore the possibility of building a unified foundation model that can be adapted to both vision-only and text-only tasks. Starting from BERT and ViT, we design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads. To efficiently pre-train the proposed model jointly on unpaired images and text, we propose two novel techniques: (i) We employ the separately-trained BERT and ViT models as teachers and apply knowledge distillation to provide additional, accurate supervision signals for the joint training; (ii) We propose a novel gradient masking strategy to balance the parameter updates from the image and text pre-training losses. We evaluate the jointly pre-trained transformer by fine-tuning it on image classification tasks and natural language understanding tasks, respectively. The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: preliminary work

arXiv:2112.05181 [pdf, other]

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

Authors: Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Abstract: Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes sub-optimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper, we present Contextualized Spa… ▽ More Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes sub-optimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. We first design a region-based pretext task which requires the model to transform in-stance representations from one view to another, guided by context features. Further, we introduce a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations. We evaluate our learned representations on a variety of downstream tasks and show that ConST-CL achieves competitive results on 6 datasets, including Kinetics, UCF, HMDB, AVA-Kinetics, AVA and OTB. △ Less

Submitted 1 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: CVPR 2022

arXiv:2112.04480 [pdf, other]

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

Authors: Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

Abstract: This work presents a self-supervised learning framework named TeG to explore Temporal Granularity in learning video representations. In TeG, we sample a long clip from a video and a short clip that lies inside the long clip. We then extract their dense temporal embeddings. The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between co… ▽ More This work presents a self-supervised learning framework named TeG to explore Temporal Granularity in learning video representations. In TeG, we sample a long clip from a video and a short clip that lies inside the long clip. We then extract their dense temporal embeddings. The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips. Our study reveals the impact of temporal granularity with three major findings. 1) Different video tasks may require features of different temporal granularities. 2) Intriguingly, some tasks that are widely considered to require temporal awareness can actually be well addressed by temporally persistent features. 3) The flexibility of TeG gives rise to state-of-the-art results on 8 video benchmarks, outperforming supervised pre-training in most cases. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2111.11222 [pdf, ps, other]

doi 10.1103/PhysRevB.105.155115

Evolution of ultra-flat band in van der Waals kagome semiconductor Pd$_{3}$P$_{2}$(S$_{1-x}$Se$_{x}$)$_{8}$

Authors: Shaohua Yan, Ben-Chao Gong, Lin Wang, **zhi Wu, Qiangwei Yin, Xinyu Cao, Xiao Zhang, Xiaofeng Liu, Zhong-Yi Lu, Kai Liu, Hechang Lei

Abstract: We investigate the evolutions of structural parameters, optical properties, and electronic structures of a van der Waals kagome semiconductor Pd$_{3}$P$_{2}$S$_{8}$ with Se do**. When the do** level of Se increases, the bandgaps of Pd$_{3}$P$_{2}$(S$_{1-x}$Se$_{x}$)$_{8}$ single crystals decrease gradually, accompanying with the expanded unit cells. The first-principles calculations show that… ▽ More We investigate the evolutions of structural parameters, optical properties, and electronic structures of a van der Waals kagome semiconductor Pd$_{3}$P$_{2}$S$_{8}$ with Se do**. When the do** level of Se increases, the bandgaps of Pd$_{3}$P$_{2}$(S$_{1-x}$Se$_{x}$)$_{8}$ single crystals decrease gradually, accompanying with the expanded unit cells. The first-principles calculations show that there is a flat band (FB) near the Fermi level in bulk Pd$_{3}$P$_{2}$S$_{8}$. This FB mainly originates from the $d_{z^2}$-like orbitals of Pd atoms in the Pd kagome lattice, which have a finite interlayer electron hop** perpendicular to the PdS$_4$ square plane. The interlayer hop** can be reinforced with Se do**, inducing a stronger interlayer coupling via the chalcogen atoms at apical sites, which reduces the bandgap and enhances the cleavage energy. In contrast, the varnishing interlayer hop** in the two-dimensional limit results in the formation of ultra-FB in the monolayers of these compounds. The easy exfoliation and the existence of unique ultra-FB near $E_{\rm F}$ make Pd$_{3}$P$_{2}$(S$_{1-x}$Se$_{x}$)$_{8}$ a model system to explore the exotic physics of FB in 2D kagome lattice. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 6 pages, 4 figures

arXiv:2109.09023 [pdf, other]

Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks

Authors: Zihang Zou, Boqing Gong, Liqiang Wang

Abstract: We study protecting a user's data (images in this work) against a learner's unauthorized use in training neural networks. It is especially challenging when the user's data is only a tiny percentage of the learner's complete training set. We revisit the traditional watermarking under modern deep learning settings to tackle the challenge. We show that when a user watermarks images using a specialize… ▽ More We study protecting a user's data (images in this work) against a learner's unauthorized use in training neural networks. It is especially challenging when the user's data is only a tiny percentage of the learner's complete training set. We revisit the traditional watermarking under modern deep learning settings to tackle the challenge. We show that when a user watermarks images using a specialized linear color transformation, a neural network classifier will be imprinted with the signature so that a third-party arbitrator can verify the potentially unauthorized usage of the user data by inferring the watermark signature from the neural network. We also discuss what watermarking properties and signature spaces make the arbitrator's verification convincing. To our best knowledge, this work is the first to protect an individual user's data ownership from unauthorized use in training neural networks. △ Less

Submitted 1 August, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: Accepted to ECCV 2022

arXiv:2108.11976 [pdf, other]

JUWELS Booster -- A Supercomputer for Large-Scale AI Research

Authors: Stefan Kesselheim, Andreas Herten, Kai Krajsek, Jan Ebert, Jenia Jitsev, Mehdi Cherti, Michael Langguth, Bing Gong, Scarlet Stadtler, Amirpasha Mozaffari, Gabriele Cavallaro, Rocco Sedona, Alexander Schug, Alexandre Strube, Roshni Kamath, Martin G. Schultz, Morris Riedel, Thomas Lippert

Abstract: In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the Jülich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its s… ▽ More In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the Jülich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its system architecture, parallel, distributed model training, and benchmarks indicating its outstanding performance. We exemplify its potential for research application by presenting large-scale AI research highlights from various scientific fields that require such a facility. △ Less

Submitted 30 June, 2021; originally announced August 2021.

Comments: 12 pages, 5 figures. Accepted at ISC 2021, Workshop Deep Learning on Supercomputers. This is a duplicate submission as my previous submission is on hold for several weeks now and my attempts to contact the moderators failed

Report number: 1234567Dummy

arXiv:2108.08187 [pdf, other]

doi 10.1109/ICCV48922.2021.01226

ME-PCN: Point Completion Conditioned on Mask Emptiness

Authors: Bingchen Gong, Yinyu Nie, Yiqun Lin, Xiaoguang Han, Yizhou Yu

Abstract: Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages `emptiness' in… ▽ More Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages `emptiness' in 3D shape space. Given a single depth scan, previous methods often encode the occupied partial shapes while ignoring the empty regions (e.g. holes) in depth maps. In contrast, we argue that these `emptiness' clues indicate shape boundaries that can be used to improve topology representation and detail granularity on surfaces. Specifically, our ME-PCN encodes both the occupied point cloud and the neighboring `empty points'. It estimates coarse-grained but complete and reasonable surface points in the first stage, followed by a refinement stage to produce fine-grained surface details. Comprehensive experiments verify that our ME-PCN presents better qualitative and quantitative performance against the state-of-the-art. Besides, we further prove that our `emptiness' design is lightweight and easy to embed in existing methods, which shows consistent effectiveness in improving the CD and EMD scores. △ Less

Submitted 14 October, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV 2021; typos corrected

arXiv:2108.07792 [pdf, other]

Federated Multi-Target Domain Adaptation

Authors: Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang

Abstract: Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy. However, it is not always feasible to obtain high-quality supervisory signals from users, especially for vision tasks. Unlike typical federated settings with labeled client data, we consider a more practical scenario where the distributed client data is unlabeled, and a cent… ▽ More Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy. However, it is not always feasible to obtain high-quality supervisory signals from users, especially for vision tasks. Unlike typical federated settings with labeled client data, we consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server. We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data). Within this new Federated Multi-Target Domain Adaptation (FMTDA) task, we analyze the model performance of exiting domain adaptation methods and propose an effective DualAdapt method to address the new challenges. Extensive experimental results on image classification and semantic segmentation tasks demonstrate that our method achieves high accuracy, incurs minimal communication cost, and requires low computational resources on client devices. △ Less

Submitted 17 August, 2021; originally announced August 2021.

arXiv:2107.08164 [pdf, ps, other]

Anonymous communication protocol over quantum networks

Authors: Beili Gong, Wei Cui

Abstract: We propose a W state-based protocol for anonymously transmitting quantum messages in a quantum network. Different from the existing protocols [A. Unnikrishnan, et al., Phys. Rev. Lett. 122, 240501 (2019)], the proposed protocol can be effectively implemented in the network only equipped with quantum channels and regular broadcast channels. Throughout the design procedure, we develop three sub-prot… ▽ More We propose a W state-based protocol for anonymously transmitting quantum messages in a quantum network. Different from the existing protocols [A. Unnikrishnan, et al., Phys. Rev. Lett. 122, 240501 (2019)], the proposed protocol can be effectively implemented in the network only equipped with quantum channels and regular broadcast channels. Throughout the design procedure, we develop three sub-protocols using the W state, including the quantum collision detection protocol and the quantum notification protocol. Moreover, together with the conventional anonymous entanglement protocol, the whole anonymous communication protocol has been constructed. Finally, we examine the correctness and security of the proposed quantum anonymous communication protocol. △ Less

Submitted 16 July, 2021; originally announced July 2021.

arXiv:2107.02170 [pdf, other]

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Authors: Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Abstract: Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibrat… ▽ More Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NorCal can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach. Our code is publicly available at https://github.com/tydpan/NorCal/. △ Less

Submitted 29 November, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2107.01845 [pdf, other]

doi 10.1007/JHEP02(2022)077

Gravitational waves from bubble collisions in FLRW spacetime

Authors: Haowen Zhong, Bi** Gong, Taotao Qiu

Abstract: Stochastic gravitational wave background (SGWB) is a promising tool to probe the very early universe where the standard model of particle physics and cosmology are connected closely. As a possible component of SGWB, gravitational waves (GW) from bubble collisions during the first order cosmological phase transitions deserve comprehensive analyses. In 2017, Ryusuke **no and Masahiro Takimoto propo… ▽ More Stochastic gravitational wave background (SGWB) is a promising tool to probe the very early universe where the standard model of particle physics and cosmology are connected closely. As a possible component of SGWB, gravitational waves (GW) from bubble collisions during the first order cosmological phase transitions deserve comprehensive analyses. In 2017, Ryusuke **no and Masahiro Takimoto proposed an elegant analysis approach to derive the analytical expressions of energy spectra of GW from bubble collisions in Minkowski spacetime avoiding large-scale numerical simulations for the first time[26]. However, they neglect the expansion of the universe and regard the duration of phase transitions as infinity in their derivation which could deviate their estimations from true values. For these two reasons, we give a new expression of GW spectra by adopting their method, switching spacetime background to FLRW spacetime, and considering a finite duration of phase transitions. By denoting $σ$ as the fraction of the speed of phase transitions to the expansion speed of the universe, we find when $σ$ is around $\mathcal{O}(10)$, the maxima of estimated GW energy spectra drop by around 1 order of magnitude than the results given by their previous work. Even when $σ=100$, the maximum of GW energy spectrum is only $65\%$ of their previous estimation. Such a significant decrease may bring about new challenges for the detectability of GW from bubble collisions. Luckily, by comparing new spectra with PLI (\textit{power-law integrated}) sensitivity curves of GW detectors, we find that the detection prospect for GW from bubble collisions is still promising for DECIGO, BBO, LISA, and TianQin in the foreseeable future. △ Less

Submitted 9 February, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 16 pages, 8 figures; This paper has been accepted by JHEP

arXiv:2106.12388 [pdf, ps, other]

doi 10.1103/PhysRevB.103.035143

LaO as a candidate substrate for realizing superconductivity in FeSe epitaxial film

Authors: Xiao-Le Qiu, Ben-Chao Gong, Huan-Cheng Yang, Zhong-Yi Lu, Kai Liu

Abstract: The significantly enhanced superconducting transition temperature ($T_c$) of an FeSe monolayer on SrTiO$_3$(001) substrate has attracted extensive attention in recent years. Here, based on first-principles electronic structure calculations, we propose another candidate substrate LaO(001) for the epitaxial growth of FeSe monolayer to realize superconductivity. Our calculations show that for the opt… ▽ More The significantly enhanced superconducting transition temperature ($T_c$) of an FeSe monolayer on SrTiO$_3$(001) substrate has attracted extensive attention in recent years. Here, based on first-principles electronic structure calculations, we propose another candidate substrate LaO(001) for the epitaxial growth of FeSe monolayer to realize superconductivity. Our calculations show that for the optimal adsorption structure of FeSe monolayer on LaO(001), the stripe antiferromagnetic state and the dimer antiferromagnetic state are almost energetically degenerate, indicating the existence of strong magnetic fluctuation that is beneficial to the appearance of superconductivity. According to the Bader charge analysis, the calculated electron do** from the LaO substrate to the FeSe monolayer is about 0.18 electrons per Fe atom, even larger than that in case of FeSe/SrTiO$_3$(001). Since LaO was also reported to be a superconductor with $T_c$ ~ 5 K, it may have a superconducting proximity effect on the epitaxial FeSe film and vice versa. These results suggest that LaO would be an interesting substrate to study the interface-related superconductivity. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 7 pages, 6 figures, 1 table

Journal ref: Phys. Rev. B 103, 035143 (2021)

arXiv:2106.12158 [pdf]

doi 10.1016/j.carbon.2021.01.153

Theoretical design of all-carbon networks with intrinsic magnetism

Authors: Yan Gao, Xiaolong Feng, Ben-Chao Gong, Chengyong Zhong, Shengyuan A. Yang, Kai Liu, Zhong-Yi Lu

Abstract: To induce intrinsic magnetism in the nominally nonmagnetic carbon materials containing only $s$ and $p$ electrons is an intriguing yet challenging task. Here, based on first-principles electronic structure calculations, we propose a universal approach inspired by Ovchinnikov's rule to guide us the design of a series of imaginative magnetic all-carbon structures. The idea is to combine the differen… ▽ More To induce intrinsic magnetism in the nominally nonmagnetic carbon materials containing only $s$ and $p$ electrons is an intriguing yet challenging task. Here, based on first-principles electronic structure calculations, we propose a universal approach inspired by Ovchinnikov's rule to guide us the design of a series of imaginative magnetic all-carbon structures. The idea is to combine the differently stacked graphene layers via the acetylenic linkages (-C$\equiv$C-) to obtain a class of two-dimensional (2D) and three-dimensional (3D) carbon networks. With first-principles electronic structure calculations, we confirm the effectiveness of this approach via concrete examples of double-layer ALBG-C14, triple-layer ALTG-C22, and bulk IALG-C30. We show that these materials are antiferromagnetic (AFM) semiconductors with intralayer Néel and interlayer AFM couplings. According to the above idea, our work not only provides a promising design scheme for magnetic all-carbon materials, but also can apply to other $π$-bonding network systems. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 19 pages, 5 pages, 3 tables

Journal ref: Carbon 177, 11 (2021)

arXiv:2106.10258 [pdf, other]

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Authors: Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex Stark, Yin Cui, Boqing Gong, Andrew Howard

Abstract: When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is esp… ▽ More When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device. In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also outperform a specialized referring expression recognition system. Query-modulated detectors can also be trained to simultaneously solve for both localizing a user query and standard detection, even outperforming standard mobile detectors at the canonical COCO task. △ Less

Submitted 3 August, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

arXiv:2106.01899 [pdf, other]

Adversarially Adaptive Normalization for Single Domain Generalization

Authors: Xinjie Fan, Qifei Wang, Junjie Ke, Feng Yang, Boqing Gong, Mingyuan Zhou

Abstract: Single domain generalization aims to learn a model that performs well on many unseen domains with only one domain data for training. Existing works focus on studying the adversarial domain augmentation (ADA) to improve the model's generalization capability. The impact on domain generalization of the statistics of normalization layers is still underinvestigated. In this paper, we propose a generic… ▽ More Single domain generalization aims to learn a model that performs well on many unseen domains with only one domain data for training. Existing works focus on studying the adversarial domain augmentation (ADA) to improve the model's generalization capability. The impact on domain generalization of the statistics of normalization layers is still underinvestigated. In this paper, we propose a generic normalization approach, adaptive standardization and rescaling normalization (ASR-Norm), to complement the missing part in previous works. ASR-Norm learns both the standardization and rescaling statistics via neural networks. This new form of normalization can be viewed as a generic form of the traditional normalizations. When trained with ADA, the statistics in ASR-Norm are learned to be adaptive to the data coming from different domains, and hence improves the model generalization performance across domains, especially on the target domain with large discrepancy from the source domain. The experimental results show that ASR-Norm can bring consistent improvement to the state-of-the-art ADA approaches by 1.6%, 2.7%, and 6.3% averagely on the Digits, CIFAR-10-C, and PACS benchmarks, respectively. As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: CVPR 2021

arXiv:2106.01548 [pdf, other]

When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

Authors: Xiangning Chen, Cho-Jui Hsieh, Boqing Gong

Abstract: Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures. Existing works empower the models by massive data, such as large-scale pre-training and/or repeated strong data augmentations, and still report optimization-related problems (e.g., sensitivity to initialization and learning rates). Hence, this p… ▽ More Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures. Existing works empower the models by massive data, such as large-scale pre-training and/or repeated strong data augmentations, and still report optimization-related problems (e.g., sensitivity to initialization and learning rates). Hence, this paper investigates ViTs and MLP-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and generalization at inference. Visualization and Hessian reveal extremely sharp local minima of converged models. By promoting smoothness with a recently proposed sharpness-aware optimizer, we substantially improve the accuracy and robustness of ViTs and MLP-Mixers on various tasks spanning supervised, adversarial, contrastive, and transfer learning (e.g., +5.3\% and +11.0\% top-1 accuracy on ImageNet for ViT-B/16 and Mixer-B/16, respectively, with the simple Inception-style preprocessing). We show that the improved smoothness attributes to sparser active neurons in the first few layers. The resultant ViTs outperform ResNets of similar size and throughput when trained from scratch on ImageNet without large-scale pre-training or strong data augmentations. Model checkpoints are available at \url{https://github.com/google-research/vision_transformer}. △ Less

Submitted 13 March, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: ICLR 2022 (spotlight)

arXiv:2104.12727 [pdf, other]

2.5D Visual Relationship Detection

Authors: Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

Abstract: Visual 2.5D perception involves understanding the semantics and geometry of a scene through reasoning about object relationships with respect to the viewer in an environment. However, existing works in visual recognition primarily focus on the semantics. To bridge this gap, we study 2.5D visual relationship detection (2.5VRD), in which the goal is to jointly detect objects and predict their relati… ▽ More Visual 2.5D perception involves understanding the semantics and geometry of a scene through reasoning about object relationships with respect to the viewer in an environment. However, existing works in visual recognition primarily focus on the semantics. To bridge this gap, we study 2.5D visual relationship detection (2.5VRD), in which the goal is to jointly detect objects and predict their relative depth and occlusion relationships. Unlike general VRD, 2.5VRD is egocentric, using the camera's viewpoint as a common reference for all 2.5D relationships. Unlike depth estimation, 2.5VRD is object-centric and not only focuses on depth. To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2.5D relationships among 512K objects from 11K images. We analyze this dataset and conduct extensive experiments including benchmarking multiple state-of-the-art VRD models on this task. Our results show that existing models largely rely on semantic cues and simple heuristics to solve 2.5VRD, motivating further research on models for 2.5D perception. The new dataset is available at https://github.com/google-research-datasets/2.5vrd. △ Less

Submitted 26 April, 2021; originally announced April 2021.

arXiv:2104.11178 [pdf, other]

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Authors: Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Abstract: We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and eval… ▽ More We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic, single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT's vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT's audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training. VATT's source code is publicly available. △ Less

Submitted 6 December, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

Comments: Published in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2104.05279 [pdf, other]

Class-Balanced Distillation for Long-Tailed Visual Recognition

Authors: Ahmet Iscen, André Araujo, Boqing Gong, Cordelia Schmid

Abstract: Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions. An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively. In this work, we introduce a new framework, by making the key observa… ▽ More Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions. An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively. In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting. Our main contribution is a new training method, referred to as Class-Balanced Distillation (CBD), that leverages knowledge distillation to enhance feature representations. CBD allows the feature representation to evolve in the second training stage, guided by the teacher learned in the first stage. The second stage uses class-balanced sampling, in order to focus on under-represented classes. This framework can naturally accommodate the usage of multiple teachers, unlocking the information from an ensemble of models to enhance recognition capabilities. Our experiments show that the proposed technique consistently outperforms the state of the art on long-tailed recognition benchmarks such as ImageNet-LT, iNaturalist17 and iNaturalist18. △ Less

Submitted 12 January, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: The code is available at https://github.com/google-research/google-research/tree/master/class_balanced_distillation

arXiv:2103.13886 [pdf, other]

Robust and Accurate Object Detection via Adversarial Learning

Authors: Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong

Abstract: Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection. Noting that most state-of-the-art object detectors benefit from fine-tuning a pre-trained classifier, we first study how the classifiers' gains from various data augmentations transfer to object detection. The results are discouraging; th… ▽ More Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection. Noting that most state-of-the-art object detectors benefit from fine-tuning a pre-trained classifier, we first study how the classifiers' gains from various data augmentations transfer to object detection. The results are discouraging; the gains diminish after fine-tuning in terms of either accuracy or robustness. This work instead augments the fine-tuning stage for object detectors by exploring adversarial examples, which can be viewed as a model-dependent data augmentation. Our method dynamically selects the stronger adversarial images sourced from a detector's classification and localization branches and evolves with the detector to ensure the augmentation policy stays current and relevant. This model-dependent augmentation generalizes to different object detectors better than AutoAugment, a model-agnostic augmentation policy searched based on one particular detector. Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the COCO object detection benchmark. It also improves the detectors' robustness against natural distortions by +3.8 mAP and against domain shift by +1.3 mAP. Models are available at https://github.com/google/automl/tree/master/efficientdet/Det-AdvProp.md △ Less

Submitted 26 March, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: CVPR 2021. Models are available at https://github.com/google/automl/tree/master/efficientdet/Det-AdvProp.md

arXiv:2103.11511 [pdf, other]

MoViNets: Mobile Video Networks for Efficient Video Recognition

Authors: Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong

Abstract: We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets and do not support online inference, making them difficult to work on mobile devices. We propose a three-step appr… ▽ More We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets and do not support online inference, making them difficult to work on mobile devices. We propose a three-step approach to improve computational efficiency while substantially reducing the peak memory usage of 3D CNNs. First, we design a video network search space and employ neural architecture search to generate efficient and diverse 3D CNN architectures. Second, we introduce the Stream Buffer technique that decouples memory from video clip duration, allowing 3D CNNs to embed arbitrary-length streaming video sequences for both training and inference with a small constant memory footprint. Third, we propose a simple ensembling technique to improve accuracy further without sacrificing efficiency. These three progressive techniques allow MoViNets to achieve state-of-the-art accuracy and efficiency on the Kinetics, Moments in Time, and Charades video action recognition datasets. For instance, MoViNet-A5-Stream achieves the same accuracy as X3D-XL on Kinetics 600 while requiring 80% fewer FLOPs and 65% less memory. Code will be made available at https://github.com/tensorflow/models/tree/master/official/vision. △ Less

Submitted 18 April, 2021; v1 submitted 21 March, 2021; originally announced March 2021.

Comments: Accepted to CVPR 2021

arXiv:2102.08884 [pdf, other]

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

Authors: Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Abstract: Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images. In this paper, we propose Mosaic of Object-centric images as Sc… ▽ More Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images. In this paper, we propose Mosaic of Object-centric images as Scene-centric images (MosaicOS), a simple and novel framework that is surprisingly effective at tackling the challenges of long-tailed object detection. Keys to our approach are three-fold: (i) pseudo scene-centric image construction from object-centric images for mitigating domain differences, (ii) high-quality bounding box imputation using the object-centric images' class labels, and (iii) a multi-stage training procedure. On LVIS object detection (and instance segmentation), MosaicOS leads to a massive 60% (and 23%) relative improvement in average precision for rare object categories. We also show that our framework can be compatibly used with other existing approaches to achieve even further gains. Our pre-trained models are publicly available at https://github.com/czhang0528/MosaicOS/. △ Less

Submitted 13 September, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: Accepted to ICCV 2021

arXiv:2102.07215 [pdf, other]

Large-Scale Meta-Learning with Continual Trajectory Shifting

Authors: Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

Abstract: Meta-learning of shared initialization parameters has shown to be highly effective in solving few-shot learning tasks. However, extending the framework to many-shot scenarios, which may further enhance its practicality, has been relatively overlooked due to the technical difficulties of meta-learning over long chains of inner-gradient steps. In this paper, we first show that allowing the meta-lear… ▽ More Meta-learning of shared initialization parameters has shown to be highly effective in solving few-shot learning tasks. However, extending the framework to many-shot scenarios, which may further enhance its practicality, has been relatively overlooked due to the technical difficulties of meta-learning over long chains of inner-gradient steps. In this paper, we first show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale task distributions, thus results in obtaining better initialization points. Further, in order to increase the frequency of meta-updates even with the excessively long inner-optimization trajectories, we propose to estimate the required shift of the task-specific parameters with respect to the change of the initialization parameters. By doing so, we can arbitrarily increase the frequency of meta-updates and thus greatly improve the meta-level convergence as well as the quality of the learned initializations. We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence, as well as multi-task learning and fine-tuning baselines. △ Less

Submitted 16 February, 2022; v1 submitted 14 February, 2021; originally announced February 2021.

Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9603-9613, 2021

arXiv:2101.09441 [pdf, ps, other]

DBL: Efficient Reachability Queries on Dynamic Graphs (Complete Version)

Authors: Qiuyi Lyu, Yuchen Li, Bingsheng He, Bin Gong

Abstract: Reachability query is a fundamental problem on graphs, which has been extensively studied in academia and industry. Since graphs are subject to frequent updates in many applications, it is essential to support efficient graph updates while offering good performance in reachability queries. Existing solutions compress the original graph with the Directed Acyclic Graph (DAG) and propose efficient qu… ▽ More Reachability query is a fundamental problem on graphs, which has been extensively studied in academia and industry. Since graphs are subject to frequent updates in many applications, it is essential to support efficient graph updates while offering good performance in reachability queries. Existing solutions compress the original graph with the Directed Acyclic Graph (DAG) and propose efficient query processing and index update techniques. However, they focus on optimizing the scenarios where the Strong Connected Components(SCCs) remain unchanged and have overlooked the prohibitively high cost of the DAG maintenance when SCCs are updated. In this paper, we propose DBL, an efficient DAG-free index to support the reachability query on dynamic graphs with insertion-only updates. DBL builds on two complementary indexes: Dynamic Landmark (DL) label and Bidirectional Leaf (BL) label. The former leverages landmark nodes to quickly determine reachable pairs whereas the latter prunes unreachable pairs by indexing the leaf nodes in the graph. We evaluate DBL against the state-of-the-art approaches on dynamic reachability index with extensive experiments on real-world datasets. The results have demonstrated that DBL achieves orders of magnitude speedup in terms of index update, while still producing competitive query efficiency. △ Less

Submitted 15 April, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

arXiv:2101.03518 [pdf, other]

doi 10.1103/PhysRevB.103.L140501

Ionic liquid gating induced two superconductor-insulator phase transitions in spinel oxide Li$_{1 \pm x}$Ti$_2$O$_{4-δ}$

Authors: Zhongxu Wei, Qian Li, Ben-Chao Gong, Xinjian Wei, Wei Hu, Zhuang Ni, Ge He, Mingyang Qin, Anna Kusmartseva, Fedor V. Kusmartsev, Jie Yuan, Beiyi Zhu, Qihong Chen, Jian-Hao Chen, Kai Liu, Kui **

Abstract: The associations between emergent physical phenomena (e.g., superconductivity) and orbital, charge, and spin degrees of freedom of $3d$ electrons are intriguing in transition metal compounds. Here, we successfully manipulate the superconductivity of spinel oxide Li$_{1\pm x}$Ti$_2$O$_{4-δ}$ (LTO) by ionic liquid gating. A dome-shaped superconducting phase diagram is established, where two insulati… ▽ More The associations between emergent physical phenomena (e.g., superconductivity) and orbital, charge, and spin degrees of freedom of $3d$ electrons are intriguing in transition metal compounds. Here, we successfully manipulate the superconductivity of spinel oxide Li$_{1\pm x}$Ti$_2$O$_{4-δ}$ (LTO) by ionic liquid gating. A dome-shaped superconducting phase diagram is established, where two insulating phases are disclosed both in heavily electron-do** and hole-do** regions. The superconductor-insulator transition (SIT) in the hole-do** region can be attributed to the loss of Ti valence electrons. In the electron-do** region, LTO exhibits an unexpected SIT instead of a metallic behavior despite an increase in carrier density. Furthermore, a thermal hysteresis is observed in the normal state resistance curve, suggesting a first-order phase transition. We speculate that the SIT and the thermal hysteresis stem from the enhanced $3d$ electron correlations and the formation of orbital ordering by comparing the transport and structural results of LTO with the other spinel oxide superconductor MgTi$_2$O$_4$, as well as analysing the electronic structure by first-principles calculations. Further comprehension of the detailed interplay between superconductivity and orbital ordering would contribute to the revealing of unconventional superconducting pairing mechanism. △ Less

Submitted 8 April, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. B 103, 140501 (2021)

arXiv:2012.06985 [pdf, other]

Contrastive Learning for Label-Efficient Semantic Segmentation

Authors: Xiangyun Zhao, Raviteja Vemulapalli, Philip Mansfield, Boqing Gong, Bradley Green, Lior Shapira, Ying Wu

Abstract: Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases. This happen… ▽ More Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases. This happens because deep CNNs trained with the de facto cross-entropy loss can easily overfit to small amounts of labeled data. To address this issue, we propose a simple and effective contrastive learning-based training strategy in which we first pretrain the network using a pixel-wise, label-based contrastive loss, and then fine-tune it using the cross-entropy loss. This approach increases intra-class compactness and inter-class separability, thereby resulting in a better pixel classifier. We demonstrate the effectiveness of the proposed training strategy using the Cityscapes and PASCAL VOC 2012 segmentation datasets. Our results show that pretraining with the proposed contrastive loss results in large performance gains (more than 20% absolute improvement in some settings) when the amount of labeled data is limited. In many settings, the proposed contrastive pretraining strategy, which does not use any additional data, is able to match or outperform the widely-used ImageNet pretraining strategy that uses more than a million additional labeled images. △ Less

Submitted 18 August, 2021; v1 submitted 13 December, 2020; originally announced December 2020.

Comments: International Conference on Computer Vision (ICCV), 2021

arXiv:2011.11200 [pdf, other]

Ranking Neural Checkpoints

Authors: Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, Boqing Gong

Abstract: This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint r… ▽ More This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking measures. These measures are generic, applying to the checkpoints of different output types without knowing how the checkpoints are pre-trained on which dataset. They also incur low computation cost, making them practically meaningful. Our results suggest that the linear separability of the features extracted by the checkpoints is a strong indicator of transferability. We also arrive at a new ranking measure, NLEEP, which gives rise to the best performance in the experiments. △ Less

Submitted 27 August, 2022; v1 submitted 22 November, 2020; originally announced November 2020.

Comments: Accepted to CVPR 2021

arXiv:2011.04919 [pdf, other]

Tokoin: A Coin-Based Accountable Access Control Scheme for Internet of Things

Authors: Chunchi Liu, Minghui Xu, Hechuan Guo, Xiuzhen Cheng, Yinhao Xiao, Dongxiao Yu, Bei Gong, Arkady Yerukhimovich, Shengling Wang, Weifeng Lv

Abstract: With the prevalence of Internet of Things (IoT) applications, IoT devices interact closely with our surrounding environments, bringing us unparalleled smartness and convenience. However, the development of secure IoT solutions is getting a long way lagged behind, making us exposed to common unauthorized accesses that may bring malicious attacks and unprecedented danger to our daily life. Overprivi… ▽ More With the prevalence of Internet of Things (IoT) applications, IoT devices interact closely with our surrounding environments, bringing us unparalleled smartness and convenience. However, the development of secure IoT solutions is getting a long way lagged behind, making us exposed to common unauthorized accesses that may bring malicious attacks and unprecedented danger to our daily life. Overprivilege attack, a widely reported phenomenon in IoT that accesses unauthorized or excessive resources, is notoriously hard to prevent, trace and mitigate. To tackle this challenge, we propose Tokoin-Based Access Control (TBAC), an accountable access control model enabled by blockchain and Trusted Execution Environment (TEE) technologies, to offer fine-graininess, strong auditability, and access procedure control for IoT. TBAC materializes the virtual access power into a definite-amount and secure cryptographic coin termed "tokoin" (token+coin), and manages it using atomic and accountable state-transition functions in a blockchain. We also realize access procedure control by mandating every tokoin a fine-grained access policy defining who is allowed to do what at when in where by how. The tokoin is peer-to-peer transferable, and can be modified only by the resource owner when necessary. We fully implement TBAC with well-studied cryptographic primitives and blockchain platforms and present a readily available APP for regular users. We also present a case study to demonstrate how TBAC is employed to enable autonomous in-home cargo delivery while guaranteeing the access policy compliance and home owner's physical security by regulating the physical behaviors of the deliveryman. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2011.01509 [pdf, other]

MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors

Authors: Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu, Bei Gong, Shuaiwen Song, Jiguo Yu

Abstract: Deep learning is a thriving field currently stuffed with many practical applications and active research topics. It allows computers to learn from experience and to understand the world in terms of a hierarchy of concepts, with each being defined through its relations to simpler concepts. Relying on the strong capabilities of deep learning, we propose a convolutional generative adversarial network… ▽ More Deep learning is a thriving field currently stuffed with many practical applications and active research topics. It allows computers to learn from experience and to understand the world in terms of a hierarchy of concepts, with each being defined through its relations to simpler concepts. Relying on the strong capabilities of deep learning, we propose a convolutional generative adversarial network-based (Conv-GAN) framework titled MalFox, targeting adversarial malware example generation against third-party black-box malware detectors. Motivated by the rival game between malware authors and malware detectors, MalFox adopts a confrontational approach to produce perturbation paths, with each formed by up to three methods (namely Obfusmal, Stealmal, and Hollowmal) to generate adversarial malware examples. To demonstrate the effectiveness of MalFox, we collect a large dataset consisting of both malware and benignware programs, and investigate the performance of MalFox in terms of accuracy, detection rate, and evasive rate of the generated adversarial malware examples. Our evaluation indicates that the accuracy can be as high as 99.0% which significantly outperforms the other 12 well-known learning models. Furthermore, the detection rate is dramatically decreased by 56.8% on average, and the average evasive rate is noticeably improved by up to 56.2%. △ Less

Submitted 6 June, 2022; v1 submitted 3 November, 2020; originally announced November 2020.

arXiv:2010.02973 [pdf, other]

Applications of Differential Privacy in Social Network Analysis: A Survey

Authors: Honglu Jiang, Jian Pei, Dongxiao Yu, Jiguo Yu, Bei Gong, Xiuzhen Cheng

Abstract: Differential privacy is effective in sharing information and preserving privacy with a strong guarantee. As social network analysis has been extensively adopted in many applications, it opens a new arena for the application of differential privacy. In this article, we provide a comprehensive survey connecting the basic principles of differential privacy and applications in social network analysis.… ▽ More Differential privacy is effective in sharing information and preserving privacy with a strong guarantee. As social network analysis has been extensively adopted in many applications, it opens a new arena for the application of differential privacy. In this article, we provide a comprehensive survey connecting the basic principles of differential privacy and applications in social network analysis. We present a concise review of the foundations of differential privacy and the major variants and discuss how differential privacy is applied to social network analysis, including privacy attacks in social networks, types of differential privacy in social network analysis, and a series of popular tasks, such as degree distribution analysis, subgraph counting and edge weights. We also discuss a series of challenges for future studies. △ Less

Submitted 14 April, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

Comments: 50 pages,16 figures, 5 tables

arXiv:2009.03250 [pdf, other]

doi 10.1007/JHEP05(2021)100

One-loop radiative corrections to $e^+ e^-\to Zh^0/H^0A^0$ in the Inert Higgs Doublet Model

Authors: Hamza Abouabid, Abdesslam Arhrib, Rachid Benbrik, Jaouad El Falaki, Bin Gong, Wenhai Xie, Qi-Shu Yan

Abstract: We compute the full one-loop radiative corrections (including both weak and QED corrections) for two processes $e^{+}e^{-}\to Z h^0,H^0 A^{0}$ in the Inert Higgs Doublet model (IHDM). Up to $O(α_{w})$ and $O(α_{em})$ order, we use FeynArts/FormCalc to compute the one-loop virtual corrections and Feynman Diagram Calculation (FDC) to evaluate the real emission, respectively. Being equipped with thes… ▽ More We compute the full one-loop radiative corrections (including both weak and QED corrections) for two processes $e^{+}e^{-}\to Z h^0,H^0 A^{0}$ in the Inert Higgs Doublet model (IHDM). Up to $O(α_{w})$ and $O(α_{em})$ order, we use FeynArts/FormCalc to compute the one-loop virtual corrections and Feynman Diagram Calculation (FDC) to evaluate the real emission, respectively. Being equipped with these computing tools, we investigate radiative corrections of new physics for both the degenerate and non-degenerate scenarios with three typical collision energies of future electron-positron colliders: 250 GeV, 500 GeV, and 1000GeV. By scanning the parameter space of IHDM, we identify the allowed regions which are consistent with constraints and bounds, from both theoretical and experimental sides. We find that the radiative corrections of the IHDM to $e^+ e^- \to Z h^0$ can be sizeable and are within the detection potentials of future Higgs factories. We also find that the new physics of IHDM could also be directly detected by observing the process $e^{+}e^{-}\to H^0 A^{0} $ which could have large enough production rate. We propose five benchmark points and examine their salient features which can serve as physics targets for future electron-positron colliders, such as CEPC/CLIC/FCC-ee/ILC as well as for LHC. △ Less

Submitted 5 April, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

Comments: 43 pages, 17 figures, minor changes

arXiv:2009.03028 [pdf, other]

doi 10.1088/1674-1137/abc682

The complete study on polarization of $Υ(nS)$ hadroproduction at QCD next-to-leading order

Authors: Yu Feng, Bin Gong, Chao-Hsi Chang, Jian-Xiong Wang

Abstract: Applying the nonrelativistic quantum chromodynamics factorization formalism to the $Υ(1S,2S,3S)$ hadroproduction, a complete analysis on the polarization parameters $λ_θ$, $λ_{θφ}$, $λ_φ$ for the production are presented at QCD next-to-leading order. With the long-distance matrix elements extracted from experimental data for the production rate and polarization parameter $λ_θ$ of $Υ$ hadroproducti… ▽ More Applying the nonrelativistic quantum chromodynamics factorization formalism to the $Υ(1S,2S,3S)$ hadroproduction, a complete analysis on the polarization parameters $λ_θ$, $λ_{θφ}$, $λ_φ$ for the production are presented at QCD next-to-leading order. With the long-distance matrix elements extracted from experimental data for the production rate and polarization parameter $λ_θ$ of $Υ$ hadroproduction, our results provide a good description for the measured parameters $λ_{θφ}$ and $λ_φ$ in both the helicity and the Collins-Soper frames. In our calculations the frame invariant parameter $\tildeλ$ is consistent in the two frames. Finally, it is pointed out that there are discrepancies for $\tildeλ$ between available experimental data and corresponding theoretical predictions. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: 9 pages, 8 figures

arXiv:2009.02782 [pdf, other]

Contextual Personalized Re-Ranking of Music Recommendations through Audio Features

Authors: Boning Gong, Mesut Kaya, Nava Tintarev

Abstract: Users are able to access millions of songs through music streaming services like Spotify, Pandora, and Deezer. Access to such large catalogs, created a need for relevant song recommendations. However, user preferences are highly subjective in nature and change according to context (e.g., music that is suitable in the morning is not as suitable in the evening). Moreover, the music one user may pref… ▽ More Users are able to access millions of songs through music streaming services like Spotify, Pandora, and Deezer. Access to such large catalogs, created a need for relevant song recommendations. However, user preferences are highly subjective in nature and change according to context (e.g., music that is suitable in the morning is not as suitable in the evening). Moreover, the music one user may prefer in a given context may be different from what another user prefers in the same context (i.e., what is considered good morning music differs across users). Accurately representing these preferences is essential to creating accurate and effective song recommendations. User preferences for songs can be based on high level audio features, such as tempo and valence. In this paper, we therefore propose a contextual re-ranking algorithm, based on audio feature representations of user preferences in specific contextual conditions. We evaluate the performance of our re-ranking algorithm using the #NowPlaying-RS dataset, which exists of user listening events crawled from Twitter and is enriched with song audio features. We compare a global (context for all users) and personalized (context for each user) model based on these audio features. The global model creates an audio feature representation of each contextual condition based on the preferences of all users. Unlike the global model, the personalized model creates user-specific audio feature representations of contextual conditions, and is measured across 333 distinct users. We show that the personalized model outperforms the global model when evaluated using the precision and mean average precision metrics. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: RecSys 2020: CARS 2.0: Workshop on Context-Aware Recommender Systems

arXiv:2008.03800 [pdf, other]

Spatiotemporal Contrastive Video Representation Learning

Authors: Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Abstract: We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn spatiotemporal visual representations from unlabeled videos. Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away. We study what makes for good data augmen… ▽ More We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn spatiotemporal visual representations from unlabeled videos. Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away. We study what makes for good data augmentations for video self-supervised learning and find that both spatial and temporal information are crucial. We carefully design data augmentations involving spatial and temporal cues. Concretely, we propose a temporally consistent spatial augmentation method to impose strong spatial augmentations on each frame of the video while maintaining the temporal consistency across frames. We also propose a sampling-based temporal augmentation method to avoid overly enforcing invariance on clips that are distant in time. On Kinetics-600, a linear classifier trained on the representations learned by CVRL achieves 70.4% top-1 accuracy with a 3D-ResNet-50 (R3D-50) backbone, outperforming ImageNet supervised pre-training by 15.7% and SimCLR unsupervised pre-training by 18.8% using the same inflated R3D-50. The performance of CVRL can be further improved to 72.9% with a larger R3D-152 (2x filters) backbone, significantly closing the gap between unsupervised and supervised video representation learning. Our code and models will be available at https://github.com/tensorflow/models/tree/master/official/. △ Less

Submitted 5 April, 2021; v1 submitted 9 August, 2020; originally announced August 2020.

Comments: CVPR2021 Camera ready

arXiv:2007.11540 [pdf, ps, other]

Finite Element Calculation of Photonic Band Structures for Frequency Dependent Materials

Authors: Wenqiang Xiao, Bo Gong, Jiguang Sun, Zhimin Zhang

Abstract: We consider the calculation of the band structure of frequency dependent photonic crystals. The associated eigenvalue problem is nonlinear and it is challenging to develop effective convergent numerical methods. In this paper, the band structure problem is formulated as the eigenvalue problem of a holomorphic Fredholm operator function of index zero. Lagrange finite elements are used to discretize… ▽ More We consider the calculation of the band structure of frequency dependent photonic crystals. The associated eigenvalue problem is nonlinear and it is challenging to develop effective convergent numerical methods. In this paper, the band structure problem is formulated as the eigenvalue problem of a holomorphic Fredholm operator function of index zero. Lagrange finite elements are used to discretize the operator function. Then the convergence of the eigenvalues is proved using the abstract approximation theory for holomorphic operator functions. A spectral indicator method is developed to practically compute the eigenvalues. Numerical examples are presented to validate the theory and show the effectiveness of the proposed method. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2007.09162 [pdf, other]

Improving Object Detection with Selective Self-supervised Self-training

Authors: Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong

Abstract: We study how to leverage Web images to augment human-curated object detection datasets. Our approach is two-pronged. On the one hand, we retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods. The Web images are diverse, supplying a wide variety of object poses, appearances, their interactions with the context, etc. On the other… ▽ More We study how to leverage Web images to augment human-curated object detection datasets. Our approach is two-pronged. On the one hand, we retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods. The Web images are diverse, supplying a wide variety of object poses, appearances, their interactions with the context, etc. On the other hand, we propose a novel learning method motivated by two parallel lines of work that explore unlabeled data for image classification: self-training and self-supervised learning. They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets. To tackle this challenge, we propose a selective net to rectify the supervision signals in Web images. It not only identifies positive bounding boxes but also creates a safe zone for mining hard negative boxes. We report state-of-the-art results on detecting backpacks and chairs from everyday scenes, along with other challenging object classes. △ Less

Submitted 24 July, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted to ECCV 2020

arXiv:2007.08488 [pdf, other]

Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds

Authors: Li Yi, Boqing Gong, Thomas Funkhouser

Abstract: We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors. Based on the observation that sparse 3D point clouds are sampled from 3D surfaces, we take a Complete and Label approach to recover the underlying surfaces before passing them to a segmentation network. Specifically, we… ▽ More We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors. Based on the observation that sparse 3D point clouds are sampled from 3D surfaces, we take a Complete and Label approach to recover the underlying surfaces before passing them to a segmentation network. Specifically, we design a Sparse Voxel Completion Network (SVCN) to complete the 3D surfaces of a sparse point cloud. Unlike semantic labels, to obtain training pairs for SVCN requires no manual labeling. We also introduce local adversarial learning to model the surface prior. The recovered 3D surfaces serve as a canonical domain, from which semantic labels can transfer across different LiDAR sensors. Experiments and ablation studies with our new benchmark for cross-domain semantic labeling of LiDAR data show that the proposed approach provides 8.2-36.6% better performance than previous domain adaptation methods. △ Less

Submitted 30 March, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

arXiv:2006.14536 [pdf, other]

Smooth Adversarial Training

Authors: Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

Abstract: It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key… ▽ More It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet. SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9.5% for accuracy and 11.6% for robustness. Models are available at https://github.com/cihangxie/SmoothAdversarialTraining. △ Less

Submitted 10 July, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: tech report

arXiv:2006.08339 [pdf, other]

Graph-Stega: Semantic Controllable Steganographic Text Generation Guided by Knowledge Graph

Authors: Zhongliang Yang, Baitao Gong, Yamin Li, **shuai Yang, Zhiwen Hu, Yongfeng Huang

Abstract: Most of the existing text generative steganographic methods are based on coding the conditional probability distribution of each word during the generation process, and then selecting specific words according to the secret information, so as to achieve information hiding. Such methods have their limitations which may bring potential security risks. Firstly, with the increase of embedding rate, the… ▽ More Most of the existing text generative steganographic methods are based on coding the conditional probability distribution of each word during the generation process, and then selecting specific words according to the secret information, so as to achieve information hiding. Such methods have their limitations which may bring potential security risks. Firstly, with the increase of embedding rate, these models will choose words with lower conditional probability, which will reduce the quality of the generated steganographic texts; secondly, they can not control the semantic expression of the final generated steganographic text. This paper proposes a new text generative steganography method which is quietly different from the existing models. We use a Knowledge Graph (KG) to guide the generation of steganographic sentences. On the one hand, we hide the secret information by coding the path in the knowledge graph, but not the conditional probability of each generated word; on the other hand, we can control the semantic expression of the generated steganographic text to a certain extent. The experimental results show that the proposed model can guarantee both the quality of the generated text and its semantic expression, which is a supplement and improvement to the current text generation steganography. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Showing 51–100 of 204 results for author: Gong, B