-
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
Authors:
Peng Zhang,
Yunlu Xu,
Zhanzhan Cheng,
Shiliang Pu,
**g Lu,
Liang Qiao,
Yi Niu,
Fei Wu
Abstract:
Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting and recognizing texts in images and (2) information extraction for analyzing and extracting key elements from previously extract…
▽ More
Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting and recognizing texts in images and (2) information extraction for analyzing and extracting key elements from previously extracted plain text. However, they mainly focus on improving information extraction task, while neglecting the fact that text reading and information extraction are mutually correlated. In this paper, we propose a unified end-to-end text reading and information extraction network, where the two tasks can reinforce each other. Specifically, the multimodal visual and textual features of text reading are fused for information extraction and in turn, the semantics in information extraction contribute to the optimization of text reading. On three real-world datasets with diverse document images (from fixed layout to variable layout, from structured text to semi-structured text), our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
△ Less
Submitted 25 October, 2021; v1 submitted 26 May, 2020;
originally announced May 2020.
-
SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition
Authors:
Chengwei Zhang,
Yunlu Xu,
Zhanzhan Cheng,
Shiliang Pu,
Yi Niu,
Fei Wu,
Futai Zou
Abstract:
Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much atte…
▽ More
Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than the existing spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show that the use of SPIN results in a significant improvement on multiple text recognition benchmarks compared to the state-of-the-arts.
△ Less
Submitted 25 October, 2021; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Object-QA: Towards High Reliable Object Quality Assessment
Authors:
**g Lu,
Baorui Zou,
Zhanzhan Cheng,
Shiliang Pu,
Shuigeng Zhou,
Yi Niu,
Fei Wu
Abstract:
In object recognition applications, object images usually appear with different quality levels. Practically, it is very important to indicate object image qualities for better application performance, e.g. filtering out low-quality object image frames to maintain robust video object recognition results and speed up inference. However, no previous works are explicitly proposed for addressing the pr…
▽ More
In object recognition applications, object images usually appear with different quality levels. Practically, it is very important to indicate object image qualities for better application performance, e.g. filtering out low-quality object image frames to maintain robust video object recognition results and speed up inference. However, no previous works are explicitly proposed for addressing the problem. In this paper, we define the problem of object quality assessment for the first time and propose an effective approach named Object-QA to assess high-reliable quality scores for object images. Concretely, Object-QA first employs a well-designed relative quality assessing module that learns the intra-class-level quality scores by referring to the difference between object images and their estimated templates. Then an absolute quality assessing module is designed to generate the final quality scores by aligning the quality score distributions in inter-class. Besides, Object-QA can be implemented with only object-level annotations, and is also easily deployed to a variety of object recognition tasks. To our best knowledge this is the first work to put forward the definition of this problem and conduct quantitative evaluations. Validations on 5 different datasets show that Object-QA can not only assess high-reliable quality scores according with human cognition, but also improve application performance.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Fast and Accurate Langevin Simulations of Stochastic Hodgkin-Huxley Dynamics
Authors:
Shusen Pu,
Peter J. Thomas
Abstract:
Fox and Lu introduced a Langevin framework for discrete-time stochastic models of randomly gated ion channels such as the Hodgkin-Huxley (HH) system. They derived a Fokker-Planck equation with state-dependent diffusion tensor $D$ and suggested a Langevin formulation with noise coefficient matrix $S$ such that $SS^\intercal=D$. Subsequently, several authors introduced a variety of Langevin equation…
▽ More
Fox and Lu introduced a Langevin framework for discrete-time stochastic models of randomly gated ion channels such as the Hodgkin-Huxley (HH) system. They derived a Fokker-Planck equation with state-dependent diffusion tensor $D$ and suggested a Langevin formulation with noise coefficient matrix $S$ such that $SS^\intercal=D$. Subsequently, several authors introduced a variety of Langevin equations for the HH system. In this paper, we present a natural 14-dimensional dynamics for the HH system in which each \emph{directed} edge in the ion channel state transition graph acts as an independent noise source, leading to a $14\times 28$ noise coefficient matrix $S$. We show that (i) the corresponding 14D system of ordinary differential \rev{equations} is consistent with the classical 4D representation of the HH system; (ii) the 14D representation leads to a noise coefficient matrix $S$ that can be obtained cheaply on each timestep, without requiring a matrix decomposition; (iii) sample trajectories of the 14D representation are pathwise equivalent to trajectories of Fox and Lu's system, as well as trajectories of several existing Langevin models; (iv) our 14D representation (and those equivalent to it) give the most accurate interspike-interval distribution, not only with respect to moments but under both the $L_1$ and $L_\infty$ metric-space norms; and (v) the 14D representation gives an approximation to exact Markov chain simulations that are as fast and as efficient as all equivalent models. Our approach goes beyond existing models, in that it supports a stochastic shielding decomposition that dramatically simplifies $S$ with minimal loss of accuracy under both voltage- and current-clamp conditions.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
Recent developments in chiral and spin polarization effects in heavy-ion collisions
Authors:
Jian-Hua Gao,
Guo-Liang Ma,
Shi Pu,
Qun Wang
Abstract:
We give a brief overview of recent theoretical and experimental results on the chiral magnetic effect and spin polarization effect in heavy-ion collisions. We present updated experimental results for the chiral magnetic effect and related phenomena. The time evolution of the magnetic fields in different models is discussed. The newly developed quantum kinetic theory for massive fermions is reviewe…
▽ More
We give a brief overview of recent theoretical and experimental results on the chiral magnetic effect and spin polarization effect in heavy-ion collisions. We present updated experimental results for the chiral magnetic effect and related phenomena. The time evolution of the magnetic fields in different models is discussed. The newly developed quantum kinetic theory for massive fermions is reviewed. We present theoretical and experimental results for the polarization of $Λ$ hyperons and the $ρ_{00}$ value of vector mesons.
△ Less
Submitted 3 August, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
Authors:
Qi She,
Fan Feng,
Qi Liu,
Rosa H. M. Chan,
Xinyue Hao,
Chuanlin Lan,
Qihan Yang,
Vincenzo Lomonaco,
German I. Parisi,
Heechul Bae,
Eoin Brophy,
Baoquan Chen,
Gabriele Graffieti,
Vidit Goel,
Hyonyoung Han,
Sathursan Kanagarajah,
Somesh Kumar,
Siew-Kei Lam,
Tin Lun Lam,
Liang Ma,
Davide Maltoni,
Lorenzo Pellegrini,
Duvindu Piyasena,
Shiliang Pu,
Debdoot Sheet
, et al. (11 additional authors not shown)
Abstract:
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w…
▽ More
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Anomalous magnetohydrodynamics with constant anisotropic electric conductivities
Authors:
Ren-jie Wang,
Patrick Co**er,
Shi Pu
Abstract:
We study anomalous magnetohydrodynamics in a longitudinal boost invariant Bjorken flow with constant anisotropic electric conductivities as outlined in Ref. [1]. For simplicity, we consider a neutral fluid and a force-free magnetic field in the transverse direction. We derived analytic solutions of the electromagnetic fields in the laboratory frame, the chiral density, and the energy density as fu…
▽ More
We study anomalous magnetohydrodynamics in a longitudinal boost invariant Bjorken flow with constant anisotropic electric conductivities as outlined in Ref. [1]. For simplicity, we consider a neutral fluid and a force-free magnetic field in the transverse direction. We derived analytic solutions of the electromagnetic fields in the laboratory frame, the chiral density, and the energy density as functions of proper time.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks
Authors:
Shi Pu
Abstract:
In this paper, we consider the problem of distributed consensus optimization over multi-agent networks with directed network topology. Assuming each agent has a local cost function that is smooth and strongly convex, the global objective is to minimize the average of all the local cost functions. To solve the problem, we introduce a robust gradient tracking method (R-Push-Pull) adapted from the re…
▽ More
In this paper, we consider the problem of distributed consensus optimization over multi-agent networks with directed network topology. Assuming each agent has a local cost function that is smooth and strongly convex, the global objective is to minimize the average of all the local cost functions. To solve the problem, we introduce a robust gradient tracking method (R-Push-Pull) adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits the advantages of Push-Pull and enjoys linear convergence to the optimal solution with exact communication. Under noisy information exchange, R-Push-Pull is more robust than the existing gradient tracking based algorithms; the solutions obtained by each agent reach a neighborhood of the optimum in expectation exponentially fast under a constant stepsize policy. We provide a numerical example that demonstrate the effectiveness of R-Push-Pull.
△ Less
Submitted 20 August, 2020; v1 submitted 31 March, 2020;
originally announced March 2020.
-
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Authors:
Long Chen,
Xin Yan,
Jun Xiao,
Hanwang Zhang,
Shiliang Pu,
Yueting Zhuang
Abstract:
Despite Visual Question Answering (VQA) has realized impressive progress over the last few years, today's VQA models tend to capture superficial linguistic correlations in the train set and fail to generalize to the test set with different QA distributions. To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA mode…
▽ More
Despite Visual Question Answering (VQA) has realized impressive progress over the last few years, today's VQA models tend to capture superficial linguistic correlations in the train set and fail to generalize to the test set with different QA distributions. To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP. However, since the complexity of design, current methods are unable to equip the ensemble-based models with two indispensable characteristics of an ideal VQA model: 1) visual-explainable: the model should rely on the right visual regions when making decisions. 2) question-sensitive: the model should be sensitive to the linguistic variations in question. To this end, we propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme. The CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions, and assigning different ground-truth answers. After training with the complementary samples (ie, the original and generated samples), the VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. In return, the performance of these models is further boosted. Extensive ablations have shown the effectiveness of CSS. Particularly, by building on top of the model LMH, we achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
△ Less
Submitted 14 March, 2020;
originally announced March 2020.
-
Neural Inheritance Relation Guided One-Shot Layer Assignment Search
Authors:
Rang Meng,
Weijie Chen,
Di Xie,
Yuan Zhang,
Shiliang Pu
Abstract:
Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks w…
▽ More
Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks with different layer assignments, that is, the optimal layer assignments for deeper networks always inherit from those for shallow networks. Inspired by this neural inheritance relation, we propose an efficient one-shot layer assignment search approach via inherited sampling. Specifically, the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space. Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny-ImageNet and ImageNet. Our searched results are remarkably superior to the handcrafted ones under the unchanged computational budgets. The neural inheritance relation discovered in this paper can provide insights to the universal neural architecture search.
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units
Authors:
Zhanzhan Cheng,
Yunlu Xu,
Mingjian Cheng,
Yu Qiao,
Shiliang Pu,
Yi Niu,
Fei Wu
Abstract:
Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states). However, the vanilla gates in RNN (e.g., the input gate in LSTM) suffer from the problem of gate undertraining, which can be caused by various factors, such as the saturating act…
▽ More
Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states). However, the vanilla gates in RNN (e.g., the input gate in LSTM) suffer from the problem of gate undertraining, which can be caused by various factors, such as the saturating activation functions, the gate layouts (e.g., the gate number and gating functions), or even the suboptimal memory state etc.. Those may result in failures of learning gating switch roles and thus the weak performance. In this paper, we propose a new gating mechanism within general gated recurrent neural networks to handle this issue. Specifically, the proposed gates directly short connect the extracted input features to the outputs of vanilla gates, denoted as refined gates. The refining mechanism allows enhancing gradient back-propagation as well as extending the gating activation scope, which can guide RNN to reach possibly deeper minima. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU. Extensive experiments on 3 synthetic tasks, 3 language modeling tasks and 5 scene text recognition benchmarks demonstrate the effectiveness of our method.
△ Less
Submitted 26 May, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Interplay between fractional quantum Hall liquid and crystal phases at low filling
Authors:
Zheng-Wei Zuo,
Ajit C. Balram,
Songyang Pu,
Jianyun Zhao,
Thierry Jolicoeur,
A. Wójs,
J. K. Jain
Abstract:
The nature of the state at low Landau-level filling factors has been a longstanding puzzle in the field of the fractional quantum Hall effect. While theoretical calculations suggest that a crystal is favored at filling factors $ν\lesssim 1/6$, experiments show, at somewhat elevated temperatures, minima in the longitudinal resistance that are associated with fractional quantum Hall effect at $ν=$ 1…
▽ More
The nature of the state at low Landau-level filling factors has been a longstanding puzzle in the field of the fractional quantum Hall effect. While theoretical calculations suggest that a crystal is favored at filling factors $ν\lesssim 1/6$, experiments show, at somewhat elevated temperatures, minima in the longitudinal resistance that are associated with fractional quantum Hall effect at $ν=$ 1/7, 2/11, 2/13, 3/17, 3/19, 1/9, 2/15 and 2/17, which belong to the standard sequences $ν=n/(6n\pm 1)$ and $ν=n/(8n\pm 1)$. To address this paradox, we investigate the nature of some of the low-$ν$ states, specifically $ν=1/7$, $2/13$, and $1/9$, by variational Monte Carlo, density matrix renormalization group, and exact diagonalization methods. We conclude that in the thermodynamic limit, these are likely to be incompressible fractional quantum Hall liquids, albeit with strong short-range crystalline correlations. This suggests a natural explanation for the experimentally observed behavior and a rich phase diagram that admits, in the low-disorder limit, a multitude of crystal-FQHE liquid transitions as the filling factor is reduced.
△ Less
Submitted 19 August, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting
Authors:
Liang Qiao,
Sanli Tang,
Zhanzhan Cheng,
Yunlu Xu,
Yi Niu,
Shiliang Pu,
Fei Wu
Abstract:
Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following text recognition part mainly because of two reasons: 1) recognizing arbitrary shaped text is still a challenging task, and 2) prevalent non-trainable pipeline strategies between text detection and text recognition will lea…
▽ More
Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following text recognition part mainly because of two reasons: 1) recognizing arbitrary shaped text is still a challenging task, and 2) prevalent non-trainable pipeline strategies between text detection and text recognition will lead to suboptimal performances. To handle this incompatibility problem, in this paper we propose an end-to-end trainable text spotting approach named Text Perceptron. Concretely, Text Perceptron first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies without extra parameters. It unites text detection and the following recognition part into a whole framework, and helps the whole network achieve global optimization. Experiments show that our method achieves competitive performance on two standard text benchmarks, i.e., ICDAR 2013 and ICDAR 2015, and also obviously outperforms existing methods on irregular text benchmarks SCUT-CTW1500 and Total-Text.
△ Less
Submitted 25 October, 2021; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Relativistic decomposition of the orbital and the spin angular momentum in chiral physics and Feynman's angular momentum paradox
Authors:
Kenji Fukushima,
Shi Pu
Abstract:
Over recent years we have witnessed tremendous progresses in our understanding on the angular momentum decomposition. In the context of the proton spin problem in high energy processes the angular momentum decomposition by Jaffe and Manohar, which is based on the canonical definition, and the alternative by Ji, which is based on the Belinfante improved one, have been revisited under light shed by…
▽ More
Over recent years we have witnessed tremendous progresses in our understanding on the angular momentum decomposition. In the context of the proton spin problem in high energy processes the angular momentum decomposition by Jaffe and Manohar, which is based on the canonical definition, and the alternative by Ji, which is based on the Belinfante improved one, have been revisited under light shed by Chen et al. leading to seminal works by Hatta, Wakamatsu, Leader, etc. In chiral physics as exemplified by the chiral vortical effect and applications to the relativistic nucleus-nucleus collisions, sometimes referred to as a relativistic extension of the Barnett and the Einstein--de Haas effects, such arguments of the angular momentum decomposition would be of crucial importance. We pay our special attention to the fermionic part in the canonical and the Belinfante conventions and discuss a difference between them, which is reminiscent of a classical example of Feynman's angular momentum paradox. We point out its possible relevance to early-time dynamics in the nucleus-nucleus collisions, resulting in excess by the electromagnetic angular momentum.
△ Less
Submitted 19 April, 2020; v1 submitted 2 January, 2020;
originally announced January 2020.
-
Towards a full solution of relativistic Boltzmann equation for quark-gluon matter on GPUs
Authors:
Jun-Jie Zhang,
Hong-Zhong Wu,
Shi Pu,
Guang-You Qin,
Qun Wang
Abstract:
We have developed a numerical framework for a full solution of the relativistic Boltzmann equations for the quark-gluon matter using the multiple Graphics Processing Units (GPUs) on distributed clusters. Including all the $2 \to 2$ scattering processes of 3-flavor quarks and gluons, we compute the time evolution of distribution functions in both coordinate and momentum spaces for the cases of pure…
▽ More
We have developed a numerical framework for a full solution of the relativistic Boltzmann equations for the quark-gluon matter using the multiple Graphics Processing Units (GPUs) on distributed clusters. Including all the $2 \to 2$ scattering processes of 3-flavor quarks and gluons, we compute the time evolution of distribution functions in both coordinate and momentum spaces for the cases of pure gluons, quarks and the mixture of quarks and gluons. By introducing a symmetrical sampling method on GPUs which ensures the particle number conservation, our framework is able to perform the space-time evolution of quark-gluon system towards thermal equilibrium with high performance. We also observe that the gluons naturally accumulate in the soft region at the early time, which may indicate the gluon condensation.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy
Authors:
Jiaxu Chen,
**g Hao,
Kai Chen,
Di Xie,
Shicai Yang,
Shiliang Pu
Abstract:
Audio classification can distinguish different kinds of sounds, which is helpful for intelligent applications in daily life. However, it remains a challenging task since the sound events in an audio clip is probably multiple, even overlap**. This paper introduces an end-to-end audio classification system based on raw waveforms and mix-training strategy. Compared to human-designed features which…
▽ More
Audio classification can distinguish different kinds of sounds, which is helpful for intelligent applications in daily life. However, it remains a challenging task since the sound events in an audio clip is probably multiple, even overlap**. This paper introduces an end-to-end audio classification system based on raw waveforms and mix-training strategy. Compared to human-designed features which have been widely used in existing research, raw waveforms contain more complete information and are more appropriate for multi-label classification. Taking raw waveforms as input, our network consists of two variants of ResNet structure which can learn a discriminative representation. To explore the information in intermediate layers, a multi-level prediction with attention structure is applied in our model. Furthermore, we design a mix-training strategy to break the performance limitation caused by the amount of training data. Experiments show that the mean average precision of the proposed audio classification system on Audio Set dataset is 37.2%. Without using extra training data, our system exceeds the state-of-the-art multi-level attention model.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Hall Viscosity of Composite Fermions
Authors:
Songyang Pu,
Mikael Fremling,
J. K. Jain
Abstract:
Hall viscosity, also known as the Lorentz shear modulus, has been proposed as a topological property of a quantum Hall fluid. Using a recent formulation of the composite fermion theory on the torus, we evaluate the Hall viscosities for a large number of fractional quantum Hall states at filling factors of the form $ν=n/(2pn\pm 1)$, where $n$ and $p$ are integers, from the explicit wave functions f…
▽ More
Hall viscosity, also known as the Lorentz shear modulus, has been proposed as a topological property of a quantum Hall fluid. Using a recent formulation of the composite fermion theory on the torus, we evaluate the Hall viscosities for a large number of fractional quantum Hall states at filling factors of the form $ν=n/(2pn\pm 1)$, where $n$ and $p$ are integers, from the explicit wave functions for these states. The calculated Hall viscosities $η^A$ agree with the expression $η^A=(\hbar/4) {\cal S}ρ$, where $ρ$ is the density and ${\cal S}=2p\pm n$ is the "shift" in the spherical geometry. We discuss the role of modular invariance of the wave functions, of the center-of-mass momentum, and also of the lowest-Landau-level projection. Finally, we show that the Hall viscosity for $ν={n\over 2pn+1}$ may be derived analytically from the microscopic wave functions, provided that the overall normalization factor satisfies a certain behavior in the thermodynamic limit. This derivation should be applicable to a class of states in the parton construction, which are products of integer quantum Hall states with magnetic fields pointing in the same direction.
△ Less
Submitted 14 July, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization
Authors:
Chengwei Zhang,
Yunlu Xu,
Zhanzhan Cheng,
Yi Niu,
Shiliang Pu,
Fei Wu,
Futai Zou
Abstract:
Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizi…
▽ More
Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizing the video-level classification loss, which exploits the most discriminative parts of actions but ignores the minor regions. In this paper, we propose a novel weakly-supervised framework by adversarial learning of two modules for eliminating such demerits. Specifically, the first module is designed as a well-designed Seeded Sequence Growing (SSG) Network for progressively extending seed regions (namely the highly reliable regions initialized by a CAS-based framework) to their expected boundaries. The second module is a specific classifier for mining trivial or incomplete action regions, which is trained on the shared features after erasing the seeded regions activated by SSG. In this way, a whole network composed of these two modules can be trained in an adversarial manner. The goal of the adversary is to mine features that are difficult for the action classifier. That is, erasion from SSG will force the classifier to discover minor or even new action regions on the input feature sequence, and the classifier will drive the seeds to grow, alternately. At last, we could obtain the action locations and categories from the well-trained SSG and the classifier. Extensive experiments on two public benchmarks THUMOS'14 and ActivityNet1.3 demonstrate the impressive performance of our proposed method compared with the state-of-the-arts.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning
Authors:
Shi Pu,
Alex Olshevsky,
Ioannis Ch. Paschalidis
Abstract:
We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a ce…
▽ More
We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).
△ Less
Submitted 18 February, 2020; v1 submitted 28 June, 2019;
originally announced June 2019.
-
A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent
Authors:
Shi Pu,
Alex Olshevsky,
Ioannis Ch. Paschalidis
Abstract:
This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex a…
▽ More
This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_T=\mathcal{O}\left(\frac{n}{(1-ρ_w)^2}\right)$, where $1-ρ_w$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $Ω\left(\frac{n}{(1-ρ_w)^2} \right)$, implying the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.
△ Less
Submitted 29 January, 2021; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications
Authors:
Ming Lu,
Ming Cheng,
Yiling Xu,
Shiliang Pu,
Qiu Shen,
Zhan Ma
Abstract:
Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictio…
▽ More
Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictions in a recurrent fashion temporally. We have integrated this QENet as a standard-alone post-processing subsystem to the High Efficiency Video Coding (HEVC) compliant decoder. Experimental results show that our QENet demonstrates the state-of-the-art performance against default in-loop filters in HEVC and other deep learning based methods with noticeable objective gains in Peak-Signal-to-Noise Ratio (PSNR) and subjective gains visually.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.
-
Posterior-regularized REINFORCE for Instance Selection in Distant Supervision
Authors:
Qi Zhang,
Siliang Tang,
Xiang Ren,
Fei Wu,
Shiliang Pu,
Yueting Zhuang
Abstract:
This paper provides a new way to improve the efficiency of the REINFORCE training process. We apply it to the task of instance selection in distant supervision. Modeling the instance selection in one bag as a sequential decision process, a reinforcement learning agent is trained to determine whether an instance is valuable or not and construct a new bag with less noisy instances. However unbiased…
▽ More
This paper provides a new way to improve the efficiency of the REINFORCE training process. We apply it to the task of instance selection in distant supervision. Modeling the instance selection in one bag as a sequential decision process, a reinforcement learning agent is trained to determine whether an instance is valuable or not and construct a new bag with less noisy instances. However unbiased methods, such as REINFORCE, could usually take much time to train. This paper adopts posterior regularization (PR) to integrate some domain-specific rules in instance selection using REINFORCE. As the experiment results show, this method remarkably improves the performance of the relation classifier trained on cleaned distant supervision dataset as well as the efficiency of the REINFORCE training.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Anomalous magnetohydrodynamics with longitudinal boost invariance and chiral magnetic effect
Authors:
Irfan Siddique,
Ren-jie Wang,
Shi Pu,
Qun Wang
Abstract:
We study relativistic magnetohydrodynamics with longitudinal boost invariance in the presence of chiral magnetic effects and finite electric conductivity. With initial magnetic fields parallel or anti-parallel to electric fields, we derive the analytic solutions of electromagnetic fields and the chiral number and energy density in an expansion of several parameters determined by initial conditions…
▽ More
We study relativistic magnetohydrodynamics with longitudinal boost invariance in the presence of chiral magnetic effects and finite electric conductivity. With initial magnetic fields parallel or anti-parallel to electric fields, we derive the analytic solutions of electromagnetic fields and the chiral number and energy density in an expansion of several parameters determined by initial conditions. The numerical solutions show that such analytic solutions work well in weak fields or large chiral fluctuations. We also discuss the properties of electromagnetic fields in the laboratory frame.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
Authors:
Weijie Chen,
Di Xie,
Yuan Zhang,
Shiliang Pu
Abstract:
Shift operation is an efficient alternative over depthwise separable convolution. However, it is still bottlenecked by its implementation manner, namely memory movement. To put this direction forward, a new and novel basic component named Sparse Shift Layer (SSL) is introduced in this paper to construct efficient convolutional neural networks. In this family of architectures, the basic block is on…
▽ More
Shift operation is an efficient alternative over depthwise separable convolution. However, it is still bottlenecked by its implementation manner, namely memory movement. To put this direction forward, a new and novel basic component named Sparse Shift Layer (SSL) is introduced in this paper to construct efficient convolutional neural networks. In this family of architectures, the basic block is only composed by 1x1 convolutional layers with only a few shift operations applied to the intermediate feature maps. To make this idea feasible, we introduce shift operation penalty during optimization and further propose a quantization-aware shift learning method to impose the learned displacement more friendly for inference. Extensive ablation studies indicate that only a few shift operations are sufficient to provide spatial information communication. Furthermore, to maximize the role of SSL, we redesign an improved network architecture to Fully Exploit the limited capacity of neural Network (FE-Net). Equipped with SSL, this network can achieve 75.0% top-1 accuracy on ImageNet with only 563M M-Adds. It surpasses other counterparts constructed by depthwise separable convolution and the networks searched by NAS in terms of accuracy and practical speed.
△ Less
Submitted 12 March, 2019;
originally announced March 2019.
-
You Only Recognize Once: Towards Fast Video Text Spotting
Authors:
Zhanzhan Cheng,
**g Lu,
Yi Niu,
Shiliang Pu,
Fei Wu,
Shuigeng Zhou
Abstract:
Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, framewisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as t…
▽ More
Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, framewisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as the interferences of low-quality text. In this paper, we propose a fast and robust video text spotting framework by only recognizing the localized text one-time instead of frame-wisely recognition. Specifically, we first obtain text regions in videos with a well-designed spatial-temporal detector. Then we concentrate on develo** a novel text recommender for selecting the highest-quality text from text streams and only recognizing the selected ones. Here, the recommender assembles text tracking, quality scoring and recognition into an end-to-end trainable module, which not only avoids the interferences from low-quality text but also dramatically speeds up the video text spotting process. In addition, we collect a larger scale video text dataset (LSVTD) for promoting the video text spotting community, which contains 100 text videos from 22 different real-life scenarios. Extensive experiments on two public benchmarks show that our method greatly speeds up the recognition process averagely by 71 times compared with the frame-wise manner, and also achieves the remarkable state-of-the-art.
△ Less
Submitted 25 October, 2021; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Collaborative Spatio-temporal Feature Learning for Video Action Recognition
Authors:
Chao Li,
Qiaoyong Zhong,
Di Xie,
Shiliang Pu
Abstract:
Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatio-temporal features collaboratively by imposing a weight-sharing constraint on the learnabl…
▽ More
Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatio-temporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters. In particular, we perform 2D convolution along three orthogonal views of volumetric video data,which learns spatial appearance and temporal motion cues respectively. By sharing the convolution kernels of different views, spatial and temporal features are collaboratively learned and thus benefit from each other. The complementary features are subsequently fused by a weighted summation whose coefficients are learned end-to-end. Our approach achieves state-of-the-art performance on large-scale benchmarks and won the 1st place in the Moments in Time Challenge 2018. Moreover, based on the learned coefficients of different views, we are able to quantify the contributions of spatial and temporal features. This analysis sheds light on interpretability of the model and may also guide the future design of algorithm for video recognition.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction
Authors:
Yu** Yuan,
Liyuan Liu,
Siliang Tang,
Zhongfei Zhang,
Yueting Zhuang,
Shiliang Pu,
Fei Wu,
Xiang Ren
Abstract:
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selec…
▽ More
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C$^2$SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks
Authors:
Weijie Chen,
Yuan Zhang,
Di Xie,
Shiliang Pu
Abstract:
Neuron pruning is an efficient method to compress the network into a slimmer one for reducing the computational cost and storage overhead. Most of state-of-the-art results are obtained in a layer-by-layer optimization mode. It discards the unimportant input neurons and uses the survived ones to reconstruct the output neurons approaching to the original ones in a layer-by-layer manner. However, an…
▽ More
Neuron pruning is an efficient method to compress the network into a slimmer one for reducing the computational cost and storage overhead. Most of state-of-the-art results are obtained in a layer-by-layer optimization mode. It discards the unimportant input neurons and uses the survived ones to reconstruct the output neurons approaching to the original ones in a layer-by-layer manner. However, an unnoticed problem arises that the information loss is accumulated as layer increases since the survived neurons still do not encode the entire information as before. A better alternative is to propagate the entire useful information to reconstruct the pruned layer instead of directly discarding the less important neurons. To this end, we propose a novel Layer Decomposition-Recomposition Framework (LDRF) for neuron pruning, by which each layer's output information is recovered in an embedding space and then propagated to reconstruct the following pruned layers with useful information preserved. We mainly conduct our experiments on ILSVRC-12 benchmark with VGG-16 and ResNet-50. What should be emphasized is that our results before end-to-end fine-tuning are significantly superior owing to the information-preserving property of our proposed framework.With end-to-end fine-tuning, we achieve state-of-the-art results of 5.13x and 3x speed-up with only 0.5% and 0.65% top-5 accuracy drop respectively, which outperform the existing neuron pruning methods.
△ Less
Submitted 16 December, 2018;
originally announced December 2018.
-
Learning Incremental Triplet Margin for Person Re-identification
Authors:
Yingying Zhang,
Qiaoyong Zhong,
Liang Ma,
Di Xie,
Shiliang Pu
Abstract:
Person re-identification (ReID) aims to match people across multiple non-overlap** video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is one of the state-of-the-arts. In this work, we explore the margin between positive and negative pairs of triplets and prove that large margin is benefi…
▽ More
Person re-identification (ReID) aims to match people across multiple non-overlap** video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is one of the state-of-the-arts. In this work, we explore the margin between positive and negative pairs of triplets and prove that large margin is beneficial. In particular, we propose a novel multi-stage training strategy which learns incremental triplet margin and improves triplet loss effectively. Multiple levels of feature maps are exploited to make the learned features more discriminative. Besides, we introduce global hard identity searching method to sample hard identities when generating a training batch. Extensive experiments on Market-1501, CUHK03, and DukeMTMCreID show that our approach yields a performance boost and outperforms most existing state-of-the-art methods.
△ Less
Submitted 16 December, 2018;
originally announced December 2018.
-
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Authors:
Long Chen,
Hanwang Zhang,
Jun Xiao,
Xiangnan He,
Shiliang Pu,
Shih-Fu Chang
Abstract:
Scene graphs -- objects as nodes and visual relationships as edges -- describe the whereabouts and interactions of the things and stuff in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects, fitting the dynamic nature of reasoning with visual context, eg, "perso…
▽ More
Scene graphs -- objects as nodes and visual relationships as edges -- describe the whereabouts and interactions of the things and stuff in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects, fitting the dynamic nature of reasoning with visual context, eg, "person" on "bike" can help to determine the relationship "ride", which in turn contributes to the category confidence of the two objects. However, we argue that the scene dynamics is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the hub or non-hub nodes are unfortunately penalized equally. To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach to resolve the mismatch. CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward. In particular, to assign the reward properly to each agent, CMAT uses a counterfactual baseline that disentangles the agent-specific reward by fixing the dynamics of other agents. Extensive validations on the challenging Visual Genome benchmark show that CMAT achieves a state-of-the-art by significant performance gains under various settings and metrics.
△ Less
Submitted 9 August, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection
Authors:
Yunlu Xu,
Chengwei Zhang,
Zhanzhan Cheng,
Jianwen Xie,
Yi Niu,
Shiliang Pu,
Fei Wu
Abstract:
This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weig…
▽ More
This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation map** (ST-GradCAM) fused with attention weights. Experiments on THUMOS'14 and ActivityNet1.3 datasets show that our approach outperforms the state-of-the-art weakly-supervised method, and performs at par with the fully-supervised counterparts.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.
-
Push-Pull Gradient Methods for Distributed Optimization in Networks
Authors:
Shi Pu,
Wei Shi,
**ming Xu,
Angelia Nedić
Abstract:
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates,…
▽ More
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents' objective functions. From the viewpoint of an agent, the information about the gradients is pushed to the neighbors, while the information about the decision variable is pulled from the neighbors hence giving the name "push-pull gradient methods". The methods utilize two different graphs for the information exchange among agents, and as such, unify the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the proposed algorithms and their many variants converge linearly for strongly convex and smooth objective functions over a network (possibly with unidirectional data links) in both synchronous and asynchronous random-gossip settings. In particular, under the random-gossip setting, "push-pull" is the first class of algorithms for distributed optimization over directed graphs. Moreover, we numerically evaluate our proposed algorithms in both scenarios, and show that they outperform other existing linearly convergent schemes, especially for ill-conditioned problems and networks that are not well balanced.
△ Less
Submitted 6 February, 2020; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Deep Attentive Tracking via Reciprocative Learning
Authors:
Shi Pu,
Yibing Song,
Chao Ma,
Honggang Zhang,
Ming-Hsuan Yang
Abstract:
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selec…
▽ More
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selectively paying attention to temporal robust features. Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms. In this paper, we propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers. The proposed algorithm consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training. The deep classifier learns to attend to the regions of target objects robust to appearance changes. Extensive experiments on large-scale benchmark datasets show that the proposed attentive tracking method performs favorably against the state-of-the-art approaches.
△ Less
Submitted 15 October, 2018; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Eddy magnetization from the chiral Barnett effect
Authors:
Kenji Fukushima,
Shi Pu,
Zebin Qiu
Abstract:
We discuss the spin, the angular momentum, and the magnetic moment of rotating chiral fermions using a kinetic theory. We find that, in addition to the chiral vortical contribution along the rotation axis, finite circular spin polarization is induced by the spin-momentum correlation of chiral fermions, which is canceled by a change in the orbital angular momentum. We point out that the eddy magnet…
▽ More
We discuss the spin, the angular momentum, and the magnetic moment of rotating chiral fermions using a kinetic theory. We find that, in addition to the chiral vortical contribution along the rotation axis, finite circular spin polarization is induced by the spin-momentum correlation of chiral fermions, which is canceled by a change in the orbital angular momentum. We point out that the eddy magnetic moment is nonvanishing due to the $g$-factors, exhibiting the chiral Barnett effect.
△ Less
Submitted 11 April, 2019; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Extreme Network Compression via Filter Group Approximation
Authors:
Bo Peng,
Wenming Tan,
Zheyang Li,
Shun Zhang,
Di Xie,
Shiliang Pu
Abstract:
In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. Unlike other low-rank decomposition algorithms which operate on spatial or channel dimension of filters, our proposed method mainly focuses on exploiting the fi…
▽ More
In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. Unlike other low-rank decomposition algorithms which operate on spatial or channel dimension of filters, our proposed method mainly focuses on exploiting the filter group structure for each layer. For several commonly used CNN models, including VGG and ResNet, our method can reduce over 80% floating-point operations (FLOPs) with less accuracy drop than state-of-the-art methods on various image classification datasets. Besides, experiments demonstrate that our method is conducive to alleviating degeneracy of the compressed network, which hurts the convergence and performance of the network.
△ Less
Submitted 31 July, 2018; v1 submitted 30 July, 2018;
originally announced July 2018.
-
Non-Equilibrium Quantum Transport of Chiral Fluids from Kinetic Theory
Authors:
Yoshimasa Hidaka,
Shi Pu,
Di-Lun Yang
Abstract:
We introduce the quantum-field-theory (QFT) derivation of chiral kinetic theory (CKT) from the Wigner-function approach, which manifests side jumps and non-scalar distribution functions associated with Lorentz covariance and incorporates both background fields and collisions. The formalism is utilized to investigate second-order responses of chiral fluids near local equilibrium. Such non-equilibri…
▽ More
We introduce the quantum-field-theory (QFT) derivation of chiral kinetic theory (CKT) from the Wigner-function approach, which manifests side jumps and non-scalar distribution functions associated with Lorentz covariance and incorporates both background fields and collisions. The formalism is utilized to investigate second-order responses of chiral fluids near local equilibrium. Such non-equilibrium anomalous transport is dissipative and affected by interactions. Contributions from both quantum corrections in anomalous hydrodynamic equations (EOM) of motion and those from the CKT and Wigner functions (WF) are considered in a relaxation-time approximation (RTA). Anomalous charged Hall currents engendered by background electric fields and temperature/chemical-potential gradients are obtained. Furthermore, chiral magnetic/vortical effects (CME/CVE) receive viscous corrections as non-equilibrium modifications stemming from the interplay between side jumps, magnetic-moment coupling, and chiral anomaly.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Axial Ward identity and the Schwinger mechanism -- Applications to the real-time chiral magnetic effect and condensates
Authors:
Patrick Co**er,
Kenji Fukushima,
Shi Pu
Abstract:
We elucidate chirality production under parity breaking constant electromagnetic fields, with which we clarify qualitative differences in and out of equilibrium. For a strong magnetic field the pair production from the Schwinger mechanism increments the chirality. The pair production rate is exponentially suppressed with mass according to the Schwinger formula, while the mass dependence of chirali…
▽ More
We elucidate chirality production under parity breaking constant electromagnetic fields, with which we clarify qualitative differences in and out of equilibrium. For a strong magnetic field the pair production from the Schwinger mechanism increments the chirality. The pair production rate is exponentially suppressed with mass according to the Schwinger formula, while the mass dependence of chirality production in the axial Ward identity appears in the pesudo-scalar term. We demonstrate that in equilibrium field theory calculus the axial anomaly is canceled by the pseudo-scalar condensate for any mass. In a real-time formulation with in- and out-states, we show that the axial Ward identity leads to the chirality production rate consistent with the Schwinger formula. We illuminate that such an in- and out-states formulation makes clear the chiral magnetic effect in and out of equilibrium, and we discuss further applications to real-time condensates.
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation
Authors:
Tao Song,
Leiyu Sun,
Di Xie,
Haiming Sun,
Shiliang Pu
Abstract:
A critical issue in pedestrian detection is to detect small-scale objects that will introduce feeble contrast and motion blur in images and videos, which in our opinion should partially resort to deep-rooted annotation bias. Motivated by this, we propose a novel method integrated with somatic topological line localization (TLL) and temporal feature aggregation for detecting multi-scale pedestrians…
▽ More
A critical issue in pedestrian detection is to detect small-scale objects that will introduce feeble contrast and motion blur in images and videos, which in our opinion should partially resort to deep-rooted annotation bias. Motivated by this, we propose a novel method integrated with somatic topological line localization (TLL) and temporal feature aggregation for detecting multi-scale pedestrians, which works particularly well with small-scale pedestrians that are relatively far from the camera. Moreover, a post-processing scheme based on Markov Random Field (MRF) is introduced to eliminate ambiguities in occlusion cases. Applying with these methodologies comprehensively, we achieve best detection performance on Caltech benchmark and improve performance of small-scale objects significantly (miss rate decreases from 74.53% to 60.79%). Beyond this, we also achieve competitive performance on CityPersons dataset and show the existence of annotation bias in KITTI dataset.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Swarming for Faster Convergence in Stochastic Optimization
Authors:
Shi Pu,
Alfredo Garcia
Abstract:
We study a distributed framework for stochastic optimization which is inspired by models of collective motion found in nature (e.g., swarming) with mild communication requirements. Specifically, we analyze a scheme in which each one of $N > 1$ independent threads, implements in a distributed and unsynchronized fashion, a stochastic gradient-descent algorithm which is perturbed by a swarming potent…
▽ More
We study a distributed framework for stochastic optimization which is inspired by models of collective motion found in nature (e.g., swarming) with mild communication requirements. Specifically, we analyze a scheme in which each one of $N > 1$ independent threads, implements in a distributed and unsynchronized fashion, a stochastic gradient-descent algorithm which is perturbed by a swarming potential. Assuming the overhead caused by synchronization is not negligible, we show the swarming-based approach exhibits better performance than a centralized algorithm (based upon the average of $N$ observations) in terms of (real-time) convergence speed. We also derive an error bound that is monotone decreasing in network size and connectivity. We characterize the scheme's finite-time performances for both convex and non-convex objective functions.
△ Less
Submitted 6 August, 2018; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Distributed Stochastic Gradient Tracking Methods
Authors:
Shi Pu,
Angelia Nedić
Abstract:
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distri…
▽ More
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size $n$, which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.
△ Less
Submitted 10 March, 2020; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Berry phase of the composite-fermion Fermi Sea: Effect of Landau-level mixing
Authors:
Songyang Pu,
Mikael Fremling,
J. K. Jain
Abstract:
We construct explicit lowest-Landau-level wave functions for the composite-fermion Fermi sea and its low energy excitations following a recently developed approach [Pu, Wu and Jain, Phys. Rev. B 96, 195302 (2018)] and demonstrate them to be very accurate representations of the Coulomb eigenstates. We further ask how the Berry phase associated with a closed loop around the Fermi circle, predicted t…
▽ More
We construct explicit lowest-Landau-level wave functions for the composite-fermion Fermi sea and its low energy excitations following a recently developed approach [Pu, Wu and Jain, Phys. Rev. B 96, 195302 (2018)] and demonstrate them to be very accurate representations of the Coulomb eigenstates. We further ask how the Berry phase associated with a closed loop around the Fermi circle, predicted to be $π$ in a Dirac composite fermion theory satisfying particle-hole symmetry [D. T. Son, Phys. Rev. X 5, 031027 (2015)], is affected by Landau level mixing. For this purpose, we consider a simple model wherein we determine the variational ground state as a function of Landau level mixing within the space spanned by two basis functions: the lowest-Landau-level projected and the unprojected composite-fermion Fermi sea wave functions. We evaluate Berry phase for a path around the Fermi circle within this model following a recent prescription, and find that it rotates rapidly as a function of Landau level mixing. We also consider the effect of a particle-hole symmetry breaking three-body interaction on the Berry phase while confining the Hilbert space to the lowest Landau level. Our study deepens the connection between the $π$ Berry phase and the exact particle-hole symmetry in the lowest Landau level.
△ Less
Submitted 1 October, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
A practical convolutional neural network as loop filter for intra frame
Authors:
Xiaodan Song,
Jiabao Yao,
Lulu Zhou,
Li Wang,
Xiaoyang Wu,
Di Xie,
Shiliang Pu
Abstract:
Loop filters are used in video coding to remove artifacts or improve performance. Recent advances in deploying convolutional neural network (CNN) to replace traditional loop filters show large gains but with problems for practical application. First, different model is used for frames encoded with different quantization parameter (QP), respectively. It is expensive for hardware. Second, float poin…
▽ More
Loop filters are used in video coding to remove artifacts or improve performance. Recent advances in deploying convolutional neural network (CNN) to replace traditional loop filters show large gains but with problems for practical application. First, different model is used for frames encoded with different quantization parameter (QP), respectively. It is expensive for hardware. Second, float points operation in CNN leads to inconsistency between encoding and decoding across different platforms. Third, redundancy within CNN model consumes precious computational resources.
This paper proposes a CNN as the loop filter for intra frames and proposes a scheme to solve the above problems. It aims to design a single CNN model with low redundancy to adapt to decoded frames with different qualities and ensure consistency. To adapt to reconstructions with different qualities, both reconstruction and QP are taken as inputs. After training, the obtained model is compressed to reduce redundancy. To ensure consistency, dynamic fixed points (DFP) are adopted in testing CNN. Parameters in the compressed model are first quantized to DFP and then used for inference of CNN. Outputs of each layer in CNN are computed by DFP operations. Experimental results on JEM 7.0 report 3.14%, 5.21%, 6.28% BD-rate savings for luma and two chroma components with all intra configuration when replacing all traditional filters.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Edit Probability for Scene Text Recognition
Authors:
Fan Bai,
Zhanzhan Cheng,
Yi Niu,
Shiliang Pu,
Shuigeng Zhou
Abstract:
We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or s…
▽ More
We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and degrade the recognition accuracy. To handle this problem, we propose a novel method called edit probability (EP) for scene text recognition. EP tries to effectively estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters. The advantage lies in that the training process can focus on the missing, superfluous and unrecognized characters, and thus the impact of the misalignment problem can be alleviated or even overcome. We conduct extensive experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets. Experimental results show that the EP can substantially boost scene text recognition performance.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation
Authors:
Chao Li,
Qiaoyong Zhong,
Di Xie,
Shiliang Pu
Abstract:
Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature le…
▽ More
Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
A Distributed Stochastic Gradient Tracking Method
Authors:
Shi Pu,
Angelia Nedić
Abstract:
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distri…
▽ More
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method. We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant step size choice). More importantly, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size, which is a comparable performance to a centralized stochastic gradient algorithm. Numerical examples further demonstrate the effectiveness of the method.
△ Less
Submitted 1 August, 2019; v1 submitted 21 March, 2018;
originally announced March 2018.
-
A Push-Pull Gradient Method for Distributed Optimization in Networks
Authors:
Shi Pu,
Wei Shi,
**ming Xu,
Angelia Nedić
Abstract:
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider a new distributed gradient-based method where each node maintains two estimates…
▽ More
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider a new distributed gradient-based method where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents' objective functions. From the viewpoint of an agent, the information about the decision variable is pushed to the neighbors, while the information about the gradients is pulled from the neighbors (hence giving the name "push-pull gradient method"). The method unifies the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the algorithm converges linearly for strongly convex and smooth objective functions over a directed static network. In our numerical test, the algorithm performs well even for time-varying directed networks.
△ Less
Submitted 1 August, 2019; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Abelian and non-Abelian Berry curvatures in lattice QCD
Authors:
Shi Pu,
Arata Yamamoto
Abstract:
We studied the Berry curvature of the massive Dirac fermion in 3+1 dimensions. For the non-interacting Dirac fermion, the Berry curvature is non-Abelian because of the degeneracy of positive and negative helicity modes. We calculated the non-Abelian Berry curvature analytically and numerically. For the interacting Dirac fermion in QCD, the degeneracy is lost because gluons carry helicity and color…
▽ More
We studied the Berry curvature of the massive Dirac fermion in 3+1 dimensions. For the non-interacting Dirac fermion, the Berry curvature is non-Abelian because of the degeneracy of positive and negative helicity modes. We calculated the non-Abelian Berry curvature analytically and numerically. For the interacting Dirac fermion in QCD, the degeneracy is lost because gluons carry helicity and color charge. We calculated the Abelian Berry curvature in lattice QCD.
△ Less
Submitted 5 June, 2018; v1 submitted 6 December, 2017;
originally announced December 2017.
-
AON: Towards Arbitrarily-Oriented Text Recognition
Authors:
Zhanzhan Cheng,
Yangliu Xu,
Fan Bai,
Yi Niu,
Shiliang Pu,
Shuigeng Zhou
Abstract:
Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which h…
▽ More
Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.
△ Less
Submitted 22 March, 2018; v1 submitted 11 November, 2017;
originally announced November 2017.
-
Cascade Region Proposal and Global Context for Deep Object Detection
Authors:
Qiaoyong Zhong,
Chao Li,
Yingying Zhang,
Di Xie,
Shicai Yang,
Shiliang Pu
Abstract:
Deep region-based object detector consists of a region proposal step and a deep object recognition step. In this paper, we make significant improvements on both of the two steps. For region proposal we propose a novel lightweight cascade structure which can effectively improve RPN proposal quality. For object recognition we re-implement global context modeling with a few modications and obtain a p…
▽ More
Deep region-based object detector consists of a region proposal step and a deep object recognition step. In this paper, we make significant improvements on both of the two steps. For region proposal we propose a novel lightweight cascade structure which can effectively improve RPN proposal quality. For object recognition we re-implement global context modeling with a few modications and obtain a performance boost (4.2% mAP gain on the ILSVRC 2016 validation set). Besides, we apply the idea of pre-training extensively and show its importance in both steps. Together with common training and testing tricks, we improve Faster R-CNN baseline by a large margin. In particular, we obtain 87.9% mAP on the PASCAL VOC 2012 test set, 65.3% on the ILSVRC 2016 test set and 36.8% on the COCO test-std set.
△ Less
Submitted 29 October, 2017;
originally announced October 2017.
-
Nonlinear Responses of Chiral Fluids from Kinetic Theory
Authors:
Yoshimasa Hidaka,
Shi Pu,
Di-Lun Yang
Abstract:
The second-order nonlinear responses of inviscid chiral fluids near local equilibrium are investigated by applying the chiral kinetic theory (CKT) incorporating side-jump effects. It is shown that the local equilibrium distribution function can be non-trivially introduced in a co-moving frame with respect to the fluid velocity when the quantum corrections in collisions are involved. For the study…
▽ More
The second-order nonlinear responses of inviscid chiral fluids near local equilibrium are investigated by applying the chiral kinetic theory (CKT) incorporating side-jump effects. It is shown that the local equilibrium distribution function can be non-trivially introduced in a co-moving frame with respect to the fluid velocity when the quantum corrections in collisions are involved. For the study of anomalous transport, contributions from both quantum corrections in anomalous hydrodynamic equations of motion and those from the CKT and Wigner functions are considered under the relaxation-time (RT) approximation, which result in anomalous charge Hall currents propagating along the cross product of the background electric field and the temperature (or chemical-potential) gradient and of the temperature and chemical-potential gradients. On the other hand, the nonlinear quantum correction on the charge density vanishes in the classical RT approximation, which in fact satisfies the matching condition given by the anomalous equation obtained from the CKT.
△ Less
Submitted 29 May, 2018; v1 submitted 30 September, 2017;
originally announced October 2017.