-
Thermal stress around a smooth cavity in a plate subjected to uniform heat flux
Authors:
Zhaohang Lee,
Yu Tang,
Wennan Zou
Abstract:
The two-dimensional thermoelastic problem of an adiabatic cavity in an infinite isotropic homogeneous medium subjected to uniform heat flux is studied, where the shape of the cavity is characterized by the Laurent polynomial. By virtue of a novel tactics, the obtained K-M potentials can be explicitly worked out to satisfy the boundary conditions precisely, and the possible translation of the cavit…
▽ More
The two-dimensional thermoelastic problem of an adiabatic cavity in an infinite isotropic homogeneous medium subjected to uniform heat flux is studied, where the shape of the cavity is characterized by the Laurent polynomial. By virtue of a novel tactics, the obtained K-M potentials can be explicitly worked out to satisfy the boundary conditions precisely, and the possible translation of the cavity is also available. The new and explicit analytical solutions are compared with the those reported in literature and some serious problems are found and corrected. Finally, some discussions on the thermal stress concentration around the tips of three typical cavities are provided.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization
Authors:
Dongsong Yu,
Guangliang Yang,
Guozhu Meng,
Xiaorui Gong,
Xiu Zhang,
Xiaobo Xiang,
Xiaoyu Wang,
Yue Jiang,
Kai Chen,
Wei Zou,
Wenke Lee,
Wenchang Shi
Abstract:
To investigate the status quo of SEAndroid policy customization, we propose SEPAL, a universal tool to automatically retrieve and examine the customized policy rules. SEPAL applies the NLP technique and employs and trains a wide&deep model to quickly and precisely predict whether one rule is unregulated or not.Our evaluation shows SEPAL is effective, practical and scalable. We verify SEPAL outperf…
▽ More
To investigate the status quo of SEAndroid policy customization, we propose SEPAL, a universal tool to automatically retrieve and examine the customized policy rules. SEPAL applies the NLP technique and employs and trains a wide&deep model to quickly and precisely predict whether one rule is unregulated or not.Our evaluation shows SEPAL is effective, practical and scalable. We verify SEPAL outperforms the state of the art approach (i.e., EASEAndroid) by 15% accuracy rate on average. In our experiments, SEPAL successfully identifies 7,111 unregulated policy rules with a low false positive rate from 595,236 customized rules (extracted from 774 Android firmware images of 72 manufacturers). We further discover the policy customization problem is getting worse in newer Android versions (e.g., around 8% for Android 7 and nearly 20% for Android 9), even though more and more efforts are made. Then, we conduct a deep study and discuss why the unregulated rules are introduced and how they can compromise user devices. Last, we report some unregulated rules to seven vendors and so far four of them confirm our findings.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Ensemble perspective for understanding temporal credit assignment
Authors:
Wenxuan Zou,
Chan Li,
Hai** Huang
Abstract:
Recurrent neural networks are widely used for modeling spatio-temporal sequences in both nature language processing and neural population dynamics. However, understanding the temporal credit assignment is hard. Here, we propose that each individual connection in the recurrent computation is modeled by a spike and slab distribution, rather than a precise weight value. We then derive the mean-field…
▽ More
Recurrent neural networks are widely used for modeling spatio-temporal sequences in both nature language processing and neural population dynamics. However, understanding the temporal credit assignment is hard. Here, we propose that each individual connection in the recurrent computation is modeled by a spike and slab distribution, rather than a precise weight value. We then derive the mean-field algorithm to train the network at the ensemble level. The method is then applied to classify handwritten digits when pixels are read in sequence, and to the multisensory integration task that is a fundamental cognitive function of animals. Our model reveals important connections that determine the overall performance of the network. The model also shows how spatio-temporal information is processed through the hyperparameters of the distribution, and moreover reveals distinct types of emergent neural selectivity. To provide a mechanistic analysis of the ensemble learning, we first derive an analytic solution of the learning at the infinitely-large-network limit. We then carry out a low-dimensional projection of both neural and synaptic dynamics, analyze symmetry breaking in the parameter space, and finally demonstrate the role of stochastic plasticity in the recurrent computation. Therefore, our study sheds light on mechanisms of how weight uncertainty impacts the temporal credit assignment in recurrent neural networks from the ensemble perspective.
△ Less
Submitted 7 March, 2022; v1 submitted 7 February, 2021;
originally announced February 2021.
-
DEAL: Decremental Energy-Aware Learning in a Federated System
Authors:
Wenting Zou,
Li Li,
Zichen Xu,
Chengzhong Xu
Abstract:
Federated learning struggles with their heavy energy footprint on battery-powered devices. The learning process keeps all devices awake while draining expensive battery power to train a shared model collaboratively, yet it may still leak sensitive personal information. Traditional energy management techniques in system kernel mode can force the training device entering low power states, but it may…
▽ More
Federated learning struggles with their heavy energy footprint on battery-powered devices. The learning process keeps all devices awake while draining expensive battery power to train a shared model collaboratively, yet it may still leak sensitive personal information. Traditional energy management techniques in system kernel mode can force the training device entering low power states, but it may violate the SLO of the collaborative learning. To address the conflict between learning SLO and energy efficiency, we propose DEAL, an energy efficient learning system that saves energy and preserves privacy with a decremental learning design. DEAL reduces the energy footprint from two layers: 1) an optimization layer that selects a subset of workers with sufficient capacity and maximum rewards. 2) a specified decremental learning algorithm that actively provides a decremental and incremental update functions, which allows kernel to correctly tune the local DVFS. We prototyped DEAL in containerized services with modern smartphone profiles and evaluated it with several learning benchmarks with realistic traces. We observed that DEAL achieves 75.6%-82.4% less energy footprint in different datasets, compared to the traditional methods. All learning processes are faster than state-of-the-practice FL frameworks up to 2-4X in model convergence.
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
Quasilinear Schrödinger equations: ground state and infinitely many normalized solutions
Authors:
Houwang Li,
Wenming Zou
Abstract:
In the present paper, we study the normalized solutions for the following quasilinear Schrödinger equations:
$$-Δu-uΔu^2+λu=|u|^{p-2}u \quad \text{in}~\mathbb R^N,$$ with prescribed mass
$$\int_{\mathbb R^N} u^2=a^2.$$ We first consider the mass-supercritical case $p>4+\frac{4}{N}$, which has not been studied before. By using a perturbation method, we succeed to prove the existence of ground s…
▽ More
In the present paper, we study the normalized solutions for the following quasilinear Schrödinger equations:
$$-Δu-uΔu^2+λu=|u|^{p-2}u \quad \text{in}~\mathbb R^N,$$ with prescribed mass
$$\int_{\mathbb R^N} u^2=a^2.$$ We first consider the mass-supercritical case $p>4+\frac{4}{N}$, which has not been studied before. By using a perturbation method, we succeed to prove the existence of ground state normalized solutions, and by applying the index theory, we obtain the existence of infinitely many normalized solutions. Then we turn to study the mass-critical case, i.e., $p=4+\frac{4}{N}$, and obtain some new existence results. Moreover, we also observe a concentration behavior of the ground state solutions.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Temporal Pyramid Network for Pedestrian Trajectory Prediction with Multi-Supervision
Authors:
Rongqin Liang,
Yuanman Li,
Xia Li,
yi tang,
Jiantao Zhou,
Wenbin Zou
Abstract:
Predicting human motion behavior in a crowd is important for many applications, ranging from the natural navigation of autonomous vehicles to intelligent security systems of video surveillance. All the previous works model and predict the trajectory with a single resolution, which is rather inefficient and difficult to simultaneously exploit the long-range information (e.g., the destination of the…
▽ More
Predicting human motion behavior in a crowd is important for many applications, ranging from the natural navigation of autonomous vehicles to intelligent security systems of video surveillance. All the previous works model and predict the trajectory with a single resolution, which is rather inefficient and difficult to simultaneously exploit the long-range information (e.g., the destination of the trajectory), and the short-range information (e.g., the walking direction and speed at a certain time) of the motion behavior. In this paper, we propose a temporal pyramid network for pedestrian trajectory prediction through a squeeze modulation and a dilation modulation. Our hierarchical framework builds a feature pyramid with increasingly richer temporal information from top to bottom, which can better capture the motion behavior at various tempos. Furthermore, we propose a coarse-to-fine fusion strategy with multi-supervision. By progressively merging the top coarse features of global context to the bottom fine features of rich local context, our method can fully exploit both the long-range and short-range information of the trajectory. Experimental results on several benchmarks demonstrate the superiority of our method.
△ Less
Submitted 3 December, 2020; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Authors:
Dongwei Jiang,
Wubo Li,
Miao Cao,
Wei Zou,
Xiangang Li
Abstract:
Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for speech and visual tasks are both continuous, so it is natural to consider applying similar objective on speech representation learning. In this paper, we propo…
▽ More
Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for speech and visual tasks are both continuous, so it is natural to consider applying similar objective on speech representation learning. In this paper, we propose Speech SimCLR, a new self-supervised objective for speech representation learning. During training, Speech SimCLR applies augmentation on raw speech and its spectrogram. Its objective is the combination of contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstruction loss of input representation. The proposed method achieved competitive results on speech emotion recognition and speech recognition.
△ Less
Submitted 4 July, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Authors:
Wubo Li,
Dongwei Jiang,
Wei Zou,
Xiangang Li
Abstract:
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video. The previous state-of-the-art model shows superior performance for this task using Transformer-based architecture. However, there remain some limitations in learning better representation of modalities. Inspired by Neural Machine Translation (NMT), we propose the Transformer-based Modal Tran…
▽ More
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video. The previous state-of-the-art model shows superior performance for this task using Transformer-based architecture. However, there remain some limitations in learning better representation of modalities. Inspired by Neural Machine Translation (NMT), we propose the Transformer-based Modal Translator (TMT) to learn the representations of the source modal sequence by translating the source modal sequence to the related target modal sequence in a supervised manner. Based on Multimodal Transformer Networks (MTN), we apply TMT to video and dialog, proposing MTN-TMT for the video-grounded dialog system. On the AVSD track of the Dialog System Technology Challenge 7, MTN-TMT outperforms the MTN and other submission models in both Video and Text task and Text Only task. Compared with MTN, MTN-TMT improves all metrics, especially, achieving relative improvement up to 14.1% on CIDEr. Index Terms: multimodal learning, audio-visual scene-aware dialog, neural machine translation, multi-task learning
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Fast Video Salient Object Detection via Spatiotemporal Knowledge Distillation
Authors:
Yi Tang,
Yuanman Li,
Wenbin Zou
Abstract:
Since the wide employment of deep learning frameworks in video salient object detection, the accuracy of the recent approaches has made stunning progress. These approaches mainly adopt the sequential modules, based on optical flow or recurrent neural network (RNN), to learn robust spatiotemporal features. These modules are effective but significantly increase the computational burden of the corres…
▽ More
Since the wide employment of deep learning frameworks in video salient object detection, the accuracy of the recent approaches has made stunning progress. These approaches mainly adopt the sequential modules, based on optical flow or recurrent neural network (RNN), to learn robust spatiotemporal features. These modules are effective but significantly increase the computational burden of the corresponding deep models. In this paper, to simplify the network and maintain the accuracy, we present a lightweight network tailored for video salient object detection through the spatiotemporal knowledge distillation. Specifically, in the spatial aspect, we combine a saliency guidance feature embedding structure and spatial knowledge distillation to refine the spatial features. In the temporal aspect, we propose a temporal knowledge distillation strategy, which allows the network to learn the robust temporal features through the infer-frame feature encoding and distilling information from adjacent frames. The experiments on widely used video datasets (e.g., DAVIS, DAVSOD, SegTrack-V2) prove that our approach achieves competitive performance. Furthermore, without the employment of the complex sequential modules, the proposed network can obtain high efficiency with 0.01s per frame.
△ Less
Submitted 17 March, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
DiDiSpeech: A Large Scale Mandarin Speech Corpus
Authors:
Tingwei Guo,
Cheng Wen,
Dongwei Jiang,
Ne Luo,
Ruixiong Zhang,
Shuaijiang Zhao,
Wubo Li,
Cheng Gong,
Wei Zou,
Kun Han,
Xiangang Li
Abstract:
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quiet environment and is suitable for various speech processing tasks, such as voice conversion, multi-speaker text-to-speech and automatic speech recogni…
▽ More
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quiet environment and is suitable for various speech processing tasks, such as voice conversion, multi-speaker text-to-speech and automatic speech recognition. We conduct experiments with multiple speech tasks and evaluate the performance, showing that it is promising to use the corpus for both academic research and practical application. The corpus is available at https://outreach.didichuxing.com/research/opendata/.
△ Less
Submitted 8 February, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
No vortex in straight flows -- on the eigen-representations of velocity gradient
Authors:
Xiangyang Xu,
Zhiwen Xu,
Changxin Tang,
Xiaohang Zhang,
Wennan Zou
Abstract:
Velocity gradient is the basis of many vortex recognition methods, such as Q criterion, $Δ$ criterion, $λ_{2}$ criterion, $λ_{ci}$ criterion and $Ω$ criterion, etc.. Except the $λ_{ci}$ criterion, all these criterions recognize vortices by designing various invariants, based on the Helmholtz decomposition that decomposes velocity gradient into strain rate and spin. In recent years, the intuition o…
▽ More
Velocity gradient is the basis of many vortex recognition methods, such as Q criterion, $Δ$ criterion, $λ_{2}$ criterion, $λ_{ci}$ criterion and $Ω$ criterion, etc.. Except the $λ_{ci}$ criterion, all these criterions recognize vortices by designing various invariants, based on the Helmholtz decomposition that decomposes velocity gradient into strain rate and spin. In recent years, the intuition of 'no vortex in straight flows' has promoted people to analyze the vortex state directly from the velocity gradient, in which vortex can be distinguished from the situation that the velocity gradient has couple complex eigenvalues. A specious viewpoint to adopt the simple shear as an independent flow mode was emphasized by many authors, among them, Kolar proposed the triple decomposition of motion by extracting a so-called effective pure shearing motion; Li et al. introduced the so-called quaternion decomposition of velocity gradient and proposed the concept of eigen rotation; Liu et al. further mined the characteristic information of velocity gradient and put forward an effective algorithm of Liutex, and then developed the vortex recognition method. However, there is another explanation for the increasingly clear representation of velocity gradient, that is the local streamline pattern based on critical-point theory. In this paper, the tensorial expressions of the right/left real Schur forms of velocity gradient are clarified from the characteristic problem of velocity gradient. The relations between the involved parameters are derived and numerically verified. Comparing with the geometrical features of local streamline pattern, we confirm that the parameters in the right eigen-representation based on the right real Schur form of velocity gradient have good meanings to reveal the local streamline pattern. Some illustrative examples from the DNS data are presented.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Variable Stiffness Control with Strict Frequency Domain Constraints for Physical Human-Robot Interaction
Authors:
Wulin Zou,
Pu Duan,
Yawen Chen,
Ningbo Yu,
Ling Shi
Abstract:
Variable impedance control is advantageous for physical human-robot interaction to improve safety, adaptability and many other aspects. This paper presents a gain-scheduled variable stiffness control approach under strict frequency-domain constraints. Firstly, to reduce conservativeness, we characterize and constrain the impedance rendering, actuator saturation, disturbance/noise rejection and pas…
▽ More
Variable impedance control is advantageous for physical human-robot interaction to improve safety, adaptability and many other aspects. This paper presents a gain-scheduled variable stiffness control approach under strict frequency-domain constraints. Firstly, to reduce conservativeness, we characterize and constrain the impedance rendering, actuator saturation, disturbance/noise rejection and passivity requirements into their specific frequency bands. This relaxation makes sense because of the restricted frequency properties of the interactive robots. Secondly, a gain-scheduled method is taken to regulate the controller gains with respect to the desired stiffness. Thirdly, the scheduling function is parameterized via a nonsmooth optimization method. Finally, the proposed approach is validated by simulations, experiments and comparisons with a gain-fixed passivity-based PID method.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration
Authors:
Mengzuo Huang,
Feng Li,
Wuhe Zou,
Weidong Zhang
Abstract:
Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration which has brought general improvement over multi-turn dialogue systems in rece…
▽ More
Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration which has brought general improvement over multi-turn dialogue systems in recent studies. Meanwhile, jointly inspired by the autoregression for text generation and the sequence labeling for text editing, we propose a novel semi autoregressive generator (SARG) with the high efficiency and flexibility. Moreover, experiments on two benchmarks show that our proposed model significantly outperforms the state-of-the-art models in terms of quality and inference speed.
△ Less
Submitted 20 December, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Transformer based unsupervised pre-training for acoustic representation learning
Authors:
Ruixiong Zhang,
Haiwei Wu,
Wubo Li,
Dongwei Jiang,
Wei Zou,
Xiangang Li
Abstract:
Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a general and robust high-level representation for all acoustic tasks. Experiments have been conducted on three kinds of acoustic tasks: speech emotion re…
▽ More
Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a general and robust high-level representation for all acoustic tasks. Experiments have been conducted on three kinds of acoustic tasks: speech emotion recognition, sound event detection and speech translation. All the experiments have shown that pre-training using its own training data can significantly improve the performance. With a larger pre-training data combining MuST-C, Librispeech and ESC-US datasets, for speech emotion recognition, the UAR can further improve absolutely 4.3% on IEMOCAP dataset. For sound event detection, the F1 score can further improve absolutely 1.5% on DCASE2018 task5 development set and 2.1% on evaluation set. For speech translation, the BLEU score can further improve relatively 12.2% on En-De dataset and 8.4% on En-Fr dataset.
△ Less
Submitted 8 February, 2021; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Data-driven effective model shows a liquid-like deep learning
Authors:
Wenxuan Zou,
Hai** Huang
Abstract:
The geometric structure of an optimization landscape is argued to be fundamentally important to support the success of deep neural network learning. A direct computation of the landscape beyond two layers is hard. Therefore, to capture the global view of the landscape, an interpretable model of the network-parameter (or weight) space must be established. However, the model is lacking so far. Furth…
▽ More
The geometric structure of an optimization landscape is argued to be fundamentally important to support the success of deep neural network learning. A direct computation of the landscape beyond two layers is hard. Therefore, to capture the global view of the landscape, an interpretable model of the network-parameter (or weight) space must be established. However, the model is lacking so far. Furthermore, it remains unknown what the landscape looks like for deep networks of binary synapses, which plays a key role in robust and energy efficient neuromorphic computation. Here, we propose a statistical mechanics framework by directly building a least structured model of the high-dimensional weight space, considering realistic structured data, stochastic gradient descent training, and the computational depth of neural networks. We also consider whether the number of network parameters outnumbers the number of supplied training data, namely, over- or under-parametrization. Our least structured model reveals that the weight spaces of the under-parametrization and over-parameterization cases belong to the same class, in the sense that these weight spaces are well-connected without any hierarchical clustering structure. In contrast, the shallow-network has a broken weight space, characterized by a discontinuous phase transition, thereby clarifying the benefit of depth in deep learning from the angle of high dimensional geometry. Our effective model also reveals that inside a deep network, there exists a liquid-like central part of the architecture in the sense that the weights in this part behave as randomly as possible, providing algorithmic implications. Our data-driven model thus provides a statistical mechanics insight about why deep learning is unreasonably effective in terms of the high-dimensional weight space, and how deep networks are different from shallow ones.
△ Less
Submitted 28 July, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Normalized ground states for semilinear elliptic systems with critical and subcritical nonlinearities
Authors:
Houwang Li,
Wenming Zou
Abstract:
In the present paper, we study the normalized solutions with least energy to the following system: $$\begin{cases} -Δu+λ_1u=μ_1 |u|^{p-2}u+βr_1|u|^{r_1-2}|v|^{r_2}u\quad &\hbox{in}\;\mathbb R^N,\\ -Δv+λ_2v=μ_2 |v|^{q-2}v+βr_2|u|^{r_1}|v|^{r_2-2}v\quad&\hbox{in}\;\mathbb R^N,\\ \int_{\mathbb R^N}u^2=a_1^2\quad\hbox{and}\;\int_{\mathbb R^N}v^2=a_2^2, \end{cases}$$ where $p,q,r_1+r_2$ can be Sobolev…
▽ More
In the present paper, we study the normalized solutions with least energy to the following system: $$\begin{cases} -Δu+λ_1u=μ_1 |u|^{p-2}u+βr_1|u|^{r_1-2}|v|^{r_2}u\quad &\hbox{in}\;\mathbb R^N,\\ -Δv+λ_2v=μ_2 |v|^{q-2}v+βr_2|u|^{r_1}|v|^{r_2-2}v\quad&\hbox{in}\;\mathbb R^N,\\ \int_{\mathbb R^N}u^2=a_1^2\quad\hbox{and}\;\int_{\mathbb R^N}v^2=a_2^2, \end{cases}$$ where $p,q,r_1+r_2$ can be Sobolev critical. To this purpose, we study the geometry of the Pohozaev manifold and the associated minimizition problem. Under some assumption on $a_1,a_2$ and $β$, we obtain the existence of the positive normalized ground state solution to the above system. We have solved some unsolved open problems in this area.
△ Less
Submitted 8 January, 2021; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Hardware-irrelevant parallel processing system
Authors:
Xiuting Zou,
Shaofu Xu,
Anyi Deng,
Rui Wang,
Weiwen Zou
Abstract:
Parallel processing technology has been a primary tool for achieving high-speed, high-accuracy, and broadband processing for many years across modern information systems and data processing such as optical and radar, synthetic aperture radar imaging, digital beam forming, and digital filtering systems. However, hardware deviations in a parallel processing system (PPS) severely degrade system perfo…
▽ More
Parallel processing technology has been a primary tool for achieving high-speed, high-accuracy, and broadband processing for many years across modern information systems and data processing such as optical and radar, synthetic aperture radar imaging, digital beam forming, and digital filtering systems. However, hardware deviations in a parallel processing system (PPS) severely degrade system performance and pose an urgent challenge. We propose a hardware-irrelevant PPS of which the performance is unaffected by hardware deviations. In this system, an embedded convolutional recurrent autoencoder (CRAE), which learns inherent system patterns as well as acquires and removes adverse effects brought by hardware deviations, is adopted. We implement a hardware-irrelevant PPS into a parallel photonic sampling system to accomplish a high-performance analog-to-digital conversion for microwave signals with high frequency and broad bandwidth. Under one system state, a category of signals with two different mismatch degrees is utilized to train the CRAE, which can then compensate for mismatches in various categories of signals with multiple mismatch degrees under random system states. Our approach is extensively applicable to achieving hardware-irrelevant PPSs which are either discrete or integrated in photonic, electric, and other fields.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles
Authors:
Zhennan Wang,
Canqun Xiang,
Wenbin Zou,
Chen Xu
Abstract:
The strong correlation between neurons or filters can significantly weaken the generalization ability of neural networks. Inspired by the well-known Tammes problem, we propose a novel diversity regularization method to address this issue, which makes the normalized weight vectors of neurons or filters distributed on a hypersphere as uniformly as possible, through maximizing the minimal pairwise an…
▽ More
The strong correlation between neurons or filters can significantly weaken the generalization ability of neural networks. Inspired by the well-known Tammes problem, we propose a novel diversity regularization method to address this issue, which makes the normalized weight vectors of neurons or filters distributed on a hypersphere as uniformly as possible, through maximizing the minimal pairwise angles (MMA). This method can easily exert its effect by plugging the MMA regularization term into the loss function with negligible computational overhead. The MMA regularization is simple, efficient, and effective. Therefore, it can be used as a basic regularization method in neural network training. Extensive experiments demonstrate that MMA regularization is able to enhance the generalization ability of various modern models and achieves considerable performance improvements on CIFAR100 and TinyImageNet datasets. In addition, experiments on face verification show that MMA regularization is also effective for feature learning. Code is available at: https://github.com/wznpub/MMA_Regularization.
△ Less
Submitted 23 March, 2021; v1 submitted 6 June, 2020;
originally announced June 2020.
-
A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition
Authors:
Dongwei Jiang,
Wubo Li,
Ruixiong Zhang,
Miao Cao,
Ne Luo,
Yang Han,
Wei Zou,
Xiangang Li
Abstract:
Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and Transformer backbone. Howev…
▽ More
Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and Transformer backbone. However, many aspects of MPC have not been fully investigated. In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks. Experiments reveled that pre-training data with matching speaking style is more useful on downstream recognition tasks. A unified training objective with APC and MPC provided 8.46% relative error reduction on streaming model trained on HKUST. Also, the combination of target data adaption and layer-wise discriminative training helped the knowledge transfer of MPC, which achieved 3.99% relative error reduction on AISHELL over a strong baseline.
△ Less
Submitted 22 June, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Generalized bioinspired approach to a daytime radiative cooling "skin"
Authors:
Meng Yang,
Weizhi Zou,
**g Guo,
Zhenchao Qian,
Heng Luo,
Shijia Yang,
Ning Zhao,
Lorenzo Pattelli,
Jian Xu,
Diederik S. Wiersma
Abstract:
Energy-saving cooling materials with strong operability are desirable towards sustainable thermal management. Inspired by the cooperative thermo-optical effect in fur of polar bear, we develop a flexible and reusable cooling skin via laminating a polydimethylsiloxane film with a highly-scattering polyethylene aerogel. Owing to its high porosity of 97.9% and tailored pore size of 3.8 +- 1.4 microme…
▽ More
Energy-saving cooling materials with strong operability are desirable towards sustainable thermal management. Inspired by the cooperative thermo-optical effect in fur of polar bear, we develop a flexible and reusable cooling skin via laminating a polydimethylsiloxane film with a highly-scattering polyethylene aerogel. Owing to its high porosity of 97.9% and tailored pore size of 3.8 +- 1.4 micrometers, superior solar reflectance of 0.96 and high transparency to irradiated thermal energy of 0.8 can be achieved at a thickness of 2.7 mm. Combined with low thermal conductivity of 0.032 W/m/K of the aerogel, the cooling skin exerts midday sub-ambient temperature drops of 5-6 degrees in a metropolitan environment, with an estimated limit of 14 degrees under ideal service conditions. We envision that this generalized bilayer approach will construct a bridge from night-time to daytime radiative cooling and pave the way for economical, scalable, flexible and reusable cooling materials.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Learning Continuous Treatment Policy and Bipartite Embeddings for Matching with Heterogeneous Causal Effects
Authors:
Will Y. Zou,
Smitha Shyam,
Michael Mui,
Mingshi Wang,
Jan Pedersen,
Zoubin Ghahramani
Abstract:
Causal inference methods are widely applied in the fields of medicine, policy, and economics. Central to these applications is the estimation of treatment effects to make decisions. Current methods make binary yes-or-no decisions based on the treatment effect of a single outcome dimension. These methods are unable to capture continuous space treatment policies with a measure of intensity. They als…
▽ More
Causal inference methods are widely applied in the fields of medicine, policy, and economics. Central to these applications is the estimation of treatment effects to make decisions. Current methods make binary yes-or-no decisions based on the treatment effect of a single outcome dimension. These methods are unable to capture continuous space treatment policies with a measure of intensity. They also lack the capacity to consider the complexity of treatment such as matching candidate treatments with the subject. We propose to formulate the effectiveness of treatment as a parametrizable model, expanding to a multitude of treatment intensities and complexities through the continuous policy treatment function, and the likelihood of matching. Our proposal to decompose treatment effect functions into effectiveness factors presents a framework to model a rich space of actions using causal inference. We utilize deep learning to optimize the desired holistic metric space instead of predicting single-dimensional treatment counterfactual. This approach employs a population-wide effectiveness measure and significantly improves the overall effectiveness of the model. The performance of our algorithms is. demonstrated with experiments. When using generic continuous space treatments and matching architecture, we observe a 41% improvement upon prior art with cost-effectiveness and 68% improvement upon a similar method in the average treatment effect. The algorithms capture subtle variations in treatment space, structures the efficient optimizations techniques, and opens up the arena for many applications.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Heterogeneous Causal Learning for Effectiveness Optimization in User Marketing
Authors:
Will Y. Zou,
Shuyang Du,
James Lee,
Jan Pedersen
Abstract:
User marketing is a key focus of consumer-based internet companies. Learning algorithms are effective to optimize marketing campaigns which increase user engagement, and facilitates cross-marketing to related products. By attracting users with rewards, marketing methods are effective to boost user activity in the desired products. Rewards incur significant cost that can be off-set by increase in f…
▽ More
User marketing is a key focus of consumer-based internet companies. Learning algorithms are effective to optimize marketing campaigns which increase user engagement, and facilitates cross-marketing to related products. By attracting users with rewards, marketing methods are effective to boost user activity in the desired products. Rewards incur significant cost that can be off-set by increase in future revenue. Most methodologies rely on churn predictions to prevent losing users to make marketing decisions, which cannot capture up-lift across counterfactual outcomes with business metrics. Other predictive models are capable of estimating heterogeneous treatment effects, but fail to capture the balance of cost versus benefit. We propose a treatment effect optimization methodology for user marketing. This algorithm learns from past experiments and utilizes novel optimization methods to optimize cost efficiency with respect to user selection. The method optimizes decisions using deep learning optimization models to treat and reward users, which is effective in producing cost-effective, impactful marketing campaigns. Our methodology demonstrates superior algorithmic flexibility with integration with deep learning methods and dealing with business constraints. The effectiveness of our model surpasses the quasi-oracle estimation (R-learner) model and causal forests. We also established evaluation metrics that reflect the cost-efficiency and real-world business value. Our proposed constrained and direct optimization algorithms outperform by 24.6% compared with the best performing method in prior art and baseline methods. The methodology is useful in many product scenarios such as optimal treatment allocation and it has been deployed in production world-wide.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Tunable electronic structure and stoichiometry dependent disorder in Nanostructured VO$_x$ films
Authors:
A. Delia,
S. J. Rezvani,
N. Zema,
F. Zuccaro,
M. Fanetti,
Blaz Belec,
B. W. Li,
C. W. Zou,
C. Spezzani,
M. Sacchi,
A. Marcelli,
M. Coreno
Abstract:
We present and discuss an original method to synthesize disordered Nanostructured (NS) VO$_x$ films with controlled stoichiometry and tunable electronic structures. In these NS films, the original lattice symmetry of the bulk vanadium oxides is broken and atoms are arranged in a highly disordered structure . The stoichiometry-dependent disorder as a function of the oxygen concentration has been ch…
▽ More
We present and discuss an original method to synthesize disordered Nanostructured (NS) VO$_x$ films with controlled stoichiometry and tunable electronic structures. In these NS films, the original lattice symmetry of the bulk vanadium oxides is broken and atoms are arranged in a highly disordered structure . The stoichiometry-dependent disorder as a function of the oxygen concentration has been characterized by in-situ X-ray Absorption Near-Edge Structure (XANES) spectroscopy identifying the spectroscopic fingerprints. Results show structural rearrangements that deviate from the octahedral symmetry with different coexisting disordered phases. The modulation of the electronic structure of the NS films based on the resulted stoichiometry and the quantum confinement in the NS particles are also discussed. We demonstrate the possibility to modulate the electronic structure of VO$_x$ NS films accessing new disordered atomic configurations with a controlled stoichiometry that provides an extraordinary opportunity to match a wide number of technological applications.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Strain induced orbital dynamics across the Metal Insulator transition in thin VO2/TiO2(001) films
Authors:
A. D'Elia,
S. J. Rezvani,
A. Cossaro,
M. Stredansky,
C. Grazioli,
B. W. Li,
C. W. Zou,
M. Coreno,
A. Marcelli
Abstract:
VO2 is a strongly correlated material, which undergoes a reversible metal insulator transition (MIT) coupled to a structural phase transition upon heating (T= 67° C). Since its discovery the nature of the insulating state has long been debated and different solid-state mechanisms have been proposed to explain its nature: Mott-Hubbard correlation, Peierls distortion or a combination of both. Moreov…
▽ More
VO2 is a strongly correlated material, which undergoes a reversible metal insulator transition (MIT) coupled to a structural phase transition upon heating (T= 67° C). Since its discovery the nature of the insulating state has long been debated and different solid-state mechanisms have been proposed to explain its nature: Mott-Hubbard correlation, Peierls distortion or a combination of both. Moreover, still now there is a lack of consensus on the interplay between the different degrees of freedom: charge, lattice, orbital and how they contribute to the MIT. In this manuscript we will investigate across the MIT the orbital evolution induced by a tensile strain applied to thin VO2 films. The strained films allowed to study the interplay between orbital and lattice degrees of freedom and to clarify MIT properties.
△ Less
Submitted 12 January, 2020;
originally announced January 2020.
-
SAIS: Single-stage Anchor-free Instance Segmentation
Authors:
Canqun Xiang,
Shishun Tian,
Wenbin Zou,
Chen Xu
Abstract:
In this paper, we propose a simple yet efficientinstance segmentation approach based on the single-stage anchor-free detector, termed SAIS. In our approach, the instancesegmentation task consists of two parallel subtasks which re-spectively predict the mask coefficients and the mask prototypes.Then, instance masks are generated by linearly combining theprototypes with the mask coefficients. To enh…
▽ More
In this paper, we propose a simple yet efficientinstance segmentation approach based on the single-stage anchor-free detector, termed SAIS. In our approach, the instancesegmentation task consists of two parallel subtasks which re-spectively predict the mask coefficients and the mask prototypes.Then, instance masks are generated by linearly combining theprototypes with the mask coefficients. To enhance the quality ofinstance mask, the information from regression and classificationis fused to predict the mask coefficients. In addition, center-aware target is designed to preserve the center coordination ofeach instance, which achieves a stable improvement in instancesegmentation. The experiment on MS COCO shows that SAISachieves the performance of the exiting state-of-the-art single-stage methods with a much less memory footpr
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Quality Assessment of DIBR-synthesized views: An Overview
Authors:
Shishun Tian,
Lu Zhang,
Wenbin Zou,
Xia Li,
Ting Su,
Luce Morin,
Olivier Deforges
Abstract:
The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, b…
▽ More
The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, but there {is a lack of} detailed survey in {the} literature. In this paper, we provide a comprehensive survey on various current approaches for DIBR-synthesized views. The current accessible datasets of DIBR-synthesized views are firstly reviewed{, followed} by a summary analysis of the representative state-of-the-art objective metrics. Then, the performances of different objective metrics are evaluated and discussed on all available datasets. Finally, we discuss the potential challenges and suggest possible directions for future research.
△ Less
Submitted 27 April, 2021; v1 submitted 16 November, 2019;
originally announced November 2019.
-
TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
Authors:
Wubo Li,
Wei Zou,
Xiangang Li
Abstract:
Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning method.…
▽ More
Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning method. Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality. The proposed method reports new state-of-the-art performance on video-grounded dialogue which indicates representations learned by TCT are more semantics compared to directly use unimodality.
△ Less
Submitted 23 October, 2019;
originally announced November 2019.
-
A Reinforced Generation of Adversarial Examples for Neural Machine Translation
Authors:
Wei Zou,
Shujian Huang,
Jun Xie,
Xinyu Dai,
Jiajun Chen
Abstract:
Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance. Instead of collecting and analyzing bad cases using limited handcrafted error features, here we investigate this issue by generating…
▽ More
Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance. Instead of collecting and analyzing bad cases using limited handcrafted error features, here we investigate this issue by generating adversarial examples via a new paradigm based on reinforcement learning. Our paradigm could expose pitfalls for a given performance metric, e.g., BLEU, and could target any given neural machine translation architecture. We conduct experiments of adversarial attacks on two mainstream neural machine translation architectures, RNN-search, and Transformer. The results show that our method efficiently produces stable attacks with meaning-preserving adversarial examples. We also present a qualitative and quantitative analysis for the preference pattern of the attack, demonstrating its capability of pitfall exposure.
△ Less
Submitted 26 May, 2020; v1 submitted 9 November, 2019;
originally announced November 2019.
-
Performance evaluation of an integrated photonic convolutional neural network based on delay buffering and wavelength division multiplexing
Authors:
Shaofu Xu,
**g Wang,
Weiwen Zou
Abstract:
Photonic technologies have shown a promising way to build high-speed and high-energy-efficiency neural network accelerators. In previously presented photonic neural networks, architectures are mainly designed for fully-connected layers. When convolutional layers are executed in such neural networks, the large-scale electrooptic modulation array heavily increases the energy dissipation on chip. To…
▽ More
Photonic technologies have shown a promising way to build high-speed and high-energy-efficiency neural network accelerators. In previously presented photonic neural networks, architectures are mainly designed for fully-connected layers. When convolutional layers are executed in such neural networks, the large-scale electrooptic modulation array heavily increases the energy dissipation on chip. To increase the energy efficiency, here we show an integrated photonic architecture specifically for convolutional layer calculations. Optical delay lines replace electronics to execute data manipulations on optical chip, reducing the scale of electro-optic modulation array. Consequently, the energy dissipation of these parts is mitigated. Powered by wavelength division multiplexing, the footprint of delay lines is significantly reduced compared with previous art, thus being practical to fabricate. We evaluate the potential performance of the proposed architecture with respect to component flaws in practical fabrications. According to the results, with well-controlled system insertion loss, energy efficiency of the proposed architecture would surpass previously presented works and the state-of-art electronic processors. We anticipate the proposed architecture is beneficial for future fast and energy-efficient convolutional neural network accelerators.
△ Less
Submitted 28 February, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Cross-task pre-training for on-device acoustic scene classification
Authors:
Ruixiong Zhang,
Wei Zou,
Xiangang Li
Abstract:
Acoustic scene classification (ASC) and acoustic event detection (AED) are different but related tasks. Acoustic events can provide useful information for recognizing acoustic scenes. However, most of the datasets are provided without either the acoustic event or scene labels. To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training…
▽ More
Acoustic scene classification (ASC) and acoustic event detection (AED) are different but related tasks. Acoustic events can provide useful information for recognizing acoustic scenes. However, most of the datasets are provided without either the acoustic event or scene labels. To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks. On the other hand, most of the models were designed and implemented on platforms with rich computing resources, and the on-device applications were limited. To solve this problem, we use model distillation method to compress our cross-task model to enable on-device acoustic scene classification. In this paper, the cross-task models and their student model were trained and evaluated on two datasets: TAU Urban Acoustic Scenes 2019 dataset and TUT Acoustic Scenes 2017 dataset. Results have shown that cross-task pre-training mechanism can significantly improve the performance of ASC tasks. The performance of our best model improved relatively 9.5% in the TAU Urban Acoustic Scenes 2019 dataset, and also improved 10% in the TUT Acoustic Scenes 2017 dataset compared with the official baseline. At the same time, the performance of the student model is much better than that of the model without teachers.
△ Less
Submitted 24 October, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
Authors:
Dongwei Jiang,
Xiaoning Lei,
Wubo Li,
Ne Luo,
Yuxuan Hu,
Wei Zou,
Xiangang Li
Abstract:
Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, an unsupervised pre-training method called Masked Predictive Coding is proposed, which can be applied for unsupervised pre-training with Trans…
▽ More
Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, an unsupervised pre-training method called Masked Predictive Coding is proposed, which can be applied for unsupervised pre-training with Transformer based model. Experiments on HKUST show that using the same training data, we can achieve CER 23.3%, exceeding the best end-to-end model by over 0.2% absolute CER. With more pre-training data, we can further reduce the CER to 21.0%, or a 11.8% relative CER reduction over baseline.
△ Less
Submitted 31 October, 2019; v1 submitted 22 October, 2019;
originally announced October 2019.
-
EPOSIT: An Absolute Pose Estimation Method for Pinhole and Fish-Eye Cameras
Authors:
Zhaobing Kang,
Wei Zou,
Zheng Zhu,
Chi Zhang,
Hongxuan Ma
Abstract:
This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera. Different from existing methods, relative positions of 3D points rather than absolute coordinates in the world coordinate system are employed in our method, and it has a unique solution. The application scope of POSIT (Pose from Orthography and Scaling with Itera…
▽ More
This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera. Different from existing methods, relative positions of 3D points rather than absolute coordinates in the world coordinate system are employed in our method, and it has a unique solution. The application scope of POSIT (Pose from Orthography and Scaling with Iteration) algorithm is generalized to fish-eye cameras by combining with the radially symmetric projection model. The image point relationship between the pinhole camera and the fish-eye camera is derived based on their projection model. The general pose expression which fits for different cameras can be acquired by four noncoplanar object points and their corresponding image points. Accurate estimation results are calculated iteratively. Experimental results on synthetic and real data show that the pose estimation results of our method are more stable and accurate than state-of-the-art methods. The source code is available at https://github.com/k032131/EPOSIT.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera
Authors:
Hongxuan Ma,
Wei Zou,
Zheng Zhu,
Siyang Sun,
Zhaobing Kang
Abstract:
In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so kee** markers in camera's view is an important problem. In this paper, we propose a novel approach to calculate field-of-view (FOV) constraint of markers for camera. Our method can make the camera maintain the visibility of all feature points during the motion of mobile robot. Ac…
▽ More
In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so kee** markers in camera's view is an important problem. In this paper, we propose a novel approach to calculate field-of-view (FOV) constraint of markers for camera. Our method can make the camera maintain the visibility of all feature points during the motion of mobile robot. According to the angular aperture of camera, the mobile robot can obtain the FOV constraint region where the camera cannot keep all feature points in an image. Based on the FOV constraint region, the mobile robot can be guided to move from the initial position to destination. Finally simulations and experiments are conducted based on a mobile robot equipped with a pan-tilt camera, which validates the effectiveness of the method to obtain the FOV constraints.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Human Following for Wheeled Robot with Monocular Pan-tilt Camera
Authors:
Zheng Zhu,
Hongxuan Ma,
Wei Zou
Abstract:
Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications. Currently most human following systems are equipped with depth sensors to obtain distance information between human and robot, which suffer from the perception requirements and noises. In this paper, we design a wheeled mobile robot system with monocular pan-tilt camera to follow…
▽ More
Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications. Currently most human following systems are equipped with depth sensors to obtain distance information between human and robot, which suffer from the perception requirements and noises. In this paper, we design a wheeled mobile robot system with monocular pan-tilt camera to follow human, which can stay the target in the field of view and keep following simultaneously. The system consists of fast human detector, real-time and accurate visual tracker, and unified controller for mobile robot and pan-tilt camera. In visual tracking algorithm, both Siamese networks and optical flow information are exploited to locate and regress human simultaneously. In order in perform following with a monocular camera, the constraint of human height is introduced to design the controller. In experiments, human following are conducted and analysed in simulations and a real robot platform, which demonstrate the effectiveness and robustness of the overall system.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Normalized solutions for a coupled Schrödinger system
Authors:
Thomas Bartsch,
Xuexiu Zhong,
Wenming Zou
Abstract:
In the present paper, we prove the existence of solutions $(λ_1,λ_2,u,v)\in\mathbb{R}^2\times H^1(\mathbb{R}^3,\mathbb{R}^2)$ to systems of coupled Schrödinger equations $$ \begin{cases} -Δu+λ_1u=μ_1 u^3+βuv^2\quad &\hbox{in}\;\mathbb{R}^3\\ -Δv+λ_2v=μ_2 v^3+βu^2v\quad&\hbox{in}\;\mathbb{R}^3\\ u,v>0&\hbox{in}\;\mathbb{R}^3 \end{cases} $$ satisfying the normalization constraint…
▽ More
In the present paper, we prove the existence of solutions $(λ_1,λ_2,u,v)\in\mathbb{R}^2\times H^1(\mathbb{R}^3,\mathbb{R}^2)$ to systems of coupled Schrödinger equations $$ \begin{cases} -Δu+λ_1u=μ_1 u^3+βuv^2\quad &\hbox{in}\;\mathbb{R}^3\\ -Δv+λ_2v=μ_2 v^3+βu^2v\quad&\hbox{in}\;\mathbb{R}^3\\ u,v>0&\hbox{in}\;\mathbb{R}^3 \end{cases} $$ satisfying the normalization constraint $ \displaystyle\int_{\mathbb{R}^3}u^2=a^2\quad\hbox{and}\;\int_{\mathbb{R}^3}v^2=b^2, $ which appear in binary mixtures of Bose-Einstein condensates or in nonlinear optics. The parameters $μ_1,μ_2,β>0$ are prescribed as are the masses $a,b>0$. The system has been considered mostly in the fixed frequency case. And when the masses are prescribed, the standard approach to this problem is variational with $λ_1,λ_2$ appearing as Lagrange multipliers. Here we present a new approach based on bifurcation theory and the continuation method. We obtain the existence of normalized solutions for any given $a,b>0$ for $β$ in a large range. We also give a result about the nonexistence of positive solutions. From which one can see that our existence theorem is almost the best. Especially, if $μ_1=μ_2$ we prove that normalized solutions exist for all $β>0$ and all $a,b>0$.
△ Less
Submitted 14 January, 2020; v1 submitted 30 August, 2019;
originally announced August 2019.
-
High Performance Visual Object Tracking with Unified Convolutional Networks
Authors:
Zheng Zhu,
Wei Zou,
Guan Huang,
Dalong Du,
Chang Huang
Abstract:
Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different tasks and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards real-time applicati…
▽ More
Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different tasks and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards real-time applications because of their time-consuming feature extraction and complex optimization details. In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, the UCT treats feature extractor and tracking process both as convolution operation and trains them jointly, which enables learned CNN features are tightly coupled with tracking process. During online tracking, an efficient model updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. Experiments are performed on four challenging tracking datasets: OTB2013, OTB2015, VOT2015 and VOT2016. Our method achieves leading performance on these benchmarks while maintaining beyond real-time speed.
△ Less
Submitted 25 August, 2019;
originally announced August 2019.
-
Camera Pose Correction in SLAM Based on Bias Values of Map Points
Authors:
Zhaobing Kang,
Wei Zou,
Zheng Zhu
Abstract:
Accurate camera pose estimation result is essential for visual SLAM (VSLAM). This paper presents a novel pose correction method to improve the accuracy of the VSLAM system. Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM. Secondly, the bias value of the map point is calculated by a statistical meth…
▽ More
Accurate camera pose estimation result is essential for visual SLAM (VSLAM). This paper presents a novel pose correction method to improve the accuracy of the VSLAM system. Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM. Secondly, the bias value of the map point is calculated by a statistical method. Finally, the camera pose estimation error is compensated according to the first derived relationship. After the pose correction, procedures of the original system, such as the bundle adjustment (BA) optimization, can be executed as before. Compared with existing methods, our algorithm is compact and effective and can be easily generalized to different VSLAM systems. Additionally, the robustness to system noise of our method is better than feature selection methods, due to all original system information is preserved in our algorithm while only a subset is employed in the latter. Experimental results on benchmark datasets show that our approach leads to considerable improvements over state-of-the-art algorithms for absolute pose estimation.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks
Authors:
Jiabin Zhang,
Zheng Zhu,
Wei Zou,
Peng Li,
Yanwei Li,
Hu Su,
Guan Huang
Abstract:
Both accuracy and efficiency are significant for pose estimation and tracking in videos. State-of-the-art performance is dominated by two-stages top-down methods. Despite the leading results, these methods are impractical for real-world applications due to their separated architectures and complicated calculation. This paper addresses the task of articulated multi-person pose estimation and tracki…
▽ More
Both accuracy and efficiency are significant for pose estimation and tracking in videos. State-of-the-art performance is dominated by two-stages top-down methods. Despite the leading results, these methods are impractical for real-world applications due to their separated architectures and complicated calculation. This paper addresses the task of articulated multi-person pose estimation and tracking towards real-time speed. An end-to-end multi-task network (MTN) is designed to perform human detection, pose estimation, and person re-identification (Re-ID) tasks simultaneously. To alleviate the performance bottleneck caused by scale variation problem, a paradigm which exploits scale-normalized image and feature pyramids (SIFP) is proposed to boost both performance and speed. Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature. In experiments, we demonstrate that the pose estimation and tracking performance improves steadily utilizing SIFP through different backbones. Using ResNet-18 and ResNet-50 as backbones, the overall pose tracking framework achieves competitive performance with 29.4 FPS and 12.2 FPS, respectively. Additionally, occlusion-aware Re-ID feature decreases the identification switches by 37% in the pose tracking process.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Mutation Testing for Ethereum Smart Contract
Authors:
Haoran Wu,
Xingya Wang,
Jiehui Xu,
Weiqin Zou,
Lingming Zhang,
Zhenyu Chen
Abstract:
Smart contract is a special program that manages digital assets on blockchain. It is difficult to recover the loss if users make transactions through buggy smart contracts, which cannot be directly fixed. Hence, it is important to ensure the correctness of smart contracts before deploying them. This paper proposes a systematic framework to mutation testing for smart contracts on Ethereum, which is…
▽ More
Smart contract is a special program that manages digital assets on blockchain. It is difficult to recover the loss if users make transactions through buggy smart contracts, which cannot be directly fixed. Hence, it is important to ensure the correctness of smart contracts before deploying them. This paper proposes a systematic framework to mutation testing for smart contracts on Ethereum, which is currently the most popular open blockchain for deploying and running smart contracts. Fifteen novel mutation operators have been designed for Ethereum Smart Contracts (ESC), in terms of keyword, global variable/function, variable unit, and error handling. An empirical study on 26 smart contracts in four Ethereum DApps has been conducted to evaluate the effectiveness of mutation testing. The experimental results show that our approach can outperform the coverage-based approach on defect detection rate (96.01% vs. 55.68%). The ESC mutation operators are effective to reveal real defects and we found 117 out of 729 real bug reports are related to our operators. These show the great potential of using mutation testing for quality assurance of ESC.
△ Less
Submitted 10 August, 2019;
originally announced August 2019.
-
DELTA: A DEep learning based Language Technology plAtform
Authors:
Kun Han,
Junwen Chen,
Hui Zhang,
Haiyang Xu,
Yi** Peng,
Yun Wang,
Ning Ding,
Hui Deng,
Yonghu Gao,
Tingwei Guo,
Yi Zhang,
Yahao He,
Baochang Ma,
Yulong Zhou,
Kangli Zhang,
Chao Liu,
Ying Lyu,
Chenxi Wang,
Cheng Gong,
Yunbo Wang,
Wei Zou,
Hui Song,
Xiangang Li
Abstract:
In this paper we present DELTA, a deep learning based language technology platform. DELTA is an end-to-end platform designed to solve industry level natural language and speech processing problems. It integrates most popular neural network models for training as well as comprehensive deployment tools for production. DELTA aims to provide easy and fast experiences for using, deploying, and developi…
▽ More
In this paper we present DELTA, a deep learning based language technology platform. DELTA is an end-to-end platform designed to solve industry level natural language and speech processing problems. It integrates most popular neural network models for training as well as comprehensive deployment tools for production. DELTA aims to provide easy and fast experiences for using, deploying, and develo** natural language processing and speech models for both academia and industry use cases. We demonstrate the reliable performance with DELTA on several natural language processing and speech tasks, including text classification, named entity recognition, natural language inference, speech recognition, speaker verification, etc. DELTA has been used for develo** several state-of-the-art algorithms for publications and delivering real production to serve millions of users.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Deep learning scheme for recovery of broadband microwave photonic receiving systems in transceivers without expert knowledge and system priors
Authors:
Shaofu Xu,
Rui Wang,
Jian** Chen,
Lei Yu,
Weiwen Zou
Abstract:
In regular microwave photonic (MWP) receiving systems, broadband signals are processed in the analog domain before they are transformed to the digital domain for further processing and storage. However, the quality of the signals may be degraded by defective photonic analog links, especially in a complicated MWP system. Here, we show a unified deep learning scheme that recovers the distorted broad…
▽ More
In regular microwave photonic (MWP) receiving systems, broadband signals are processed in the analog domain before they are transformed to the digital domain for further processing and storage. However, the quality of the signals may be degraded by defective photonic analog links, especially in a complicated MWP system. Here, we show a unified deep learning scheme that recovers the distorted broadband signals as they are transformed to the digital domain. The neural network could automatically learn the end-to-end inverse responses of the distortion effects of actual photonic analog links from data without expert knowledge and system priors. Hence, by shifting or augmenting the datasets, the neural network is potential to be generalized to various MWP receiving systems. We conduct experiments by nontrivial MWP systems with complicated waveforms. Results validate the effectiveness, general applicability and the noise-robustness of the proposed scheme, showing its superior performance in practical MWP systems. Therefore, the proposed deep learning scheme facilitates the low-cost performance improvement of MWP receiving systems, as well as the next-generation broadband transceivers, including radars, communications, and microwave imaging.
△ Less
Submitted 25 October, 2019; v1 submitted 16 July, 2019;
originally announced July 2019.
-
Biomimetic Polymer Film with Brilliant Brightness Using a One-Step Water Vapor-Induced Phase Separation Method
Authors:
Weizhi Zou,
Lorenzo Pattelli,
**g Guo,
Shijia Yang,
Meng Yang,
Ning Zhao,
Jian Xu,
Diederik S. Wiersma
Abstract:
The scales of the white Cyphochilus beetles are endowed with unusual whiteness arising from the exceptional scattering efficiency of their disordered ultrastructure optimized through millions of years of evolution. Here, a simple, one-step method based on water vapor-induced phase separation (VIPS) is developed to prepare ultra-thin polystyrene (PS) films with similar microstructure and comparable…
▽ More
The scales of the white Cyphochilus beetles are endowed with unusual whiteness arising from the exceptional scattering efficiency of their disordered ultrastructure optimized through millions of years of evolution. Here, a simple, one-step method based on water vapor-induced phase separation (VIPS) is developed to prepare ultra-thin polystyrene (PS) films with similar microstructure and comparable optical performance. A typical biomimetic 3.5 um PS film exhibits a diffuse reflectance of 61% at 500 nm, which translates into a transport mean free path below 1 um. A complete optical characterization through Monte Carlo simulations reveals how such scattering performance arises from the scattering coefficient and scattering anisotropy, whose interplay provides insight into the morphological properties of the material. The potential of bright-white coatings as smart sensors or wearable devices is highlighted using a treated ultra-thin film as a real-time sensor for human exhalation.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
PR Product: A Substitute for Inner Product in Neural Networks
Authors:
Zhennan Wang,
Wenbin Zou,
Chen Xu
Abstract:
In this paper, we analyze the inner product of weight vector w and data vector x in neural networks from the perspective of vector orthogonal decomposition and prove that the direction gradient of w decreases with the angle between them close to 0 or π. We propose the Projection and Rejection Product (PR Product) to make the direction gradient of w independent of the angle and consistently larger…
▽ More
In this paper, we analyze the inner product of weight vector w and data vector x in neural networks from the perspective of vector orthogonal decomposition and prove that the direction gradient of w decreases with the angle between them close to 0 or π. We propose the Projection and Rejection Product (PR Product) to make the direction gradient of w independent of the angle and consistently larger than the one in standard inner product while kee** the forward propagation identical. As a reliable substitute for standard inner product, the PR Product can be applied into many existing deep learning modules, so we develop the PR Product version of fully connected layer, convolutional layer and LSTM layer. In static image classification, the experiments on CIFAR10 and CIFAR100 datasets demonstrate that the PR Product can robustly enhance the ability of various state-of-the-art classification networks. On the task of image captioning, even without any bells and whistles, our PR Product version of captioning model can compete or outperform the state-of-the-art models on MS COCO dataset. Code has been made available at:https://github.com/wzn0828/PR_Product.
△ Less
Submitted 16 August, 2019; v1 submitted 30 April, 2019;
originally announced April 2019.
-
Sharp blow up estimates and precise asymptotic behavior of singular positive solutions to fractional Hardy-Hénon equations
Authors:
Hui Yang,
Wenming Zou
Abstract:
In this paper, we study the asymptotic behavior of positive solutions of the fractional Hardy-Hénon equation $$ (-Δ)^σu = |x|^αu^p ~~~~~~~~~~~ in ~~ B_1 \backslash \{0\} $$ with an isolated singularity at the origin, where $σ\in (0, 1)$ and the punctured unit ball $B_1 \backslash \{0\} \subset \mathbb{R}^n$ with $n \geq 2$. When $-2σ< α< 2σ$ and $\frac{n+α}{n-2σ} < p < \frac{n+2σ}{n-2σ}$, we give…
▽ More
In this paper, we study the asymptotic behavior of positive solutions of the fractional Hardy-Hénon equation $$ (-Δ)^σu = |x|^αu^p ~~~~~~~~~~~ in ~~ B_1 \backslash \{0\} $$ with an isolated singularity at the origin, where $σ\in (0, 1)$ and the punctured unit ball $B_1 \backslash \{0\} \subset \mathbb{R}^n$ with $n \geq 2$. When $-2σ< α< 2σ$ and $\frac{n+α}{n-2σ} < p < \frac{n+2σ}{n-2σ}$, we give a classification of isolated singularities of positive solutions, and in particular, this implies sharp blow up estimates of singular solutions. Further, we describe the precise asymptotic behavior of solutions near the singularity. More generally, we classify isolated boundary singularities and describe the precise asymptotic behavior of singular solutions for a relevant degenerate elliptic equation with a nonlinear Neumann boundary condition. These results parallel those known for the Laplacian counterpart proved by Gidas and Spruck (Comm. Pure Appl. Math. 34: 525-598, 1981), but the methods are very different, since the ODEs analysis is a missing ingredient in the fractional case. Our proofs are based on a monotonicity formula, combined with blow up (down) arguments, Kelvin transformation and uniqueness of solutions of related degenerate equations on $\mathbb{S}^{n}_+$. We also investigate isolated singularities located at infinity of fractional Hardy-Hénon equations.
△ Less
Submitted 14 August, 2020; v1 submitted 31 March, 2019;
originally announced April 2019.
-
Passivity guaranteed stiffness control with multiple frequency band specifications for a cable-driven series elastic actuator
Authors:
Ningbo Yu,
Wulin Zou,
Yubo Sun
Abstract:
Impedance control and specifically stiffness control are widely applied for physical human-robot interaction. The series elastic actuator (SEA) provides inherent compliance, safety and further benefits. This paper aims to improve the stiffness control performance of a cable-driven SEA. Existing impedance controllers were designed within the full frequency domain, though human-robot interaction com…
▽ More
Impedance control and specifically stiffness control are widely applied for physical human-robot interaction. The series elastic actuator (SEA) provides inherent compliance, safety and further benefits. This paper aims to improve the stiffness control performance of a cable-driven SEA. Existing impedance controllers were designed within the full frequency domain, though human-robot interaction commonly falls in the low frequency range. We enhance the stiffness rendering performance under formulated constraints of passivity, actuator limitation, disturbance attenuation, noise rejection at their specific frequency ranges. Firstly, we reformulate this multiple frequency-band optimization problem into the $H_\infty$ synthesis framework. Then, the performance goals are quantitatively characterized by respective restricted frequency-domain specifications as norm bounds. Further, a structured controller is directly synthesized to satisfy all the competing performance requirements. Both simulation and experimental results showed that the produced controller enabled good interaction performance for each desired stiffness varying from 0 to 1 times of the physical spring constant. Compared with the passivity-based PID method, the proposed $H_\infty$ synthesis method achieved more accurate and robust stiffness control performance with guaranteed passivity.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Impedance control of a cable-driven SEA with mixed $H_2/H_\infty$ synthesis
Authors:
Ningbo Yu,
Wulin Zou
Abstract:
Purpose: This paper presents an impedance control method with mixed $H_2/H_\infty$ synthesis and relaxed passivity for a cable-driven series elastic actuator to be applied for physical human-robot interaction.
Design/methodology/approach: To shape the system's impedance to match a desired dynamic model, the impedance control problem was reformulated into an impedance matching structure. The desi…
▽ More
Purpose: This paper presents an impedance control method with mixed $H_2/H_\infty$ synthesis and relaxed passivity for a cable-driven series elastic actuator to be applied for physical human-robot interaction.
Design/methodology/approach: To shape the system's impedance to match a desired dynamic model, the impedance control problem was reformulated into an impedance matching structure. The desired competing performance requirements as well as constraints from the physical system can be characterized with weighting functions for respective signals. Considering the frequency properties of human movements, the passivity constraint for stable human-robot interaction, which is required on the entire frequency spectrum and may bring conservative solutions, has been relaxed in such a way that it only restrains the low frequency band. Thus, impedance control became a mixed $H_2/H_\infty$ synthesis problem, and a dynamic output feedback controller can be obtained.
Findings: The proposed impedance control strategy has been tested for various desired impedance with both simulation and experiments on the cable-driven series elastic actuator platform. The actual interaction torque tracked well the desired torque within the desired norm bounds, and the control input was regulated below the motor velocity limit. The closed loop system can guarantee relaxed passivity at low frequency. Both simulation and experimental results have validated the feasibility and efficacy of the proposed method.
Originality/value: This impedance control strategy with mixed $H_2/H_\infty$ synthesis and relaxed passivity provides a novel, effective and less conservative method for physical human-robot interaction control.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Geometrical and topological description of chirality-relevant flow structures
Authors:
Wennan Zou,
Jian-Zhou Zhu,
Xin Liu
Abstract:
Issues relevant to the flow chirality and structure are focused, while the new theoretical results, including even a distinctive theory, are introduced. However, it is hope that the presentation, with a low starting point but a steep rise, is appropriate for a broader spectrum of audiences ranging from students to researchers, thus illustrations of differential forms and relevant basic topological…
▽ More
Issues relevant to the flow chirality and structure are focused, while the new theoretical results, including even a distinctive theory, are introduced. However, it is hope that the presentation, with a low starting point but a steep rise, is appropriate for a broader spectrum of audiences ranging from students to researchers, thus illustrations of differential forms and relevant basic topological concepts are also offered, followed by the demonstration with formulation of differential forms of the classical Navier-Stokes flow theory and the discussions of recent studies in fundamental fluid mechanics and turbulence.
△ Less
Submitted 29 May, 2019; v1 submitted 13 March, 2019;
originally announced March 2019.
-
A Polynomially Irreducible Functional Basis of Hemitropic Invariants of Piezoelectric Tensors
Authors:
Y. Chen,
Z. Ming,
L. Qi,
W. Zou
Abstract:
For piezoelectric tensors, Olive (2014) proposed a minimal integrity basis of 495 hemitropic invariants, which is also a functional basis. In this article, we construct a new functional basis of hemitropic invariants of piezoelectric tensors, using the approach of Smith and Zheng. By eliminating invariants that are polynomials in other invariants, we obtain a new functional basis with 260 polynomi…
▽ More
For piezoelectric tensors, Olive (2014) proposed a minimal integrity basis of 495 hemitropic invariants, which is also a functional basis. In this article, we construct a new functional basis of hemitropic invariants of piezoelectric tensors, using the approach of Smith and Zheng. By eliminating invariants that are polynomials in other invariants, we obtain a new functional basis with 260 polynomially irreducible hemitropic invariants. Thus, the number of hemitropic invariants in the new functional basis is substantially smaller than the number of invariants in a minimal integrity basis.
△ Less
Submitted 7 January, 2019;
originally announced January 2019.
-
Motion Control on Bionic Eyes: A Comprehensive Review
Authors:
Zheng Zhu,
Qingbin Wang,
Wei Zou,
Feng Zhang
Abstract:
Biology can provide biomimetic components and new control principles for robotics. Develo** a robot system equipped with bionic eyes is a difficult but exciting task. Researchers have been studying the control mechanisms of bionic eyes for many years and considerable models are available. In this paper, control model and its implementation on robots for bionic eyes are reviewed, which covers sac…
▽ More
Biology can provide biomimetic components and new control principles for robotics. Develo** a robot system equipped with bionic eyes is a difficult but exciting task. Researchers have been studying the control mechanisms of bionic eyes for many years and considerable models are available. In this paper, control model and its implementation on robots for bionic eyes are reviewed, which covers saccade, smooth pursuit, vergence, vestibule-ocular reflex (VOR), optokinetic reflex (OKR) and eye-head coordination. What is more, some problems and possible solutions in the field of bionic eyes are discussed and analyzed. This review paper can be used as a guide for researchers to identify potential research problems and solutions of the bionic eyes' motion control.
△ Less
Submitted 5 January, 2019;
originally announced January 2019.
-
Action Machine: Rethinking Action Recognition in Trimmed Videos
Authors:
Jiagang Zhu,
Wei Zou,
Liang Xu,
Yiming Hu,
Zheng Zhu,
Manyu Chang,
Junjie Huang,
Guan Huang,
Dalong Du
Abstract:
Existing methods in video action recognition mostly do not distinguish human body from the environment and easily overfit the scenes and objects. In this work, we present a conceptually simple, general and high-performance framework for action recognition in trimmed videos, aiming at person-centric modeling. The method, called Action Machine, takes as inputs the videos cropped by person bounding b…
▽ More
Existing methods in video action recognition mostly do not distinguish human body from the environment and easily overfit the scenes and objects. In this work, we present a conceptually simple, general and high-performance framework for action recognition in trimmed videos, aiming at person-centric modeling. The method, called Action Machine, takes as inputs the videos cropped by person bounding boxes. It extends the Inflated 3D ConvNet (I3D) by adding a branch for human pose estimation and a 2D CNN for pose-based action recognition, being fast to train and test. Action Machine can benefit from the multi-task training of action recognition and pose estimation, the fusion of predictions from RGB images and poses. On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97.2% and 94.3% on cross-view and cross-subject respectively. Action Machine also achieves competitive performance on another three smaller action recognition datasets: Northwestern UCLA Multiview Action3D, MSR Daily Activity3D and UTD-MHAD. Code will be made available.
△ Less
Submitted 17 December, 2018; v1 submitted 13 December, 2018;
originally announced December 2018.