-
Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots
Authors:
Zipeng Fu,
Ashish Kumar,
Jitendra Malik,
Deepak Pathak
Abstract:
Legged locomotion is commonly studied and expressed as a discrete set of gait patterns, like walk, trot, gallop, which are usually treated as given and pre-programmed in legged robots for efficient locomotion at different speeds. However, fixing a set of pre-programmed gaits limits the generality of locomotion. Recent animal motor studies show that these conventional gaits are only prevalent in id…
▽ More
Legged locomotion is commonly studied and expressed as a discrete set of gait patterns, like walk, trot, gallop, which are usually treated as given and pre-programmed in legged robots for efficient locomotion at different speeds. However, fixing a set of pre-programmed gaits limits the generality of locomotion. Recent animal motor studies show that these conventional gaits are only prevalent in ideal flat terrain conditions while real-world locomotion is unstructured and more like bouts of intermittent steps. What principles could lead to both structured and unstructured patterns across mammals and how to synthesize them in robots? In this work, we take an analysis-by-synthesis approach and learn to move by minimizing mechanical energy. We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots. The emergent gaits are structured in ideal terrains and look similar to that of horses and sheep. The same approach leads to unstructured gaits in rough terrains which is consistent with the findings in animal motor control. We validate our hypothesis in both simulation and real hardware across natural terrains. Videos at https://energy-locomotion.github.io
△ Less
Submitted 25 October, 2021;
originally announced November 2021.
-
Multi network InfoMax: A pre-training method involving graph convolutional networks
Authors:
Usman Mahmood,
Zening Fu,
Vince Calhoun,
Sergey Plis
Abstract:
Discovering distinct features and their relations from data can help us uncover valuable knowledge crucial for various tasks, e.g., classification. In neuroimaging, these features could help to understand, classify, and possibly prevent brain disorders. Model introspection of highly performant overparameterized deep learning (DL) models could help find these features and relations. However, to ach…
▽ More
Discovering distinct features and their relations from data can help us uncover valuable knowledge crucial for various tasks, e.g., classification. In neuroimaging, these features could help to understand, classify, and possibly prevent brain disorders. Model introspection of highly performant overparameterized deep learning (DL) models could help find these features and relations. However, to achieve high-performance level DL models require numerous labeled training samples ($n$) rarely available in many fields. This paper presents a pre-training method involving graph convolutional/neural networks (GCNs/GNNs), based on maximizing mutual information between two high-level embeddings of an input sample. Many of the recently proposed pre-training methods pre-train one of many possible networks of an architecture. Since almost every DL model is an ensemble of multiple networks, we take our high-level embeddings from two different networks of a model --a convolutional and a graph network--. The learned high-level graph latent representations help increase performance for downstream graph classification tasks and bypass the need for a high number of labeled data samples. We apply our method to a neuroimaging dataset for classifying subjects into healthy control (HC) and schizophrenia (SZ) groups. Our experiments show that the pre-trained model significantly outperforms the non-pre-trained model and requires $50\%$ less data for similar performance.
△ Less
Submitted 14 February, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Brain dynamics via Cumulative Auto-Regressive Self-Attention
Authors:
Usman Mahmood,
Zening Fu,
Vince Calhoun,
Sergey Plis
Abstract:
Multivariate dynamical processes can often be intuitively described by a weighted connectivity graph between components representing each individual time-series. Even a simple representation of this graph as a Pearson correlation matrix may be informative and predictive as demonstrated in the brain imaging literature. However, there is a consensus expectation that powerful graph neural networks (G…
▽ More
Multivariate dynamical processes can often be intuitively described by a weighted connectivity graph between components representing each individual time-series. Even a simple representation of this graph as a Pearson correlation matrix may be informative and predictive as demonstrated in the brain imaging literature. However, there is a consensus expectation that powerful graph neural networks (GNNs) should perform better in similar settings. In this work, we present a model that is considerably shallow than deep GNNs, yet outperforms them in predictive accuracy in a brain imaging application. Our model learns the autoregressive structure of individual time series and estimates directed connectivity graphs between the learned representations via a self-attention mechanism in an end-to-end fashion. The supervised training of the model as a classifier between patients and controls results in a model that generates directed connectivity graphs and highlights the components of the time-series that are predictive for each subject. We demonstrate our results on a functional neuroimaging dataset classifying schizophrenia patients and controls.
△ Less
Submitted 14 February, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
False Correlation Reduction for Offline Reinforcement Learning
Authors:
Zhihong Deng,
Zuyue Fu,
Lingxiao Wang,
Zhuoran Yang,
Chenjia Bai,
Tianyi Zhou,
Zhaoran Wang,
**g Jiang
Abstract:
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COr…
▽ More
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.
△ Less
Submitted 1 November, 2023; v1 submitted 24 October, 2021;
originally announced October 2021.
-
FLiText: A Faster and Lighter Semi-Supervised Text Classification with Convolution Networks
Authors:
Chen Liu,
Mengchao Zhang,
Zhibin Fu,
Pan Hou,
Yu Li
Abstract:
In natural language processing (NLP), state-of-the-art (SOTA) semi-supervised learning (SSL) frameworks have shown great performance on deep pre-trained language models such as BERT, and are expected to significantly reduce the demand for manual labeling. However, our empirical studies indicate that these frameworks are not suitable for lightweight models such as TextCNN, LSTM and etc. In this wor…
▽ More
In natural language processing (NLP), state-of-the-art (SOTA) semi-supervised learning (SSL) frameworks have shown great performance on deep pre-trained language models such as BERT, and are expected to significantly reduce the demand for manual labeling. However, our empirical studies indicate that these frameworks are not suitable for lightweight models such as TextCNN, LSTM and etc. In this work, we develop a new SSL framework called FLiText, which stands for Faster and Lighter semi-supervised Text classification. FLiText introduces an inspirer network together with the consistency regularization framework, which leverages a generalized regular constraint on the lightweight models for efficient SSL. As a result, FLiText obtains new SOTA performance for lightweight models across multiple SSL benchmarks on text classification. Compared with existing SOTA SSL methods on TextCNN, FLiText improves the accuracy of lightweight model TextCNN from 51.00% to 90.49% on IMDb, 39.8% to 58.06% on Yelp-5, and from 55.3% to 65.08% on Yahoo. In addition, compared with the fully supervised method on the full dataset, FLiText just uses less than 1% of labeled data to improve the accuracy by 6.59%, 3.94%, and 3.22% on the datasets of IMDb, Yelp-5, and Yahoo respectively.
△ Less
Submitted 12 September, 2021;
originally announced October 2021.
-
EvoGAN: An Evolutionary Computation Assisted GAN
Authors:
Feng Liu,
HanYang Wang,
Jiahao Zhang,
Ziwang Fu,
Aimin Zhou,
Jiayin Qi,
Zhibin Li
Abstract:
The image synthesis technique is relatively well established which can generate facial images that are indistinguishable even by human beings. However, all of these approaches uses gradients to condition the output, resulting in the outputting the same image with the same input. Also, they can only generate images with basic expression or mimic an expression instead of generating compound expressi…
▽ More
The image synthesis technique is relatively well established which can generate facial images that are indistinguishable even by human beings. However, all of these approaches uses gradients to condition the output, resulting in the outputting the same image with the same input. Also, they can only generate images with basic expression or mimic an expression instead of generating compound expression. In real life, however, human expressions are of great diversity and complexity. In this paper, we propose an evolutionary algorithm (EA) assisted GAN, named EvoGAN, to generate various compound expressions with any accurate target compound expression. EvoGAN uses an EA to search target results in the data distribution learned by GAN. Specifically, we use the Facial Action Coding System (FACS) as the encoding of an EA and use a pre-trained GAN to generate human facial images, and then use a pre-trained classifier to recognize the expression composition of the synthesized images as the fitness function to guide the search of the EA. Combined random searching algorithm, various images with the target expression can be easily sythesized. Quantitative and Qualitative results are presented on several compound expressions, and the experimental results demonstrate the feasibility and the potential of EvoGAN.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Electronic confinement in quantum dots of twisted bilayer graphene
Authors:
Xiao-Feng Zhou,
Yi-Wen Liu,
Hong-Yi Yan,
Zhong-Qiu Fu,
Haiwen Liu,
Lin He
Abstract:
Electronic properties of quantum dots (QDs) depend sensitively on their parent materials. Therefore, confined electronic states in graphene QDs (GQDs) of monolayer and Bernal-stacked bilayer graphene are quite different. Twisted bilayer graphene (TBG) is distinct from monolayer and Bernal-stacked bilayer graphene because of the new degree of freedom: twist angle. In the past few years, numerous ef…
▽ More
Electronic properties of quantum dots (QDs) depend sensitively on their parent materials. Therefore, confined electronic states in graphene QDs (GQDs) of monolayer and Bernal-stacked bilayer graphene are quite different. Twisted bilayer graphene (TBG) is distinct from monolayer and Bernal-stacked bilayer graphene because of the new degree of freedom: twist angle. In the past few years, numerous efforts have been made to realize the GQDs of monolayer and Bernal-stacked bilayer graphene and achieved great success. Thus far, however, strategies for realizing GQDs of TBG have been elusive. Here, we demonstrate a general approach for fabricating stationary GQDs of TBG by introducing nanoscale p-n junctions with sharp boundaries in the TBG. We verify the confinement of low-energy massless Dirac fermions via whispering-gallery modes in the GQDs of TBG. Unexpectedly, electronic states around van Hove singularities of the TBG are also strongly modified around the GQDs. Such a feature has never been reported and is attributed to spatial variation of the interlayer coupling in the TBG induced by the GQDs.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Towards Toxic and Narcotic Medication Detection with Rotated Object Detector
Authors:
Jiao Peng,
Feifan Wang,
Zhongqiang Fu,
Yiying Hu,
Zichen Chen,
Xinghan Zhou,
Lijun Wang
Abstract:
Recent years have witnessed the advancement of deep learning vision technologies and applications in the medical industry. Intelligent devices for special medication management are in great need of, which requires more precise detection algorithms to identify the specifications and locations. In this work, YOLO (You only look once) based object detectors are tailored for toxic and narcotic medicat…
▽ More
Recent years have witnessed the advancement of deep learning vision technologies and applications in the medical industry. Intelligent devices for special medication management are in great need of, which requires more precise detection algorithms to identify the specifications and locations. In this work, YOLO (You only look once) based object detectors are tailored for toxic and narcotic medications detection tasks. Specifically, a more flexible annotation with rotated degree ranging from $0^\circ$ to $90^\circ$ and a mask-map**-based non-maximum suppression method are proposed to achieve a feasible and efficient medication detector aiming at arbitrarily oriented bounding boxes. Extensive experiments demonstrate that the rotated YOLO detectors are more suitable for identifying densely arranged drugs. The best shot mean average precision of the proposed network reaches 0.811 while the inference time is less than 300ms.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
FedSEAL: Semi-Supervised Federated Learning with Self-Ensemble Learning and Negative Learning
Authors:
Jieming Bian,
Zhu Fu,
Jie Xu
Abstract:
Federated learning (FL), a popular decentralized and privacy-preserving machine learning (FL) framework, has received extensive research attention in recent years. The majority of existing works focus on supervised learning (SL) problems where it is assumed that clients carry labeled datasets while the server has no data. However, in realistic scenarios, clients are often unable to label their dat…
▽ More
Federated learning (FL), a popular decentralized and privacy-preserving machine learning (FL) framework, has received extensive research attention in recent years. The majority of existing works focus on supervised learning (SL) problems where it is assumed that clients carry labeled datasets while the server has no data. However, in realistic scenarios, clients are often unable to label their data due to the lack of expertise and motivation while the server may host a small amount of labeled data. How to reasonably utilize the server labeled data and the clients' unlabeled data is thus of paramount practical importance. In this paper, we propose a new FL algorithm, called FedSEAL, to solve this Semi-Supervised Federated Learning (SSFL) problem. Our algorithm utilizes self-ensemble learning and complementary negative learning to enhance both the accuracy and the efficiency of clients' unsupervised learning on unlabeled data, and orchestrates the model training on both the server side and the clients' side. Our experimental results on Fashion-MNIST and CIFAR10 datasets in the SSFL setting validate the effectiveness of our method, which outperforms the state-of-the-art SSFL methods by a large margin.
△ Less
Submitted 8 June, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
Authors:
Ye Du,
Zehua Fu,
Qingjie Liu,
Yunhong Wang
Abstract:
Though image-level weakly supervised semantic segmentation (WSSS) has achieved great progress with Class Activation Maps (CAMs) as the cornerstone, the large supervision gap between classification and segmentation still hampers the model to generate more complete and precise pseudo masks for segmentation. In this study, we propose weakly-supervised pixel-to-prototype contrast that can provide pixe…
▽ More
Though image-level weakly supervised semantic segmentation (WSSS) has achieved great progress with Class Activation Maps (CAMs) as the cornerstone, the large supervision gap between classification and segmentation still hampers the model to generate more complete and precise pseudo masks for segmentation. In this study, we propose weakly-supervised pixel-to-prototype contrast that can provide pixel-level supervisory signals to narrow the gap. Guided by two intuitive priors, our method is executed across different views and within per single view of an image, aiming to impose cross-view feature semantic consistency regularization and facilitate intra(inter)-class compactness(dispersion) of the feature space. Our method can be seamlessly incorporated into existing WSSS models without any changes to the base networks and does not incur any extra inference burden. Extensive experiments manifest that our method consistently improves two strong baselines by large margins, demonstrating the effectiveness. Specifically, built on top of SEAM, we improve the initial seed mIoU on PASCAL VOC 2012 from 55.4% to 61.5%. Moreover, armed with our method, we increase the segmentation mIoU of EPS from 70.8% to 73.6%, achieving new state-of-the-art.
△ Less
Submitted 13 March, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Emergence of Theory of Mind Collaboration in Multiagent Systems
Authors:
Luyao Yuan,
Zipeng Fu,
Linqi Zhou,
Kexin Yang,
Song-Chun Zhu
Abstract:
Currently, in the study of multiagent systems, the intentions of agents are usually ignored. Nonetheless, as pointed out by Theory of Mind (ToM), people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. However, due to its intrinsic recursion and intractable modeling of distribution over…
▽ More
Currently, in the study of multiagent systems, the intentions of agents are usually ignored. Nonetheless, as pointed out by Theory of Mind (ToM), people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. However, due to its intrinsic recursion and intractable modeling of distribution over belief, integrating ToM in multiagent planning and decision making is still a challenge. In this paper, we incorporate ToM in multiagent partially observable Markov decision process (POMDP) and propose an adaptive training algorithm to develop effective collaboration between agents with ToM. We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
Non-Hermitian physics and engineering in silicon photonics
Authors:
Changqing Wang,
Zhoutian Fu,
Lan Yang
Abstract:
Silicon photonics has been studied as an integratable optical platform where numerous applicable devices and systems are created based on modern physics and state-of-the-art nanotechnologies. The implementation of quantum mechanics has been the driving force of the most intriguing design of photonic structures, since the optical systems are found of great capability and potential in realizing the…
▽ More
Silicon photonics has been studied as an integratable optical platform where numerous applicable devices and systems are created based on modern physics and state-of-the-art nanotechnologies. The implementation of quantum mechanics has been the driving force of the most intriguing design of photonic structures, since the optical systems are found of great capability and potential in realizing the analogues of quantum concepts and phenomena. Non-Hermitian physics, which breaks the conventional scope of quantum mechanics based on Hermitian Hamiltonian, has been widely explored in the platform of silicon photonics, with promising design of optical refractive index, modal coupling and gain-loss distribution. As we will discuss in this chapter, the unconventional properties of exceptional points and parity-time symmetry realized in silicon photonics have created new opportunities for ultrasensitive sensors, laser engineering, control of light propagation, topological mode conversion, etc. The marriage between the quantum non-Hermiticity and classical silicon platforms not only spurs numerous studies on the fundamental physics, but also enriches the potential functionalities of the integrated photonic systems.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Segmentation of Roads in Satellite Images using specially modified U-Net CNNs
Authors:
Jonas Bokstaller,
Yihang She,
Zhehan Fu,
Tommaso Macrì
Abstract:
The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision al…
▽ More
The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision algorithms, convolutional neural networks (CNNs) provide accurate and reliable results on this task. Our novel approach uses a sliding window to extract patches out of the whole image, data augmentation for generating more training/testing data and lastly a series of specially modified U-Net CNNs. This proposed technique outperforms all other baselines tested in terms of mean F-score metric.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Dense Contrastive Visual-Linguistic Pretraining
Authors:
Lei Shi,
Kai Shuang,
Shijie Geng,
Peng Gao,
Zuohui Fu,
Gerard de Melo,
Yunpeng Chen,
Sen Su
Abstract:
Inspired by the success of BERT, several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. In particular, LXMERT and UNITER adopt visual region feature regression and label classification as pretext tasks. However,…
▽ More
Inspired by the success of BERT, several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. In particular, LXMERT and UNITER adopt visual region feature regression and label classification as pretext tasks. However, they tend to suffer from the problems of noisy labels and sparse semantic annotations, based on the visual features having been pretrained on a crowdsourced dataset with limited and inconsistent semantic labeling. To overcome these issues, we propose unbiased Dense Contrastive Visual-Linguistic Pretraining (DCVLP), which replaces the region regression and classification with cross-modality region contrastive learning that requires no annotations. Two data augmentation strategies (Mask Perturbation and Intra-/Inter-Adversarial Perturbation) are developed to improve the quality of negative samples used in contrastive learning. Overall, DCVLP allows cross-modality dense region contrastive learning in a self-supervised setting independent of any object annotations. We compare our method against prior visual-linguistic pretraining frameworks to validate the superiority of dense contrastive learning on multimodal representation learning.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Facial Anatomical Landmark Detection using Regularized Transfer Learning with Application to Fetal Alcohol Syndrome Recognition
Authors:
Zeyu Fu,
Jianbo Jiao,
Michael Suttie,
J. Alison Noble
Abstract:
Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the…
▽ More
Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the presence of FAS associated facial anomalies. This imaging application is characterized by large variations in data appearance and limited availability of labeled data. Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets and are therefore not wellsuited for this application. To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets. In contrast to standard transfer learning which focuses on adjusting the pre-trained weights, the proposed learning approach regularizes the model behavior. It explicitly reuses the rich visual semantics of a domain-similar source model on the target task data as an additional supervisory signal for regularizing landmark detection optimization. Specifically, we develop four regularization constraints for the proposed transfer learning, including constraining the feature outputs from classification and intermediate layers, as well as matching activation attention maps in both spatial and channel levels. Experimental evaluation on a collected clinical imaging dataset demonstrate that the proposed approach can effectively improve model generalizability under limited training samples, and is advantageous to other approaches in the literature.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
Ultrabroadband THz/IR upconversion and photovoltaic response in semi-conductor ratchet based upconverter
Authors:
Peng Bai,
Ning Yang1,
Weidong Chu,
Yueheng Zhang,
Wenzhong Shen,
Zhanglong Fu,
Dixiang Shao,
Kang Zhou,
Zhiyong Tan,
Hua Li,
Juncheng Cao,
Lianhe Li,
Edmund Harold Linfield,
Yan Xie,
Ziran Zhao
Abstract:
An ultrabroadband upconversion device is demonstrated by direct tandem integration of a p-type GaAs/AlxGa1-xAs ratchet photodetector (RP) with a GaAs double heterojunction LED (DH-LED) using the molecular beam epitaxy (MBE). An ultrabroadband photoresponse from terahertz (THz) to near infrared (NIR) region (4-200 THz) was realized that covers a much wider frequency range com-pared with the existin…
▽ More
An ultrabroadband upconversion device is demonstrated by direct tandem integration of a p-type GaAs/AlxGa1-xAs ratchet photodetector (RP) with a GaAs double heterojunction LED (DH-LED) using the molecular beam epitaxy (MBE). An ultrabroadband photoresponse from terahertz (THz) to near infrared (NIR) region (4-200 THz) was realized that covers a much wider frequency range com-pared with the existing upconversion devices. Broadband IR/THz radiation from 1000 K blackbody is successfully upconverted into NIR photons which can be detected by commercial Si-based device. The normal incidence absorption of the RP simplifies the structure of the RP-LED device and make it more compact compared with the inter-subband transition based upconverters. In addition to the up-conversion function, the proposed upconverter is also tested as photovoltaic detectors in the infrared region (15-200 THz) without an applied bias voltage due to the ratchet effect.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
From Cloud to Edge: A First Look at Public Edge Platforms
Authors:
Mengwei Xu,
Zhe Fu,
Xiao Ma,
Li Zhang,
Yanan Li,
Feng Qian,
Shangguang Wang,
Ke Li,
**gyu Yang,
Xuanzhe Liu
Abstract:
Public edge platforms have drawn increasing attention from both academia and industry. In this study, we perform a first-of-its-kind measurement study on a leading public edge platform that has been densely deployed in China. Based on this measurement, we quantitatively answer two critical yet unexplored questions. First, from end users' perspective, what is the performance of commodity edge platf…
▽ More
Public edge platforms have drawn increasing attention from both academia and industry. In this study, we perform a first-of-its-kind measurement study on a leading public edge platform that has been densely deployed in China. Based on this measurement, we quantitatively answer two critical yet unexplored questions. First, from end users' perspective, what is the performance of commodity edge platforms compared to cloud, in terms of the end-to-end network delay, throughput, and the application QoE. Second, from the edge service provider's perspective, how are the edge workloads different from cloud, in terms of their VM subscription, monetary cost, and resource usage. Our study quantitatively reveals the status quo of today's public edge platforms, and provides crucial insights towards develo** and operating future edge services.
△ Less
Submitted 8 November, 2021; v1 submitted 7 September, 2021;
originally announced September 2021.
-
High fidelity entanglement of neutral atoms via a Rydberg-mediated single-modulated-pulse controlled-PHASE gate
Authors:
Zhuo Fu,
Peng Xu,
Yuan Sun,
Yangyang Liu,
Xiaodong He,
Xiao Li,
Min Liu,
Runbing Li,
** Wang,
Liang Liu,
Mingsheng Zhan
Abstract:
Neutral atom platform has become an attractive choice to study the science of quantum information and quantum simulation, where intense efforts have been devoted to the entangling processes between individual atoms. For the development of this area, two-qubit controlled-PHASE gate via Rydberg blockade is one of the most essential elements. Recent theoretical studies have suggested the advantages o…
▽ More
Neutral atom platform has become an attractive choice to study the science of quantum information and quantum simulation, where intense efforts have been devoted to the entangling processes between individual atoms. For the development of this area, two-qubit controlled-PHASE gate via Rydberg blockade is one of the most essential elements. Recent theoretical studies have suggested the advantages of introducing non-trivial waveform modulation into the gate protocol, which is anticipated to improve its performance towards the next stage. We report our recent experimental results in realizing a two-qubit controlled-PHASE($C_Z$) gate via off-resonant modulated driving(ORMD) embedded in two-photon transition for Rb atoms. It relies upon a single modulated driving pulse with a carefully calculated smooth waveform to gain the appropriate phase accumulations required by the two-qubit gate. Combining this $C_Z$ gate with global microwave pulses, two-atom entanglement is generated with the raw fidelity of 0.945(6). Accounting for state preparation and measurement (SPAM) errors, we extract the entanglement operation fidelity to be 0.980(7). Our work features completing the $C_Z$ gate operation within a single pulse to avoid shelved Rydberg population, thus demonstrate another promising route for realizing high-fidelity two-qubit gate for neutral atom platform.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Force Detection Sensitivity Spectrum Calibration of Levitated Nanomechanical Sensor Using Harmonic Coulomb Force
Authors:
Zhenhai Fu,
Shaochong Zhu,
Ying Dong,
Xingfan Chen,
Huizhu Hu,
Xiaowen Gao
Abstract:
Oscillators based on levitated particles are promising for the development of ultrasensitive force detectors. The theoretical performance of levitated nanomechanical sensors is usually characterized by the so-called thermal noise limit force detection sensitivity, which does not exhibit spectral specificity in practical measurements. To characterize the actual detection performance, we propose a m…
▽ More
Oscillators based on levitated particles are promising for the development of ultrasensitive force detectors. The theoretical performance of levitated nanomechanical sensors is usually characterized by the so-called thermal noise limit force detection sensitivity, which does not exhibit spectral specificity in practical measurements. To characterize the actual detection performance, we propose a method for the force detection sensitivity calibration of a levitated nanomechanical sensor based on the harmonic Coulomb force. Utilizing the measured transfer function, we obtained the force detection sensitivity spectrum from the position spectrum. Although the thermal noise limit force detection sensitivity of the system reached $\rm\left( {4.39 \pm 0.62} \right) \times {10^{ - 20}} N/H{z^{1/2}}$ at $\rm{2.4\times10^{-6} mbar}$ with feedback cooling, the measured sensitivity away from the resonance was of the order of $\rm10^{-17} N/Hz^{1/2}$ based on the existing detection noise level. The calibration method established in our study is applicable to the performance evaluation of any optical levitation system for high-sensitivity force measurements.
△ Less
Submitted 31 August, 2021;
originally announced September 2021.
-
Deep Natural Language Processing for LinkedIn Search
Authors:
Weiwei Guo,
Xiaowei Liu,
Sida Wang,
Michaeel Kazi,
Zhiwei Wang,
Zhoutong Fu,
Jun Jia,
Liang Zhang,
Huiji Gao,
Bo Long
Abstract:
Many search systems work with large amounts of natural language data, e.g., search queries, user profiles, and documents. Building a successful search system requires a thorough understanding of textual data semantics, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a comprehensive study for applying deep NLP techniques…
▽ More
Many search systems work with large amounts of natural language data, e.g., search queries, user profiles, and documents. Building a successful search system requires a thorough understanding of textual data semantics, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a comprehensive study for applying deep NLP techniques to five representative tasks in search systems: query intent prediction (classification), query tagging (sequential tagging), document ranking (ranking), query auto completion (language modeling), and query suggestion (sequence to sequence). We also introduce BERT pre-training as a sixth task that can be applied to many of the other tasks. Through the model design and experiments of the six tasks, readers can find answers to four important questions: (1). When is deep NLP helpful/not helpful in search systems? (2). How to address latency challenges? (3). How to ensure model robustness? This work builds on existing efforts of LinkedIn search, and is tested at scale on LinkedIn's commercial search engines. We believe our experiences can provide useful insights for the industry and research communities.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Spatially homogeneous few-cycle compression of Yb lasers via all-solid-state free-space soliton management
Authors:
Bingbing Zhu,
Zongyuan Fu,
Yudong Chen,
Sainan Peng,
Cheng **,
Guangyu Fan,
Sheng Zhang,
Shunjia Wang,
Hao Ru,
Chuanshan Tian,
Yihua Wang,
Henry Kapteyn,
Margaret Murnane,
Zhensheng Tao
Abstract:
The high power and variable repetition rate of Yb femtosecond lasers make them very attractive for ultrafast science. However, for capturing sub-200 fs dynamics, efficient, high-fidelity, and high-stability pulse compression techniques are essential. Spectral broadening using an all-solid-state free-space geometry is particularly attractive, as it is simple, robust, and low-cost. However, spatial…
▽ More
The high power and variable repetition rate of Yb femtosecond lasers make them very attractive for ultrafast science. However, for capturing sub-200 fs dynamics, efficient, high-fidelity, and high-stability pulse compression techniques are essential. Spectral broadening using an all-solid-state free-space geometry is particularly attractive, as it is simple, robust, and low-cost. However, spatial and temporal losses caused by spatio-spectral inhomogeneities have been a major challenge to date, due to coupled space-time dynamics associated with unguided nonlinear propagation. In this work, we use all-solid-state free-space compressors to demonstrate compression of 170 fs pulses at a wavelength of 1030nm from a Yb:KGW laser to ~9.2 fs, with a highly spatially homogeneous mode. This is achieved by ensuring that the nonlinear beam propagation in periodic layered Kerr media occurs in soliton modes and confining the nonlinear phase through each material layer to less than 1.0 rad. A remarkable spatio-spectral homogeneity of ~0.87 can be realized, which yields a high efficiency of >50% for few-cycle compression. The universality of the method is demonstrated by implementing high-quality pulse compression under a wide range of laser conditions. The high spatiotemporal quality and the exceptional stability of the compressed pulses are further verified by high-harmonic generation. This work represents the highest efficiency and the best spatio-spectral quality ever achieved by an all-solid-state free-space pulse compressor for few-cycle-pulse generation.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation
Authors:
Zhihan Liu,
Yufeng Zhang,
Zuyue Fu,
Zhuoran Yang,
Zhaoran Wang
Abstract:
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps.…
▽ More
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $\widetilde{\mathcal{O}}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes.
For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $\widetilde{\mathcal{O}}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2} \ )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Limits on astrophysical antineutrinos with the KamLAND experiment
Authors:
S. Abe,
S. Asami,
A. Gando,
Y. Gando,
T. Gima,
A. Goto,
T. Hachiya,
K. Hata,
S. Hayashida,
K. Hosokawa,
K. Ichimura,
S. Ieki,
H. Ikeda,
K. Inoue,
K. Ishidoshiro,
Y. Kamei,
N. Kawada,
T. Kinoshita,
Y. Kishimoto,
M. Koga,
N. Maemura,
T. Mitsui,
H. Miyake,
K. Nakamura,
K. Nakamura
, et al. (45 additional authors not shown)
Abstract:
We report on a search for electron antineutrinos ($\barν_e$) from astrophysical sources in the neutrino energy range 8.3 to 30.8 MeV with the KamLAND detector. In an exposure of 6.72 kton-year of the liquid scintillator, we observe 18 candidate events via the inverse beta decay reaction. Although there is a large background uncertainty from neutral current atmospheric neutrino interactions, we fin…
▽ More
We report on a search for electron antineutrinos ($\barν_e$) from astrophysical sources in the neutrino energy range 8.3 to 30.8 MeV with the KamLAND detector. In an exposure of 6.72 kton-year of the liquid scintillator, we observe 18 candidate events via the inverse beta decay reaction. Although there is a large background uncertainty from neutral current atmospheric neutrino interactions, we find no significant excess over background model predictions. Assuming several supernova relic neutrino spectra, we give upper flux limits of 60--110 cm$^{-2}$ s$^{-1}$ (90% CL) in the analysis range and present a model-independent flux. We also set limits on the annihilation rates for light dark matter pairs to neutrino pairs. These data improves on the upper probability limit of $^{8}$B solar neutrinos converting into $\barν_e$'s, $P_{ν_e \rightarrow \barν_e} < 3.5\times10^{-5}$ (90% CL) assuming an undistorted $\barν_e$ shape. This corresponds to a solar $\barν_e$ flux of 60 cm$^{-2}$ s$^{-1}$ (90% CL) in the analysis energy range.
△ Less
Submitted 22 October, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Deep Natural Language Processing for LinkedIn Search Systems
Authors:
Weiwei Guo,
Xiaowei Liu,
Sida Wang,
Michaeel Kazi,
Zhoutong Fu,
Huiji Gao,
Jun Jia,
Liang Zhang,
Bo Long
Abstract:
Many search systems work with large amounts of natural language data, e.g., search queries, user profiles and documents, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a comprehensive study of applying deep NLP techniques to five representative tasks in search engines. Through the model design and experiments of the fiv…
▽ More
Many search systems work with large amounts of natural language data, e.g., search queries, user profiles and documents, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a comprehensive study of applying deep NLP techniques to five representative tasks in search engines. Through the model design and experiments of the five tasks, readers can find answers to three important questions: (1) When is deep NLP helpful/not helpful in search systems? (2) How to address latency challenges? (3) How to ensure model robustness? This work builds on existing efforts of LinkedIn search, and is tested at scale on a commercial search engine. We believe our experiences can provide useful insights for the industry and research communities.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Force-feedback based Whole-body Stabilizer for Position-Controlled Humanoid Robots
Authors:
Shunpeng Yang,
Hua Chen,
Zhen Fu,
Wei Zhang
Abstract:
This paper studies stabilizer design for position-controlled humanoid robots. Stabilizers are an essential part for position-controlled humanoids, whose primary objective is to adjust the control input sent to the robot to assist the tracking controller to better follow the planned reference trajectory. To achieve this goal, this paper develops a novel force-feedback based whole-body stabilizer th…
▽ More
This paper studies stabilizer design for position-controlled humanoid robots. Stabilizers are an essential part for position-controlled humanoids, whose primary objective is to adjust the control input sent to the robot to assist the tracking controller to better follow the planned reference trajectory. To achieve this goal, this paper develops a novel force-feedback based whole-body stabilizer that fully exploits the six-dimensional force measurement information and the whole-body dynamics to improve tracking performance. Relying on rigorous analysis of whole-body dynamics of position-controlled humanoids under unknown contact, the developed stabilizer leverages quadratic-programming based technique that allows cooperative consideration of both the center-of-mass tracking and contact force tracking. The effectiveness of the proposed stabilizer is demonstrated on the UBTECH Walker robot in the MuJoCo simulator. Simulation validations show a significant improvement in various scenarios as compared to commonly adopted stabilizers based on the zero-moment-point feedback and the linear inverted pendulum model.
△ Less
Submitted 14 August, 2021;
originally announced August 2021.
-
Realization of ultrabroadband THz/IR photoresponse in a bias-tunable ratchet photodetector
Authors:
Peng Bai,
Xiaohong Li,
Ning Yang,
Weidong Chu,
Xueqi Bai,
Siheng Huang,
Yueheng Zhang,
Wenzhong Shen,
Zhanglong Fu,
Dixiang Shao,
Zhiyong Tan,
Hua Li,
Juncheng Cao,
Lianhe Li,
Edmund Harold Linfield,
Yan Xie,
Ziran Zhao
Abstract:
High performance Terahertz (THz) photodetector has drawn wide attention and got great improvement due to its significant application in biomedical, astrophysics, nondestructive inspection, 6th generation communication system as well as national security application. Here we demonstrate a novel broadband photon-type THz/infrared (IR) photodetector based on the GaAs/AlxGa1-xAs ratchet structure. Thi…
▽ More
High performance Terahertz (THz) photodetector has drawn wide attention and got great improvement due to its significant application in biomedical, astrophysics, nondestructive inspection, 6th generation communication system as well as national security application. Here we demonstrate a novel broadband photon-type THz/infrared (IR) photodetector based on the GaAs/AlxGa1-xAs ratchet structure. This kind of photodetector realizes a THz photon-response based on the electrically pumped hot hole injection and overcomes the internal workfunction related spectral response limit. An ultrabroadband photoresponse from 4 THz to 300 THz and a peak responsivity of 50.3 mA/W are realized at negative bias voltage of -1 V. The photodetector also presents a bias-tunable photon-response characteristic due to the asymmetric structure. The ratchet structure also induces an evident photocurrent even at zero bias voltage, which indicates the detector can be regard as a broadband photovoltaic-like detector. The rectification characteristic and high temperature operation possibility of the photodetector are also discussed. This work not only demonstrates a novel ultrabroadband THz/IR photodetector, but also provides a new method to study the light-responsive ratchet.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
The pickup and delivery problem with synchronized en-route transfers for microtransit planning
Authors:
Zhexi Fu,
Joseph Y. J. Chow
Abstract:
Microtransit and other flexible transit fleet services can reduce costs by incorporating transfers. However, transfers are costly to users if they must get off a vehicle and wait at a stop for another pickup. A mixed integer linear programming model (MILP) is proposed to solve pickup and delivery problems with vehicle-synchronized en-route transfers (PDPSET). The transfer location is determined by…
▽ More
Microtransit and other flexible transit fleet services can reduce costs by incorporating transfers. However, transfers are costly to users if they must get off a vehicle and wait at a stop for another pickup. A mixed integer linear programming model (MILP) is proposed to solve pickup and delivery problems with vehicle-synchronized en-route transfers (PDPSET). The transfer location is determined by the model and can be located at any candidate node in the network rather than a static facility defined in advance. The transfer operation is strictly synchronized between vehicles within a hard time window. A heuristic algorithm is proposed to solve the problem with an acceptable solution in a much shorter computation time than commercial software. Two sets of synthetic numerical experiments are tested: small-scale instances based on a 5x5 grid network, and large-scale instances of varying network sizes up to 250x250 grids to test scalability. The results show that adding synchronized en-route transfers in microtransit can further reduce the total cost by 10% on average and maximum savings can reach up to 19.6% in our small-scale test instances. The heuristic on average has an optimality gap less than 1.5% while having a fraction of the run time and can scale up to 250x250 grids with run times within 1 minute. Two large-scale examples demonstrate that over 50% of vehicle routes can be further improved by synchronized en-route transfers and the maximum savings in vehicle travel distance that can reach up to 20.37% for the instance with 100 vehicles and 300 requests on a 200x200 network.
△ Less
Submitted 19 January, 2022; v1 submitted 17 July, 2021;
originally announced July 2021.
-
RMA: Rapid Motor Adaptation for Legged Robots
Authors:
Ashish Kumar,
Zipeng Fu,
Deepak Pathak,
Jitendra Malik
Abstract:
Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these c…
▽ More
Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
TableSense: Spreadsheet Table Detection with Convolutional Neural Networks
Authors:
Haoyu Dong,
Shijie Liu,
Shi Han,
Zhouyu Fu,
Dongmei Zhang
Abstract:
Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreads…
▽ More
Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for table detection to meet the domain-specific requirement on precise table boundary detection; third, we propose an effective uncertainty metric to guide an active learning based smart sampling algorithm, which enables the efficient build-up of a training dataset with 22,176 tables on 10,220 sheets with broad coverage of diverse table structures and layouts. Our evaluation shows that TableSense is highly effective with 91.3\% recall and 86.5\% precision in EoB-2 metric, a significant improvement over both the current detection algorithm that are used in commodity spreadsheet tools and state-of-the-art convolutional neural networks in computer vision.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Weakly-Supervised Photo-realistic Texture Generation for 3D Face Reconstruction
Authors:
Xiangnan Yin,
Di Huang,
Zehua Fu,
Yunhong Wang,
Liming Chen
Abstract:
Although much progress has been made recently in 3D face reconstruction, most previous work has been devoted to predicting accurate and fine-grained 3D shapes. In contrast, relatively little work has focused on generating high-fidelity face textures. Compared with the prosperity of photo-realistic 2D face image generation, high-fidelity 3D face texture generation has yet to be studied. In this pap…
▽ More
Although much progress has been made recently in 3D face reconstruction, most previous work has been devoted to predicting accurate and fine-grained 3D shapes. In contrast, relatively little work has focused on generating high-fidelity face textures. Compared with the prosperity of photo-realistic 2D face image generation, high-fidelity 3D face texture generation has yet to be studied. In this paper, we proposed a novel UV map generation model that predicts the UV map from a single face image. The model consists of a UV sampler and a UV generator. By selectively sampling the input face image's pixels and adjusting their relative locations, the UV sampler generates an incomplete UV map that could faithfully reconstruct the original face. Missing textures in the incomplete UV map are further full-filled by the UV generator. The training is based on pseudo ground truth blended by the 3DMM texture and the input face texture, thus weakly supervised. To deal with the artifacts in the imperfect pseudo UV map, multiple partial UV map discriminators are leveraged.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Pixel Sampling for Style Preserving Face Pose Editing
Authors:
Xiangnan Yin,
Di Huang,
Hongyu Yang,
Zehua Fu,
Yunhong Wang,
Liming Chen
Abstract:
The existing auto-encoder based face pose editing methods primarily focus on modeling the identity preserving ability during pose synthesis, but are less able to preserve the image style properly, which refers to the color, brightness, saturation, etc. In this paper, we take advantage of the well-known frontal/profile optical illusion and present a novel two-stage approach to solve the aforementio…
▽ More
The existing auto-encoder based face pose editing methods primarily focus on modeling the identity preserving ability during pose synthesis, but are less able to preserve the image style properly, which refers to the color, brightness, saturation, etc. In this paper, we take advantage of the well-known frontal/profile optical illusion and present a novel two-stage approach to solve the aforementioned dilemma, where the task of face pose manipulation is cast into face inpainting. By selectively sampling pixels from the input face and slightly adjust their relative locations with the proposed ``Pixel Attention Sampling" module, the face editing result faithfully keeps the identity information as well as the image style unchanged. By leveraging high-dimensional embedding at the inpainting stage, finer details are generated. Further, with the 3D facial landmarks as guidance, our method is able to manipulate face pose in three degrees of freedom, i.e., yaw, pitch, and roll, resulting in more flexible face pose editing than merely controlling the yaw angle as usually achieved by the current state-of-the-art. Both the qualitative and quantitative evaluations validate the superiority of the proposed approach.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Kim Byeoung-su,
Radu Timofte,
Angeline Pouget,
Fenglong Song,
Cheng Li,
Shuai Xiao,
Zhongqian Fu,
Matteo Maggioni,
Yibin Huang,
Shen Cheng,
Xin Lu,
Yifeng Zhou,
Liangyu Chen,
Donghao Liu,
Xiangyu Zhang,
Haoqiang Fan,
Jian Sun,
Shuaicheng Liu,
Minsu Kwon,
Myungje Lee,
Jaeyoon Yoo,
Changbeom Kang,
Shinjo Wang,
Bin Huang
, et al. (7 additional authors not shown)
Abstract:
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut…
▽ More
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph Clustering
Authors:
Yiming Wang,
Dongxia Chang,
Zhiqian Fu,
Yao Zhao
Abstract:
Recently, there has been considerable research interest in graph clustering aimed at data partition using the graph information. However, one limitation of the most of graph-based methods is that they assume the graph structure to operate is fixed and reliable. And there are inevitably some edges in the graph that are not conducive to graph clustering, which we call spurious edges. This paper is t…
▽ More
Recently, there has been considerable research interest in graph clustering aimed at data partition using the graph information. However, one limitation of the most of graph-based methods is that they assume the graph structure to operate is fixed and reliable. And there are inevitably some edges in the graph that are not conducive to graph clustering, which we call spurious edges. This paper is the first attempt to employ graph pooling technique for node clustering and we propose a novel dual graph embedding network (DGEN), which is designed as a two-step graph encoder connected by a graph pooling layer to learn the graph embedding. In our model, it is assumed that if a node and its nearest neighboring node are close to the same clustering center, this node is an informative node and this edge can be considered as a cluster-friendly edge. Based on this assumption, the neighbor cluster pooling (NCPool) is devised to select the most informative subset of nodes and the corresponding edges based on the distance of nodes and their nearest neighbors to the cluster centers. This can effectively alleviate the impact of the spurious edges on the clustering. Finally, to obtain the clustering assignment of all nodes, a classifier is trained using the clustering results of the selected nodes. Experiments on five benchmark graph datasets demonstrate the superiority of the proposed method over state-of-the-art algorithms.
△ Less
Submitted 7 June, 2021; v1 submitted 30 April, 2021;
originally announced May 2021.
-
Consistent Multiple Graph Embedding for Multi-View Clustering
Authors:
Yiming Wang,
Dongxia Chang,
Zhiqiang Fu,
Yao Zhao
Abstract:
Graph-based multi-view clustering aiming to obtain a partition of data across multiple views, has received considerable attention in recent years. Although great efforts have been made for graph-based multi-view clustering, it remains a challenge to fuse characteristics from various views to learn a common representation for clustering. In this paper, we propose a novel Consistent Multiple Graph E…
▽ More
Graph-based multi-view clustering aiming to obtain a partition of data across multiple views, has received considerable attention in recent years. Although great efforts have been made for graph-based multi-view clustering, it remains a challenge to fuse characteristics from various views to learn a common representation for clustering. In this paper, we propose a novel Consistent Multiple Graph Embedding Clustering framework(CMGEC). Specifically, a multiple graph auto-encoder(M-GAE) is designed to flexibly encode the complementary information of multi-view data using a multi-graph attention fusion encoder. To guide the learned common representation maintaining the similarity of the neighboring characteristics in each view, a Multi-view Mutual Information Maximization module(MMIM) is introduced. Furthermore, a graph fusion network(GFN) is devised to explore the relationship among graphs from different views and provide a common consensus graph needed in M-GAE. By jointly training these models, the common latent representation can be obtained which encodes more complementary information from multiple views and depicts data more comprehensively. Experiments on three types of multi-view datasets demonstrate CMGEC outperforms the state-of-the-art clustering methods.
△ Less
Submitted 20 December, 2021; v1 submitted 11 May, 2021;
originally announced May 2021.
-
Visual Grounding with Transformers
Authors:
Ye Du,
Zehua Fu,
Qingjie Liu,
Yunhong Wang
Abstract:
In this paper, we propose a transformer based approach for visual grounding. Unlike previous proposal-and-rank frameworks that rely heavily on pretrained object detectors or proposal-free frameworks that upgrade an off-the-shelf one-stage detector by fusing textual embeddings, our approach is built on top of a transformer encoder-decoder and is independent of any pretrained detectors or word embed…
▽ More
In this paper, we propose a transformer based approach for visual grounding. Unlike previous proposal-and-rank frameworks that rely heavily on pretrained object detectors or proposal-free frameworks that upgrade an off-the-shelf one-stage detector by fusing textual embeddings, our approach is built on top of a transformer encoder-decoder and is independent of any pretrained detectors or word embedding models. Termed VGTR -- Visual Grounding with TRansformers, our approach is designed to learn semantic-discriminative visual features under the guidance of the textual description without harming their location ability. This information flow enables our VGTR to have a strong capability in capturing context-level semantics of both vision and language modalities, rendering us to aggregate accurate visual clues implied by the description to locate the interested object instance. Experiments show that our method outperforms state-of-the-art proposal-free approaches by a considerable margin on five benchmarks while maintaining fast inference speed.
△ Less
Submitted 13 March, 2022; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Equivalent formulations of the oxygen depletion problem, other implicit free boundary value problems, and implications for numerical approximation
Authors:
Xinyu Cheng,
Zhaohui Fu,
Brian Wetton
Abstract:
The Oxygen Depletion problem is an implicit free boundary value problem. The dynamics allow topological changes in the free boundary. We show several mathematical formulations of this model from the literature and give a new formulation based on a gradient flow with constraint. All formulations are shown to be equivalent. We explore the possibilities for the numerical approximation of the problem…
▽ More
The Oxygen Depletion problem is an implicit free boundary value problem. The dynamics allow topological changes in the free boundary. We show several mathematical formulations of this model from the literature and give a new formulation based on a gradient flow with constraint. All formulations are shown to be equivalent. We explore the possibilities for the numerical approximation of the problem that arise from the different formulations. We show a convergence result for an approximation based on the gradient flow with constraint formulation that applies to the general dynamics including topological changes. More general (vector, higher order) implicit free boundary value problems are discussed. Several open problems are described.
△ Less
Submitted 20 May, 2022; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Search for Solar Flare Neutrinos with the KamLAND detector
Authors:
S. Abe,
S. Asami,
A. Gando,
Y. Gando,
T. Gima,
A. Goto,
T. Hachiya,
K. Hata,
S. Hayashida,
K. Hosokawa,
K. Ichimura,
S. Ieki,
H. Ikeda,
K. Inoue,
K. Ishidoshiro,
Y. Kamei,
N. Kawada,
Y. Kishimoto,
T. Kinoshita,
M. Koga,
N. Maemura,
T. Mitsui,
H. Miyake,
K. Nakamura,
K. Nakamura
, et al. (44 additional authors not shown)
Abstract:
We report the result of a search for neutrinos in coincidence with solar flares from the GOES flare database. The search was performed on a 10.8 kton-year exposure of KamLAND collected from 2002 to 2019. This large exposure allows us to explore previously unconstrained parameter space for solar flare neutrinos. We found no statistical excess of neutrinos and established 90% confidence level upper…
▽ More
We report the result of a search for neutrinos in coincidence with solar flares from the GOES flare database. The search was performed on a 10.8 kton-year exposure of KamLAND collected from 2002 to 2019. This large exposure allows us to explore previously unconstrained parameter space for solar flare neutrinos. We found no statistical excess of neutrinos and established 90% confidence level upper limits of $8.4 \times 10^7$ cm$^{-2}$ ($3.0 \times 10^{9}$ cm$^{-2}$) on electron anti-neutrino (electron neutrino) fluence at 20 MeV normalized to the X12 flare, assuming that the neutrino fluence is proportional to the X-ray intensity.
△ Less
Submitted 26 October, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Fusing multimodal neuroimaging data with a variational autoencoder
Authors:
Eloy Geenjaar,
Noah Lewis,
Zening Fu,
Rohan Venkatdas,
Sergey Plis,
Vince Calhoun
Abstract:
Neuroimaging studies often involve the collection of multiple data modalities. These modalities contain both shared and mutually exclusive information about the brain. This work aims at finding a scalable and interpretable method to fuse the information of multiple neuroimaging modalities using a variational autoencoder (VAE). To provide an initial assessment, this work evaluates the representatio…
▽ More
Neuroimaging studies often involve the collection of multiple data modalities. These modalities contain both shared and mutually exclusive information about the brain. This work aims at finding a scalable and interpretable method to fuse the information of multiple neuroimaging modalities using a variational autoencoder (VAE). To provide an initial assessment, this work evaluates the representations that are learned using a schizophrenia classification task. A support vector machine trained on the representations achieves an area under the curve for the classifier's receiver operating characteristic (ROC-AUC) of 0.8610.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Auto-weighted low-rank representation for clustering
Authors:
Zhiqiang Fu,
Yao Zhao,
Dongxia Chang,
Xingxing Zhang,
Yiming Wang
Abstract:
In this paper, a novel unsupervised low-rank representation model, i.e., Auto-weighted Low-Rank Representation (ALRR), is proposed to construct a more favorable similarity graph (SG) for clustering. In particular, ALRR enhances the discriminability of SG by capturing the multi-subspace structure and extracting the salient features simultaneously. Specifically, an auto-weighted penalty is introduce…
▽ More
In this paper, a novel unsupervised low-rank representation model, i.e., Auto-weighted Low-Rank Representation (ALRR), is proposed to construct a more favorable similarity graph (SG) for clustering. In particular, ALRR enhances the discriminability of SG by capturing the multi-subspace structure and extracting the salient features simultaneously. Specifically, an auto-weighted penalty is introduced to learn a similarity graph by highlighting the effective features, and meanwhile, overshadowing the disturbed features. Consequently, ALRR obtains a similarity graph that can preserve the intrinsic geometrical structures within the data by enforcing a smaller similarity on two dissimilar samples. Moreover, we employ a block-diagonal regularizer to guarantee the learned graph contains $k$ diagonal blocks. This can facilitate a more discriminative representation learning for clustering tasks. Extensive experimental results on synthetic and real databases demonstrate the superiority of ALRR over other state-of-the-art methods with a margin of 1.8\%$\sim$10.8\%.
△ Less
Submitted 25 April, 2021;
originally announced April 2021.
-
Efficient Non-Sampling Knowledge Graph Embedding
Authors:
Zelong Li,
Jianchao Ji,
Zuohui Fu,
Yingqiang Ge,
Shuyuan Xu,
Chong Chen,
Yongfeng Zhang
Abstract:
Knowledge Graph (KG) is a flexible structure that is able to describe the complex relationship between data entities. Currently, most KG embedding models are trained based on negative sampling, i.e., the model aims to maximize some similarity of the connected entities in the KG, while minimizing the similarity of the sampled disconnected entities. Negative sampling helps to reduce the time complex…
▽ More
Knowledge Graph (KG) is a flexible structure that is able to describe the complex relationship between data entities. Currently, most KG embedding models are trained based on negative sampling, i.e., the model aims to maximize some similarity of the connected entities in the KG, while minimizing the similarity of the sampled disconnected entities. Negative sampling helps to reduce the time complexity of model learning by only considering a subset of negative instances, which may fail to deliver stable model performance due to the uncertainty in the sampling procedure. To avoid such deficiency, we propose a new framework for KG embedding -- Efficient Non-Sampling Knowledge Graph Embedding (NS-KGE). The basic idea is to consider all of the negative instances in the KG for model learning, and thus to avoid negative sampling. The framework can be applied to square-loss based knowledge graph embedding models or models whose loss can be converted to a square loss. A natural side-effect of this non-sampling strategy is the increased computational complexity of model learning. To solve the problem, we leverage mathematical derivations to reduce the complexity of non-sampling loss function, which eventually provides us both better efficiency and better accuracy in KG embedding compared with existing models. Experiments on benchmark datasets show that our NS-KGE framework can achieve a better performance on efficiency and accuracy over traditional negative sampling based models, and that the framework is applicable to a large class of knowledge graph embedding models.
△ Less
Submitted 16 June, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results
Authors:
Ren Yang,
Radu Timofte,
**g Liu,
Yi Xu,
Xinjian Zhang,
Minyi Zhao,
Shuigeng Zhou,
Kelvin C. K. Chan,
Shangchen Zhou,
Xiangyu Xu,
Chen Change Loy,
Xin Li,
Fanglong Liu,
He Zheng,
Lielin Jiang,
Qi Zhang,
Dongliang He,
Fu Li,
Qingqing Dang,
Yibin Huang,
Matteo Maggioni,
Zhongqian Fu,
Shuai Xiao,
Cheng li,
Thomas Tanay
, et al. (47 additional authors not shown)
Abstract:
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at…
▽ More
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh
△ Less
Submitted 31 August, 2022; v1 submitted 21 April, 2021;
originally announced April 2021.
-
User-oriented Fairness in Recommendation
Authors:
Yunqi Li,
Hanxiong Chen,
Zuohui Fu,
Yingqiang Ge,
Yongfeng Zhang
Abstract:
As a highly data-driven application, recommender systems could be affected by data bias, resulting in unfair results for different data groups, which could be a reason that affects the system performance. Therefore, it is important to identify and solve the unfairness issues in recommendation scenarios. In this paper, we address the unfairness problem in recommender systems from the user perspecti…
▽ More
As a highly data-driven application, recommender systems could be affected by data bias, resulting in unfair results for different data groups, which could be a reason that affects the system performance. Therefore, it is important to identify and solve the unfairness issues in recommendation scenarios. In this paper, we address the unfairness problem in recommender systems from the user perspective. We group users into advantaged and disadvantaged groups according to their level of activity, and conduct experiments to show that current recommender systems will behave unfairly between two groups of users. Specifically, the advantaged users (active) who only account for a small proportion in data enjoy much higher recommendation quality than those disadvantaged users (inactive). Such bias can also affect the overall performance since the disadvantaged users are the majority. To solve this problem, we provide a re-ranking approach to mitigate this unfairness problem by adding constraints over evaluation metrics. The experiments we conducted on several real-world datasets with various recommendation algorithms show that our approach can not only improve group fairness of users in recommender systems, but also achieve better overall recommendation performance.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport
Authors:
Yang Yang,
Zhao-Yang Fu,
De-Chuan Zhan,
Zhi-Bin Liu,
Yuan Jiang
Abstract:
Complex objects are usually with multiple labels, and can be represented by multiple modal representations, e.g., the complex articles contain text and image information as well as multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, e.g., the article constitutes with variable number of inconsis…
▽ More
Complex objects are usually with multiple labels, and can be represented by multiple modal representations, e.g., the complex articles contain text and image information as well as multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, e.g., the article constitutes with variable number of inconsistent text and image instances. Therefore, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. However, M3 learning is facing two main challenges: 1) how to effectively utilize label correlation; 2) how to take advantage of multi-modal learning to process unlabeled instances. To solve these problems, we first propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which considers M3 learning in an end-to-end multi-modal deep network and utilizes consistency principle among different modal bag-level predictions. Based on the M3DN, we learn the latent ground label metric with the optimal transport. Moreover, we introduce the extrinsic unlabeled multi-modal multi-instance data, and propose the M3DNS, which considers the instance-level auto-encoder for single modality and modified bag-level optimal transport to strengthen the consistency among modalities. Thereby M3DNS can better predict label and exploit label correlation simultaneously. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed methods.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Context-Aware Interaction Network for Question Matching
Authors:
Zhe Hu,
Zuohui Fu,
Yu Yin,
Gerard de Melo
Abstract:
Impressive milestones have been achieved in text matching by adopting a cross-attention mechanism to capture pertinent semantic connections between two sentence representations. However, regular cross-attention focuses on word-level links between the two input sequences, neglecting the importance of contextual information. We propose a context-aware interaction network (COIN) to properly align two…
▽ More
Impressive milestones have been achieved in text matching by adopting a cross-attention mechanism to capture pertinent semantic connections between two sentence representations. However, regular cross-attention focuses on word-level links between the two input sequences, neglecting the importance of contextual information. We propose a context-aware interaction network (COIN) to properly align two sequences and infer their semantic relationship. Specifically, each interaction block includes (1) a context-aware cross-attention mechanism to effectively integrate contextual information when aligning two sequences, and (2) a gate fusion layer to flexibly interpolate aligned representations. We apply multiple stacked interaction blocks to produce alignments at different levels and gradually refine the attention results. Experiments on two question matching datasets and detailed analyses demonstrate the effectiveness of our model.
△ Less
Submitted 18 September, 2021; v1 submitted 17 April, 2021;
originally announced April 2021.
-
Faithfully Explainable Recommendation via Neural Logic Reasoning
Authors:
Yaxin Zhu,
Yikun Xian,
Zuohui Fu,
Gerard de Melo,
Yongfeng Zhang
Abstract:
Knowledge graphs (KG) have become increasingly important to endow modern recommender systems with the ability to generate traceable reasoning paths to explain the recommendation process. However, prior research rarely considers the faithfulness of the derived explanations to justify the decision making process. To the best of our knowledge, this is the first work that models and evaluates faithful…
▽ More
Knowledge graphs (KG) have become increasingly important to endow modern recommender systems with the ability to generate traceable reasoning paths to explain the recommendation process. However, prior research rarely considers the faithfulness of the derived explanations to justify the decision making process. To the best of our knowledge, this is the first work that models and evaluates faithfully explainable recommendation under the framework of KG reasoning. Specifically, we propose neural logic reasoning for explainable recommendation (LOGER) by drawing on interpretable logical rules to guide the path reasoning process for explanation generation. We experiment on three large-scale datasets in the e-commerce domain, demonstrating the effectiveness of our method in delivering high-quality recommendations as well as ascertaining the faithfulness of the derived explanation.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Residual Gaussian Process: A Tractable Nonparametric Bayesian Emulator for Multi-fidelity Simulations
Authors:
Wei W. Xing,
Akeel A. Shah,
Peng Wang,
Shandian Zhe Qian Fu,
Robert. M. Kirby
Abstract:
Challenges in multi-fidelity modeling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. T…
▽ More
Challenges in multi-fidelity modeling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. The resulting model is equipped with a closed-form solution for the predictive posterior, making it applicable to advanced, high-dimensional tasks that require uncertainty estimation. Its advantages are demonstrated on univariate benchmarks and on three challenging multivariate problems. It is shown how active learning can be used to enhance the model, especially with a limited computational budget. Furthermore, error bounds are derived for the mean prediction in the univariate case.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
STMTrack: Template-free Visual Tracking with Space-time Memory Networks
Authors:
Zhihong Fu,
Qingjie Liu,
Zehua Fu,
Yunhong Wang
Abstract:
Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes. Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to…
▽ More
Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes. Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance, hindering them from real-time tracking and practical applications. In this paper, we propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target for better adapting to appearance variations during tracking. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame. Furthermore, the pixel-level similarity computation of the memory network enables our tracker to generate much more accurate bounding boxes of the target. Extensive experiments and comparisons with many competitive trackers on challenging large-scale benchmarks, OTB-2015, TrackingNet, GOT-10k, LaSOT, UAV123, and VOT2018, show that, without bells and whistles, our tracker outperforms all previous state-of-the-art real-time methods while running at 37 FPS. The code is available at https://github.com/fzh0917/STMTrack.
△ Less
Submitted 2 April, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Contrastive Embedding for Generalized Zero-Shot Learning
Authors:
Zongyan Han,
Zhenyong Fu,
Shuo Chen,
Jian Yang
Abstract:
Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. Recent feature generation methods learn a generative model that can synthesize the missing visual features of unseen classes to mitigate the data-imbalance problem in GZSL. However, the original visual feature space is suboptimal for GZSL…
▽ More
Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. Recent feature generation methods learn a generative model that can synthesize the missing visual features of unseen classes to mitigate the data-imbalance problem in GZSL. However, the original visual feature space is suboptimal for GZSL classification since it lacks discriminative information. To tackle this issue, we propose to integrate the generation model with the embedding model, yielding a hybrid GZSL framework. The hybrid GZSL approach maps both the real and the synthetic samples produced by the generation model into an embedding space, where we perform the final GZSL classification. Specifically, we propose a contrastive embedding (CE) for our hybrid GZSL framework. The proposed contrastive embedding can leverage not only the class-wise supervision but also the instance-wise supervision, where the latter is usually neglected by existing GZSL researches. We evaluate our proposed hybrid GZSL framework with contrastive embedding, named CE-GZSL, on five benchmark datasets. The results show that our CEGZSL method can outperform the state-of-the-arts by a significant margin on three datasets. Our codes are available on https://github.com/Hanzy1996/CE-GZSL.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Efficient Multi-Stage Video Denoising with Recurrent Spatio-Temporal Fusion
Authors:
Matteo Maggioni,
Yibin Huang,
Cheng Li,
Shuai Xiao,
Zhongqian Fu,
Fenglong Song
Abstract:
In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of a…
▽ More
In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of all past frames in the video. Then, a denoising stage removes the noise in the fused frame. Finally, a refinement stage restores the missing high frequency in the denoised frame. All stages operate on a transform-domain representation obtained by learnable and invertible linear operators which simultaneously increase accuracy and decrease complexity of the model. A single loss on the final output is sufficient for successful convergence, hence making EMVD easy to train. Experiments on real raw data demonstrate that EMVD outperforms the state of the art when complexity is constrained, and even remains competitive against methods whose complexities are several orders of magnitude higher. Further, the low complexity and memory requirements of EMVD enable real-time video denoising on commercial SoC in mobile devices.
△ Less
Submitted 30 March, 2023; v1 submitted 9 March, 2021;
originally announced March 2021.
-
The Observation of Ferroelastic and Ferrielectric Domains in AgNbO3 Single Crystal
Authors:
Wei Zhao,
Zhengqian Fu,
Jianming Deng,
Song Li,
Yifeng Han,
Man-Rong Li,
Xueyun Wang,
Jiawang Hong
Abstract:
Compared to AgNbO3 based ceramics, the experimental investigations on the single crystalline AgNbO3, especially the ground state and ferroic domain structures, are not on the same level. Here in this work, based on successfully synthesized AgNbO3 single crystal using flux method, we observed the coexistence of ferroelastic and ferrielectric domain structures by a combination study of polarized lig…
▽ More
Compared to AgNbO3 based ceramics, the experimental investigations on the single crystalline AgNbO3, especially the ground state and ferroic domain structures, are not on the same level. Here in this work, based on successfully synthesized AgNbO3 single crystal using flux method, we observed the coexistence of ferroelastic and ferrielectric domain structures by a combination study of polarized light microscopy and piezoresponse force microscope, this finding may provide a new aspect for studying AgNbO3. The result also suggests a weak electromechanical response from the ferrielectric phase of AgNbO3 which is also supported by the transmission electron microscope characterization. Our results reveal that the AgNbO3 single crystal is in a polar ferrielectric phase at room temperature, clarifying its ground state which is controversial from the AgNbO3 ceramic materials.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.