-
Gated Cross-Attention Network for Depth Completion
Authors:
Xiaogang Jia,
Songlei Jian,
Yusong Tan,
Yonggang Che,
Wei Chen,
Zhengfa Liang
Abstract:
Depth completion is a popular research direction in the field of depth estimation. The fusion of color and depth features is the current critical challenge in this task, mainly due to the asymmetry between the rich scene details in color images and the sparse pixels in depth maps. To tackle this issue, we design an efficient Gated Cross-Attention Network that propagates confidence via a gating mec…
▽ More
Depth completion is a popular research direction in the field of depth estimation. The fusion of color and depth features is the current critical challenge in this task, mainly due to the asymmetry between the rich scene details in color images and the sparse pixels in depth maps. To tackle this issue, we design an efficient Gated Cross-Attention Network that propagates confidence via a gating mechanism, simultaneously extracting and refining key information in both color and depth branches to achieve local spatial feature fusion. Additionally, we employ an attention network based on the Transformer in low-dimensional space to effectively fuse global features and increase the network's receptive field. With a simple yet efficient gating mechanism, our proposed method achieves fast and accurate depth completion without the need for additional branches or post-processing steps. At the same time, we use the Ray Tune mechanism with the AsyncHyperBandScheduler scheduler and the HyperOptSearch algorithm to automatically search for the optimal number of module iterations, which also allows us to achieve performance comparable to state-of-the-art methods. We conduct experiments on both indoor and outdoor scene datasets. Our fast network achieves Pareto-optimal solutions in terms of time and accuracy, and at the time of submission, our accurate network ranks first among all published papers on the KITTI official website in terms of accuracy.
△ Less
Submitted 21 January, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
RoSAS: Deep Semi-Supervised Anomaly Detection with Contamination-Resilient Continuous Supervision
Authors:
Hongzuo Xu,
Yijie Wang,
Guansong Pang,
Songlei Jian,
Ning Liu,
Yongjun Wang
Abstract:
Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (su…
▽ More
Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises \textit{contamination-resilient continuous supervisory signals}. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies. The source code is available at https://github.com/xuhongzuo/rosas/.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly Detection with Scale Learning
Authors:
Hongzuo Xu,
Yijie Wang,
Juhui Wei,
Songlei Jian,
Yizhou Li,
Ning Liu
Abstract:
Due to the unsupervised nature of anomaly detection, the key to fueling deep models is finding supervisory signals. Different from current reconstruction-guided generative models and transformation-based contrastive models, we devise novel data-driven supervision for tabular data by introducing a characteristic -- scale -- as data labels. By representing varied sub-vectors of data instances, we de…
▽ More
Due to the unsupervised nature of anomaly detection, the key to fueling deep models is finding supervisory signals. Different from current reconstruction-guided generative models and transformation-based contrastive models, we devise novel data-driven supervision for tabular data by introducing a characteristic -- scale -- as data labels. By representing varied sub-vectors of data instances, we define scale as the relationship between the dimensionality of original sub-vectors and that of representations. Scales serve as labels attached to transformed representations, thus offering ample labeled data for neural network training. This paper further proposes a scale learning-based anomaly detection method. Supervised by the learning objective of scale distribution alignment, our approach learns the ranking of representations converted from varied subspaces of each data instance. Through this proxy task, our approach models inherent regularities and patterns within data, which well describes data "normality". Abnormal degrees of testing instances are obtained by measuring whether they fit these learned patterns. Extensive experiments show that our approach leads to significant improvement over state-of-the-art generative/contrastive anomaly detection methods.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection
Authors:
Hongzuo Xu,
Yijie Wang,
Songlei Jian,
Qing Liao,
Yongjun Wang,
Guansong Pang
Abstract:
Time series anomaly detection is instrumental in maintaining system availability in various domains. Current work in this research line mainly focuses on learning data normality deeply and comprehensively by devising advanced neural network structures and new reconstruction/prediction learning objectives. However, their one-class learning process can be misled by latent anomalies in training data…
▽ More
Time series anomaly detection is instrumental in maintaining system availability in various domains. Current work in this research line mainly focuses on learning data normality deeply and comprehensively by devising advanced neural network structures and new reconstruction/prediction learning objectives. However, their one-class learning process can be misled by latent anomalies in training data (i.e., anomaly contamination) under the unsupervised paradigm. Their learning process also lacks knowledge about the anomalies. Consequently, they often learn a biased, inaccurate normality boundary. To tackle these problems, this paper proposes calibrated one-class classification for anomaly detection, realizing contamination-tolerant, anomaly-informed learning of data normality via uncertainty modeling-based calibration and native anomaly-based calibration. Specifically, our approach adaptively penalizes uncertain predictions to restrain irregular samples in anomaly contamination during optimization, while simultaneously encouraging confident predictions on regular samples to ensure effective normality learning. This largely alleviates the negative impact of anomaly contamination. Our approach also creates native anomaly examples via perturbation to simulate time series abnormal behaviors. Through discriminating these dummy anomalies, our one-class learning is further calibrated to form a more precise normality boundary. Extensive experiments on ten real-world datasets show that our model achieves substantial improvement over sixteen state-of-the-art contenders.
△ Less
Submitted 24 April, 2024; v1 submitted 25 July, 2022;
originally announced July 2022.
-
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments
Authors:
Jiafei Duan,
Samson Yu Bai Jian,
Cheston Tan
Abstract:
Recent advancements in deep learning, computer vision, and embodied AI have given rise to synthetic causal reasoning video datasets. These datasets facilitate the development of AI algorithms that can reason about physical interactions between objects. However, datasets thus far have primarily focused on elementary physical events such as rolling or falling. There is currently a scarcity of datase…
▽ More
Recent advancements in deep learning, computer vision, and embodied AI have given rise to synthetic causal reasoning video datasets. These datasets facilitate the development of AI algorithms that can reason about physical interactions between objects. However, datasets thus far have primarily focused on elementary physical events such as rolling or falling. There is currently a scarcity of datasets that focus on the physical interactions that humans perform daily with objects in the real world. To address this scarcity, we introduce SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments. The SPACE simulator allows us to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact. These events make up the vast majority of the basic physical interactions between objects. We then further evaluate it with a state-of-the-art physics-based deep model and show that the SPACE dataset improves the learning of intuitive physics with an approach inspired by curriculum learning. Repository: https://github.com/jiafei1224/SPACE
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Aspect Sentiment Triplet Extraction Using Reinforcement Learning
Authors:
Samson Yu Bai Jian,
Tapas Nayak,
Navonil Majumder,
Soujanya Poria
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we present a novel paradigm, ASTE-RL, by regarding the aspect and opinion terms as arguments of the expressed sentiment in a hierarchical reinforcement learning (RL) framework. We first focus on sentiments expressed in a sentence, then identify the target aspect and opinion terms for that sentiment. This takes into account the mutual interactions among the triplet's components while improving exploration and sample efficiency. Furthermore, this hierarchical RLsetup enables us to deal with multiple and overlap** triplets. In our experiments, we evaluate our model on existing datasets from laptop and restaurant domains and show that it achieves state-of-the-art performance. The implementation of this work is publicly available at https://github.com/declare-lab/ASTE-RL.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
DRAM Failure Prediction in AIOps: Empirical Evaluation, Challenges and Opportunities
Authors:
Zhiyue Wu,
Hongzuo Xu,
Guansong Pang,
Fengyuan Yu,
Yijie Wang,
Songlei Jian,
Yongjun Wang
Abstract:
DRAM failure prediction is a vital task in AIOps, which is crucial to maintain the reliability and sustainable service of large-scale data centers. However, limited work has been done on DRAM failure prediction mainly due to the lack of public available datasets. This paper presents a comprehensive empirical evaluation of diverse machine learning techniques for DRAM failure prediction using a larg…
▽ More
DRAM failure prediction is a vital task in AIOps, which is crucial to maintain the reliability and sustainable service of large-scale data centers. However, limited work has been done on DRAM failure prediction mainly due to the lack of public available datasets. This paper presents a comprehensive empirical evaluation of diverse machine learning techniques for DRAM failure prediction using a large-scale multi-source dataset, including more than three millions of records of kernel, address, and mcelog data, provided by Alibaba Cloud through PAKDD 2021 competition. Particularly, we first formulate the problem as a multi-class classification task and exhaustively evaluate seven popular/state-of-the-art classifiers on both the individual and multiple data sources. We then formulate the problem as an unsupervised anomaly detection task and evaluate three state-of-the-art anomaly detectors. Further, based on the empirical results and our experience of attending this competition, we discuss major challenges and present future research opportunities in this task.
△ Less
Submitted 3 May, 2021; v1 submitted 30 April, 2021;
originally announced April 2021.
-
Hierarchical Adaptive Pooling by Capturing High-order Dependency for Graph Representation Learning
Authors:
Ning Liu,
Songlei Jian,
Dongsheng Li,
Yiming Zhang,
Zhiquan Lai,
Hongzuo Xu
Abstract:
Graph neural networks (GNN) have been proven to be mature enough for handling graph-structured data on node-level graph representation learning tasks. However, the graph pooling technique for learning expressive graph-level representation is critical yet still challenging. Existing pooling methods either struggle to capture the local substructure or fail to effectively utilize high-order dependenc…
▽ More
Graph neural networks (GNN) have been proven to be mature enough for handling graph-structured data on node-level graph representation learning tasks. However, the graph pooling technique for learning expressive graph-level representation is critical yet still challenging. Existing pooling methods either struggle to capture the local substructure or fail to effectively utilize high-order dependency, thus diminishing the expression capability. In this paper we propose HAP, a hierarchical graph-level representation learning framework, which is adaptively sensitive to graph structures, i.e., HAP clusters local substructures incorporating with high-order dependencies. HAP utilizes a novel cross-level attention mechanism MOA to naturally focus more on close neighborhood while effectively capture higher-order dependency that may contain crucial information. It also learns a global graph content GCont that extracts the graph pattern properties to make the pre- and post-coarsening graph content maintain stable, thus providing global guidance in graph coarsening. This novel innovation also facilitates generalization across graphs with the same form of features. Extensive experiments on fourteen datasets show that HAP significantly outperforms twelve popular graph pooling methods on graph classification task with an maximum accuracy improvement of 22.79%, and exceeds the performance of state-of-the-art graph matching and graph similarity learning algorithms by over 3.5% and 16.7%.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Recognizing Emotion Cause in Conversations
Authors:
Soujanya Poria,
Navonil Majumder,
Devamanyu Hazarika,
Deepanway Ghosal,
Rishabh Bhardwaj,
Samson Yu Bai Jian,
Pengfei Hong,
Romila Ghosh,
Abhinaba Roy,
Niyati Chhaya,
Alexander Gelbukh,
Rada Mihalcea
Abstract:
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON.
Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NL…
▽ More
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON.
Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NLP. Advances in this area hold the potential to improve interpretability and performance in affect-based models. Identifying emotion causes at the utterance level in conversations is particularly challenging due to the intermingling dynamics among the interlocutors.
Method: We introduce the task of Recognizing Emotion Cause in CONversations with an accompanying dataset named RECCON, containing over 1,000 dialogues and 10,000 utterance cause-effect pairs. Furthermore, we define different cause types based on the source of the causes, and establish strong Transformer-based baselines to address two different sub-tasks on this dataset: causal span extraction and causal emotion entailment.
Result: Our Transformer-based baselines, which leverage contextual pre-trained embeddings, such as RoBERTa, outperform the state-of-the-art emotion cause extraction approaches
Conclusion: We introduce a new task highly relevant for (explainable) emotion-aware artificial intelligence: recognizing emotion cause in conversations, provide a new highly challenging publicly available dialogue-level dataset for this task, and give strong baseline results on this dataset.
△ Less
Submitted 28 July, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Towards Generalization and Data Efficient Learning of Deep Robotic Gras**
Authors:
Zhixin Chen,
Mengxiang Lin,
Zhixin Jia,
Shibo Jian
Abstract:
Deep reinforcement learning (DRL) has been proven to be a powerful paradigm for learning complex control policy autonomously. Numerous recent applications of DRL in robotic gras** have successfully trained DRL robotic agents end-to-end, map** visual inputs into control instructions directly, but the amount of training data required may hinder these applications in practice. In this paper, we p…
▽ More
Deep reinforcement learning (DRL) has been proven to be a powerful paradigm for learning complex control policy autonomously. Numerous recent applications of DRL in robotic gras** have successfully trained DRL robotic agents end-to-end, map** visual inputs into control instructions directly, but the amount of training data required may hinder these applications in practice. In this paper, we propose a DRL based robotic visual gras** framework, in which visual perception and control policy are trained separately rather than end-to-end. The visual perception produces physical descriptions of grasped objects and the policy takes use of them to decide optimal actions based on DRL. Benefiting from the explicit representation of objects, the policy is expected to be endowed with more generalization power over new objects and environments. In addition, the policy can be trained in simulation and transferred in real robotic system without any further training. We evaluate our framework in a real world robotic system on a number of robotic gras** tasks, such as semantic gras**, clustered object gras**, moving object gras**. The results show impressive robustness and generalization of our system.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Vision-based Robot Manipulation Learning via Human Demonstrations
Authors:
Zhixin Jia,
Mengxiang Lin,
Zhixin Chen,
Shibo Jian
Abstract:
Vision-based learning methods provide promise for robots to learn complex manipulation tasks. However, how to generalize the learned manipulation skills to real-world interactions remains an open question. In this work, we study robotic manipulation skill learning from a single third-person view demonstration by using activity recognition and object detection in computer vision. To facilitate gene…
▽ More
Vision-based learning methods provide promise for robots to learn complex manipulation tasks. However, how to generalize the learned manipulation skills to real-world interactions remains an open question. In this work, we study robotic manipulation skill learning from a single third-person view demonstration by using activity recognition and object detection in computer vision. To facilitate generalization across objects and environments, we propose to use a prior knowledge base in the form of a text corpus to infer the object to be interacted with in the context of a robot. We evaluate our approach in a real-world robot, using several simple and complex manipulation tasks commonly performed in daily life. The experimental results show that our approach achieves good generalization performance even from small amounts of training data.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.
-
Rethinking Class Relations: Absolute-relative Supervised and Unsupervised Few-shot Learning
Authors:
Hongguang Zhang,
Piotr Koniusz,
Songlei Jian,
Hongdong Li,
Philip H. S. Torr
Abstract:
The majority of existing few-shot learning methods describe image relations with binary labels. However, such binary relations are insufficient to teach the network complicated real-world relations, due to the lack of decision smoothness. Furthermore, current few-shot learning models capture only the similarity via relation labels, but they are not exposed to class concepts associated with objects…
▽ More
The majority of existing few-shot learning methods describe image relations with binary labels. However, such binary relations are insufficient to teach the network complicated real-world relations, due to the lack of decision smoothness. Furthermore, current few-shot learning models capture only the similarity via relation labels, but they are not exposed to class concepts associated with objects, which is likely detrimental to the classification performance due to underutilization of the available class labels. To paraphrase, children learn the concept of tiger from a few of actual examples as well as from comparisons of tiger to other animals. Thus, we hypothesize that in fact both similarity and class concept learning must be occurring simultaneously. With these observations at hand, we study the fundamental problem of simplistic class modeling in current few-shot learning methods. We rethink the relations between class concepts, and propose a novel Absolute-relative Learning paradigm to fully take advantage of label information to refine the image representations and correct the relation understanding in both supervised and unsupervised scenarios. Our proposed paradigm improves the performance of several the state-of-the-art models on publicly available datasets.
△ Less
Submitted 9 June, 2021; v1 submitted 12 January, 2020;
originally announced January 2020.
-
STA: Adversarial Attacks on Siamese Trackers
Authors:
Xugang Wu,
** Wang,
Xu Zhou,
Songlei Jian
Abstract:
Recently, the majority of visual trackers adopt Convolutional Neural Network (CNN) as their backbone to achieve high tracking accuracy. However, less attention has been paid to the potential adversarial threats brought by CNN, including Siamese network.
In this paper, we first analyze the existing vulnerabilities in Siamese trackers and propose the requirements for a successful adversarial attac…
▽ More
Recently, the majority of visual trackers adopt Convolutional Neural Network (CNN) as their backbone to achieve high tracking accuracy. However, less attention has been paid to the potential adversarial threats brought by CNN, including Siamese network.
In this paper, we first analyze the existing vulnerabilities in Siamese trackers and propose the requirements for a successful adversarial attack. On this basis, we formulate the adversarial generation problem and propose an end-to-end pipeline to generate a perturbed texture map for the 3D object that causes the trackers to fail. Finally, we conduct thorough experiments to verify the effectiveness of our algorithm. Experiment results show that adversarial examples generated by our algorithm can successfully lower the tracking accuracy of victim trackers and even make them drift off. To the best of our knowledge, this is the first work to generate 3D adversarial examples on visual trackers.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
SFace: An Efficient Network for Face Detection in Large Scale Variations
Authors:
Jianfeng Wang,
Ye Yuan,
Boxun Li,
Gang Yu,
Sun Jian
Abstract:
Face detection serves as a fundamental research topic for many applications like face recognition. Impressive progress has been made especially with the recent development of convolutional neural networks. However, the issue of large scale variations, which widely exists in high resolution images/videos, has not been well addressed in the literature. In this paper, we present a novel algorithm cal…
▽ More
Face detection serves as a fundamental research topic for many applications like face recognition. Impressive progress has been made especially with the recent development of convolutional neural networks. However, the issue of large scale variations, which widely exists in high resolution images/videos, has not been well addressed in the literature. In this paper, we present a novel algorithm called SFace, which efficiently integrates the anchor-based method and anchor-free method to address the scale issues. A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations. The SFace architecture shows promising results on the new 4K-Face benchmarks. In addition, our method can run at 50 frames per second (fps) with an accuracy of 80% AP on the standard WIDER FACE dataset, which outperforms the state-of-art algorithms by almost one order of magnitude in speed while achieves comparative performance.
△ Less
Submitted 23 April, 2018; v1 submitted 18 April, 2018;
originally announced April 2018.