-
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
Authors:
Wenyue Hua,
Jiang Guo,
Mingwen Dong,
Henghui Zhu,
Patrick Ng,
Zhiguo Wang
Abstract:
Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers s…
▽ More
Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
A Benchmark Study on Calibration
Authors:
Linwei Tao,
Younan Zhu,
Haolan Guo,
Min**g Dong,
Chang Xu
Abstract:
Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through the use of specific loss functions, data preprocessing and training frameworks. Yet, investigations into calibration properties…
▽ More
Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through the use of specific loss functions, data preprocessing and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different datasets? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS. The project page can be found at https://www.taolinwei.com/calibration-study
△ Less
Submitted 22 March, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Measurement of Investment activity in China based on Natural language processing technology
Authors:
Xiaobin Tang,
Tong Shen,
Manru Dong
Abstract:
The purpose of this study is to propose a new index to measure and reflect China's investment activity in time, and to analyze the changes of China's investment activity in the past five years. This study first uses the NEZHA model for semantic representation, and expand the indicator system based on semantic similarity. Then we calculate China's investment activity index by using the network sear…
▽ More
The purpose of this study is to propose a new index to measure and reflect China's investment activity in time, and to analyze the changes of China's investment activity in the past five years. This study first uses the NEZHA model for semantic representation, and expand the indicator system based on semantic similarity. Then we calculate China's investment activity index by using the network search data. This study shows that China's investment activity began to decline in 2019, rebounded for a period of time after the outbreak of COVID-19 in 2020, and then continued to maintain a downward trend. Private investment activity has declined significantly, while government investment activity has increased. Among the provinces in Chinese Mainland, the investment activity of economically developed provinces has decreased significantly, while the investment activity of some economically less developed provinces in the north and south is higher. After the outbreak of COVID-19, the investment period became shorter. Our research will provide timely investment information for the government, decision makers and managers, as well as provide other researchers who also pay attention to investment with a perspective other than investment in fixed asset.
△ Less
Submitted 5 April, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Multiple Imputation Methods for Missing Multilevel Ordinal Outcomes
Authors:
Mei Dong,
Aya Mitani
Abstract:
Multiple imputation (MI) is an established technique to handle missing data in observational studies. Joint modeling (JM) and fully conditional specification (FCS) are commonly used methods for imputing multilevel clustered data. However, MI approaches for ordinal clustered outcome variables have not been well studied, especially when there is informative cluster size (ICS). The purpose of this st…
▽ More
Multiple imputation (MI) is an established technique to handle missing data in observational studies. Joint modeling (JM) and fully conditional specification (FCS) are commonly used methods for imputing multilevel clustered data. However, MI approaches for ordinal clustered outcome variables have not been well studied, especially when there is informative cluster size (ICS). The purpose of this study is to describe different imputation and analysis strategies for the multilevel ordinal outcome when ICS exists. We conducted comprehensive Monte Carlo simulation studies to compare five different methods: complete case analysis (CCA), FCS, FCS+CS (include cluster size (CS) when performing the imputation), JM, and JM+CS under different scenarios. We evaluated their performances using an proportional odds logistic regression model estimated with cluster weighted generalized estimating equations (CWGEE). The simulation results show that including cluster size in imputation can significantly improve imputation accuracy when ICS exists. FCS provides more accurate and robust estimation than JM, followed by CCA for multilevel ordinal outcomes. We further applied those methods to a real dental study.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation
Authors:
Manqing Dong,
Feng Yuan,
Lina Yao,
Xiwei Xu,
Liming Zhu
Abstract:
A common challenge for most current recommender systems is the cold-start problem. Due to the lack of user-item interactions, the fine-tuned recommender systems are unable to handle situations with new users or new items. Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e. predicting the user preference by only a few of past interacted items. The core…
▽ More
A common challenge for most current recommender systems is the cold-start problem. Due to the lack of user-item interactions, the fine-tuned recommender systems are unable to handle situations with new users or new items. Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e. predicting the user preference by only a few of past interacted items. The core idea is learning a global sharing initialization parameter for all users and then learning the local parameters for each user separately. However, most meta-learning based recommendation approaches adopt model-agnostic meta-learning for parameter initialization, where the global sharing parameter may lead the model into local optima for some users. In this paper, we design two memory matrices that can store task-specific memories and feature-specific memories. Specifically, the feature-specific memories are used to guide the model with personalized parameter initialization, while the task-specific memories are used to guide the model fast predicting the user preference. And we adopt a meta-optimization approach for optimizing the proposed method. We test the model on two widely used recommendation datasets and consider four cold-start situations. The experimental results show the effectiveness of the proposed methods.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Towards Certified Robustness of Distance Metric Learning
Authors:
Xiaochen Yang,
Yiwen Guo,
Mingzhi Dong,
**g-Hao Xue
Abstract:
Metric learning aims to learn a distance metric such that semantically similar instances are pulled together while dissimilar instances are pushed away. Many existing methods consider maximizing or at least constraining a distance margin in the feature space that separates similar and dissimilar pairs of instances to guarantee their generalization ability. In this paper, we advocate imposing an ad…
▽ More
Metric learning aims to learn a distance metric such that semantically similar instances are pulled together while dissimilar instances are pushed away. Many existing methods consider maximizing or at least constraining a distance margin in the feature space that separates similar and dissimilar pairs of instances to guarantee their generalization ability. In this paper, we advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms. We first show that, the adversarial margin, defined as the distance between training instances and their closest adversarial examples in the input space, takes account of both the distance margin in the feature space and the correlation between the metric and triplet constraints. Next, to enhance robustness to instance perturbation, we propose to enlarge the adversarial margin through minimizing a derived novel loss function termed the perturbation loss. The proposed loss can be viewed as a data-dependent regularizer and easily plugged into any existing metric learning methods. Finally, we show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness. Experimental results on 16 datasets demonstrate the superiority of the proposed method over existing state-of-the-art methods in both discrimination accuracy and robustness against possible noise.
△ Less
Submitted 16 August, 2022; v1 submitted 10 June, 2020;
originally announced June 2020.
-
DCMD: Distance-based Classification Using Mixture Distributions on Microbiome Data
Authors:
Konstantin Shestopaloff,
Mei Dong,
Fan Gao,
Wei Xu
Abstract:
Current advances in next generation sequencing techniques have allowed researchers to conduct comprehensive research on microbiome and human diseases, with recent studies identifying associations between human microbiome and health outcomes for a number of chronic conditions. However, microbiome data structure, characterized by sparsity and skewness, presents challenges to building effective class…
▽ More
Current advances in next generation sequencing techniques have allowed researchers to conduct comprehensive research on microbiome and human diseases, with recent studies identifying associations between human microbiome and health outcomes for a number of chronic conditions. However, microbiome data structure, characterized by sparsity and skewness, presents challenges to building effective classifiers. To address this, we present an innovative approach for distance-based classification using mixture distributions (DCMD). The method aims to improve classification performance when using microbiome community data, where the predictors are composed of sparse and heterogeneous count data. This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data, and representing each observation as a distribution, conditional on observed counts and the estimated mixture, which are then used as inputs for distance-based classification. The method is implemented into a k-means and k-nearest neighbours framework and we identify two distance metrics that produce optimal results. The performance of the model is assessed using simulations and applied to a human microbiome study, with results compared against a number of existing machine learning and distance-based approaches. The proposed method is competitive when compared to the machine learning approaches and showed a clear improvement over commonly used distance-based classifiers. The range of applicability and robustness make the proposed method a viable alternative for classification using sparse microbiome count data.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
A Hierarchy of Graph Neural Networks Based on Learnable Local Features
Authors:
Michael Lingzhi Li,
Meng Dong,
Jiawei Zhou,
Alexander M. Rush
Abstract:
Graph neural networks (GNNs) are a powerful tool to learn representations on graphs by iteratively aggregating features from node neighbourhoods. Many variant models have been proposed, but there is limited understanding on both how to compare different architectures and how to construct GNNs systematically. Here, we propose a hierarchy of GNNs based on their aggregation regions. We derive theoret…
▽ More
Graph neural networks (GNNs) are a powerful tool to learn representations on graphs by iteratively aggregating features from node neighbourhoods. Many variant models have been proposed, but there is limited understanding on both how to compare different architectures and how to construct GNNs systematically. Here, we propose a hierarchy of GNNs based on their aggregation regions. We derive theoretical results about the discriminative power and feature representation capabilities of each class. Then, we show how this framework can be utilized to systematically construct arbitrarily powerful GNNs. As an example, we construct a simple architecture that exceeds the expressiveness of the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theory on both synthetic and real-world benchmarks, and demonstrate our example's theoretical power translates to strong results on node classification, graph classification, and graph regression tasks.
△ Less
Submitted 12 November, 2019;
originally announced November 2019.
-
Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning
Authors:
Xiang Zhang,
Xiaocong Chen,
Lina Yao,
Chang Ge,
Manqing Dong
Abstract:
Deep learning algorithms have achieved excellent performance lately in a wide range of fields (e.g., computer version). However, a severe challenge faced by deep learning is the high dependency on hyper-parameters. The algorithm results may fluctuate dramatically under the different configuration of hyper-parameters. Addressing the above issue, this paper presents an efficient Orthogonal Array Tun…
▽ More
Deep learning algorithms have achieved excellent performance lately in a wide range of fields (e.g., computer version). However, a severe challenge faced by deep learning is the high dependency on hyper-parameters. The algorithm results may fluctuate dramatically under the different configuration of hyper-parameters. Addressing the above issue, this paper presents an efficient Orthogonal Array Tuning Method (OATM) for deep learning hyper-parameter tuning. We describe the OATM approach in five detailed steps and elaborate on it using two widely used deep neural network structures (Recurrent Neural Networks and Convolutional Neural Networks). The proposed method is compared to the state-of-the-art hyper-parameter tuning methods including manually (e.g., grid search and random search) and automatically (e.g., Bayesian Optimization) ones. The experiment results state that OATM can significantly save the tuning time compared to the state-of-the-art methods while preserving the satisfying performance. The codes are open in GitHub (https://github.com/xiangzhang1015/OATM)
△ Less
Submitted 28 February, 2020; v1 submitted 31 July, 2019;
originally announced July 2019.
-
Combining Unsupervised and Supervised Learning for Asset Class Failure Prediction in Power Systems
Authors:
Ming Dong
Abstract:
In power systems, an asset class is a group of power equipment that has the same function and shares similar electrical or mechanical characteristics. Predicting failures for different asset classes is critical for electric utilities towards develo** cost-effective asset management strategies. Previously, physical age based Weibull distribution has been widely used to failure prediction. However…
▽ More
In power systems, an asset class is a group of power equipment that has the same function and shares similar electrical or mechanical characteristics. Predicting failures for different asset classes is critical for electric utilities towards develo** cost-effective asset management strategies. Previously, physical age based Weibull distribution has been widely used to failure prediction. However, this mathematical model cannot incorporate asset condition data such as inspection or testing results. As a result, the prediction cannot be very specific and accurate for individual assets. To solve this important problem, this paper proposes a novel and comprehensive data-driven approach based on asset condition data: K-means clustering as an unsupervised learning method is used to analyze the inner structure of historical asset condition data and produce the asset conditional ages; logistic regression as a supervised learning method takes in both asset physical ages and conditional ages to classify and predict asset statuses. Furthermore, an index called average aging rate is defined to quantify, track and estimate the relationship between asset physical age and conditional age. This approach was applied to an urban distribution system in West Canada to predict medium-voltage cable failures. Case studies and comparison with standard Weibull distribution are provided. The proposed approach demonstrates superior performance and practicality for predicting asset class failures in power systems.
△ Less
Submitted 1 July, 2020; v1 submitted 5 January, 2019;
originally announced January 2019.
-
A Hybrid Distribution Feeder Long-Term Load Forecasting Method Based on Sequence Prediction
Authors:
Ming Dong,
L. S. Grumbach
Abstract:
Distribution feeder long-term load forecast (LTLF) is a critical task many electric utility companies perform on an annual basis. The goal of this task is to forecast the annual load of distribution feeders. The previous top-down and bottom-up LTLF methods are unable to incorporate different levels of information. This paper proposes a hybrid modeling method using sequence prediction for this clas…
▽ More
Distribution feeder long-term load forecast (LTLF) is a critical task many electric utility companies perform on an annual basis. The goal of this task is to forecast the annual load of distribution feeders. The previous top-down and bottom-up LTLF methods are unable to incorporate different levels of information. This paper proposes a hybrid modeling method using sequence prediction for this classic and important task. The proposed method can seamlessly integrate top-down, bottom-up and sequential information hidden in multi-year data. Two advanced sequence prediction models Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks are investigated in this paper. They successfully solve the vanishing and exploding gradient problems a standard recurrent neural network has. This paper firstly explains the theories of LSTM and GRU networks and then discusses the steps of feature selection, feature engineering and model implementation in detail. In the end, a real-world application example for a large urban grid in West Canada is provided. LSTM and GRU networks under different sequential configurations and traditional models including bottom-up, ARIMA and feed-forward neural network are all implemented and compared in detail. The proposed method demonstrates superior performance and great practicality.
△ Less
Submitted 1 July, 2020; v1 submitted 9 December, 2018;
originally announced December 2018.
-
HAR-Net:Fusing Deep Representation and Hand-crafted Features for Human Activity Recognition
Authors:
Mingtao Dong,
**dong Han
Abstract:
Wearable computing and context awareness are the focuses of study in the field of artificial intelligence recently. One of the most appealing as well as challenging applications is the Human Activity Recognition (HAR) utilizing smart phones. Conventional HAR based on Support Vector Machine relies on subjective manually extracted features. This approach is time and energy consuming as well as immat…
▽ More
Wearable computing and context awareness are the focuses of study in the field of artificial intelligence recently. One of the most appealing as well as challenging applications is the Human Activity Recognition (HAR) utilizing smart phones. Conventional HAR based on Support Vector Machine relies on subjective manually extracted features. This approach is time and energy consuming as well as immature in prediction due to the partial view toward which features to be extracted by human. With the rise of deep learning, artificial intelligence has been making progress toward being a mature technology. This paper proposes a new approach based on deep learning and traditional feature engineering called HAR-Net to address the issue related to HAR. The study used the data collected by gyroscopes and acceleration sensors in android smart phones. The raw sensor data was put into the HAR-Net proposed. The HAR-Net fusing the hand-crafted features and high-level features extracted from convolutional network to make prediction. The performance of the proposed method was proved to be 0.9% higher than the original MC-SVM approach. The experimental results on the UCI dataset demonstrate that fusing the two kinds of features can make up for the shortage of traditional feature engineering and deep learning techniques.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.
-
Dynamic Ensemble Active Learning: A Non-Stationary Bandit with Expert Advice
Authors:
Kunkun Pang,
Mingzhi Dong,
Yang Wu,
Timothy M. Hospedales
Abstract:
Active learning aims to reduce annotation cost by predicting which samples are useful for a human teacher to label. However it has become clear there is no best active learning algorithm. Inspired by various philosophies about what constitutes a good criteria, different algorithms perform well on different datasets. This has motivated research into ensembles of active learners that learn what cons…
▽ More
Active learning aims to reduce annotation cost by predicting which samples are useful for a human teacher to label. However it has become clear there is no best active learning algorithm. Inspired by various philosophies about what constitutes a good criteria, different algorithms perform well on different datasets. This has motivated research into ensembles of active learners that learn what constitutes a good criteria in a given scenario, typically via multi-armed bandit algorithms. Though algorithm ensembles can lead to better results, they overlook the fact that not only does algorithm efficacy vary across datasets, but also during a single active learning session. That is, the best criteria is non-stationary. This breaks existing algorithms' guarantees and hampers their performance in practice. In this paper, we propose dynamic ensemble active learning as a more general and promising research direction. We develop a dynamic ensemble active learner based on a non-stationary multi-armed bandit with expert advice algorithm. Our dynamic ensemble selects the right criteria at each step of active learning. It has theoretical guarantees, and shows encouraging results on $13$ popular datasets.
△ Less
Submitted 29 September, 2018;
originally announced October 2018.
-
GrCAN: Gradient Boost Convolutional Autoencoder with Neural Decision Forest
Authors:
Manqing Dong,
Lina Yao,
Xianzhi Wang,
Boualem Benatallah,
Shuai Zhang
Abstract:
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, w…
▽ More
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods.
△ Less
Submitted 24 June, 2018; v1 submitted 21 June, 2018;
originally announced June 2018.
-
Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning
Authors:
Kunkun Pang,
Mingzhi Dong,
Yang Wu,
Timothy Hospedales
Abstract:
Active learning (AL) aims to enable training high performance classifiers with low annotation cost by predicting which subset of unlabelled instances would be most beneficial to label. The importance of AL has motivated extensive research, proposing a wide variety of manually designed AL algorithms with diverse theoretical and intuitive motivations. In contrast to this body of research, we propose…
▽ More
Active learning (AL) aims to enable training high performance classifiers with low annotation cost by predicting which subset of unlabelled instances would be most beneficial to label. The importance of AL has motivated extensive research, proposing a wide variety of manually designed AL algorithms with diverse theoretical and intuitive motivations. In contrast to this body of research, we propose to treat active learning algorithm design as a meta-learning problem and learn the best criterion from data. We model an active learning algorithm as a deep neural network that inputs the base learner state and the unlabelled point set and predicts the best point to annotate next. Training this active query policy network with reinforcement learning, produces the best non-myopic policy for a given dataset. The key challenge in achieving a general solution to AL then becomes that of learner generalisation, particularly across heterogeneous datasets. We propose a multi-task dataset-embedding approach that allows dataset-agnostic active learners to be trained. Our evaluation shows that AL algorithms trained in this way can directly generalise across diverse problems.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Machine learning applied to single-shot x-ray diagnostics in an XFEL
Authors:
A. Sanchez-Gonzalez,
P. Micaelli,
C. Olivier,
T. R. Barillot,
M. Ilchen,
A. A. Lutman,
A. Marinelli,
T. Maxwell,
A. Achner,
M. AgÄker,
N. Berrah,
C. Bostedt,
J. Buck,
P. H. Bucksbaum,
S. Carron Montero,
B. Cooper,
J. P. Cryan,
M. Dong,
R. Feifel,
L. J. Frasinski,
H. Fukuzawa,
A. Galler,
G. Hartmann,
N. Hartmann,
W. Helml
, et al. (17 additional authors not shown)
Abstract:
X-ray free-electron lasers (XFELs) are the only sources currently able to produce bright few-fs pulses with tunable photon energies from 100 eV to more than 10 keV. Due to the stochastic SASE operating principles and other technical issues the output pulses are subject to large fluctuations, making it necessary to characterize the x-ray pulses on every shot for data sorting purposes. We present a…
▽ More
X-ray free-electron lasers (XFELs) are the only sources currently able to produce bright few-fs pulses with tunable photon energies from 100 eV to more than 10 keV. Due to the stochastic SASE operating principles and other technical issues the output pulses are subject to large fluctuations, making it necessary to characterize the x-ray pulses on every shot for data sorting purposes. We present a technique that applies machine learning tools to predict x-ray pulse properties using simple electron beam and x-ray parameters as input. Using this technique at the Linac Coherent Light Source (LCLS), we report mean errors below 0.3 eV for the prediction of the photon energy at 530 eV and below 1.6 fs for the prediction of the delay between two x-ray pulses. We also demonstrate spectral shape prediction with a mean agreement of 97%. This approach could potentially be used at the next generation of high-repetition-rate XFELs to provide accurate knowledge of complex x-ray pulses at the full repetition rate.
△ Less
Submitted 11 October, 2016;
originally announced October 2016.