Search | arXiv e-print repository

The Cambridge Law Corpus: A Dataset for Legal AI Research

Authors: Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

Abstract: We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases,… ▽ More We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases, done by legal experts. Using our annotated data, we have trained and evaluated case outcome extraction with GPT-3, GPT-4 and RoBERTa models to provide benchmarks. We include an extensive legal and ethical discussion to address the potentially sensitive nature of this material. As a consequence, the corpus will only be released for research purposes under certain restrictions. △ Less

Submitted 1 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Journal ref: Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2023

arXiv:2308.12925 [pdf, other]

doi 10.1109/MLSP55844.2023.10285979

Low-count Time Series Anomaly Detection

Authors: Philipp Renz, Kurt Cutajar, Niall Twomey, Gavin K. C. Cheung, Hanting Xie

Abstract: Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative o… ▽ More Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 6 pages, 7 figures, to be published in IEEE 2023 Workshop on Machine Learning for Signal Processing (MLSP)

Journal ref: 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:2212.06332 [pdf, other]

Evaluating Airline Service Quality Through the Comprehensive Text-mining and TOPSIS-VIKOR-AISM Analysis

Authors: Haotian Xie, Yi Li, Yang Pu, Chen Zhang, Junlin Huang

Abstract: Service quality rankings are pivotal for maintaining sustainability in the fiercely competitive airline industry. However, prior research in this domain has often fallen short in aspects of sample size, efficiency, and dependability. This study introduces refined insights into this area and establishes a comprehensive, yet highly elucidative, ranking framework. Initially, we employ Latent Semantic… ▽ More Service quality rankings are pivotal for maintaining sustainability in the fiercely competitive airline industry. However, prior research in this domain has often fallen short in aspects of sample size, efficiency, and dependability. This study introduces refined insights into this area and establishes a comprehensive, yet highly elucidative, ranking framework. Initially, we employ Latent Semantic Analysis (LSA) to distill principal themes and sentiments from online reviews of 80 airlines. Subsequently, we utilize the SentiWordNet lexicon and the TextBlob package for conducting sentiment analysis based on these reviews. Following this, we construct a hierarchical structure using the computation of compromise solutions, employing an integrated Technique for Order Preference by Similarity to Ideal Solution, vis-à-vis Kriterijumska Optimizacija I Kompromisno Resenje-Adversarial Interpretive Structural Model (TOPSIS-VIKOR-AISM) methodology. Beyond aiding consumer decision-making and fostering airline growth, this study contributes novel viewpoints on evaluating the efficacy of airlines and other sectors. △ Less

Submitted 11 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

arXiv:2209.04117 [pdf, other]

clusterBMA: Bayesian model averaging for clustering

Authors: Owen Forbes, Edgar Santos-Fernandez, Paul Pao-Yen Wu, Hong-Bo Xie, Paul E. Schwenn, Jim Lagopoulos, Lia Mills, Dashiell D. Sacks, Daniel F. Hermens, Kerrie Mengersen

Abstract: Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and… ▽ More Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a consensus matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name. △ Less

Submitted 25 March, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

arXiv:2206.08776 [pdf, other]

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

Authors: Xuchuang Wang, Hong Xie, John C. S. Lui

Abstract: We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a ''per-load'' reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the "per-load" reward multiplying either the number… ▽ More We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a ''per-load'' reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the "per-load" reward multiplying either the number of plays pulling the arm, or its reward capacity when the number of plays exceeds the capacity limit. When the "per-load" reward follows a Gaussian distribution, we prove a sample complexity lower bound of learning the capacity from load-dependent rewards and also a regret lower bound of this new MP-MAB problem. We devise a capacity estimator whose sample complexity upper bound matches the lower bound in terms of reward means and capacities. We also propose an online learning algorithm to address the problem and prove its regret upper bound. This regret upper bound's first term is the same as regret lower bound's, and its second and third terms also evidently correspond to lower bound's. Extensive experiments validate our algorithm's performance and also its gain in 5G & 4G base station selection. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: to appear in ICML 2022

arXiv:2206.02536 [pdf, other]

The impact of spatio-temporal travel distance on epidemics using an interpretable attention-based sequence-to-sequence model

Authors: Yukang Jiang, Ting Tian, Huajun Xie, Hailiang Guo, Xueqin Wang

Abstract: Amidst the COVID-19 pandemic, travel restrictions have emerged as crucial interventions for mitigating the spread of the virus. In this study, we enhance the predictive capabilities of our model, Sequence-to-Sequence Epidemic Attention Network (S2SEA-Net), by incorporating an attention module, allowing us to assess the impact of distinct classes of travel distances on epidemic dynamics. Furthermor… ▽ More Amidst the COVID-19 pandemic, travel restrictions have emerged as crucial interventions for mitigating the spread of the virus. In this study, we enhance the predictive capabilities of our model, Sequence-to-Sequence Epidemic Attention Network (S2SEA-Net), by incorporating an attention module, allowing us to assess the impact of distinct classes of travel distances on epidemic dynamics. Furthermore, our model provides forecasts for new confirmed cases and deaths. To achieve this, we leverage daily data on population movement across various travel distance categories, coupled with county-level epidemic data in the United States. Our findings illuminate a compelling relationship between the volume of travelers at different distance ranges and the trajectories of COVID-19. Notably, a discernible spatial pattern emerges with respect to these travel distance categories on a national scale. We unveil the geographical variations in the influence of population movement at different travel distances on the dynamics of epidemic spread. This will contribute to the formulation of strategies for future epidemic prevention and public health policies. △ Less

Submitted 12 November, 2023; v1 submitted 26 May, 2022; originally announced June 2022.

Comments: 18 pages, 7 figures

arXiv:2206.01802 [pdf, other]

Do-Operation Guided Causal Representation Learning with Reduced Supervision Strength

Authors: Jiageng Zhu, Hanchen Xie, Wael AbdAlmageed

Abstract: Causal representation learning has been proposed to encode relationships between factors presented in the high dimensional data. However, existing methods suffer from merely using a large amount of labeled data and ignore the fact that samples generated by the same causal mechanism follow the same causal relationships. In this paper, we seek to explore such information by leveraging do-operation t… ▽ More Causal representation learning has been proposed to encode relationships between factors presented in the high dimensional data. However, existing methods suffer from merely using a large amount of labeled data and ignore the fact that samples generated by the same causal mechanism follow the same causal relationships. In this paper, we seek to explore such information by leveraging do-operation to reduce supervision strength. We propose a framework that implements do-operation by swap** latent cause and effect factors encoded from a pair of inputs. Moreover, we also identify the inadequacy of existing causal representation metrics empirically and theoretically and introduce new metrics for better evaluation. Experiments conducted on both synthetic and real datasets demonstrate the superiorities of our method compared with state-of-the-art methods. △ Less

Submitted 7 November, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022 Workshop CML4Impact Workshop Camera Ready

arXiv:2106.13423 [pdf, other]

Federated Graph Classification over Non-IID Graphs

Authors: Han Xie, **g Ma, Li Xiong, Carl Yang

Abstract: Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collab… ▽ More Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time war** (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks. △ Less

Submitted 7 November, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2011.14301

Automated Prostate Cancer Diagnosis Based on Gleason Grading Using Convolutional Neural Network

Authors: Haotian Xie, Yong Zhang, Jun Wang, **g**g Zhang, Yifan Ma, Zhaogang Yang

Abstract: The Gleason grading system using histological images is the most powerful diagnostic and prognostic predictor of prostate cancer. The current standard inspection is evaluating Gleason H&E-stained histopathology images by pathologists. However, it is complicated, time-consuming, and subject to observers. Deep learning (DL) based-methods that automatically learn image features and achieve higher gen… ▽ More The Gleason grading system using histological images is the most powerful diagnostic and prognostic predictor of prostate cancer. The current standard inspection is evaluating Gleason H&E-stained histopathology images by pathologists. However, it is complicated, time-consuming, and subject to observers. Deep learning (DL) based-methods that automatically learn image features and achieve higher generalization ability have attracted significant attention. However, challenges remain especially using DL to train the whole slide image (WSI), a predominant clinical source in the current diagnostic setting, containing billions of pixels, morphological heterogeneity, and artifacts. Hence, we proposed a convolutional neural network (CNN)-based automatic classification method for accurate grading of PCa using whole slide histopathology images. In this paper, a data augmentation method named Patch-Based Image Reconstruction (PBIR) was proposed to reduce the high resolution and increase the diversity of WSIs. In addition, a distribution correction (DC) module was developed to enhance the adaption of pretrained model to the target dataset by adjusting the data distribution. Besides, a Quadratic Weighted Mean Square Error (QWMSE) function was presented to reduce the misdiagnosis caused by equal Euclidean distances. Our experiments indicated the combination of PBIR, DC, and QWMSE function was necessary for achieving superior expert-level performance, leading to the best results (0.8885 quadratic-weighted kappa coefficient). △ Less

Submitted 29 November, 2020; originally announced November 2020.

Comments: This article has been removed by arXiv administrators because the submitter did not have the authority to grant the license applied at the time of submission

arXiv:2005.05802 [pdf, other]

doi 10.1103/PhysRevLett.125.203603

Bayesian optimal control of GHZ states in Rydberg lattices

Authors: Rick Mukherjee, Harry Xie, Florian Mintert

Abstract: The ability to prepare non-classical states in a robust manner is essential for quantum sensors beyond the standard quantum limit. We demonstrate that Bayesian optimal control is capable of finding control pulses that drive trapped Rydberg atoms into highly entangled GHZ states. The control sequences have a physically intuitive functionality based on the quasi-integrability of the Ising dynamics.… ▽ More The ability to prepare non-classical states in a robust manner is essential for quantum sensors beyond the standard quantum limit. We demonstrate that Bayesian optimal control is capable of finding control pulses that drive trapped Rydberg atoms into highly entangled GHZ states. The control sequences have a physically intuitive functionality based on the quasi-integrability of the Ising dynamics. They can be constructed in laboratory experiments resulting in preparation times that scale very favourably with the system size. △ Less

Submitted 21 August, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

Comments: 4+2 pages, 5+2 figures

Journal ref: Phys. Rev. Lett. 125, 203603 (2020)

arXiv:2004.05914 [pdf, other]

Blind Adversarial Training: Balance Accuracy and Robustness

Authors: Haidong Xie, Xueshuang Xiang, Nai** Liu, Bin Dong

Abstract: Adversarial training (AT) aims to improve the robustness of deep learning models by mixing clean data and adversarial examples (AEs). Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT requires a prescribed uniform budget to constrain the magnitude of the AE perturbations during training, with the obtained results showing high sensitivity to the b… ▽ More Adversarial training (AT) aims to improve the robustness of deep learning models by mixing clean data and adversarial examples (AEs). Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT requires a prescribed uniform budget to constrain the magnitude of the AE perturbations during training, with the obtained results showing high sensitivity to the budget. On the other hand, unrestricted AT uses unconstrained AEs, resulting in the use of AEs located beyond the decision boundary; these overestimated AEs significantly lower the accuracy on clean data. These limitations mean that the existing AT approaches have difficulty in obtaining a comprehensively robust model with high accuracy and robustness when confronting attacks with varying strengths. Considering this problem, this paper proposes a novel AT approach named blind adversarial training (BAT) to better balance the accuracy and robustness. The main idea of this approach is to use a cutoff-scale strategy to adaptively estimate a nonuniform budget to modify the AEs used in the training, ensuring that the strengths of the AEs are dynamically located in a reasonable range and ultimately improving the overall robustness of the AT model. The experimental results obtained using BAT for training classification models on several benchmarks demonstrate the competitive performance of this method. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2004.05913 [pdf, other]

doi 10.1109/ICME51207.2021.9428149

Blind Adversarial Pruning: Balance Accuracy, Efficiency and Robustness

Authors: Haidong Xie, Lixin Qian, Xueshuang Xiang, Nai** Liu

Abstract: With the growth of interest in the attack and defense of deep neural networks, researchers are focusing more on the robustness of applying them to devices with limited memory. Thus, unlike adversarial training, which only considers the balance between accuracy and robustness, we come to a more meaningful and critical issue, i.e., the balance among accuracy, efficiency and robustness (AER). Recentl… ▽ More With the growth of interest in the attack and defense of deep neural networks, researchers are focusing more on the robustness of applying them to devices with limited memory. Thus, unlike adversarial training, which only considers the balance between accuracy and robustness, we come to a more meaningful and critical issue, i.e., the balance among accuracy, efficiency and robustness (AER). Recently, some related works focused on this issue, but with different observations, and the relations among AER remain unclear. This paper first investigates the robustness of pruned models with different compression ratios under the gradual pruning process and concludes that the robustness of the pruned model drastically varies with different pruning processes, especially in response to attacks with large strength. Second, we test the performance of mixing the clean data and adversarial examples (generated with a prescribed uniform budget) into the gradual pruning process, called adversarial pruning, and find the following: the pruned model's robustness exhibits high sensitivity to the budget. Furthermore, to better balance the AER, we propose an approach called blind adversarial pruning (BAP), which introduces the idea of blind adversarial training into the gradual pruning process. The main idea is to use a cutoff-scale strategy to adaptively estimate a nonuniform budget to modify the AEs used during pruning, thus ensuring that the strengths of AEs are dynamically located within a reasonable range at each pruning step and ultimately improving the overall AER of the pruned model. The experimental results obtained using BAP for pruning classification models based on several benchmarks demonstrate the competitive performance of this method: the robustness of the model pruned by BAP is more stable among varying pruning processes, and BAP exhibits better overall AER than adversarial pruning. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2001.05699 [pdf, other]

Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision

Authors: Li Ye, Yishi Lin, Hong Xie, John C. S. Lui

Abstract: A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions during testing hurt users' experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone… ▽ More A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions during testing hurt users' experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone to make decisions. However, these decisions are not adaptive to the new incoming data, and so a wrong decision will continuously hurt users' experiences. To overcome the aforementioned limitations, we propose a framework to unify offline causal inference algorithms (e.g., weighting, matching) and online learning algorithms (e.g., UCB, LinUCB). We propose novel algorithms and derive bounds on the decision accuracy via the notion of "regret". We derive the first upper regret bound for forest-based online bandit algorithms. Experiments on two real datasets show that our algorithms outperform other algorithms that use only logged data or online feedbacks, or algorithms that do not use the data properly. △ Less

Submitted 7 November, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: 27 pages, 35 figures, 4 tables

arXiv:2001.03520 [pdf, other]

doi 10.1088/1367-2630/ab8677

Preparation of ordered states in ultra-cold gases using Bayesian optimization

Authors: Rick Mukherjee, Frederic Sauvage, Harry Xie, Robert Löw, Florian Mintert

Abstract: Ultra-cold atomic gases are unique in terms of the degree of controllability, both for internal and external degrees of freedom. This makes it possible to use them for the study of complex quantum many-body phenomena. However in many scenarios, the prerequisite condition of faithfully preparing a desired quantum state despite decoherence and system imperfections is not always adequately met. To pa… ▽ More Ultra-cold atomic gases are unique in terms of the degree of controllability, both for internal and external degrees of freedom. This makes it possible to use them for the study of complex quantum many-body phenomena. However in many scenarios, the prerequisite condition of faithfully preparing a desired quantum state despite decoherence and system imperfections is not always adequately met. To path the way to a specific target state, we explore quantum optimal control framework based on Bayesian optimization. The probabilistic modeling and broad exploration aspects of Bayesian optimization is particularly suitable for quantum experiments where data acquisition can be expensive. Using numerical simulations for the superfluid to Mott-insulator transition for bosons in a lattice as well for the formation of Rydberg crystals as explicit examples, we demonstrate that Bayesian optimization is capable of finding better control solutions with regards to finite and noisy data compared to existing methods of optimal control. △ Less

Submitted 12 May, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: 29 pages, 10 figures

Journal ref: New Journal of Physics 22, 075001 (2020)

arXiv:1912.04278 [pdf, other]

doi 10.1109/ACCESS.2020.3033795

Deep Efficient End-to-end Reconstruction (DEER) Network for Few-view Breast CT Image Reconstruction

Authors: Huidong Xie, Hongming Shan, Wenxiang Cong, Chi Liu, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Ge Wang

Abstract: Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of small calcification (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose, few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-… ▽ More Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of small calcification (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose, few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-end Reconstruction (DEER) network for few-view breast CT image reconstruction. The major merits of our network include high dose efficiency, excellent image quality, and low model complexity. By the design, the proposed network can learn the reconstruction process with as few as O(N) parameters, where N is the side length of an image to be reconstructed, which represents orders of magnitude improvements relative to the state-of-the-art deep-learning-based reconstruction methods that map raw data to tomographic images directly. Also, validated on a cone-beam breast CT dataset prepared by Koning Corporation on a commercial scanner, our method demonstrates a competitive performance over the state-of-the-art reconstruction networks in terms of image quality. The source code of this paper is available at: https://github.com/HuidongXie/DEER. △ Less

Submitted 3 November, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

arXiv:1906.01219 [pdf, other]

Conversational Contextual Bandit: Algorithm and Application

Authors: Xiaoying Zhang, Hong Xie, Hang Li, John C. S. Lui

Abstract: Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need… ▽ More Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need to provide feedbacks on a lot of uninterested items. To accelerate the learning speed, we generalize contextual bandit to conversational contextual bandit. Conversational contextual bandit leverages not only behavioral feedbacks on arms (e.g., articles in news recommendation), but also occasional conversational feedbacks on key-terms from the user. Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation. We then design the Conversational UCB algorithm (ConUCB) to address two challenges in conversational contextual bandit: (1) which key-terms to select to conduct conversation, (2) how to leverage conversational feedbacks to accelerate the speed of bandit learning. We theoretically prove that ConUCB can achieve a smaller regret upper bound than the traditional contextual bandit algorithm LinUCB, which implies a faster learning speed. Experiments on synthetic data, as well as real datasets from Yelp and Toutiao, demonstrate the efficacy of the ConUCB algorithm. △ Less

Submitted 26 January, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: 11 pages; Accepted by WWW2020

arXiv:1905.11381 [pdf, other]

Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks

Authors: Jirong Yi, Hui Xie, Leixin Zhou, Xiaodong Wu, Weiyu Xu, Raghuraman Mudumbai

Abstract: Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this… ▽ More Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. Drawing on ideas from information and coding theory, we propose a general class of defenses for detecting classifier errors caused by abnormally small input perturbations. We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense methods. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system. △ Less

Submitted 25 May, 2019; originally announced May 2019.

Comments: 44 Pages, 2 Theorems, 35 Figures, 29 Tables. arXiv admin note: substantial text overlap with arXiv:1901.09413

arXiv:1905.01926 [pdf]

Zero-Shot Audio Classification Based on Class Label Embeddings

Authors: Huang Xie, Tuomas Virtanen

Abstract: This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a… ▽ More This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes. △ Less

Submitted 7 August, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

arXiv:1904.08936 [pdf, other]

Language Modeling through Long Term Memory Network

Authors: Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie

Abstract: Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not cap… ▽ More Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not capable of handling long sequences (50 or more data points long sequence patterns). Language modelling requiring learning from longer sequences are affected by the need for more information in memory. This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting. LTM is designed to scale data in the memory and gives a higher weight to the input in the sequence. LTM avoid overfitting by scaling the cell state after achieving the optimal results. The LTM is tested on Penn treebank dataset, and Text8 dataset and LTM achieves test perplexities of 83 and 82 respectively. 650 LTM cells achieved a test perplexity of 67 for Penn treebank, and 600 cells achieved a test perplexity of 77 for Text8. LTM achieves state of the art results by only using ten hidden LTM cells for both datasets. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: The paper is accepted to be published in IJCNN 2019

arXiv:1902.09255 [pdf, other]

doi 10.1145/3308558.3313621

Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

Authors: Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang

Abstract: Real-time traffic volume inference is key to an intelligent city. It is a challenging task because accurate traffic volumes on the roads can only be measured at certain locations where sensors are installed. Moreover, the traffic evolves over time due to the influences of weather, events, holidays, etc. Existing solutions to the traffic volume inference problem often rely on dense GPS trajectories… ▽ More Real-time traffic volume inference is key to an intelligent city. It is a challenging task because accurate traffic volumes on the roads can only be measured at certain locations where sensors are installed. Moreover, the traffic evolves over time due to the influences of weather, events, holidays, etc. Existing solutions to the traffic volume inference problem often rely on dense GPS trajectories, which inevitably fail to account for the vehicles which carry no GPS devices or have them turned off. Consequently, the results are biased to taxicabs because they are almost always online for GPS tracking. In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems. Our approach employs a high-fidelity traffic simulator and deep reinforcement learning to recover full vehicle movements from the incomplete trajectories. In order to jointly model the recovered trajectories and dense GPS trajectories, we construct spatiotemporal graphs and use multi-view graph embedding to encode the multi-hop correlations between road segments into real-valued vectors. Finally, we infer the citywide traffic volumes by propagating the traffic values of monitored road segments to the unmonitored ones through masked pairwise similarities. Extensive experiments with two big regions in a provincial capital city in China verify the effectiveness of our approach. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: Accepted by The Web Conference (WWW) 2019

arXiv:1809.02596 [pdf, ps, other]

VOS: a Method for Variational Oversampling of Imbalanced Data

Authors: Val Andrei Fajardo, David Findlay, Roshanak Houmanfar, Charu Jaiswal, Jiaxi Liang, Honglei Xie

Abstract: Class imbalanced datasets are common in real-world applications that range from credit card fraud detection to rare disease diagnostics. Several popular classification algorithms assume that classes are approximately balanced, and hence build the accompanying objective function to maximize an overall accuracy rate. In these situations, optimizing the overall accuracy will lead to highly skewed pre… ▽ More Class imbalanced datasets are common in real-world applications that range from credit card fraud detection to rare disease diagnostics. Several popular classification algorithms assume that classes are approximately balanced, and hence build the accompanying objective function to maximize an overall accuracy rate. In these situations, optimizing the overall accuracy will lead to highly skewed predictions towards the majority class. Moreover, the negative business impact resulting from false positives (positive samples incorrectly classified as negative) can be detrimental. Many methods have been proposed to address the class imbalance problem, including methods such as over-sampling, under-sampling and cost-sensitive methods. In this paper, we consider the over-sampling method, where the aim is to augment the original dataset with synthetically created observations of the minority classes. In particular, inspired by the recent advances in generative modelling techniques (e.g., Variational Inference and Generative Adversarial Networks), we introduce a new oversampling technique based on variational autoencoders. Our experiments show that the new method is superior in augmenting datasets for downstream classification tasks when compared to traditional oversampling methods. △ Less

Submitted 7 September, 2018; originally announced September 2018.

arXiv:1712.06391 [pdf, other]

On the Effectiveness of Least Squares Generative Adversarial Networks

Authors: Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, Stephen Paul Smolley

Abstract: Unsupervised learning with generative adversarial networks (GANs) has proven to be hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generat… ▽ More Unsupervised learning with generative adversarial networks (GANs) has proven to be hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss for both the discriminator and the generator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson $χ^2$ divergence. We also show that the derived objective function that yields minimizing the Pearson $χ^2$ divergence performs better than the classical one of using least squares for classification. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stably during the learning process. For evaluating the image quality, we conduct both qualitative and quantitative experiments, and the experimental results show that LSGANs can generate higher quality images than regular GANs. Furthermore, we evaluate the stability of LSGANs in two groups. One is to compare between LSGANs and regular GANs without gradient penalty. We conduct three experiments, including Gaussian mixture distribution, difficult architectures, and a newly proposed method --- datasets with small variability, to illustrate the stability of LSGANs. The other one is to compare between LSGANs with gradient penalty (LSGANs-GP) and WGANs with gradient penalty (WGANs-GP). The experimental results show that LSGANs-GP succeed in training for all the difficult architectures used in WGANs-GP, including 101-layer ResNet. △ Less

Submitted 21 September, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

arXiv:1701.04831 [pdf, other]

doi 10.1103/PhysRevB.97.085104

Equivalence of restricted Boltzmann machines and tensor network states

Authors: **g Chen, Song Cheng, Haidong Xie, Lei Wang, Tao Xiang

Abstract: The restricted Boltzmann machine (RBM) is one of the fundamental building blocks of deep learning. RBM finds wide applications in dimensional reduction, feature extraction, and recommender systems via modeling the probability distributions of a variety of input data including natural images, speech signals, and customer ratings, etc. We build a bridge between RBM and tensor network states (TNS) wi… ▽ More The restricted Boltzmann machine (RBM) is one of the fundamental building blocks of deep learning. RBM finds wide applications in dimensional reduction, feature extraction, and recommender systems via modeling the probability distributions of a variety of input data including natural images, speech signals, and customer ratings, etc. We build a bridge between RBM and tensor network states (TNS) widely used in quantum many-body physics research. We devise efficient algorithms to translate an RBM into the commonly used TNS. Conversely, we give sufficient and necessary conditions to determine whether a TNS can be transformed into an RBM of given architectures. Revealing these general and constructive connections can cross-fertilize both deep learning and quantum many-body physics. Notably, by exploiting the entanglement entropy bound of TNS, we can rigorously quantify the expressive power of RBM on complex data sets. Insights into TNS and its entanglement capacity can guide the design of more powerful deep learning architectures. On the other hand, RBM can represent quantum many-body states with fewer parameters compared to TNS, which may allow more efficient classical simulations. △ Less

Submitted 5 February, 2018; v1 submitted 17 January, 2017; originally announced January 2017.

Comments: 18 pages, 12 figures + 2 appendices; Code implementations at https://github.com/yzcj105/rbm2mps

Journal ref: Phys. Rev. B 97, 085104 (2018)

arXiv:1608.07019 [pdf, ps, other]

doi 10.1016/j.compbiolchem.2016.09.010

Comparison among dimensionality reduction techniques based on Random Projection for cancer classification

Authors: Haozhe Xie, Jie Li, Qiaosheng Zhang, Yadong Wang

Abstract: Random Projection (RP) technique has been widely applied in many scenarios because it can reduce high-dimensional features into low-dimensional space within short time and meet the need of real-time analysis of massive data. There is an urgent need of dimensionality reduction with fast increase of big genomics data. However, the performance of RP is usually lower. We attempt to improve classificat… ▽ More Random Projection (RP) technique has been widely applied in many scenarios because it can reduce high-dimensional features into low-dimensional space within short time and meet the need of real-time analysis of massive data. There is an urgent need of dimensionality reduction with fast increase of big genomics data. However, the performance of RP is usually lower. We attempt to improve classification accuracy of RP through combining other reduction dimension methods such as Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Feature Selection (FS). We compared classification accuracy and running time of different combination methods on three microarray datasets and a simulation dataset. Experimental results show a remarkable improvement of 14.77% in classification accuracy of FS followed by RP compared to RP on BC-TCGA dataset. LDA followed by RP also helps RP to yield a more discriminative subspace with an increase of 13.65% on classification accuracy on the same dataset. FS followed by RP outperforms other combination methods in classification accuracy on most of the datasets. △ Less

Submitted 17 June, 2017; v1 submitted 25 August, 2016; originally announced August 2016.

Journal ref: Computational biology and chemistry, 65: 165-172, 2016

Showing 1–24 of 24 results for author: Xie, H