Skip to main content

Showing 1–24 of 24 results for author: Xie, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2309.12269  [pdf, other

    cs.CL cs.CY stat.AP

    The Cambridge Law Corpus: A Dataset for Legal AI Research

    Authors: Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

    Abstract: We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases,… ▽ More

    Submitted 1 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Journal ref: Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2023

  2. Low-count Time Series Anomaly Detection

    Authors: Philipp Renz, Kurt Cutajar, Niall Twomey, Gavin K. C. Cheung, Hanting Xie

    Abstract: Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative o… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 6 pages, 7 figures, to be published in IEEE 2023 Workshop on Machine Learning for Signal Processing (MLSP)

    Journal ref: 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP)

  3. arXiv:2212.06332  [pdf, other

    stat.AP

    Evaluating Airline Service Quality Through the Comprehensive Text-mining and TOPSIS-VIKOR-AISM Analysis

    Authors: Haotian Xie, Yi Li, Yang Pu, Chen Zhang, Junlin Huang

    Abstract: Service quality rankings are pivotal for maintaining sustainability in the fiercely competitive airline industry. However, prior research in this domain has often fallen short in aspects of sample size, efficiency, and dependability. This study introduces refined insights into this area and establishes a comprehensive, yet highly elucidative, ranking framework. Initially, we employ Latent Semantic… ▽ More

    Submitted 11 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

  4. arXiv:2209.04117  [pdf, other

    stat.ME stat.AP stat.ML

    clusterBMA: Bayesian model averaging for clustering

    Authors: Owen Forbes, Edgar Santos-Fernandez, Paul Pao-Yen Wu, Hong-Bo Xie, Paul E. Schwenn, Jim Lagopoulos, Lia Mills, Dashiell D. Sacks, Daniel F. Hermens, Kerrie Mengersen

    Abstract: Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and… ▽ More

    Submitted 25 March, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

  5. arXiv:2206.08776  [pdf, other

    cs.LG stat.ML

    Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

    Authors: Xuchuang Wang, Hong Xie, John C. S. Lui

    Abstract: We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a ''per-load'' reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the "per-load" reward multiplying either the number… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: to appear in ICML 2022

  6. arXiv:2206.02536  [pdf, other

    physics.soc-ph cs.AI q-bio.PE stat.AP

    The impact of spatio-temporal travel distance on epidemics using an interpretable attention-based sequence-to-sequence model

    Authors: Yukang Jiang, Ting Tian, Huajun Xie, Hailiang Guo, Xueqin Wang

    Abstract: Amidst the COVID-19 pandemic, travel restrictions have emerged as crucial interventions for mitigating the spread of the virus. In this study, we enhance the predictive capabilities of our model, Sequence-to-Sequence Epidemic Attention Network (S2SEA-Net), by incorporating an attention module, allowing us to assess the impact of distinct classes of travel distances on epidemic dynamics. Furthermor… ▽ More

    Submitted 12 November, 2023; v1 submitted 26 May, 2022; originally announced June 2022.

    Comments: 18 pages, 7 figures

  7. arXiv:2206.01802  [pdf, other

    cs.LG cs.AI stat.ME

    Do-Operation Guided Causal Representation Learning with Reduced Supervision Strength

    Authors: Jiageng Zhu, Hanchen Xie, Wael AbdAlmageed

    Abstract: Causal representation learning has been proposed to encode relationships between factors presented in the high dimensional data. However, existing methods suffer from merely using a large amount of labeled data and ignore the fact that samples generated by the same causal mechanism follow the same causal relationships. In this paper, we seek to explore such information by leveraging do-operation t… ▽ More

    Submitted 7 November, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022 Workshop CML4Impact Workshop Camera Ready

  8. arXiv:2106.13423  [pdf, other

    cs.LG cs.AI cs.DC stat.ML

    Federated Graph Classification over Non-IID Graphs

    Authors: Han Xie, **g Ma, Li Xiong, Carl Yang

    Abstract: Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collab… ▽ More

    Submitted 7 November, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2021

  9. arXiv:2011.14301   

    eess.IV cs.CV stat.ML

    Automated Prostate Cancer Diagnosis Based on Gleason Grading Using Convolutional Neural Network

    Authors: Haotian Xie, Yong Zhang, Jun Wang, **g**g Zhang, Yifan Ma, Zhaogang Yang

    Abstract: The Gleason grading system using histological images is the most powerful diagnostic and prognostic predictor of prostate cancer. The current standard inspection is evaluating Gleason H&E-stained histopathology images by pathologists. However, it is complicated, time-consuming, and subject to observers. Deep learning (DL) based-methods that automatically learn image features and achieve higher gen… ▽ More

    Submitted 29 November, 2020; originally announced November 2020.

    Comments: This article has been removed by arXiv administrators because the submitter did not have the authority to grant the license applied at the time of submission

  10. arXiv:2005.05802  [pdf, other

    quant-ph physics.atom-ph stat.ML

    Bayesian optimal control of GHZ states in Rydberg lattices

    Authors: Rick Mukherjee, Harry Xie, Florian Mintert

    Abstract: The ability to prepare non-classical states in a robust manner is essential for quantum sensors beyond the standard quantum limit. We demonstrate that Bayesian optimal control is capable of finding control pulses that drive trapped Rydberg atoms into highly entangled GHZ states. The control sequences have a physically intuitive functionality based on the quasi-integrability of the Ising dynamics.… ▽ More

    Submitted 21 August, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: 4+2 pages, 5+2 figures

    Journal ref: Phys. Rev. Lett. 125, 203603 (2020)

  11. arXiv:2004.05914  [pdf, other

    cs.LG stat.ML

    Blind Adversarial Training: Balance Accuracy and Robustness

    Authors: Haidong Xie, Xueshuang Xiang, Nai** Liu, Bin Dong

    Abstract: Adversarial training (AT) aims to improve the robustness of deep learning models by mixing clean data and adversarial examples (AEs). Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT requires a prescribed uniform budget to constrain the magnitude of the AE perturbations during training, with the obtained results showing high sensitivity to the b… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  12. Blind Adversarial Pruning: Balance Accuracy, Efficiency and Robustness

    Authors: Haidong Xie, Lixin Qian, Xueshuang Xiang, Nai** Liu

    Abstract: With the growth of interest in the attack and defense of deep neural networks, researchers are focusing more on the robustness of applying them to devices with limited memory. Thus, unlike adversarial training, which only considers the balance between accuracy and robustness, we come to a more meaningful and critical issue, i.e., the balance among accuracy, efficiency and robustness (AER). Recentl… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  13. arXiv:2001.05699  [pdf, other

    cs.LG stat.ML

    Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision

    Authors: Li Ye, Yishi Lin, Hong Xie, John C. S. Lui

    Abstract: A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions during testing hurt users' experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone… ▽ More

    Submitted 7 November, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

    Comments: 27 pages, 35 figures, 4 tables

  14. arXiv:2001.03520  [pdf, other

    quant-ph physics.atom-ph stat.ML

    Preparation of ordered states in ultra-cold gases using Bayesian optimization

    Authors: Rick Mukherjee, Frederic Sauvage, Harry Xie, Robert Löw, Florian Mintert

    Abstract: Ultra-cold atomic gases are unique in terms of the degree of controllability, both for internal and external degrees of freedom. This makes it possible to use them for the study of complex quantum many-body phenomena. However in many scenarios, the prerequisite condition of faithfully preparing a desired quantum state despite decoherence and system imperfections is not always adequately met. To pa… ▽ More

    Submitted 12 May, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

    Comments: 29 pages, 10 figures

    Journal ref: New Journal of Physics 22, 075001 (2020)

  15. arXiv:1912.04278  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Deep Efficient End-to-end Reconstruction (DEER) Network for Few-view Breast CT Image Reconstruction

    Authors: Huidong Xie, Hongming Shan, Wenxiang Cong, Chi Liu, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Ge Wang

    Abstract: Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of small calcification (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose, few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-… ▽ More

    Submitted 3 November, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

  16. arXiv:1906.01219  [pdf, other

    cs.LG cs.IR stat.ML

    Conversational Contextual Bandit: Algorithm and Application

    Authors: Xiaoying Zhang, Hong Xie, Hang Li, John C. S. Lui

    Abstract: Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need… ▽ More

    Submitted 26 January, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: 11 pages; Accepted by WWW2020

  17. arXiv:1905.11381  [pdf, other

    cs.CR cs.CV cs.LG stat.ML

    Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks

    Authors: Jirong Yi, Hui Xie, Leixin Zhou, Xiaodong Wu, Weiyu Xu, Raghuraman Mudumbai

    Abstract: Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this… ▽ More

    Submitted 25 May, 2019; originally announced May 2019.

    Comments: 44 Pages, 2 Theorems, 35 Figures, 29 Tables. arXiv admin note: substantial text overlap with arXiv:1901.09413

  18. arXiv:1905.01926  [pdf

    cs.LG cs.SD eess.AS stat.ML

    Zero-Shot Audio Classification Based on Class Label Embeddings

    Authors: Huang Xie, Tuomas Virtanen

    Abstract: This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a… ▽ More

    Submitted 7 August, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

  19. arXiv:1904.08936  [pdf, other

    cs.CL cs.LG stat.ML

    Language Modeling through Long Term Memory Network

    Authors: Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie

    Abstract: Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not cap… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: The paper is accepted to be published in IJCNN 2019

  20. arXiv:1902.09255  [pdf, other

    cs.LG cs.SI stat.ML

    Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

    Authors: Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang

    Abstract: Real-time traffic volume inference is key to an intelligent city. It is a challenging task because accurate traffic volumes on the roads can only be measured at certain locations where sensors are installed. Moreover, the traffic evolves over time due to the influences of weather, events, holidays, etc. Existing solutions to the traffic volume inference problem often rely on dense GPS trajectories… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: Accepted by The Web Conference (WWW) 2019

  21. arXiv:1809.02596  [pdf, ps, other

    stat.ML cs.LG

    VOS: a Method for Variational Oversampling of Imbalanced Data

    Authors: Val Andrei Fajardo, David Findlay, Roshanak Houmanfar, Charu Jaiswal, Jiaxi Liang, Honglei Xie

    Abstract: Class imbalanced datasets are common in real-world applications that range from credit card fraud detection to rare disease diagnostics. Several popular classification algorithms assume that classes are approximately balanced, and hence build the accompanying objective function to maximize an overall accuracy rate. In these situations, optimizing the overall accuracy will lead to highly skewed pre… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

  22. arXiv:1712.06391  [pdf, other

    cs.CV cs.LG stat.ML

    On the Effectiveness of Least Squares Generative Adversarial Networks

    Authors: Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, Stephen Paul Smolley

    Abstract: Unsupervised learning with generative adversarial networks (GANs) has proven to be hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generat… ▽ More

    Submitted 21 September, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

  23. arXiv:1701.04831  [pdf, other

    cond-mat.str-el quant-ph stat.ML

    Equivalence of restricted Boltzmann machines and tensor network states

    Authors: **g Chen, Song Cheng, Haidong Xie, Lei Wang, Tao Xiang

    Abstract: The restricted Boltzmann machine (RBM) is one of the fundamental building blocks of deep learning. RBM finds wide applications in dimensional reduction, feature extraction, and recommender systems via modeling the probability distributions of a variety of input data including natural images, speech signals, and customer ratings, etc. We build a bridge between RBM and tensor network states (TNS) wi… ▽ More

    Submitted 5 February, 2018; v1 submitted 17 January, 2017; originally announced January 2017.

    Comments: 18 pages, 12 figures + 2 appendices; Code implementations at https://github.com/yzcj105/rbm2mps

    Journal ref: Phys. Rev. B 97, 085104 (2018)

  24. Comparison among dimensionality reduction techniques based on Random Projection for cancer classification

    Authors: Haozhe Xie, Jie Li, Qiaosheng Zhang, Yadong Wang

    Abstract: Random Projection (RP) technique has been widely applied in many scenarios because it can reduce high-dimensional features into low-dimensional space within short time and meet the need of real-time analysis of massive data. There is an urgent need of dimensionality reduction with fast increase of big genomics data. However, the performance of RP is usually lower. We attempt to improve classificat… ▽ More

    Submitted 17 June, 2017; v1 submitted 25 August, 2016; originally announced August 2016.

    Journal ref: Computational biology and chemistry, 65: 165-172, 2016