Search | arXiv e-print repository

Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

Authors: Jiancong Xiao, Ruoyu Sun, Qi Long, Weijie J. Su

Abstract: Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfa… ▽ More Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfactory bound remains an elusive goal. Existing works on DNNs either apply to a surrogate loss instead of the robust loss or yield bounds that are notably looser compared to their standard counterparts. In the latter case, the bounds have a higher dependency on the width $m$ of the DNNs or the dimension $d$ of the data, with an extra factor of at least $\mathcal{O}(\sqrt{m})$ or $\mathcal{O}(\sqrt{d})$. This paper presents upper bounds for adversarial Rademacher complexity of DNNs that match the best-known upper bounds in standard settings, as established in the work of Bartlett et al. (2017), with the dependency on width and dimension being $\mathcal{O}(\ln(dm))$. The central challenge addressed is calculating the covering number of adversarial function classes. We aim to construct a new cover that possesses two properties: 1) compatibility with adversarial examples, and 2) precision comparable to covers used in standard settings. To this end, we introduce a new variant of covering number called the \emph{uniform covering number}, specifically designed and proven to reconcile these two properties. Consequently, our method effectively bridges the gap between Rademacher complexity in robust and standard generalization. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: COLT 2024

arXiv:2405.16455 [pdf, other]

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

Authors: Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, Weijie J. Su

Abstract: Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its K… ▽ More Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its Kullback--Leibler-based regularization in optimization. In extreme cases, this bias could lead to a phenomenon we term preference collapse, where minority preferences are virtually disregarded. To mitigate this algorithmic bias, we introduce preference matching (PM) RLHF, a novel approach that provably aligns LLMs with the preference distribution of the reward model under the Bradley--Terry--Luce/Plackett--Luce model. Central to our approach is a PM regularizer that takes the form of the negative logarithm of the LLM's policy probability distribution over responses, which helps the LLM balance response diversification and reward maximization. Notably, we obtain this regularizer by solving an ordinary differential equation that is necessary for the PM property. For practical implementation, we introduce a conditional variant of PM RLHF that is tailored to natural language generation. Finally, we empirically validate the effectiveness of conditional PM RLHF through experiments on the OPT-1.3B and Llama-2-7B models, demonstrating a 29% to 41% improvement in alignment with human preferences, as measured by a certain metric, compared to standard RLHF. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2311.10246 [pdf, other]

Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning

Authors: Amartya Banerjee, Christopher J. Hazard, Jacob Beel, Cade Mack, Jack Xia, Michael Resnick, Will Goddin

Abstract: Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-… ▽ More Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. We can determine data point weights as well as feature contributions by calculating the conditional entropy for adding a feature without the need for explicit model training. This allows us to compute feature contributions by providing detailed data point influence weights with perfect attribution and can be used to query counterfactuals. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of $\textit{surprisal}$ (amount of information required to explain the difference between the observed and expected result). Finally, our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection, while also attaining competitive results for regression across a statistically significant number of datasets. △ Less

Submitted 2 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2306.16890 [pdf, other]

Trajectory Poisson multi-Bernoulli mixture filter for traffic monitoring using a drone

Authors: Ángel F. García-Fernández, Jimin Xiao

Abstract: This paper proposes a multi-object tracking (MOT) algorithm for traffic monitoring using a drone equipped with optical and thermal cameras. Object detections on the images are obtained using a neural network for each type of camera. The cameras are modelled as direction-of-arrival (DOA) sensors. Each DOA detection follows a von-Mises Fisher distribution, whose mean direction is obtain by projectin… ▽ More This paper proposes a multi-object tracking (MOT) algorithm for traffic monitoring using a drone equipped with optical and thermal cameras. Object detections on the images are obtained using a neural network for each type of camera. The cameras are modelled as direction-of-arrival (DOA) sensors. Each DOA detection follows a von-Mises Fisher distribution, whose mean direction is obtain by projecting a vehicle position on the ground to the camera. We then use the trajectory Poisson multi-Bernoulli mixture filter (TPMBM), which is a Bayesian MOT algorithm, to optimally estimate the set of vehicle trajectories. We have also developed a parameter estimation algorithm for the measurement model. We have tested the accuracy of the resulting TPMBM filter in synthetic and experimental data sets. △ Less

Submitted 28 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: accepted in IEEE Transactions on Vehicular Technology

arXiv:2302.10364 [pdf, other]

Gaussian processes at the Helm(holtz): A more fluid model for ocean currents

Authors: Renato Berlinghieri, Brian L. Trippe, David R. Burt, Ryan Giordano, Kaushik Srinivasan, Tamay Özgökmen, Junfei Xia, Tamara Broderick

Abstract: Given sparse observations of buoy velocities, oceanographers are interested in reconstructing ocean currents away from the buoys and identifying divergences in a current vector field. As a first and modular step, we focus on the time-stationary case - for instance, by restricting to short time periods. Since we expect current velocity to be a continuous but highly non-linear function of spatial lo… ▽ More Given sparse observations of buoy velocities, oceanographers are interested in reconstructing ocean currents away from the buoys and identifying divergences in a current vector field. As a first and modular step, we focus on the time-stationary case - for instance, by restricting to short time periods. Since we expect current velocity to be a continuous but highly non-linear function of spatial location, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current reconstruction and divergence identification, due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method with theory and experiments on synthetic and real ocean data. △ Less

Submitted 20 June, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: 51 pages, 16 figures

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:2113-2163, 2023

arXiv:2210.05538 [pdf, other]

Estimating optimal treatment regimes in survival contexts using an instrumental variable

Authors: Junwen Xia, Zishu Zhan, **gxiao Zhang

Abstract: In survival contexts, substantial literature exists on estimating optimal treatment regimes, where treatments are assigned based on personal characteristics for the purpose of maximizing the survival probability. These methods assume that a set of covariates is sufficient to deconfound the treatment-outcome relationship. Nevertheless, the assumption can be limited in observational studies or rando… ▽ More In survival contexts, substantial literature exists on estimating optimal treatment regimes, where treatments are assigned based on personal characteristics for the purpose of maximizing the survival probability. These methods assume that a set of covariates is sufficient to deconfound the treatment-outcome relationship. Nevertheless, the assumption can be limited in observational studies or randomized trials in which non-adherence occurs. Thus, we propose a novel approach for estimating the optimal treatment regime when certain confounders are not observable and a binary instrumental variable is available. Specifically, via a binary instrumental variable, we propose two semiparametric estimators for the optimal treatment regime by maximizing Kaplan-Meier-like estimators within a pre-defined class of regimes, one of which possesses the desirable property of double robustness. Because the Kaplan-Meier-like estimators are jagged, we incorporate kernel smoothing methods to enhance their performance. Under appropriate regularity conditions, the asymptotic properties are rigorously established. Furthermore, the finite sample performance is assessed through simulation studies. Finally, we exemplify our method using data from the National Cancer Institute's (NCI) prostate, lung, colorectal, and ovarian cancer screening trial. △ Less

Submitted 30 October, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2202.11424 [pdf, other]

Towards Speaker Age Estimation with Label Distribution Learning

Authors: Shi**g Si, Jianzong Wang, Junqing Peng, **g Xiao

Abstract: Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise age identification remains a challenge due to label ambiguity, \emph{i.e.}, utterances from adjacent age of the same person are often indistinguishable. To address this, we utilize the ambiguous information among the age labels, convert each age label into a discre… ▽ More Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise age identification remains a challenge due to label ambiguity, \emph{i.e.}, utterances from adjacent age of the same person are often indistinguishable. To address this, we utilize the ambiguous information among the age labels, convert each age label into a discrete label distribution and leverage the label distribution learning (LDL) method to fit the data. For each audio data sample, our method produces a age distribution of its speaker, and on top of the distribution we also perform two other tasks: age prediction and age uncertainty minimization. Therefore, our method naturally combines the age classification and regression approaches, which enhances the robustness of our method. We conduct experiments on the public NIST SRE08-10 dataset and a real-world dataset, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding a 10\% reduction in terms of mean absolute error (MAE) on a real-world dataset. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted by the 47th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022)

arXiv:2109.14569 [pdf, other]

An Expert System for Redesigning Software for Cloud Applications

Authors: Rahul Yedida, Rahul Krishna, Anup Kalia, Tim Menzies, ** Xiao, Maja Vukovic

Abstract: Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplif… ▽ More Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplify this partitioning task. Despite much research, no single partitioning method can be recommended as generally useful. More specifically, those prior solutions are "brittle"; i.e. if they work well for one kind of goal in one dataset, then they can be sub-optimal if applied to many datasets and multiple goals. In order to find a generally useful partitioning method, we propose DEEPLY. This new algorithm extends the CO-GCN deep learning partition generator with (a) a novel loss function and (b) some hyper-parameter optimization. As shown by our experiments, DEEPLY generally outperforms prior work (including CO-GCN, and others) across multiple datasets and goals. To the best of our knowledge, this is the first report in SE of such stable hyper-parameter optimization. To aid reuse of this work, DEEPLY is available on-line at https://bit.ly/2WhfFlB. △ Less

Submitted 27 June, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: version 3

arXiv:2101.02908 [pdf, other]

NVAE-GAN Based Approach for Unsupervised Time Series Anomaly Detection

Authors: Liang Xu, Liying Zheng, Weijun Li, Zhenbo Chen, Weishun Song, Yue Deng, Yongzhe Chang, **g Xiao, Bo Yuan

Abstract: In recent studies, Lots of work has been done to solve time series anomaly detection by applying Variational Auto-Encoders (VAEs). Time series anomaly detection is a very common but challenging task in many industries, which plays an important role in network monitoring, facility maintenance, information security, and so on. However, it is very difficult to detect anomalies in time series with hig… ▽ More In recent studies, Lots of work has been done to solve time series anomaly detection by applying Variational Auto-Encoders (VAEs). Time series anomaly detection is a very common but challenging task in many industries, which plays an important role in network monitoring, facility maintenance, information security, and so on. However, it is very difficult to detect anomalies in time series with high accuracy, due to noisy data collected from real world, and complicated abnormal patterns. From recent studies, we are inspired by Nouveau VAE (NVAE) and propose our anomaly detection model: Time series to Image VAE (T2IVAE), an unsupervised model based on NVAE for univariate series, transforming 1D time series to 2D image as input, and adopting the reconstruction error to detect anomalies. Besides, we also apply the Generative Adversarial Networks based techniques to T2IVAE training strategy, aiming to reduce the overfitting. We evaluate our model performance on three datasets, and compare it with other several popular models using F1 score. T2IVAE achieves 0.639 on Numenta Anomaly Benchmark, 0.651 on public dataset from NASA, and 0.504 on our dataset collected from real-world scenario, outperforms other comparison models. △ Less

Submitted 8 January, 2021; originally announced January 2021.

arXiv:2009.09590 [pdf, other]

Generalized Clustering and Multi-Manifold Learning with Geometric Structure Preservation

Authors: Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan. Z Li

Abstract: Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space. In this paper, we propose a novel Generalized Clustering and Multi-manifold Learning (GCML) framework with geometric structure preservation for generalized… ▽ More Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space. In this paper, we propose a novel Generalized Clustering and Multi-manifold Learning (GCML) framework with geometric structure preservation for generalized data, i.e., not limited to 2-D image data and has a wide range of applications in speech, text, and biology domains. In the proposed framework, manifold clustering is done in the latent space guided by a clustering loss. To overcome the problem that the clustering-oriented loss may deteriorate the geometric structure of the latent space, an isometric loss is proposed for preserving intra-manifold structure locally and a ranking loss for inter-manifold structure globally. Extensive experimental results have shown that GCML exhibits superior performance to counterparts in terms of qualitative visualizations and quantitative metrics, which demonstrates the effectiveness of preserving geometric structure. △ Less

Submitted 8 October, 2021; v1 submitted 20 September, 2020; originally announced September 2020.

arXiv:2009.07455 [pdf, ps, other]

FedSmart: An Auto Updating Federated Learning Optimization Mechanism

Authors: Anxun He, Jianzong Wang, Zhangcheng Huang, **g Xiao

Abstract: Federated learning has made an important contribution to data privacy-preserving. Many previous works are based on the assumption that the data are independently identically distributed (IID). As a result, the model performance on non-identically independently distributed (non-IID) data is beyond expectation, which is the concrete situation. Some existing methods of ensuring the model robustness o… ▽ More Federated learning has made an important contribution to data privacy-preserving. Many previous works are based on the assumption that the data are independently identically distributed (IID). As a result, the model performance on non-identically independently distributed (non-IID) data is beyond expectation, which is the concrete situation. Some existing methods of ensuring the model robustness on non-IID data, like the data-sharing strategy or pretraining, may lead to privacy leaking. In addition, there exist some participants who try to poison the model with low-quality data. In this paper, a performance-based parameter return method for optimization is introduced, we term it FederatedSmart (FedSmart). It optimizes different model for each client through sharing global gradients, and it extracts the data from each client as a local validation set, and the accuracy that model achieves in round t determines the weights of the next round. The experiment results show that FedSmart enables the participants to allocate a greater weight to the ones with similar data distribution. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: has been presented in APWeb-WAIM 2020

arXiv:2009.04899 [pdf, other]

Meta-learning based Alternating Minimization Algorithm for Non-convex Optimization

Authors: **gyuan Xia, Shengxi Li, Jun-Jie Huang, Imad Jaimoukha, Deniz Gunduz

Abstract: In this paper, we propose a novel solution for non-convex problems of multiple variables, especially for those typically solved by an alternating minimization (AM) strategy that splits the original optimization problem into a set of sub-problems corresponding to each variable, and then iteratively optimize each sub-problem using a fixed updating rule. However, due to the intrinsic non-convexity of… ▽ More In this paper, we propose a novel solution for non-convex problems of multiple variables, especially for those typically solved by an alternating minimization (AM) strategy that splits the original optimization problem into a set of sub-problems corresponding to each variable, and then iteratively optimize each sub-problem using a fixed updating rule. However, due to the intrinsic non-convexity of the original optimization problem, the optimization can usually be trapped into spurious local minimum even when each sub-problem can be optimally solved at each iteration. Meanwhile, learning-based approaches, such as deep unfolding algorithms, are highly limited by the lack of labelled data and restricted explainability. To tackle these issues, we propose a meta-learning based alternating minimization (MLAM) method, which aims to minimize a partial of the global losses over iterations instead of carrying minimization on each sub-problem, and it tends to learn an adaptive strategy to replace the handcrafted counterpart resulting in advance on superior performance. Meanwhile, the proposed MLAM still maintains the original algorithmic principle, which contributes to a better interpretability. We evaluate the proposed method on two representative problems, namely, bi-linear inverse problem: matrix completion, and non-linear problem: Gaussian mixture models. The experimental results validate that our proposed approach outperforms AM-based methods in standard settings, and is able to achieve effective optimization in challenging cases while other comparing methods would typically fail. △ Less

Submitted 26 June, 2022; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2006.05622 [pdf, other]

P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and Paralleled ADMM Approach

Authors: Yu Tang, Zhigang Kan, Dequan Sun, **g**g Xiao, Zhiquan Lai, Linbo Qiao, Dongsheng Li

Abstract: It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding problems, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulties in training. The Alternating Direction Method of Multipliers (ADMM) has become a promising algori… ▽ More It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding problems, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulties in training. The Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms with the gradient-free features and immunity to unsatisfactory conditions. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously. We also provide novel update rules and theoretical convergence analysis. We explicitly specify essential update rules in the iterations of ADMMiRNN with constructed approximation techniques and solutions to each sub-problem instead of vanilla ADMM. Numerical experiments are conducted on MNIST, IMDb, and text classification tasks. ADMMiRNN achieves convergent results and outperforms the compared baselines. Furthermore, ADMMiRNN trains RNN more stably without gradient vanishing or exploding than stochastic gradient algorithms. We also provide a distributed paralleled algorithm regarding ADMMiRNN, named P-ADMMiRNN, including Synchronous Parallel ADMMiRNN (SP-ADMMiRNN) and Asynchronous Parallel ADMMiRNN (AP-ADMMiRNN), which is the first to train RNN with ADMM in an asynchronous parallel manner. The source code is publicly available. △ Less

Submitted 28 March, 2022; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: 13 pages, 12 figures

arXiv:2004.13344 [pdf, ps, other]

Robust Generative Adversarial Network

Authors: Shufei Zhang, Zhuang Qian, Kaizhu Huang, Jimin Xiao, Yuan He

Abstract: Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability and generalization problem which may lead to poor generations. Most existing works focus on stabilizing the training of the discriminator while ignoring the generalization properties. In this work, we aim to improve the generalization capability of GANs by promoting the local robustness withi… ▽ More Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability and generalization problem which may lead to poor generations. Most existing works focus on stabilizing the training of the discriminator while ignoring the generalization properties. In this work, we aim to improve the generalization capability of GANs by promoting the local robustness within the small neighborhood of the training samples. We also prove that the robustness in small neighborhood of training sets can lead to better generalization. Particularly, we design a robust optimization framework where the generator and discriminator compete with each other in a \textit{worst-case} setting within a small Wasserstein ball. The generator tries to map \textit{the worst input distribution} (rather than a Gaussian distribution used in most GANs) to the real data distribution, while the discriminator attempts to distinguish the real and fake distribution \textit{with the worst perturbation}. We have proved that our robust method can obtain a tighter generalization upper bound than traditional GANs under mild assumptions, ensuring a theoretical superiority of RGAN over GANs. A series of experiments on CIFAR-10, STL-10 and CelebA datasets indicate that our proposed robust framework can improve on five baseline GAN models substantially and consistently. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: This paper has been submitted to ICLR in Sep 25. 2019

arXiv:2003.09821 [pdf, other]

BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of Channels

Authors: Zan Shen, Jiang Qian, Bo** Zhuang, Shaojun Wang, **g Xiao

Abstract: One-Shot methods have evolved into one of the most popular methods in Neural Architecture Search (NAS) due to weight sharing and single training of a supernet. However, existing methods generally suffer from two issues: predetermined number of channels in each layer which is suboptimal; and model averaging effects and poor ranking correlation caused by weight coupling and continuously expanding se… ▽ More One-Shot methods have evolved into one of the most popular methods in Neural Architecture Search (NAS) due to weight sharing and single training of a supernet. However, existing methods generally suffer from two issues: predetermined number of channels in each layer which is suboptimal; and model averaging effects and poor ranking correlation caused by weight coupling and continuously expanding search space. To explicitly address these issues, in this paper, a Broadening-and-Shrinking One-Shot NAS (BS-NAS) framework is proposed, in which `broadening' refers to broadening the search space with a spring block enabling search for numbers of channels during training of the supernet; while `shrinking' refers to a novel shrinking strategy gradually turning off those underperforming operations. The above innovations broaden the search space for wider representation and then shrink it by gradually removing underperforming operations, followed by an evolutionary algorithm to efficiently search for the optimal architecture. Extensive experiments on ImageNet illustrate the effectiveness of the proposed BS-NAS as well as the state-of-the-art performance. △ Less

Submitted 22 March, 2020; originally announced March 2020.

Comments: 14 pages

arXiv:2003.01575 [pdf, other]

Evaluation Framework For Large-scale Federated Learning

Authors: Lifeng Liu, Fengda Zhang, Jun Xiao, Chao Wu

Abstract: Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model while kee** all the training data on device, which can not only take full advantage of data distributed across millions of nodes to train a good model but also protect data privacy. However, learning in scenario above poses new… ▽ More Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model while kee** all the training data on device, which can not only take full advantage of data distributed across millions of nodes to train a good model but also protect data privacy. However, learning in scenario above poses new challenges. In fact, data across a massive number of unreliable devices is likely to be non-IID (identically and independently distributed), which may make the performance of models trained by federated learning unstable. In this paper, we introduce a framework designed for large-scale federated learning which consists of approaches to generating dataset and modular evaluation framework. Firstly, we construct a suite of open-source non-IID datasets by providing three respects including covariate shift, prior probability shift, and concept shift, which are grounded in real-world assumptions. In addition, we design several rigorous evaluation metrics including the number of network nodes, the size of datasets, the number of communication rounds and communication resources etc. Finally, we present an open-source benchmark for large-scale federated learning research. △ Less

Submitted 11 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.05780 [pdf, other]

Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States

Authors: Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Jun Xiao, Bo Li

Abstract: Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a nove… ▽ More Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity -- the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e.g., asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines. △ Less

Submitted 9 February, 2020; originally announced February 2020.

Comments: AAAI 2020

arXiv:1910.08699 [pdf, ps, other]

doi 10.1016/j.jclepro.2019.118573

Application of a new information priority accumulated grey model with time power to predict short-term wind turbine capacity

Authors: Jie Xia, Xin Ma, Wenqing Wu, Baolian Huang, Wanpeng Li

Abstract: Wind energy makes a significant contribution to global power generation. Predicting wind turbine capacity is becoming increasingly crucial for cleaner production. For this purpose, a new information priority accumulated grey model with time power is proposed to predict short-term wind turbine capacity. Firstly, the computational formulas for the time response sequence and the prediction values are… ▽ More Wind energy makes a significant contribution to global power generation. Predicting wind turbine capacity is becoming increasingly crucial for cleaner production. For this purpose, a new information priority accumulated grey model with time power is proposed to predict short-term wind turbine capacity. Firstly, the computational formulas for the time response sequence and the prediction values are deduced by grey modeling technique and the definite integral trapezoidal approximation formula. Secondly, an intelligent algorithm based on particle swarm optimization is applied to determine the optimal nonlinear parameters of the novel model. Thirdly, three real numerical examples are given to examine the accuracy of the new model by comparing with six existing prediction models. Finally, based on the wind turbine capacity from 2007 to 2017, the proposed model is established to predict the total wind turbine capacity in Europe, North America, Asia, and the world. The numerical results reveal that the novel model is superior to other forecasting models. It has a great advantage for small samples with new characteristic behaviors. Besides, reasonable suggestions are put forward from the standpoint of the practitioners and governments, which has high potential to advance the sustainable improvement of clean energy production in the future. △ Less

Submitted 19 October, 2019; originally announced October 2019.

Journal ref: Journal of Cleaner Production, Volume 244, 2020, 118573

arXiv:1909.01541 [pdf, other]

Graph Transfer Learning via Adversarial Domain Adaptation with Graph Convolution

Authors: Quanyu Dai, Xiao-Ming Wu, Jiaren Xiao, Xiao Shen, Dan Wang

Abstract: This paper studies the problem of cross-network node classification to overcome the insufficiency of labeled data in a single network. It aims to leverage the label information in a partially labeled source network to assist node classification in a completely unlabeled or partially labeled target network. Existing methods for single network learning cannot solve this problem due to the domain shi… ▽ More This paper studies the problem of cross-network node classification to overcome the insufficiency of labeled data in a single network. It aims to leverage the label information in a partially labeled source network to assist node classification in a completely unlabeled or partially labeled target network. Existing methods for single network learning cannot solve this problem due to the domain shift across networks. Some multi-network learning methods heavily rely on the existence of cross-network connections, thus are inapplicable for this problem. To tackle this problem, we propose a novel \textcolor{black}{graph} transfer learning framework AdaGCN by leveraging the techniques of adversarial domain adaptation and graph convolution. It consists of two components: a semi-supervised learning component and an adversarial domain adaptation component. The former aims to learn class discriminative node representations with given label information of the source and target networks, while the latter contributes to mitigating the distribution divergence between the source and target domains to facilitate knowledge transfer. Extensive empirical evaluations on real-world datasets show that AdaGCN can successfully transfer class information with a low label rate on the source network and a substantial divergence between the source and target domains. The source code for reproducing the experimental results is available at https://github.com/daiquanyu/AdaGCN. △ Less

Submitted 30 July, 2022; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering

arXiv:1908.01718 [pdf]

Discovery of Bias and Strategic Behavior in Crowdsourced Performance Assessment

Authors: Yifei Huang, Matt Shum, Xi Wu, Jason Zezhong Xiao

Abstract: With the industry trend of shifting from a traditional hierarchical approach to flatter management structure, crowdsourced performance assessment gained mainstream popularity. One fundamental challenge of crowdsourced performance assessment is the risks that personal interest can introduce distortions of facts, especially when the system is used to determine merit pay or promotion. In this paper,… ▽ More With the industry trend of shifting from a traditional hierarchical approach to flatter management structure, crowdsourced performance assessment gained mainstream popularity. One fundamental challenge of crowdsourced performance assessment is the risks that personal interest can introduce distortions of facts, especially when the system is used to determine merit pay or promotion. In this paper, we developed a method to identify bias and strategic behavior in crowdsourced performance assessment, using a rich dataset collected from a professional service firm in China. We find a pattern of "discriminatory generosity" on the part of peer evaluation, where raters downgrade their peer coworkers who have passed objective promotion requirements while overrating their peer coworkers who have not yet passed. This introduces two types of biases: the first aimed against more competent competitors, and the other favoring less eligible peers which can serve as a mask of the first bias. This paper also aims to bring angles of fairness-aware data mining to talent and management computing. Historical decision records, such as performance ratings, often contain subjective judgment which is prone to bias and strategic behavior. For practitioners of predictive talent analytics, it is important to investigate potential bias and strategic behavior underlying historical decision records. △ Less

Submitted 12 October, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: International Workshop of Talent and Management Computing, KDD 2019

arXiv:1905.05987 [pdf, ps, other]

EasiCS: the objective and fine-grained classification method of cervical spondylosis dysfunction

Authors: Nana Wang, Li Cui, Xi Huang, Yingcong Xiang, **g Xiao, Yi Rao

Abstract: The precise diagnosis is of great significance in develo** precise treatment plans to restore neck function and reduce the burden posed by the cervical spondylosis (CS). However, the current available neck function assessment method are subjective and coarse-grained. In this paper, based on the relationship among CS, cervical structure, cervical vertebra function, and surface electromyography (s… ▽ More The precise diagnosis is of great significance in develo** precise treatment plans to restore neck function and reduce the burden posed by the cervical spondylosis (CS). However, the current available neck function assessment method are subjective and coarse-grained. In this paper, based on the relationship among CS, cervical structure, cervical vertebra function, and surface electromyography (sEMG), we seek to develop a clustering algorithms on the sEMG data set collected from the clinical environment and implement the division. We proposed and developed the framework EasiCS, which consists of dimension reduction, clustering algorithm EasiSOM, spectral clustering algorithm EasiSC. The EasiCS outperform the commonly used seven algorithms overall. △ Less

Submitted 15 May, 2019; originally announced May 2019.

arXiv:1812.04912 [pdf, ps, other]

EasiCSDeep: A deep learning model for Cervical Spondylosis Identification using surface electromyography signal

Authors: Nana Wang, Li Cui, Xi Huang, Yingcong Xiang, **g Xiao

Abstract: Cervical spondylosis (CS) is a common chronic disease that affects up to two-thirds of the population and poses a serious burden on individuals and society. The early identification has significant value in improving cure rate and reducing costs. However, the pathology is complex, and the mild symptoms increase the difficulty of the diagnosis, especially in the early stage. Besides, the time-consu… ▽ More Cervical spondylosis (CS) is a common chronic disease that affects up to two-thirds of the population and poses a serious burden on individuals and society. The early identification has significant value in improving cure rate and reducing costs. However, the pathology is complex, and the mild symptoms increase the difficulty of the diagnosis, especially in the early stage. Besides, the time-consuming and costliness of hospital medical service reduces the attention to the CS identification. Thus, a convenient, low-cost intelligent CS identification method is imperious demanded. In this paper, we present an intelligent method based on the deep learning to identify CS, using the surface electromyography (sEMG) signal. Faced with the complex, high dimensionality and weak usability of the sEMG signal, we proposed and developed a multi-channel EasiCSDeep algorithm based on the convolutional neural network, which consists of the feature extraction, spatial relationship representation and classification algorithm. To the best of our knowledge, this EasiCSDeep is the first effort to employ the deep learning and the sEMG data to identify CS. Compared with previous state-of-the-art algorithm, our algorithm achieves a significant improvement. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Showing 1–22 of 22 results for author: Xiao, J