Search | arXiv e-print repository

Multi-Agent Based Transfer Learning for Data-Driven Air Traffic Applications

Authors: Chuhao Deng, Hong-Cheol Choi, Hyunsang Park, Inseok Hwang

Abstract: Research in develo** data-driven models for Air Traffic Management (ATM) has gained a tremendous interest in recent years. However, data-driven models are known to have long training time and require large datasets to achieve good performance. To address the two issues, this paper proposes a Multi-Agent Bidirectional Encoder Representations from Transformers (MA-BERT) model that fully considers… ▽ More Research in develo** data-driven models for Air Traffic Management (ATM) has gained a tremendous interest in recent years. However, data-driven models are known to have long training time and require large datasets to achieve good performance. To address the two issues, this paper proposes a Multi-Agent Bidirectional Encoder Representations from Transformers (MA-BERT) model that fully considers the multi-agent characteristic of the ATM system and learns air traffic controllers' decisions, and a pre-training and fine-tuning transfer learning framework. By pre-training the MA-BERT on a large dataset from a major airport and then fine-tuning it to other airports and specific air traffic applications, a large amount of the total training time can be saved. In addition, for newly adopted procedures and constructed airports where no historical data is available, this paper shows that the pre-trained MA-BERT can achieve high performance by updating regularly with little data. The proposed transfer learning framework and MA-BERT are tested with the automatic dependent surveillance-broadcast data recorded in 3 airports in South Korea in 2019. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 12 pages, 8 figures, submitted for IEEE Transactions on Intelligent Transportation System

arXiv:2306.17378 [pdf, other]

Global Optimality in Bivariate Gradient-based DAG Learning

Authors: Chang Deng, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

Abstract: Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other no… ▽ More Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 39 pages, 13 figures

arXiv:2306.06895 [pdf, other]

MPPN: Multi-Resolution Periodic Pattern Network For Long-Term Time Series Forecasting

Authors: Xing Wang, Zhendong Wang, Kexin Yang, Junlan Feng, Zhiyan Song, Chao Deng, Lin zhu

Abstract: Long-term time series forecasting plays an important role in various real-world scenarios. Recent deep learning methods for long-term series forecasting tend to capture the intricate patterns of time series by decomposition-based or sampling-based methods. However, most of the extracted patterns may include unpredictable noise and lack good interpretability. Moreover, the multivariate series forec… ▽ More Long-term time series forecasting plays an important role in various real-world scenarios. Recent deep learning methods for long-term series forecasting tend to capture the intricate patterns of time series by decomposition-based or sampling-based methods. However, most of the extracted patterns may include unpredictable noise and lack good interpretability. Moreover, the multivariate series forecasting methods usually ignore the individual characteristics of each variate, which may affecting the prediction accuracy. To capture the intrinsic patterns of time series, we propose a novel deep learning network architecture, named Multi-resolution Periodic Pattern Network (MPPN), for long-term series forecasting. We first construct context-aware multi-resolution semantic units of time series and employ multi-periodic pattern mining to capture the key patterns of time series. Then, we propose a channel adaptive module to capture the perceptions of multivariate towards different patterns. In addition, we present an entropy-based method for evaluating the predictability of time series and providing an upper bound on the prediction accuracy before forecasting. Our experimental evaluation on nine real-world benchmarks demonstrated that MPPN significantly outperforms the state-of-the-art Transformer-based, decomposition-based and sampling-based methods for long-term series forecasting. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 21 pages

arXiv:2305.17277 [pdf, other]

Optimizing NOTEARS Objectives via Topological Swaps

Authors: Chang Deng, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

Abstract: Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimization challenges associated with this class of non-convex prog… ▽ More Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimization challenges associated with this class of non-convex programs. To address these challenges, we propose a bi-level algorithm that leverages the non-convex constraint in a novel way. The outer level of the algorithm optimizes over topological orders by iteratively swap** pairs of nodes within the topological order of a DAG. A key innovation of our approach is the development of an effective method for generating a set of candidate swap** pairs for each iteration. At the inner level, given a topological order, we utilize off-the-shelf solvers that can handle linear constraints. The key advantage of our proposed algorithm is that it is guaranteed to find a local minimum or a KKT point under weaker conditions compared to previous work and finds solutions with lower scores. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of achieving a better score. Additionally, our method can also be used as a post-processing algorithm to significantly improve the score of other algorithms. Code implementing the proposed method is available at https://github.com/duntrain/topo. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 39 pages, 12 figures, ICML 2023

arXiv:2302.03832 [pdf, other]

Group LASSO Variable Selection Method for Treatment Effect Generalization

Authors: Chuyu Deng, Brandon Koch, David M. Vock, Joseph S. Koopmeiners

Abstract: Often in public health, we are interested in the treatment effect of an intervention on a population that is systemically different from the experimental population the intervention was originally evaluated in. When treatment effect heterogeneity is present in a randomized controlled trial, generalizing the treatment effect from this experimental population to a target population of interest is a… ▽ More Often in public health, we are interested in the treatment effect of an intervention on a population that is systemically different from the experimental population the intervention was originally evaluated in. When treatment effect heterogeneity is present in a randomized controlled trial, generalizing the treatment effect from this experimental population to a target population of interest is a complex problem; it requires the characterization of both the treatment effect heterogeneity and the baseline covariate mismatch between the two populations. Despite the importance of this problem, the literature for variable selection in this context is limited. In this paper, we present a Group LASSO-based approach to variable selection in the context of treatment effect generalization, with an application to generalize the treatment effect of very low nicotine content cigarettes to the overall U.S. smoking population. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2111.08741 [pdf, other]

Practical Guidance on Modeling Choices for the Virtual Twins Method

Authors: Chuyu Deng, David M. Vock, Dana M. Carroll, Jeffrey A. Boatman, Dorothy K. Hatsukami, Ning Leng, Joseph S. Koopmeiners

Abstract: Individuals can vary drastically in their response to the same treatment, and this heterogeneity has driven the push for more personalized medicine. Accurate and interpretable methods to identify subgroups that respond to the treatment differently from the population average are necessary to achieving this goal. The Virtual Twins (VT) method by Foster et al. \cite{Foster} is a highly cited and imp… ▽ More Individuals can vary drastically in their response to the same treatment, and this heterogeneity has driven the push for more personalized medicine. Accurate and interpretable methods to identify subgroups that respond to the treatment differently from the population average are necessary to achieving this goal. The Virtual Twins (VT) method by Foster et al. \cite{Foster} is a highly cited and implemented method for subgroup identification because of its intuitive framework. However, since its initial publication, many researchers still rely heavily on the authors' initial modeling suggestions without examining newer and more powerful alternatives. This leaves much of the potential of the method untapped. We comprehensively evaluate the performance of VT with different combinations of methods in each of its component steps, under a collection of linear and nonlinear problem settings. Our simulations show that the method choice for step 1 of VT is highly influential in the overall accuracy of the method, and Superlearner is a promising choice. We illustrate our findings by using VT to identify subgroups with heterogeneous treatment effects in a randomized, double-blind nicotine reduction trial. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2105.08230 [pdf, other]

High-Dimensional Sparse Single-Index Regression Via Hilbert-Schmidt Independence Criterion

Authors: Runxiong Wu, Chang Deng, Xin Chen

Abstract: Hilbert-Schmidt Independence Criterion (HSIC) has recently been used in the field of single-index models to estimate the directions. Compared with some other well-established methods, it requires relatively weaker conditions. However, its performance has not yet been studied in the high-dimensional scenario, where the number of covariates is much larger than the sample size. In this article, we pr… ▽ More Hilbert-Schmidt Independence Criterion (HSIC) has recently been used in the field of single-index models to estimate the directions. Compared with some other well-established methods, it requires relatively weaker conditions. However, its performance has not yet been studied in the high-dimensional scenario, where the number of covariates is much larger than the sample size. In this article, we propose a new efficient sparse estimate in HSIC based single-index model. This new method estimates the subspace spanned by the linear combinations of the covariates directly and performs variable selection simultaneously. Due to the non-convexity of the objective function, we use a majorize-minimize approach together with the linearized alternating direction method of multipliers algorithm to solve the optimization problem. The algorithm does not involve the inverse of the covariance matrix and therefore can handle the large p small n scenario naturally. Through extensive simulation studies and a real data analysis, we show our proposal is efficient and effective in the high-dimensional setting. The Matlab codes for this method are available online. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: 24 pages, 4 figures

arXiv:2010.10711 [pdf, other]

On the Global Self-attention Mechanism for Graph Convolutional Networks

Authors: Chen Wang, Chengyuan Deng

Abstract: Applying Global Self-attention (GSA) mechanism over features has achieved remarkable success on Convolutional Neural Networks (CNNs). However, it is not clear if Graph Convolutional Networks (GCNs) can similarly benefit from such a technique. In this paper, inspired by the similarity between CNNs and GCNs, we study the impact of the Global Self-attention mechanism on GCNs. We find that consistent… ▽ More Applying Global Self-attention (GSA) mechanism over features has achieved remarkable success on Convolutional Neural Networks (CNNs). However, it is not clear if Graph Convolutional Networks (GCNs) can similarly benefit from such a technique. In this paper, inspired by the similarity between CNNs and GCNs, we study the impact of the Global Self-attention mechanism on GCNs. We find that consistent with the intuition, the GSA mechanism allows GCNs to capture feature-based vertex relations regardless of edge connections; As a result, the GSA mechanism can introduce extra expressive power to the GCNs. Furthermore, we analyze the impacts of the GSA mechanism on the issues of overfitting and over-smoothing. We prove that the GSA mechanism can alleviate both the overfitting and the over-smoothing issues based on some recent technical developments. Experiments on multiple benchmark datasets illustrate both superior expressive power and less significant overfitting and over-smoothing problems for the GSA-augmented GCNs, which corroborate the intuitions and the theoretical results. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2008.06233 [pdf, other]

Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning

Authors: Bin Gu, An Xu, Zhouyuan Huo, Cheng Deng, Heng Huang

Abstract: The privacy-preserving federated learning for vertically partitioned data has shown promising results as the solution of the emerging multi-party joint modeling application, in which the data holders (such as government branches, private finance and e-business companies) collaborate throughout the learning process rather than relying on a trusted third party to hold data. However, existing federat… ▽ More The privacy-preserving federated learning for vertically partitioned data has shown promising results as the solution of the emerging multi-party joint modeling application, in which the data holders (such as government branches, private finance and e-business companies) collaborate throughout the learning process rather than relying on a trusted third party to hold data. However, existing federated learning algorithms for vertically partitioned data are limited to synchronous computation. To improve the efficiency when the unbalanced computation/communication resources are common among the parties in the federated learning system, it is essential to develop asynchronous training algorithms for vertically partitioned data while kee** the data privacy. In this paper, we propose an asynchronous federated SGD (AFSGD-VP) algorithm and its SVRG and SAGA variants on the vertically partitioned data. Moreover, we provide the convergence analyses of AFSGD-VP and its SVRG and SAGA variants under the condition of strong convexity. We also discuss their model privacy, data privacy, computational complexities and communication costs. To the best of our knowledge, AFSGD-VP and its SVRG and SAGA variants are the first asynchronous federated learning algorithms for vertically partitioned data. Extensive experimental results on a variety of vertically partitioned datasets not only verify the theoretical results of AFSGD-VP and its SVRG and SAGA variants, but also show that our algorithms have much higher efficiency than the corresponding synchronous algorithms. △ Less

Submitted 14 August, 2020; originally announced August 2020.

arXiv:2006.15296 [pdf, other]

doi 10.1109/ACCESS.2022.3142174

Simulation and Optimisation of Air Conditioning Systems using Machine Learning

Authors: Rakshitha Godahewa, Chang Deng, Arnaud Prouzeau, Christoph Bergmeir

Abstract: In building management, usually static thermal setpoints are used to maintain the inside temperature of a building at a comfortable level irrespective of its occupancy. This strategy can cause a massive amount of energy wastage and therewith increase energy related expenses. This paper explores how to optimise the setpoints used in a particular room during its unoccupied periods using machine lear… ▽ More In building management, usually static thermal setpoints are used to maintain the inside temperature of a building at a comfortable level irrespective of its occupancy. This strategy can cause a massive amount of energy wastage and therewith increase energy related expenses. This paper explores how to optimise the setpoints used in a particular room during its unoccupied periods using machine learning approaches. We introduce a deep-learning model based on Recurrent Neural Networks (RNN) that can predict the temperatures of a future period directly where a particular room is unoccupied and by using these predicted temperatures, we define the optimal thermal setpoints to be used inside the room during the unoccupied period. We show that RNNs are particularly suitable for this learning task as they enable us to learn across many relatively short series, which is necessary to focus on particular operation modes of the air conditioning (AC) system. We evaluate the prediction accuracy of our RNN model against a set of state-of-the-art models and are able to outperform those by a large margin. We furthermore analyse the usage of our RNN model in optimising the energy consumption of an AC system in a real-world scenario using the temperature data from a university lecture theatre. Based on the simulations, we show that our RNN model can lead to savings around 20% compared with the traditional temperature controlling model that does not use optimisation techniques. △ Less

Submitted 27 June, 2020; originally announced June 2020.

Report number: Volume 10

Journal ref: IEEE Access 2022

arXiv:2002.04813 [pdf, other]

Deep Multi-Task Augmented Feature Learning via Hierarchical Graph Neural Network

Authors: Pengxin Guo, Chang Deng, Linjie Xu, Xiaonan Huang, Yu Zhang

Abstract: Deep multi-task learning attracts much attention in recent years as it achieves good performance in many applications. Feature learning is important to deep multi-task learning for sharing common information among tasks. In this paper, we propose a Hierarchical Graph Neural Network (HGNN) to learn augmented features for deep multi-task learning. The HGNN consists of two-level graph neural networks… ▽ More Deep multi-task learning attracts much attention in recent years as it achieves good performance in many applications. Feature learning is important to deep multi-task learning for sharing common information among tasks. In this paper, we propose a Hierarchical Graph Neural Network (HGNN) to learn augmented features for deep multi-task learning. The HGNN consists of two-level graph neural networks. In the low level, an intra-task graph neural network is responsible of learning a powerful representation for each data point in a task by aggregating its neighbors. Based on the learned representation, a task embedding can be generated for each task in a similar way to max pooling. In the second level, an inter-task graph neural network updates task embeddings of all the tasks based on the attention mechanism to model task relations. Then the task embedding of one task is used to augment the feature representation of data points in this task. Moreover, for classification tasks, an inter-class graph neural network is introduced to conduct similar operations on a finer granularity, i.e., the class level, to generate class embeddings for each class in all the tasks use class embeddings to augment the feature representation. The proposed feature augmentation strategy can be used in many deep multi-task learning models. we analyze the HGNN in terms of training and generalization losses. Experiments on real-world datastes show the significant performance improvement when using this strategy. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:2002.01927 [pdf, other]

doi 10.1038/s41467-021-27713-7

Self-Directed Online Machine Learning for Topology Optimization

Authors: Changyu Deng, Yizhou Wang, Can Qin, Yun Fu, Wei Lu

Abstract: Topology optimization by optimally distributing materials in a given domain requires non-gradient optimizers to solve highly complicated problems. However, with hundreds of design variables or more involved, solving such problems would require millions of Finite Element Method (FEM) calculations whose computational cost is huge and impractical. Here we report Self-directed Online Learning Optimiza… ▽ More Topology optimization by optimally distributing materials in a given domain requires non-gradient optimizers to solve highly complicated problems. However, with hundreds of design variables or more involved, solving such problems would require millions of Finite Element Method (FEM) calculations whose computational cost is huge and impractical. Here we report Self-directed Online Learning Optimization (SOLO) which integrates Deep Neural Network (DNN) with FEM calculations. A DNN learns and substitutes the objective as a function of design variables. A small number of training data is generated dynamically based on the DNN's prediction of the optimum. The DNN adapts to the new training data and gives better prediction in the region of interest until convergence. The optimum predicted by the DNN is proved to converge to the true global optimum through iterations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 ~ 5 orders of magnitude compared with directly using heuristic methods, and outperformed all state-of-the-art algorithms tested in our experiments. This approach enables solving large multi-dimensional optimization problems. △ Less

Submitted 25 January, 2022; v1 submitted 4 February, 2020; originally announced February 2020.

arXiv:1911.11984 [pdf, other]

SAG-VAE: End-to-end Joint Inference of Data Representations and Feature Relations

Authors: Chen Wang, Chengyuan Deng, Vladimir Ivanov

Abstract: Variational Autoencoders (VAEs) are powerful in data representation inference, but it cannot learn relations between features with its vanilla form and common variations. The ability to capture relations within data can provide the much needed inductive bias necessary for building more robust Machine Learning algorithms with more interpretable results. In this paper, inspired by recent advances in… ▽ More Variational Autoencoders (VAEs) are powerful in data representation inference, but it cannot learn relations between features with its vanilla form and common variations. The ability to capture relations within data can provide the much needed inductive bias necessary for building more robust Machine Learning algorithms with more interpretable results. In this paper, inspired by recent advances in relational learning using Graph Neural Networks, we propose the Self-Attention Graph Variational AutoEncoder (SAG-VAE) network which can simultaneously learn feature relations and data representations in an end-to-end manner. SAG-VAE is trained by jointly inferring the posterior distribution of two types of latent variables, which denote the data representation and a shared graph structure, respectively. Furthermore, we introduce a novel self-attention graph network that improves the generative capabilities of SAG-VAE by parameterizing the generative distribution allowing SAG-VAE to generate new data via graph convolution, while still trainable via backpropagation. A learnable relational graph representation enhances SAG-VAE's robustness to perturbation and noise, while also providing deeper intuition into model performance. Experiments based on graphs show that SAG-VAE is capable of approximately retrieving edges and links between nodes based entirely on feature observations. Finally, results on image data illustrate that SAG-VAE is fairly robust against perturbations in image reconstruction and sampling. △ Less

Submitted 22 July, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

arXiv:1910.02370 [pdf, other]

GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding

Authors: Chenhui Deng, Zhiqiang Zhao, Yongyu Wang, Zhiru Zhang, Zhuo Feng

Abstract: Graph embedding techniques have been increasingly deployed in a multitude of different applications that involve learning on non-Euclidean data. However, existing graph embedding models either fail to incorporate node attribute information during training or suffer from node attribute noise, which compromises the accuracy. Moreover, very few of them scale to large graphs due to their high computat… ▽ More Graph embedding techniques have been increasingly deployed in a multitude of different applications that involve learning on non-Euclidean data. However, existing graph embedding models either fail to incorporate node attribute information during training or suffer from node attribute noise, which compromises the accuracy. Moreover, very few of them scale to large graphs due to their high computational complexity and memory usage. In this paper we propose GraphZoom, a multi-level framework for improving both accuracy and scalability of unsupervised graph embedding algorithms. GraphZoom first performs graph fusion to generate a new graph that effectively encodes the topology of the original graph and the node attribute information. This fused graph is then repeatedly coarsened into much smaller graphs by merging nodes with high spectral similarities. GraphZoom allows any existing embedding methods to be applied to the coarsened graph, before it progressively refine the embeddings obtained at the coarsest level to increasingly finer graphs. We have evaluated our approach on a number of popular graph datasets for both transductive and inductive tasks. Our experiments show that GraphZoom can substantially increase the classification accuracy and significantly accelerate the entire graph embedding process by up to 40.8x, when compared to the state-of-the-art unsupervised embedding methods. △ Less

Submitted 17 February, 2020; v1 submitted 6 October, 2019; originally announced October 2019.

Comments: Published as a conference paper at ICLR 2020

Journal ref: International Conference on Learning Representations, ICLR 2020

arXiv:1908.03595 [pdf, other]

Adaptive Ensemble of Classifiers with Regularization for Imbalanced Data Classification

Authors: Chen Wang, Chengyuan Deng, Zhoulu Yu, Dafeng Hui, Xiaofeng Gong, Ruisen Luo

Abstract: The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence of the aforementioned technique on local geometry. In this study, focusing on binary imbalanced data classification, a novel dynamic ensemble method, namely adap… ▽ More The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence of the aforementioned technique on local geometry. In this study, focusing on binary imbalanced data classification, a novel dynamic ensemble method, namely adaptive ensemble of classifiers with regularization (AER), is proposed, to overcome the stated limitations. The method solves the overfitting problem through implicit regularization. Specifically, it leverages the properties of stochastic gradient descent to obtain the solution with the minimum norm, thereby achieving regularization; furthermore, it interpolates the ensemble weights by exploiting the global geometry of data to further prevent overfitting. According to our theoretical proofs, the seemingly complicated AER paradigm, in addition to its regularization capabilities, can actually reduce the asymptotic time and memory complexities of several other algorithms. We evaluate the proposed AER method on seven benchmark imbalanced datasets from the UCI machine learning repository and one artificially generated GMM-based dataset with five variations. The results show that the proposed algorithm outperforms the major existing algorithms based on multiple metrics in most cases, and two hypothesis tests (McNemar's and Wilcoxon tests) verify the statistical significance further. In addition, the proposed method has other preferred properties such as special advantages in dealing with highly imbalanced data, and it pioneers the research on the regularization for dynamic ensemble methods. △ Less

Submitted 5 November, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

Comments: Major revision; Change of authors due to contributions

arXiv:1908.01672 [pdf, other]

Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost

Authors: Chen Wang, Chengyuan Deng, Suzhen Wang

Abstract: The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors' knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a gen… ▽ More The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors' knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a general-purpose extension on XGBoost for label-imbalanced scenarios. In this paper, the design and usage of the package are described with exemplar code listings, and its convenience to be integrated into Python-driven Machine Learning projects is illustrated. Furthermore, as the first- and second-order derivatives of the loss functions are essential for the implementations, the algebraic derivation is discussed and it can be deemed as a separate algorithmic contribution. The performances of the algorithms implemented in the package are empirically evaluated on Parkinson's disease classification data set, and multiple state-of-the-art performances have been observed. Given the scalable nature of XGBoost, the package has great potentials to be applied to real-life binary classification tasks, which are usually of large-scale and label-imbalanced. △ Less

Submitted 22 August, 2021; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: Updated version, published in Pattern Recognition Letters 136(2020):190-197

Journal ref: Pattern Recognition Letters 136 (2020): 190-197

arXiv:1904.13113 [pdf, other]

Deep Spectral Clustering using Dual Autoencoder Network

Authors: Xu Yang, Cheng Deng, Feng Zheng, Junchi Yan, Wei Liu

Abstract: The clustering methods have recently absorbed even-increasing attention in learning and vision. Deep clustering combines embedding and clustering together to obtain optimal embedding subspace for clustering, which can be more effective compared with conventional clustering methods. In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering. We first d… ▽ More The clustering methods have recently absorbed even-increasing attention in learning and vision. Deep clustering combines embedding and clustering together to obtain optimal embedding subspace for clustering, which can be more effective compared with conventional clustering methods. In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering. We first devise a dual autoencoder network, which enforces the reconstruction constraint for the latent representations and their noisy versions, to embed the inputs into a latent space for clustering. As such the learned latent representations can be more robust to noise. Then the mutual information estimation is utilized to provide more discriminative information from the inputs. Furthermore, a deep spectral clustering method is applied to embed the latent representations into the eigenspace and subsequently clusters them, which can fully exploit the relationship between inputs to achieve optimal clustering results. Experimental results on benchmark datasets show that our method can significantly outperform state-of-the-art clustering approaches. △ Less

Submitted 30 April, 2019; originally announced April 2019.

arXiv:1611.03981 [pdf, other]

Dual Teaching: A Practical Semi-supervised Wrapper Method

Authors: Fuqaing Liu, Chenwei Deng, Fukun Bi, Yiding Yang

Abstract: Semi-supervised wrapper methods are concerned with building effective supervised classifiers from partially labeled data. Though previous works have succeeded in some fields, it is still difficult to apply semi-supervised wrapper methods to practice because the assumptions those methods rely on tend to be unrealistic in practice. For practical use, this paper proposes a novel semi-supervised wrapp… ▽ More Semi-supervised wrapper methods are concerned with building effective supervised classifiers from partially labeled data. Though previous works have succeeded in some fields, it is still difficult to apply semi-supervised wrapper methods to practice because the assumptions those methods rely on tend to be unrealistic in practice. For practical use, this paper proposes a novel semi-supervised wrapper method, Dual Teaching, whose assumptions are easy to set up. Dual Teaching adopts two external classifiers to estimate the false positives and false negatives of the base learner. Only if the recall of every external classifier is greater than zero and the sum of the precision is greater than one, Dual Teaching will train a base learner from partially labeled data as effectively as the fully-labeled-data-trained classifier. The effectiveness of Dual Teaching is proved in both theory and practice. △ Less

Submitted 12 November, 2016; originally announced November 2016.

Comments: 7 pages, 4 figures, 1 table

arXiv:1607.02804 [pdf, other]

Estimating the number of species to attain sufficient representation in a random sample

Authors: Chao Deng, Timothy Daley, Peter Calabrese, Jie Ren, Andrew D. Smith

Abstract: The statistical problem of using an initial sample to estimate the number of species in a larger sample has found important applications in fields far removed from ecology. Here we address the general problem of estimating the number of species that will be represented by at least a number r of observations in a future sample. The number r indicates species with sufficient observations, which are… ▽ More The statistical problem of using an initial sample to estimate the number of species in a larger sample has found important applications in fields far removed from ecology. Here we address the general problem of estimating the number of species that will be represented by at least a number r of observations in a future sample. The number r indicates species with sufficient observations, which are commonly used as a necessary condition for any robust statistical inference. We derive a procedure to construct consistent estimators that apply universally for a given population: once constructed, they can be evaluated as a simple function of r. Our approach is based on a relation between the number of species represented at least r times and the higher derivatives of the expected number of species discovered per unit of time. Combining this relation with a rational function approximation, we propose nonparametric estimators that are accurate for both large values of r and long-range extrapolations. We further show that our estimators retain asymptotic behaviors that are essential for applications on large-scale datasets. We evaluate the performance of this approach by both simulation and real data applications for inferences of the vocabulary of Shakespeare and Dickens, the topology of a Twitter social network, and molecular diversity in DNA sequencing data. △ Less

Submitted 15 May, 2018; v1 submitted 10 July, 2016; originally announced July 2016.

Comments: 19 pages, 5 figures, 3 tables

Showing 1–19 of 19 results for author: Deng, C