Search | arXiv e-print repository

Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network

Authors: Jaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng

Abstract: Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. Most existing works have focused on horizontal or vertical data distributions, where each client possesses different samples with shared features, or each client fully shares only sample indices, respectively. However, the hybrid… ▽ More Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. Most existing works have focused on horizontal or vertical data distributions, where each client possesses different samples with shared features, or each client fully shares only sample indices, respectively. However, the hybrid scheme is much less studied, even though it is much more common in the real world. Therefore, in this paper, we propose a generalized algorithm, FedGraph, that introduces a graph convolutional neural network to capture feature-sharing information while learning features from a subset of clients. We also develop a simple but effective clustering algorithm that aggregates features produced by the deep neural networks of each client while preserving data privacy. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2402.04417 [pdf, ps, other]

Decentralized Blockchain-based Robust Multi-agent Multi-armed Bandit

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study a robust multi-agent multi-armed bandit problem where multiple clients or participants are distributed on a fully decentralized blockchain, with the possibility of some being malicious. The rewards of arms are homogeneous among the clients, following time-invariant stochastic distributions that are revealed to the participants only when the system is secure enough. The system's objective… ▽ More We study a robust multi-agent multi-armed bandit problem where multiple clients or participants are distributed on a fully decentralized blockchain, with the possibility of some being malicious. The rewards of arms are homogeneous among the clients, following time-invariant stochastic distributions that are revealed to the participants only when the system is secure enough. The system's objective is to efficiently ensure the cumulative rewards gained by the honest participants. To this end and to the best of our knowledge, we are the first to incorporate advanced techniques from blockchains, as well as novel mechanisms, into the system to design optimal strategies for honest participants. This allows various malicious behaviors and the maintenance of participant privacy. More specifically, we randomly select a pool of validators who have access to all participants, design a brand-new consensus mechanism based on digital signatures for these validators, invent a UCB-based strategy that requires less information from participants through secure multi-party computation, and design the chain-participant interaction and an incentive mechanism to encourage participants' participation. Notably, we are the first to prove the theoretical guarantee of the proposed algorithms by regret analyses in the context of optimality in blockchains. Unlike existing work that integrates blockchains with learning problems such as federated learning which mainly focuses on numerical optimality, we demonstrate that the regret of honest participants is upper bounded by $log{T}$. This is consistent with the multi-agent multi-armed bandit problem without malicious participants and the robust multi-agent multi-armed bandit problem with purely Byzantine attacks. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 16 pages

arXiv:2311.16135 [pdf, other]

Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics

Authors: Garrett Blum, Ryan Doris, Diego Klabjan, Horacio Espinosa, Ron Szalkowski

Abstract: Stress-strain curves, or more generally, stress functions, are an extremely important characterization of a material's mechanical properties. However, stress functions are often difficult to derive and are narrowly tailored to a specific material. Further, large deformations, high strain-rates, temperature sensitivity, and effect of material parameters compound modeling challenges. We propose a ge… ▽ More Stress-strain curves, or more generally, stress functions, are an extremely important characterization of a material's mechanical properties. However, stress functions are often difficult to derive and are narrowly tailored to a specific material. Further, large deformations, high strain-rates, temperature sensitivity, and effect of material parameters compound modeling challenges. We propose a generalized deep neural network approach to model stress as a state function with quantile regression to capture uncertainty. We extend these models to uniaxial impact mechanics using stochastic differential equations to demonstrate a use case and provide a framework for implementing this uncertainty-aware stress function. We provide experiments benchmarking our approach against leading constitutive, machine learning, and transfer learning approaches to stress and impact mechanics modeling on publicly available and newly presented data sets. We also provide a framework to optimize material parameters given multiple competing impact scenarios. △ Less

Submitted 19 December, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: Index Terms: Stress, Uncertainty, Impact Mechanics, Deep Learning, Neural Network. 10 pages, 9 figures, 6 tables

arXiv:2311.07027 [pdf, other]

Robust softmax aggregation on blockchain based federated learning with convergence guarantee

Authors: Huiyu Wu, Diego Klabjan

Abstract: Blockchain based federated learning is a distributed learning scheme that allows model training without participants sharing their local data sets, where the blockchain components eliminate the need for a trusted central server compared to traditional Federated Learning algorithms. In this paper we propose a softmax aggregation blockchain based federated learning framework. First, we propose a new… ▽ More Blockchain based federated learning is a distributed learning scheme that allows model training without participants sharing their local data sets, where the blockchain components eliminate the need for a trusted central server compared to traditional Federated Learning algorithms. In this paper we propose a softmax aggregation blockchain based federated learning framework. First, we propose a new blockchain based federated learning architecture that utilizes the well-tested proof-of-stake consensus mechanism on an existing blockchain network to select validators and miners to aggregate the participants' updates and compute the blocks. Second, to ensure the robustness of the aggregation process, we design a novel softmax aggregation method based on approximated population loss values that relies on our specific blockchain architecture. Additionally, we show our softmax aggregation technique converges to the global minimum in the convex setting with non-restricting assumptions. Our comprehensive experiments show that our framework outperforms existing robust aggregation algorithms in various settings by large margins. △ Less

Submitted 28 December, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.03745 [pdf, other]

Unsupervised Video Summarization

Authors: Hanqing Li, Diego Klabjan, Jean Utke

Abstract: This paper introduces a new, unsupervised method for automatic video summarization using ideas from generative adversarial networks but eliminating the discriminator, having a simple loss function, and separating training of different parts of the model. An iterative training strategy is also applied by alternately training the reconstructor and the frame selector for multiple iterations. Furtherm… ▽ More This paper introduces a new, unsupervised method for automatic video summarization using ideas from generative adversarial networks but eliminating the discriminator, having a simple loss function, and separating training of different parts of the model. An iterative training strategy is also applied by alternately training the reconstructor and the frame selector for multiple iterations. Furthermore, a trainable mask vector is added to the model in summary generation during training and evaluation. The method also includes an unsupervised model selection algorithm. Results from experiments on two public datasets (SumMe and TVSum) and four datasets we created (Soccer, LoL, MLB, and ShortMLB) demonstrate the effectiveness of each component on the model performance, particularly the iterative training strategy. Evaluations and comparisons with the state-of-the-art methods highlight the advantages of the proposed method in performance, stability, and training efficiency. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.02546 [pdf, ps, other]

On the Second-Order Convergence of Biased Policy Gradient Algorithms

Authors: Siqiao Mu, Diego Klabjan

Abstract: Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points. Existing results only consider vanilla policy gradient algorithms with unbiased gradient estimators, but practical implementations under the infinite-horizon discounted… ▽ More Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points. Existing results only consider vanilla policy gradient algorithms with unbiased gradient estimators, but practical implementations under the infinite-horizon discounted reward setting are biased due to finite-horizon sampling. Moreover, actor-critic methods, whose second-order convergence has not yet been established, are also biased due to the critic approximation of the value function. We provide a novel second-order analysis of biased policy gradient methods, including the vanilla gradient estimator computed from Monte-Carlo sampling of trajectories as well as the double-loop actor-critic algorithm, where in the inner loop the critic improves the approximation of the value function via TD(0) learning. Separately, we also establish the convergence of TD(0) on Markov chains irrespective of initial state distribution. △ Less

Submitted 13 May, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.10611 [pdf, other]

IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

Authors: Taejong Joo, Diego Klabjan

Abstract: Reasoning about a model's accuracy on a test sample from its confidence is a central problem in machine learning, being connected to important applications such as uncertainty representation, model selection, and exploration. While these connections have been well-studied in the i.i.d. settings, distribution shifts pose significant challenges to the traditional methods. Therefore, model calibratio… ▽ More Reasoning about a model's accuracy on a test sample from its confidence is a central problem in machine learning, being connected to important applications such as uncertainty representation, model selection, and exploration. While these connections have been well-studied in the i.i.d. settings, distribution shifts pose significant challenges to the traditional methods. Therefore, model calibration and model selection remain challenging in the unsupervised domain adaptation problem--a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by develo** a novel importance weighted group accuracy estimator. Specifically, we formulate an optimization problem for finding an importance weight that leads to an accurate group accuracy estimation in the distribution shifted domain with theoretical analyses. Extensive experiments show the effectiveness of group accuracy estimation on model calibration and model selection. Our results emphasize the significance of group accuracy estimation for addressing challenges in unsupervised domain adaptation, as an orthogonal improvement direction with improving transferability of accuracy. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.01063 [pdf, other]

Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time War** Algorithm

Authors: Yintai Ma, Diego Klabjan

Abstract: This paper presents a novel semi-supervised deep learning algorithm for retrieving similar 2D and 3D videos based on visual content. The proposed approach combines the power of deep convolutional and recurrent neural networks with dynamic time war** as a similarity measure. The proposed algorithm is designed to handle large video datasets and retrieve the most related videos to a given inquiry v… ▽ More This paper presents a novel semi-supervised deep learning algorithm for retrieving similar 2D and 3D videos based on visual content. The proposed approach combines the power of deep convolutional and recurrent neural networks with dynamic time war** as a similarity measure. The proposed algorithm is designed to handle large video datasets and retrieve the most related videos to a given inquiry video clip based on its graphical frames and contents. We split both the candidate and the inquiry videos into a sequence of clips and convert each clip to a representation vector using an autoencoder-backed deep neural network. We then calculate a similarity measure between the sequences of embedding vectors using a bi-directional dynamic time-war** method. This approach is tested on multiple public datasets, including CC\_WEB\_VIDEO, Youtube-8m, S3DIS, and Synthia, and showed good results compared to state-of-the-art. The algorithm effectively solves video retrieval tasks and outperforms the benchmarked state-of-the-art deep learning model. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: 10 pages, submitted to IEEE Conference Big Data 2023

arXiv:2309.00626 [pdf, other]

An Ensemble Method of Deep Reinforcement Learning for Automated Cryptocurrency Trading

Authors: Shuyang Wang, Diego Klabjan

Abstract: We propose an ensemble method to improve the generalization performance of trading strategies trained by deep reinforcement learning algorithms in a highly stochastic environment of intraday cryptocurrency portfolio trading. We adopt a model selection method that evaluates on multiple validation periods, and propose a novel mixture distribution policy to effectively ensemble the selected models. W… ▽ More We propose an ensemble method to improve the generalization performance of trading strategies trained by deep reinforcement learning algorithms in a highly stochastic environment of intraday cryptocurrency portfolio trading. We adopt a model selection method that evaluates on multiple validation periods, and propose a novel mixture distribution policy to effectively ensemble the selected models. We provide a distributional view of the out-of-sample performance on granular test periods to demonstrate the robustness of the strategies in evolving market conditions, and retrain the models periodically to address non-stationarity of financial data. Our proposed ensemble method improves the out-of-sample performance compared with the benchmarks of a deep reinforcement learning strategy and a passive investment strategy. △ Less

Submitted 27 July, 2023; originally announced September 2023.

arXiv:2308.08046 [pdf, ps, other]

Regret Lower Bounds in Multi-agent Multi-armed Bandit

Authors: Mengfan Xu, Diego Klabjan

Abstract: Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by reg… ▽ More Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 10 pages

arXiv:2307.07529 [pdf, other]

Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints

Authors: Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

Abstract: This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthet… ▽ More This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthetic rewards (MARLM-SR) and prove that it serves as a lower bound of the optimal value function. Computationally, we propose a practical training algorithm that exploits new notion of leader agent and reward generator and distributor agent to guide the decomposed follower agents to better explore the parameter space in environments with DAG constraints. Empirically, we exploit four DAG environments including a real-world scheduling for one of Intel's high volume packaging and test factory to benchmark our methods and show it outperforms the other non-DAG approaches. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.00226 [pdf, other]

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

Authors: Ye Xue, Diego Klabjan, Jean Utke

Abstract: Multimodal multitask learning has attracted an increasing interest in recent years. Singlemodal models have been advancing rapidly and have achieved astonishing results on various tasks across multiple domains. Multimodal learning offers opportunities for further improvements by integrating data from multiple modalities. Many methods are proposed to learn on a specific type of multimodal data, suc… ▽ More Multimodal multitask learning has attracted an increasing interest in recent years. Singlemodal models have been advancing rapidly and have achieved astonishing results on various tasks across multiple domains. Multimodal learning offers opportunities for further improvements by integrating data from multiple modalities. Many methods are proposed to learn on a specific type of multimodal data, such as vision and language data. A few of them are designed to handle several modalities and tasks at a time. In this work, we extend and improve Omninet, an architecture that is capable of handling multiple modalities and tasks at a time, by introducing cross-cache attention, integrating patch embeddings for vision inputs, and supporting structured data. The proposed Structured-data-enhanced Omninet (S-Omninet) is a universal model that is capable of learning from structured data of various dimensions effectively with unstructured data through cross-cache attention, which enables interactions among spatial, temporal, and structured features. We also enhance spatial representations in a spatial cache with patch embeddings. We evaluate the proposed model on several multimodal datasets and demonstrate a significant improvement over the baseline, Omninet. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.05579 [pdf, other]

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an a… ▽ More We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions. △ Less

Submitted 17 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 58 pages, to appear at Advances in Neural Information Processing Systems (NeurIPS 2023 Spotlight)

arXiv:2305.01151 [pdf, ps, other]

Early Classifying Multimodal Sequences

Authors: Alexander Cao, Jean Utke, Diego Klabjan

Abstract: Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand in… ▽ More Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand into early classifying multimodal sequences by combining existing methods. We show our new method yields experimental AUC advantages of up to 8.7%. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 7 pages, 5 figures

arXiv:2304.11268 [pdf, other]

Stochastic Scale Invariant Power Iteration for KL-divergence Nonnegative Matrix Factorization

Authors: Cheolmin Kim, Youngseok Kim, Diego Klabjan

Abstract: We introduce a mini-batch stochastic variance-reduced algorithm to solve finite-sum scale invariant problems which cover several examples in machine learning and statistics such as principal component analysis (PCA) and estimation of mixture proportions. The algorithm is a stochastic generalization of scale invariant power iteration, specializing to power iteration when full-batch is used for the… ▽ More We introduce a mini-batch stochastic variance-reduced algorithm to solve finite-sum scale invariant problems which cover several examples in machine learning and statistics such as principal component analysis (PCA) and estimation of mixture proportions. The algorithm is a stochastic generalization of scale invariant power iteration, specializing to power iteration when full-batch is used for the PCA problem. In convergence analysis, we show the expectation of the optimality gap decreases at a linear rate under some conditions on the step size, epoch length, batch size and initial iterate. Numerical experiments on the non-negative factorization problem with the Kullback-Leibler divergence using real and synthetic datasets demonstrate that the proposed stochastic approach not only converges faster than state-of-the-art deterministic algorithms but also produces excellent quality robust solutions. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.03463 [pdf, ps, other]

A Policy for Early Sequence Classification

Authors: Alexander Cao, Jean Utke, Diego Klabjan

Abstract: Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as possible, without having to wait for the last element. For this early sequence classification, we introduce our novel classifier-induced stop**. While previ… ▽ More Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as possible, without having to wait for the last element. For this early sequence classification, we introduce our novel classifier-induced stop**. While previous methods depend on exploration during training to learn when to stop and classify, ours is a more direct, supervised approach. Our classifier-induced stop** achieves an average Pareto frontier AUC increase of 11.8% over multiple experiments. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: 12 pages, 6 figures

arXiv:2302.14299 [pdf, other]

Gradient-Boosted Based Structured and Unstructured Learning

Authors: Andrea Treviño Gavito, Diego Klabjan, Jean Utke

Abstract: We propose two frameworks to deal with problem settings in which both structured and unstructured data are available. Structured data problems are best solved by traditional machine learning models such as boosting and tree-based algorithms, whereas deep learning has been widely applied to problems dealing with images, text, audio, and other unstructured data sources. However, for the setting in w… ▽ More We propose two frameworks to deal with problem settings in which both structured and unstructured data are available. Structured data problems are best solved by traditional machine learning models such as boosting and tree-based algorithms, whereas deep learning has been widely applied to problems dealing with images, text, audio, and other unstructured data sources. However, for the setting in which both structured and unstructured data are accessible, it is not obvious what the best modeling approach is to enhance performance on both data sources simultaneously. Our proposed frameworks allow joint learning on both kinds of data by integrating the paradigms of boosting models and deep neural networks. The first framework, the boosted-feature-vector deep learning network, learns features from the structured data using gradient boosting and combines them with embeddings from unstructured data via a two-branch deep neural network. Secondly, the two-weak-learner boosting framework extends the boosting paradigm to the setting with two input data sources. We present and compare first- and second-order methods of this framework. Our experimental results on both public and real-world datasets show performance gains achieved by the frameworks over selected baselines by magnitudes of 0.1% - 4.7%. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.14278 [pdf, other]

Multi-Layer Attention-Based Explainability via Transformers for Tabular Data

Authors: Andrea Treviño Gavito, Diego Klabjan, Jean Utke

Abstract: We propose a graph-oriented attention-based explainability method for tabular data. Tasks involving tabular data have been solved mostly using traditional tree-based machine learning models which have the challenges of feature selection and engineering. With that in mind, we consider a transformer architecture for tabular data, which is amenable to explainability, and present a novel way to levera… ▽ More We propose a graph-oriented attention-based explainability method for tabular data. Tasks involving tabular data have been solved mostly using traditional tree-based machine learning models which have the challenges of feature selection and engineering. With that in mind, we consider a transformer architecture for tabular data, which is amenable to explainability, and present a novel way to leverage self-attention mechanism to provide explanations by taking into account the attention matrices of all heads and layers as a whole. The matrices are mapped to a graph structure where groups of features correspond to nodes and attention values to arcs. By finding the maximum probability paths in the graph, we identify groups of features providing larger contributions to explain the model's predictions. To assess the quality of multi-layer attention-based explanations, we compare them with popular attention-, gradient-, and perturbation-based explanability methods. △ Less

Submitted 3 June, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2212.11360 [pdf, other]

Feature Acquisition using Monte Carlo Tree Search

Authors: Sungsoo Lim, Diego Klabjan, Mark Shapiro

Abstract: Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models. Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences. Other approaches formulated the problem as a Markov Decision Process (MDP) and applied reinforc… ▽ More Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models. Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences. Other approaches formulated the problem as a Markov Decision Process (MDP) and applied reinforcement learning based algorithms. In comparison to previous approaches, we focus on 1) formulating the feature acquisition problem as a MDP and applying Monte Carlo Tree Search, 2) calculating the intermediary rewards for each acquisition step based on model improvements and acquisition costs and 3) simultaneously optimizing model improvement and acquisition costs with multi-objective Monte Carlo Tree Search. With Proximal Policy Optimization and Deep Q-Network algorithms as benchmark, we show the effectiveness of our proposed approach with experimental study. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: 13 pages, 7 figures

arXiv:2212.00884 [pdf, other]

Pareto Regret Analyses in Multi-objective Multi-armed Bandit

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both… ▽ More We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting. The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in stochastic settings simultaneously by our established upper bounds and lower bounds on Pareto regrets. Moreover, the lower bound analyses show that the new regrets are consistent with the existing Pareto regret for stochastic settings and extend an adversarial attack mechanism from bandit to the multi-objective one. △ Less

Submitted 30 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: 19 pages; accepted at ICML 2023 and to be published in Proceedings of Machine Learning Research (PMLR)

arXiv:2210.08106 [pdf, other]

A Primal-Dual Algorithm for Hybrid Federated Learning

Authors: Tom Overman, Garrett Blum, Diego Klabjan

Abstract: Very few methods for hybrid federated learning, where clients only hold subsets of both features and samples, exist. Yet, this scenario is extremely important in practical settings. We provide a fast, robust algorithm for hybrid federated learning that hinges on Fenchel Duality. We prove the convergence of the algorithm to the same solution as if the model is trained centrally in a variety of prac… ▽ More Very few methods for hybrid federated learning, where clients only hold subsets of both features and samples, exist. Yet, this scenario is extremely important in practical settings. We provide a fast, robust algorithm for hybrid federated learning that hinges on Fenchel Duality. We prove the convergence of the algorithm to the same solution as if the model is trained centrally in a variety of practical regimes. Furthermore, we provide experimental results that demonstrate the performance improvements of the algorithm over a commonly used method in federated learning, FedAvg, and an existing hybrid FL algorithm, HyFEM. We also provide privacy considerations and necessary steps to protect client data. △ Less

Submitted 9 February, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted by AAAI 2024. To appear in AAAI proceedings

arXiv:2210.05607 [pdf, other]

Divergence Results and Convergence of a Variance Reduced Version of ADAM

Authors: Ruiqi Wang, Diego Klabjan

Abstract: Stochastic optimization algorithms using exponential moving averages of the past gradients, such as ADAM, RMSProp and AdaGrad, have been having great successes in many applications, especially in training deep neural networks. ADAM in particular stands out as efficient and robust. Despite of its outstanding performance, ADAM has been proved to be divergent for some specific problems. We revisit th… ▽ More Stochastic optimization algorithms using exponential moving averages of the past gradients, such as ADAM, RMSProp and AdaGrad, have been having great successes in many applications, especially in training deep neural networks. ADAM in particular stands out as efficient and robust. Despite of its outstanding performance, ADAM has been proved to be divergent for some specific problems. We revisit the divergent question and provide divergent examples under stronger conditions such as in expectation or high probability. Under a variance reduction assumption, we show that an ADAM-type algorithm converges, which means that it is the variance of gradients that causes the divergence of original ADAM. To this end, we propose a variance reduced version of ADAM and provide a convergent analysis of the algorithm. Numerical experiments show that the proposed algorithm has as good performance as ADAM. Our work suggests a new direction for fixing the convergence issues. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2205.00548 [pdf, other]

Large-Scale Multi-Document Summarization with Information Extraction and Compression

Authors: Ning Wang, Han Liu, Diego Klabjan

Abstract: We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents. Unlike existing multi-document summarization methods, our framework processes documents telling different stories instead of documents on the same topic. We also enhance an existing sentence fusion method with a uni-directional language model to prioritize fused sentences with higher… ▽ More We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents. Unlike existing multi-document summarization methods, our framework processes documents telling different stories instead of documents on the same topic. We also enhance an existing sentence fusion method with a uni-directional language model to prioritize fused sentences with higher sentence probability with the goal of increasing readability. Lastly, we construct a total of twelve dataset variations based on CNN/Daily Mail and the NewsRoom datasets, where each document group contains a large and diverse collection of documents to evaluate the performance of our model in comparison with other baseline systems. Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting. △ Less

Submitted 1 May, 2022; originally announced May 2022.

arXiv:2203.00762 [pdf, other]

Topic Analysis for Text with Side Data

Authors: Biyi Fang, Kripa Rajshekhar, Diego Klabjan

Abstract: Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with side data to tackle these limitations. We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a… ▽ More Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with side data to tackle these limitations. We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a four-level hierarchical Bayesian model. In the model, each document is modeled as a finite mixture over an underlying set of topics and each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Furthermore, each topic probability is modeled as a finite mixture over side data. In the context of text, the neural network provides an overview distribution about side data for the corresponding text, which is the prior distribution in LDA to help perform topic grou**. The approach is evaluated on several different datasets, where the model is shown to outperform standard LDA and Dirichlet-multinomial regression (DMR) in terms of topic grou**, model perplexity, classification and comment generation. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2203.00761 [pdf, other]

Tricks and Plugins to GBM on Images and Sequences

Authors: Biyi Fang, Jean Utke, Diego Klabjan

Abstract: Convolutional neural networks (CNNs) and transformers, which are composed of multiple processing layers and blocks to learn the representations of data with multiple abstract levels, are the most successful machine learning models in recent years. However, millions of parameters and many blocks make them difficult to be trained, and sometimes several days or weeks are required to find an ideal arc… ▽ More Convolutional neural networks (CNNs) and transformers, which are composed of multiple processing layers and blocks to learn the representations of data with multiple abstract levels, are the most successful machine learning models in recent years. However, millions of parameters and many blocks make them difficult to be trained, and sometimes several days or weeks are required to find an ideal architecture or tune the parameters. Within this paper, we propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN, and another new family of algorithms combining boosting and transformers. To learn these new models, we introduce subgrid selection and importance sampling strategies and propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function. These algorithms not only reduce the required manual effort for finding an appropriate network architecture but also result in superior performance and lower running time. Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2201.02923 [pdf, ps, other]

Open-Set Recognition of Breast Cancer Treatments

Authors: Alexander Cao, Diego Klabjan, Yuan Luo

Abstract: Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from strai… ▽ More Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from straightforward implementations of prior work in healthcare open-set learning. Accordingly, we reframe the problem methodology and apply a recent existing Gaussian mixture variational autoencoder model, which achieves state-of-the-art results for image datasets, to breast cancer patient data. Not only do we obtain more accurate and robust classification results, with a 24.5% average F1 increase compared to a recent method, but we also reexamine open-set recognition in terms of deployability to a clinical setting. △ Less

Submitted 8 January, 2022; originally announced January 2022.

Comments: 22 pages, 9 figures and 9 tables

arXiv:2111.08577 [pdf, other]

Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation

Authors: Abdolghani Ebrahimi, Diego Klabjan

Abstract: Existing methods of pruning deep neural networks focus on removing unnecessary parameters of the trained network and fine tuning the model afterwards to find a good solution that recovers the initial performance of the trained model. Unlike other works, our method pays special attention to the quality of the solution in the compressed model and inference computation time by pruning neurons. The pr… ▽ More Existing methods of pruning deep neural networks focus on removing unnecessary parameters of the trained network and fine tuning the model afterwards to find a good solution that recovers the initial performance of the trained model. Unlike other works, our method pays special attention to the quality of the solution in the compressed model and inference computation time by pruning neurons. The proposed algorithm directs the parameters of the compressed model toward a flatter solution by exploring the spectral radius of Hessian which results in better generalization on unseen data. Moreover, the method does not work with a pre-trained network and performs training and pruning simultaneously. Our result shows that it improves the state-of-the-art results on neuron compression. The method is able to achieve very small networks with small accuracy degradation across different neural network models. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: 15 pages, 5 figures

arXiv:2108.07433 [pdf, other]

Aggregation Delayed Federated Learning

Authors: Ye Xue, Diego Klabjan, Yuan Luo

Abstract: Federated learning is a distributed machine learning paradigm where multiple data owners (clients) collaboratively train one machine learning model while kee** data on their own devices. The heterogeneity of client datasets is one of the most important challenges of federated learning algorithms. Studies have found performance reduction with standard federated algorithms, such as FedAvg, on non-… ▽ More Federated learning is a distributed machine learning paradigm where multiple data owners (clients) collaboratively train one machine learning model while kee** data on their own devices. The heterogeneity of client datasets is one of the most important challenges of federated learning algorithms. Studies have found performance reduction with standard federated algorithms, such as FedAvg, on non-IID data. Many existing works on handling non-IID data adopt the same aggregation framework as FedAvg and focus on improving model updates either on the server side or on clients. In this work, we tackle this challenge in a different view by introducing redistribution rounds that delay the aggregation. We perform experiments on multiple tasks and show that the proposed framework significantly improves the performance on non-IID data. △ Less

Submitted 17 August, 2021; originally announced August 2021.

arXiv:2107.02845 [pdf, other]

Logit-based Uncertainty Measure in Classification

Authors: Huiyu Wu, Diego Klabjan

Abstract: We introduce a new, reliable, and agnostic uncertainty measure for classification tasks called logit uncertainty. It is based on logit outputs of neural networks. We in particular show that this new uncertainty measure yields a superior performance compared to existing uncertainty measures on different tasks, including out of sample detection and finding erroneous predictions. We analyze theoretic… ▽ More We introduce a new, reliable, and agnostic uncertainty measure for classification tasks called logit uncertainty. It is based on logit outputs of neural networks. We in particular show that this new uncertainty measure yields a superior performance compared to existing uncertainty measures on different tasks, including out of sample detection and finding erroneous predictions. We analyze theoretical foundations of the measure and explore a relationship with high density regions. We also demonstrate how to test uncertainty using intermediate outputs in training of generative adversarial networks. We propose two potential ways to utilize logit-based uncertainty in real world applications, and show that the uncertainty measure outperforms. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2105.10065 [pdf, other]

A Probabilistic Approach to Neural Network Pruning

Authors: Xin Qian, Diego Klabjan

Abstract: Neural network pruning techniques reduce the number of parameters without compromising predicting ability of a network. Many algorithms have been developed for pruning both over-parameterized fully-connected networks (FCNs) and convolutional neural networks (CNNs), but analytical studies of capabilities and compression ratios of such pruned sub-networks are lacking. We theoretically study the perf… ▽ More Neural network pruning techniques reduce the number of parameters without compromising predicting ability of a network. Many algorithms have been developed for pruning both over-parameterized fully-connected networks (FCNs) and convolutional neural networks (CNNs), but analytical studies of capabilities and compression ratios of such pruned sub-networks are lacking. We theoretically study the performance of two pruning techniques (random and magnitude-based) on FCNs and CNNs. Given a target network {whose weights are independently sampled from appropriate distributions}, we provide a universal approach to bound the gap between a pruned and the target network in a probabilistic sense. The results establish that there exist pruned networks with expressive power within any specified bound from the target network. △ Less

Submitted 20 May, 2021; originally announced May 2021.

arXiv:2102.11210 [pdf, other]

Non-Convex Optimization with Spectral Radius Regularization

Authors: Adam Sandler, Diego Klabjan, Yuan Luo

Abstract: We develop a regularization method which finds flat minima during the training of deep neural networks and other machine learning models. These minima generalize better than sharp minima, allowing models to better generalize to real word test data, which may be distributed differently from the training data. Specifically, we propose a method of regularized optimization to reduce the spectral radiu… ▽ More We develop a regularization method which finds flat minima during the training of deep neural networks and other machine learning models. These minima generalize better than sharp minima, allowing models to better generalize to real word test data, which may be distributed differently from the training data. Specifically, we propose a method of regularized optimization to reduce the spectral radius of the Hessian of the loss function. Additionally, we derive algorithms to efficiently perform this optimization on neural networks and prove convergence results for these algorithms. Furthermore, we demonstrate that our algorithm works effectively on multiple real world applications in multiple domains including healthcare. In order to show our models generalize well, we introduce different methods of testing generalizability. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 12 pages

arXiv:2102.00380 [pdf, other]

Classification Models for Partially Ordered Sequences

Authors: Stephanie Ger, Diego Klabjan, Jean Utke

Abstract: Many models such as Long Short Term Memory (LSTMs), Gated Recurrent Units (GRUs) and transformers have been developed to classify time series data with the assumption that events in a sequence are ordered. On the other hand, fewer models have been developed for set based inputs, where order does not matter. There are several use cases where data is given as partially-ordered sequences because of t… ▽ More Many models such as Long Short Term Memory (LSTMs), Gated Recurrent Units (GRUs) and transformers have been developed to classify time series data with the assumption that events in a sequence are ordered. On the other hand, fewer models have been developed for set based inputs, where order does not matter. There are several use cases where data is given as partially-ordered sequences because of the granularity or uncertainty of time stamps. We introduce a novel transformer based model for such prediction tasks, and benchmark against extensions of existing order invariant models. We also discuss how transition probabilities between events in a sequence can be used to improve model performance. We show that the transformer-based equal-time model outperforms extensions of existing set models on three data sets. △ Less

Submitted 31 January, 2021; originally announced February 2021.

arXiv:2101.02561 [pdf, other]

Open Set Domain Adaptation by Extreme Value Theory

Authors: Yiming Xu, Diego Klabjan

Abstract: Common domain adaptation techniques assume that the source domain and the target domain share an identical label space, which is problematic since when target samples are unlabeled we have no knowledge on whether the two domains share the same label space. When this is not the case, the existing methods fail to perform well because the additional unknown classes are also matched with the source do… ▽ More Common domain adaptation techniques assume that the source domain and the target domain share an identical label space, which is problematic since when target samples are unlabeled we have no knowledge on whether the two domains share the same label space. When this is not the case, the existing methods fail to perform well because the additional unknown classes are also matched with the source domain during adaptation. In this paper, we tackle the open set domain adaptation problem under the assumption that the source and the target label spaces only partially overlap, and the task becomes when the unknown classes exist, how to detect the target unknown classes and avoid aligning them with the source domain. We propose to utilize an instance-level reweighting strategy for domain adaptation where the weights indicate the likelihood of a sample belonging to known classes and to model the tail of the entropy distribution with Extreme Value Theory for unknown class detection. Experiments on conventional domain adaptation datasets show that the proposed method outperforms the state-of-the-art models. △ Less

Submitted 22 December, 2020; originally announced January 2021.

arXiv:2012.04759 [pdf, other]

Concept Drift and Covariate Shift Detection Ensemble with Lagged Labels

Authors: Yiming Xu, Diego Klabjan

Abstract: In model serving, having one fixed model during the entire often life-long inference process is usually detrimental to model performance, as data distribution evolves over time, resulting in lack of reliability of the model trained on historical data. It is important to detect changes and retrain the model in time. The existing methods generally have three weaknesses: 1) using only classification… ▽ More In model serving, having one fixed model during the entire often life-long inference process is usually detrimental to model performance, as data distribution evolves over time, resulting in lack of reliability of the model trained on historical data. It is important to detect changes and retrain the model in time. The existing methods generally have three weaknesses: 1) using only classification error rate as signal, 2) assuming ground truth labels are immediately available after features from samples are received and 3) unable to decide what data to use to retrain the model when change occurs. We address the first problem by utilizing six different signals to capture a wide range of characteristics of data, and we address the second problem by allowing lag of labels, where labels of corresponding features are received after a lag in time. For the third problem, our proposed method automatically decides what data to use to retrain based on the signals. Extensive experiments on structured and unstructured data for different type of data changes establish that our method consistently outperforms the state-of-the-art methods by a large margin. △ Less

Submitted 14 December, 2020; v1 submitted 8 December, 2020; originally announced December 2020.

arXiv:2009.14111 [pdf, other]

Inverse Classification with Limited Budget and Maximum Number of Perturbed Samples

Authors: Jaehoon Koo, Diego Klabjan, Jean Utke

Abstract: Most recent machine learning research focuses on develo** new classifiers for the sake of improving classification accuracy. With many well-performing state-of-the-art classifiers available, there is a growing need for understanding interpretability of a classifier necessitated by practical purposes such as to find the best diet recommendation for a diabetes patient. Inverse classification is a… ▽ More Most recent machine learning research focuses on develo** new classifiers for the sake of improving classification accuracy. With many well-performing state-of-the-art classifiers available, there is a growing need for understanding interpretability of a classifier necessitated by practical purposes such as to find the best diet recommendation for a diabetes patient. Inverse classification is a post modeling process to find changes in input features of samples to alter the initially predicted class. It is useful in many business applications to determine how to adjust a sample input data such that the classifier predicts it to be in a desired class. In real world applications, a budget on perturbations of samples corresponding to customers or patients is usually considered, and in this setting, the number of successfully perturbed samples is key to increase benefits. In this study, we propose a new framework to solve inverse classification that maximizes the number of perturbed samples subject to a per-feature-budget limits and favorable classification classes of the perturbed samples. We design algorithms to solve this optimization problem based on gradient methods, stochastic processes, Lagrangian relaxations, and the Gumbel trick. In experiments, we find that our algorithms based on stochastic processes exhibit an excellent performance in different budget settings and they scale well. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.09538 [pdf, other]

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-t… ▽ More We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-type methods that integrate multiple agents (experts) for exploration in bandits with the assumption that rewards are bounded, we propose new algorithms, namely EXP4.P and EXP4-RL for exploration in the unbounded reward case, and demonstrate their effectiveness in these new settings. Unbounded rewards introduce challenges as the regret cannot be limited by the number of trials, and selecting suboptimal arms may lead to infinite regret. Specifically, we establish EXP4.P's regret upper bounds in both bounded and unbounded linear and stochastic contextual bandits. Surprisingly, we also find that by including one sufficiently competent expert, EXP4.P can achieve global optimality in the linear case. This unbounded reward result is also applicable to a revised version of EXP3.P in the Multi-armed Bandit scenario. In EXP4-RL, we extend EXP4.P from bandit scenarios to reinforcement learning to incentivize exploration by multiple agents, including one high-performing agent, for both efficiency and excellence. This algorithm has been tested on difficult-to-explore games and shows significant improvements in exploration compared to state-of-the-art. △ Less

Submitted 3 May, 2024; v1 submitted 20 September, 2020; originally announced September 2020.

Comments: 40 pages, 8 figures

arXiv:2006.04027 [pdf, ps, other]

Efficient Architecture Search for Continual Learning

Authors: Qiang Gao, Zhipeng Luo, Diego Klabjan

Abstract: Continual learning with neural networks is an important learning framework in AI that aims to learn a sequence of tasks well. However, it is often confronted with three challenges: (1) overcome the catastrophic forgetting problem, (2) adapt the current network to new tasks, and meanwhile (3) control its model complexity. To reach these goals, we propose a novel approach named as Continual Learning… ▽ More Continual learning with neural networks is an important learning framework in AI that aims to learn a sequence of tasks well. However, it is often confronted with three challenges: (1) overcome the catastrophic forgetting problem, (2) adapt the current network to new tasks, and meanwhile (3) control its model complexity. To reach these goals, we propose a novel approach named as Continual Learning with Efficient Architecture Search, or CLEAS in short. CLEAS works closely with neural architecture search (NAS) which leverages reinforcement learning techniques to search for the best neural architecture that fits a new task. In particular, we design a neuron-level NAS controller that decides which old neurons from previous tasks should be reused (knowledge transfer), and which new neurons should be added (to learn new knowledge). Such a fine-grained controller allows one to find a very concise architecture that can fit each new task well. Meanwhile, since we do not alter the weights of the reused neurons, we perfectly memorize the knowledge learned from previous tasks. We evaluate CLEAS on numerous sequential classification tasks, and the results demonstrate that CLEAS outperforms other state-of-the-art alternative methods, achieving higher classification accuracy while using simpler neural architectures. △ Less

Submitted 9 June, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

Comments: 12 pages, 11 figures

arXiv:2006.02003 [pdf, other]

Open-Set Recognition with Gaussian Mixture Variational Autoencoders

Authors: Alexander Cao, Yuan Luo, Diego Klabjan

Abstract: In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively… ▽ More In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively learn reconstruction and perform class-based clustering in the latent space. With this, our Gaussian mixture variational autoencoder (GMVAE) achieves more accurate and robust open-set classification results, with an average F1 improvement of 29.5%, through extensive experiments aided by analytical results. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: 12 pages including 8 figures and 4 tables, plus 6 pages of supplementary material

arXiv:2004.14203 [pdf, other]

Neural Network Retraining for Model Serving

Authors: Diego Klabjan, Xiaofeng Zhu

Abstract: We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference during model serving. As such, this is a life-long learning process. We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining. If we combine all past and new data it can easily become intractable to retrain the neural network model. On the… ▽ More We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference during model serving. As such, this is a life-long learning process. We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining. If we combine all past and new data it can easily become intractable to retrain the neural network model. On the other hand, if the model is retrained using only new data, it can easily suffer catastrophic forgetting and thus it is paramount to strike the right balance. Moreover, if we retrain all weights of the model every time new data is collected, retraining tends to require too many computing resources. To solve these two issues, we propose a novel retraining model that can select important samples and important weights utilizing multi-armed bandits. To further address forgetting, we propose a new regularization term focusing on synapse and neuron importance. We analyze multiple datasets to document the outcome of the proposed retraining methods. Various experiments demonstrate that our retraining methodologies mitigate the catastrophic forgetting problem while boosting model performance. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2004.13146 [pdf, other]

The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent

Authors: Xin Qian, Diego Klabjan

Abstract: The mini-batch stochastic gradient descent (SGD) algorithm is widely used in training machine learning models, in particular deep learning models. We study SGD dynamics under linear regression and two-layer linear networks, with an easy extension to deeper linear networks, by focusing on the variance of the gradients, which is the first study of this nature. In the linear regression case, we show… ▽ More The mini-batch stochastic gradient descent (SGD) algorithm is widely used in training machine learning models, in particular deep learning models. We study SGD dynamics under linear regression and two-layer linear networks, with an easy extension to deeper linear networks, by focusing on the variance of the gradients, which is the first study of this nature. In the linear regression case, we show that in each iteration the norm of the gradient is a decreasing function of the mini-batch size $b$ and thus the variance of the stochastic gradient estimator is a decreasing function of $b$. For deep neural networks with $L_2$ loss we show that the variance of the gradient is a polynomial in $1/b$. The results back the important intuition that smaller batch sizes yield lower loss function values which is a common believe among the researchers. The proof techniques exhibit a relationship between stochastic gradient estimators and initial weights, which is useful for further research on the dynamics of SGD. We empirically provide further insights to our results on various datasets and commonly used deep network structures. △ Less

Submitted 27 April, 2020; originally announced April 2020.

arXiv:2001.07866 [pdf, other]

Keyword-based Topic Modeling and Keyword Selection

Authors: Xingyu Wang, Lida Zhang, Diego Klabjan

Abstract: Certain type of documents such as tweets are collected by specifying a set of keywords. As topics of interest change with time it is beneficial to adjust keywords dynamically. The challenge is that these need to be specified ahead of knowing the forthcoming documents and the underlying topics. The future topics should mimic past topics of interest yet there should be some novelty in them. We devel… ▽ More Certain type of documents such as tweets are collected by specifying a set of keywords. As topics of interest change with time it is beneficial to adjust keywords dynamically. The challenge is that these need to be specified ahead of knowing the forthcoming documents and the underlying topics. The future topics should mimic past topics of interest yet there should be some novelty in them. We develop a keyword-based topic model that dynamically selects a subset of keywords to be used to collect future documents. The generative process first selects keywords and then the underlying documents based on the specified keywords. The model is trained by using a variational lower bound and stochastic gradient optimization. The inference consists of finding a subset of keywords where given a subset the model predicts the underlying topic-word matrix for the unknown forthcoming documents. We compare the keyword topic model against a benchmark model using viral predictions of tweets combined with a topic model. The keyword-based topic model outperforms this sophisticated baseline model by 67%. △ Less

Submitted 21 January, 2020; originally announced January 2020.

arXiv:2001.01828 [pdf, other]

doi 10.1145/3336191.3371814

Listwise Learning to Rank by Exploring Unique Ratings

Authors: Xiaofeng Zhu, Diego Klabjan

Abstract: In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can… ▽ More In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can lead to imprecise permutation probabilities and inefficient training because of selecting documents one by one. (2) It does not favor documents having high relevance. (3) It has a loose assumption that sampling documents at different steps is independent. To overcome the first two limitations, we model ranking as selecting documents from a candidate set based on unique rating levels in decreasing order. The number of steps in training is determined by the number of unique rating levels. We propose a new loss function and associated four models for the entire sequence of weighted classification tasks by assigning high weights to the selected documents with high ratings for optimizing Normalized Discounted Cumulative Gain (NDCG). To overcome the final limitation, we further propose a novel and efficient way of refining prediction scores by combining an adapted Vanilla Recurrent Neural Network (RNN) model with pooling given selected documents at previous steps. We encode all of the documents already selected by an RNN model. In a single step, we rank all of the documents with the same ratings using the last cell of the RNN multiple times. We have implemented our models using three settings: neural networks, neural networks with gradient boosting, and regression trees with gradient boosting. We have conducted experiments on four public datasets. The experiments demonstrate that the models notably outperform state-of-the-art learning-to-rank models. △ Less

Submitted 22 January, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Journal ref: WSDM 2020

arXiv:1911.12426 [pdf, other]

Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis

Authors: Adam Sandler, Diego Klabjan, Yuan Luo

Abstract: We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allo… ▽ More We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%. △ Less

Submitted 27 December, 2022; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: 38 pages, 8 figures, 5 tables

arXiv:1908.04209 [pdf, other]

doi 10.1109/BigData47090.2019.9005672

Mixture-based Multiple Imputation Model for Clinical Data with a Temporal Dimension

Authors: Ye Xue, Diego Klabjan, Yuan Luo

Abstract: The problem of missing values in multivariable time series is a key challenge in many applications such as clinical data mining. Although many imputation methods show their effectiveness in many applications, few of them are designed to accommodate clinical multivariable time series. In this work, we propose a multiple imputation model that capture both cross-sectional information and temporal cor… ▽ More The problem of missing values in multivariable time series is a key challenge in many applications such as clinical data mining. Although many imputation methods show their effectiveness in many applications, few of them are designed to accommodate clinical multivariable time series. In this work, we propose a multiple imputation model that capture both cross-sectional information and temporal correlations. We integrate Gaussian processes with mixture models and introduce individualized mixing weights to handle the variance of predictive confidence of Gaussian process models. The proposed model is compared with several state-of-the-art imputation algorithms on both real-world and synthetic datasets. Experiments show that our best model can provide more accurate imputation than the benchmarks on all of our datasets. △ Less

Submitted 2 March, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

arXiv:1906.11906 [pdf, other]

Data Extraction from Charts via Single Deep Neural Network

Authors: Xiaoyi Liu, Diego Klabjan, Patrick NBless

Abstract: Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognitio… ▽ More Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1905.10540 [pdf, other]

Dynamic Cell Structure via Recursive-Recurrent Neural Networks

Authors: Xin Qian, Matthew Kennedy, Diego Klabjan

Abstract: In a recurrent setting, conventional approaches to neural architecture search find and fix a general model for all data samples and time steps. We propose a novel algorithm that can dynamically search for the structure of cells in a recurrent neural network model. Based on a combination of recurrent and recursive neural networks, our algorithm is able to construct customized cell structures for ea… ▽ More In a recurrent setting, conventional approaches to neural architecture search find and fix a general model for all data samples and time steps. We propose a novel algorithm that can dynamically search for the structure of cells in a recurrent neural network model. Based on a combination of recurrent and recursive neural networks, our algorithm is able to construct customized cell structures for each data sample and time step, allowing for a more efficient architecture search than existing models. Experiments on three common datasets show that the algorithm discovers high-performance cell architectures and achieves better prediction accuracy compared to the GRU structure for language modelling and sentiment analysis. △ Less

Submitted 25 May, 2019; originally announced May 2019.

arXiv:1905.09882 [pdf, other]

Scale Invariant Power Iteration

Authors: Cheolmin Kim, Youngseok Kim, Diego Klabjan

Abstract: Power iteration has been generalized to solve many interesting problems in machine learning and statistics. Despite its striking success, theoretical understanding of when and how such an algorithm enjoys good convergence property is limited. In this work, we introduce a new class of optimization problems called scale invariant problems and prove that they can be efficiently solved by scale invari… ▽ More Power iteration has been generalized to solve many interesting problems in machine learning and statistics. Despite its striking success, theoretical understanding of when and how such an algorithm enjoys good convergence property is limited. In this work, we introduce a new class of optimization problems called scale invariant problems and prove that they can be efficiently solved by scale invariant power iteration (SCI-PI) with a generalized convergence guarantee of power iteration. By deriving that a stationary point is an eigenvector of the Hessian evaluated at the point, we show that scale invariant problems indeed resemble the leading eigenvector problem near a local optimum. Also, based on a novel reformulation, we geometrically derive SCI-PI which has a general form of power iteration. The convergence analysis shows that SCI-PI attains local linear convergence with a rate being proportional to the top two eigenvalues of the Hessian at the optimum. Moreover, we discuss some extended settings of scale invariant problems and provide similar convergence results for them. In numerical experiments, we introduce applications to independent component analysis, Gaussian mixtures, and non-negative matrix factorization. Experimental results demonstrate that SCI-PI is competitive to state-of-the-art benchmark algorithms and often yield better solutions. △ Less

Submitted 11 June, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

arXiv:1905.09356 [pdf, other]

Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network

Authors: Biyi Fang, Diego Klabjan

Abstract: Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not… ▽ More Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. We first introduce regret with rolling window, a new performance metric for online streaming learning, which measures the performance of an algorithm on every fixed number of contiguous samples. At the same time, we propose a family of algorithms based on gradient descent with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties of the algorithms. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide the first analysis of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with ReLU activation. In this case we establish that if initial weights are close to a stationary point, the same square root regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms. △ Less

Submitted 25 November, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

arXiv:1903.04360 [pdf, other]

Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data

Authors: Yiming Xu, Dnyanesh Rajpathak, Ian Gibbs, Diego Klabjan

Abstract: Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-s… ▽ More Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-stage classification system to automatically learn an ontology from unstructured text data. We first collect candidate concepts, which are classified into concepts and irrelevant collocates by our first classifier. The concepts from the first classifier are further classified by the second classifier into different concept types. The proposed system is deployed as a prototype at a company and its performance is validated by using complaint and repair verbatim data collected in automotive industry from different data sources. △ Less

Submitted 7 March, 2019; originally announced March 2019.

arXiv:1901.08179 [pdf, ps, other]

Stochastic Variance-Reduced Heavy Ball Power Iteration

Authors: Cheolmin Kim, Diego Klabjan

Abstract: We present a stochastic variance-reduced heavy ball power iteration algorithm for solving PCA and provide a convergence analysis for it. The algorithm is an extension of heavy ball power iteration, incorporating a step size so that progress can be controlled depending on the magnitude of the variance of stochastic gradients. The algorithm works with any size of the mini-batch, and if the step size… ▽ More We present a stochastic variance-reduced heavy ball power iteration algorithm for solving PCA and provide a convergence analysis for it. The algorithm is an extension of heavy ball power iteration, incorporating a step size so that progress can be controlled depending on the magnitude of the variance of stochastic gradients. The algorithm works with any size of the mini-batch, and if the step size is appropriately chosen, it attains global linear convergence to the first eigenvector of the covariance matrix in expectation. The global linear convergence result in expectation is analogous to those of stochastic variance-reduced gradient methods for convex optimization but due to non-convexity of PCA, it has never been shown for previous stochastic variants of power iteration since it requires very different techniques. We provide the first such analysis and stress that our framework can be used to establish convergence of the previous stochastic algorithms for any initial vector and in expectation. Experimental results show that the algorithm attains acceleration in a large batch regime, outperforming benchmark algorithms especially when the eigen-gap is small. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Showing 1–50 of 93 results for author: Klabjan, D