Skip to main content

Showing 1–50 of 54 results for author: Klabjan, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.10611  [pdf, other

    cs.LG stat.ML

    IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

    Authors: Taejong Joo, Diego Klabjan

    Abstract: Reasoning about a model's accuracy on a test sample from its confidence is a central problem in machine learning, being connected to important applications such as uncertainty representation, model selection, and exploration. While these connections have been well-studied in the i.i.d. settings, distribution shifts pose significant challenges to the traditional methods. Therefore, model calibratio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  2. arXiv:2308.08046  [pdf, ps, other

    cs.LG stat.ML

    Regret Lower Bounds in Multi-agent Multi-armed Bandit

    Authors: Mengfan Xu, Diego Klabjan

    Abstract: Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by reg… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 10 pages

  3. arXiv:2306.05579  [pdf, other

    cs.LG stat.ML

    Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

    Authors: Mengfan Xu, Diego Klabjan

    Abstract: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an a… ▽ More

    Submitted 17 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 58 pages, to appear at Advances in Neural Information Processing Systems (NeurIPS 2023 Spotlight)

  4. arXiv:2212.00884  [pdf, other

    cs.LG stat.ML

    Pareto Regret Analyses in Multi-objective Multi-armed Bandit

    Authors: Mengfan Xu, Diego Klabjan

    Abstract: We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both… ▽ More

    Submitted 30 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: 19 pages; accepted at ICML 2023 and to be published in Proceedings of Machine Learning Research (PMLR)

  5. arXiv:2102.00380  [pdf, other

    cs.LG stat.ML

    Classification Models for Partially Ordered Sequences

    Authors: Stephanie Ger, Diego Klabjan, Jean Utke

    Abstract: Many models such as Long Short Term Memory (LSTMs), Gated Recurrent Units (GRUs) and transformers have been developed to classify time series data with the assumption that events in a sequence are ordered. On the other hand, fewer models have been developed for set based inputs, where order does not matter. There are several use cases where data is given as partially-ordered sequences because of t… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  6. arXiv:2101.02561  [pdf, other

    stat.ML cs.AI cs.LG

    Open Set Domain Adaptation by Extreme Value Theory

    Authors: Yiming Xu, Diego Klabjan

    Abstract: Common domain adaptation techniques assume that the source domain and the target domain share an identical label space, which is problematic since when target samples are unlabeled we have no knowledge on whether the two domains share the same label space. When this is not the case, the existing methods fail to perform well because the additional unknown classes are also matched with the source do… ▽ More

    Submitted 22 December, 2020; originally announced January 2021.

  7. arXiv:2009.14111  [pdf, other

    cs.LG stat.ML

    Inverse Classification with Limited Budget and Maximum Number of Perturbed Samples

    Authors: Jaehoon Koo, Diego Klabjan, Jean Utke

    Abstract: Most recent machine learning research focuses on develo** new classifiers for the sake of improving classification accuracy. With many well-performing state-of-the-art classifiers available, there is a growing need for understanding interpretability of a classifier necessitated by practical purposes such as to find the best diet recommendation for a diabetes patient. Inverse classification is a… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  8. arXiv:2009.09538  [pdf, other

    cs.LG cs.AI stat.ML

    Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

    Authors: Mengfan Xu, Diego Klabjan

    Abstract: We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-t… ▽ More

    Submitted 3 May, 2024; v1 submitted 20 September, 2020; originally announced September 2020.

    Comments: 40 pages, 8 figures

  9. arXiv:2006.04027  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Efficient Architecture Search for Continual Learning

    Authors: Qiang Gao, Zhipeng Luo, Diego Klabjan

    Abstract: Continual learning with neural networks is an important learning framework in AI that aims to learn a sequence of tasks well. However, it is often confronted with three challenges: (1) overcome the catastrophic forgetting problem, (2) adapt the current network to new tasks, and meanwhile (3) control its model complexity. To reach these goals, we propose a novel approach named as Continual Learning… ▽ More

    Submitted 9 June, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: 12 pages, 11 figures

  10. arXiv:2006.02003  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Open-Set Recognition with Gaussian Mixture Variational Autoencoders

    Authors: Alexander Cao, Yuan Luo, Diego Klabjan

    Abstract: In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: 12 pages including 8 figures and 4 tables, plus 6 pages of supplementary material

  11. arXiv:2004.14203  [pdf, other

    cs.LG stat.ML

    Neural Network Retraining for Model Serving

    Authors: Diego Klabjan, Xiaofeng Zhu

    Abstract: We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference during model serving. As such, this is a life-long learning process. We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining. If we combine all past and new data it can easily become intractable to retrain the neural network model. On the… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  12. arXiv:2001.07866  [pdf, other

    stat.ML cs.IR cs.LG

    Keyword-based Topic Modeling and Keyword Selection

    Authors: Xingyu Wang, Lida Zhang, Diego Klabjan

    Abstract: Certain type of documents such as tweets are collected by specifying a set of keywords. As topics of interest change with time it is beneficial to adjust keywords dynamically. The challenge is that these need to be specified ahead of knowing the forthcoming documents and the underlying topics. The future topics should mimic past topics of interest yet there should be some novelty in them. We devel… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

  13. arXiv:2001.01828  [pdf, other

    cs.IR cs.LG stat.ML

    Listwise Learning to Rank by Exploring Unique Ratings

    Authors: Xiaofeng Zhu, Diego Klabjan

    Abstract: In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can… ▽ More

    Submitted 22 January, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

    Journal ref: WSDM 2020

  14. arXiv:1911.12426  [pdf, other

    cs.LG stat.ME stat.ML

    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis

    Authors: Adam Sandler, Diego Klabjan, Yuan Luo

    Abstract: We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allo… ▽ More

    Submitted 27 December, 2022; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: 38 pages, 8 figures, 5 tables

  15. Mixture-based Multiple Imputation Model for Clinical Data with a Temporal Dimension

    Authors: Ye Xue, Diego Klabjan, Yuan Luo

    Abstract: The problem of missing values in multivariable time series is a key challenge in many applications such as clinical data mining. Although many imputation methods show their effectiveness in many applications, few of them are designed to accommodate clinical multivariable time series. In this work, we propose a multiple imputation model that capture both cross-sectional information and temporal cor… ▽ More

    Submitted 2 March, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

  16. arXiv:1905.10540  [pdf, other

    cs.LG cs.NE stat.ML

    Dynamic Cell Structure via Recursive-Recurrent Neural Networks

    Authors: Xin Qian, Matthew Kennedy, Diego Klabjan

    Abstract: In a recurrent setting, conventional approaches to neural architecture search find and fix a general model for all data samples and time steps. We propose a novel algorithm that can dynamically search for the structure of cells in a recurrent neural network model. Based on a combination of recurrent and recursive neural networks, our algorithm is able to construct customized cell structures for ea… ▽ More

    Submitted 25 May, 2019; originally announced May 2019.

  17. arXiv:1905.09882  [pdf, other

    math.OC cs.LG stat.ML

    Scale Invariant Power Iteration

    Authors: Cheolmin Kim, Youngseok Kim, Diego Klabjan

    Abstract: Power iteration has been generalized to solve many interesting problems in machine learning and statistics. Despite its striking success, theoretical understanding of when and how such an algorithm enjoys good convergence property is limited. In this work, we introduce a new class of optimization problems called scale invariant problems and prove that they can be efficiently solved by scale invari… ▽ More

    Submitted 11 June, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

  18. arXiv:1905.09356  [pdf, other

    cs.LG cs.DS stat.ML

    Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network

    Authors: Biyi Fang, Diego Klabjan

    Abstract: Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not… ▽ More

    Submitted 25 November, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

  19. arXiv:1903.04360  [pdf, other

    cs.IR cs.LG stat.ML

    Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data

    Authors: Yiming Xu, Dnyanesh Rajpathak, Ian Gibbs, Diego Klabjan

    Abstract: Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-s… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  20. arXiv:1901.02514  [pdf, other

    cs.LG stat.ML

    Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification

    Authors: Stephanie Ger, Yegna Subramanian Jambunath, Diego Klabjan

    Abstract: Generative Adversarial Networks (GANs) have been used in many different applications to generate realistic synthetic data. We introduce a novel GAN with Autoencoder (GAN-AE) architecture to generate synthetic samples for variable length, multi-feature sequence datasets. In this model, we develop a GAN architecture with an additional autoencoder component, where recurrent neural networks (RNNs) are… ▽ More

    Submitted 6 October, 2022; v1 submitted 8 January, 2019; originally announced January 2019.

  21. arXiv:1812.02335  [pdf, other

    cs.LG stat.ML

    Layer Flexible Adaptive Computational Time

    Authors: Lida Zhang, Abdolghani Ebrahimi, Diego Klabjan

    Abstract: Deep recurrent neural networks perform well on sequence data and are the model of choice. However, it is a daunting task to decide the structure of the networks, i.e. the number of layers, especially considering different computational needs of a sequence. We propose a layer flexible recurrent neural network with adaptive computation time, and expand it to a sequence to sequence model. Different f… ▽ More

    Submitted 4 January, 2021; v1 submitted 5 December, 2018; originally announced December 2018.

    Comments: 11 pages, 5 figures

    ACM Class: I.2.6

  22. arXiv:1809.09574  [pdf, other

    cs.LG stat.ML

    Combined convolutional and recurrent neural networks for hierarchical classification of images

    Authors: Jaehoon Koo, Diego Klabjan, Jean Utke

    Abstract: Deep learning models based on CNNs are predominantly used in image classification tasks. Such approaches, assuming independence of object categories, normally use a CNN as a feature learner and apply a flat classifier on top of it. Object classes in many settings have hierarchical relations, and classifiers exploiting these relations should perform better. We propose hierarchical classification mo… ▽ More

    Submitted 18 November, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

  23. arXiv:1809.08717  [pdf, other

    stat.ML cs.LG

    Unified recurrent neural network for many feature types

    Authors: Alexander Stec, Diego Klabjan, Jean Utke

    Abstract: There are time series that are amenable to recurrent neural network (RNN) solutions when treated as sequences, but some series, e.g. asynchronous time series, provide a richer variation of feature types than current RNN cells take into account. In order to address such situations, we introduce a unified RNN that handles five different feature types, each in a different manner. Our RNN framework se… ▽ More

    Submitted 23 September, 2018; originally announced September 2018.

  24. arXiv:1808.10430  [pdf, other

    stat.ML cs.LG

    Nested multi-instance classification

    Authors: Alexander Stec, Diego Klabjan, Jean Utke

    Abstract: There are classification tasks that take as inputs groups of images rather than single images. In order to address such situations, we introduce a nested multi-instance deep network. The approach is generic in that it is applicable to general data instances, not just images. The network has several convolutional neural networks grouped together at different stages. This primarily differs from othe… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

  25. arXiv:1807.00425  [pdf, other

    cs.LG cs.AI stat.ML

    Dynamic Prediction Length for Time Series with Sequence to Sequence Networks

    Authors: Mark Harmon, Diego Klabjan

    Abstract: Recurrent neural networks and sequence to sequence models require a predetermined length for prediction output length. Our model addresses this by allowing the network to predict a variable length output in inference. A new loss function with a tailored gradient computation is developed that trades off prediction accuracy and output length. The model utilizes a function to determine whether a part… ▽ More

    Submitted 18 August, 2019; v1 submitted 1 July, 2018; originally announced July 2018.

  26. arXiv:1806.01486  [pdf, other

    stat.ML cs.LG

    Forecasting Crime with Deep Learning

    Authors: Alexander Stec, Diego Klabjan

    Abstract: The objective of this work is to take advantage of deep neural networks in order to make next day crime count predictions in a fine-grain city partition. We make predictions using Chicago and Portland crime data, which is augmented with additional datasets covering weather, census data, and public transportation. The crime counts are broken into 10 bins and our model predicts the most likely bin f… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

  27. arXiv:1805.01867  [pdf, other

    stat.ML cs.LG

    Bayesian active learning for choice models with deep Gaussian processes

    Authors: Jie Yang, Diego Klabjan

    Abstract: In this paper, we propose an active learning algorithm and models which can gradually learn individual's preference through pairwise comparisons. The active learning scheme aims at finding individual's most preferred choice with minimized number of pairwise comparisons. The pairwise comparisons are encoded into probabilistic models based on assumptions of choice models and deep Gaussian processes.… ▽ More

    Submitted 4 May, 2018; originally announced May 2018.

  28. arXiv:1804.11214  [pdf, other

    cs.LG cs.AI stat.ML

    k-Nearest Neighbors by Means of Sequence to Sequence Deep Neural Networks and Memory Networks

    Authors: Yiming Xu, Diego Klabjan

    Abstract: k-Nearest Neighbors is one of the most fundamental but effective classification models. In this paper, we propose two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequence of out-of-sample feature vectors and a final label for classification, and thus they could also function as ov… ▽ More

    Submitted 26 November, 2019; v1 submitted 27 April, 2018; originally announced April 2018.

  29. arXiv:1804.09812  [pdf, ps, other

    stat.ML cs.LG

    Improved Classification Based on Deep Belief Networks

    Authors: Jaehoon Koo, Diego Klabjan

    Abstract: For better classification generative models are used to initialize the model and model features before training a classifier. Typically it is needed to solve separate unsupervised and supervised learning problems. Generative restricted Boltzmann machines and deep belief networks are widely used for unsupervised learning. We developed several supervised models based on DBN in order to improve this… ▽ More

    Submitted 12 August, 2019; v1 submitted 25 April, 2018; originally announced April 2018.

  30. arXiv:1803.11287  [pdf, other

    stat.ML cs.LG

    A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

    Authors: Biyi Fang, Diego Klabjan

    Abstract: As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention, in particular when either observations or features are dist… ▽ More

    Submitted 8 December, 2019; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: 11 figures, 41 pages

  31. arXiv:1802.05786  [pdf

    cs.AI stat.ML

    Truth Validation with Evidence

    Authors: Papis Wongchaisuwat, Diego Klabjan

    Abstract: In the modern era, abundant information is easily accessible from various sources, however only a few of these sources are reliable as they mostly contain unverified contents. We develop a system to validate the truthfulness of a given statement together with underlying evidence. The proposed system provides supporting evidence when the statement is tagged as false. Our work relies on an inference… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: 40 pages (including Appendix), 3 tables, 3 figures

  32. arXiv:1801.02124  [pdf, ps, other

    stat.ML cs.LG

    Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

    Authors: Xingyu Wang, Diego Klabjan

    Abstract: This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for t… ▽ More

    Submitted 5 June, 2018; v1 submitted 6 January, 2018; originally announced January 2018.

    Comments: 31 pages, to be presented at ICML 2018

  33. arXiv:1711.09545  [pdf, other

    stat.CO stat.ML

    OSTSC: Over Sampling for Time Series Classification in R

    Authors: Matthew Dixon, Diego Klabjan, Lan Wei

    Abstract: The OSTSC package is a powerful oversampling approach for classifying univariant, but multinomial time series data in R. This article provides a brief overview of the oversampling methodology implemented by the package. A tutorial of the OSTSC package is provided. We begin by providing three test cases for the user to quickly validate the functionality in the package. To demonstrate the performanc… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

  34. A Simple and Fast Algorithm for L1-norm Kernel PCA

    Authors: Cheolmin Kim, Diego Klabjan

    Abstract: We present an algorithm for L1-norm kernel PCA and provide a convergence analysis for it. While an optimal solution of L2-norm kernel PCA can be obtained through matrix decomposition, finding that of L1-norm kernel PCA is not trivial due to its non-convexity and non-smoothness. We provide a novel reformulation through which an equivalent, geometrically interpretable problem is obtained. Based on t… ▽ More

    Submitted 11 June, 2020; v1 submitted 28 September, 2017; originally announced September 2017.

    Comments: 14 pages, 7 figures

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

  35. arXiv:1706.01833  [pdf

    stat.ML cs.LG q-fin.CP

    Online Adaptive Machine Learning Based Algorithm for Implied Volatility Surface Modeling

    Authors: Yaxiong Zeng, Diego Klabjan

    Abstract: In this work, we design a machine learning based method, online adaptive primal support vector regression (SVR), to model the implied volatility surface (IVS). The algorithm proposed is the first derivation and implementation of an online primal kernel SVR. It features enhancements that allow efficient online adaptive learning by embedding the idea of local fitness and budget maintenance to dynami… ▽ More

    Submitted 7 June, 2018; v1 submitted 6 June, 2017; originally announced June 2017.

    Comments: 34 Pages

  36. arXiv:1705.10033  [pdf, ps, other

    cs.LG stat.ML

    Improving the Expected Improvement Algorithm

    Authors: Chao Qin, Diego Klabjan, Daniel Russo

    Abstract: The expected improvement (EI) algorithm is a popular strategy for information collection in optimization under uncertainty. The algorithm is widely known to be too greedy, but nevertheless enjoys wide use due to its simplicity and ability to handle uncertainty and noise in a coherent decision theoretic framework. To provide rigorous insight into EI, we study its properties in a simple setting of B… ▽ More

    Submitted 29 May, 2017; originally announced May 2017.

    Comments: Submitted to NIPS 2017

  37. arXiv:1702.07790  [pdf, other

    stat.ML cs.LG

    Activation Ensembles for Deep Neural Networks

    Authors: Mark Harmon, Diego Klabjan

    Abstract: Many activation functions have been proposed in the past, but selecting an adequate one requires trial and error. We propose a new methodology of designing activation functions within a neural network at each layer. We call this technique an "activation ensemble" because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, $α$, at each… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

  38. arXiv:1702.05137  [pdf

    stat.ML cs.LG

    Semi-supervised Learning for Discrete Choice Models

    Authors: Jie Yang, Sergey Shebalov, Diego Klabjan

    Abstract: We introduce a semi-supervised discrete choice model to calibrate discrete choice models when relatively few requests have both choice sets and stated preferences but the majority only have the choice sets. Two classic semi-supervised learning algorithms, the expectation maximization algorithm and the cluster-and-label algorithm, have been adapted to our choice modeling problem setting. We also de… ▽ More

    Submitted 16 February, 2017; originally announced February 2017.

  39. Subset Selection for Multiple Linear Regression via Optimization

    Authors: Young Woong Park, Diego Klabjan

    Abstract: Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy-maximal-relevance criteria. The proposed models are tes… ▽ More

    Submitted 13 January, 2020; v1 submitted 26 January, 2017; originally announced January 2017.

    Journal ref: Journal of Global Optimization 77-3(2020): 543-574

  40. arXiv:1701.05654  [pdf, other

    stat.ML cs.DS

    Bayesian Network Learning via Topological Order

    Authors: Young Woong Park, Diego Klabjan

    Abstract: We propose a mixed integer programming (MIP) model and iterative algorithms based on topological orders to solve optimization problems with acyclic constraints on a directed graph. The proposed MIP model has a significantly lower number of constraints compared to popular MIP models based on cycle elimination constraints and triangular inequalities. The proposed iterative algorithms use gradient de… ▽ More

    Submitted 20 August, 2017; v1 submitted 19 January, 2017; originally announced January 2017.

    Journal ref: Journal of Machine Learning Research 18(99) 1-32, 2017

  41. arXiv:1610.10060  [pdf, other

    stat.ML cs.LG

    Optimization for Large-Scale Machine Learning with Distributed Features and Observations

    Authors: Alexandros Nathan, Diego Klabjan

    Abstract: As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention in the literature. Although previous research has mostly foc… ▽ More

    Submitted 14 April, 2017; v1 submitted 31 October, 2016; originally announced October 2016.

  42. arXiv:1609.04849  [pdf, other

    stat.ML cs.LG

    Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories

    Authors: Mark Harmon, Abdolghani Ebrahimi, Patrick Lucey, Diego Klabjan

    Abstract: In this paper, we predict the likelihood of a player making a shot in basketball from multiagent trajectories. Previous approaches to similar problems center on hand-crafting features to capture domain specific knowledge. Although intuitive, recent work in deep learning has shown this approach is prone to missing important predictive features. To circumvent this issue, we present a convolutional n… ▽ More

    Submitted 15 January, 2021; v1 submitted 15 September, 2016; originally announced September 2016.

  43. Iteratively Reweighted Least Squares Algorithms for L1-Norm Principal Component Analysis

    Authors: Young Woong Park, Diego Klabjan

    Abstract: Principal component analysis (PCA) is often used to reduce the dimension of data by selecting a few orthonormal vectors that explain most of the variance structure of the data. L1 PCA uses the L1 norm to measure error, whereas the conventional PCA uses the L2 norm. For the L1 PCA problem minimizing the fitting error of the reconstructed data, we propose an exact reweighted and an approximate algor… ▽ More

    Submitted 19 September, 2016; v1 submitted 10 September, 2016; originally announced September 2016.

    Journal ref: 2016 IEEE 16th International Conference on Data Mining, Barcelona, Spain (2016): 430-438

  44. arXiv:1607.03202  [pdf, other

    stat.ML cs.SI stat.AP

    Rapid Prediction of Player Retention in Free-to-Play Mobile Games

    Authors: Anders Drachen, Eric Thurston Lundquist, Yungjen Kung, Pranav Simha Rao, Diego Klabjan, Rafet Sifa, Julian Runge

    Abstract: Predicting and improving player retention is crucial to the success of mobile Free-to-Play games. This paper explores the problem of rapid retention prediction in this context. Heuristic modeling approaches are introduced as a way of building simple rules for predicting short-term retention. Compared to common classification algorithms, our heuristic-based approach achieves reasonable and comparab… ▽ More

    Submitted 11 July, 2016; originally announced July 2016.

    Comments: Draft Submitted to AIIDE-16. 7 pages, 5 figures, 3 tables

  45. Algorithms for Generalized Cluster-wise Linear Regression

    Authors: Young Woong Park, Yan Jiang, Diego Klabjan, Loren Williams

    Abstract: Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. We generalize the CLR problem by allowing each entity to have more than one observation, and refer to it as generalized CLR. W… ▽ More

    Submitted 11 July, 2016; v1 submitted 5 July, 2016; originally announced July 2016.

    Journal ref: INFORMS Journal on Computing 29-2(2017): 301 - 317

  46. An Aggregate and Iterative Disaggregate Algorithm with Proven Optimality in Machine Learning

    Authors: Young Woong Park, Diego Klabjan

    Abstract: We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent steps gradually disaggregate the aggregated data. We apply the algorithm to common machine learning problems such as the least absolute deviation regression problem… ▽ More

    Submitted 5 July, 2016; originally announced July 2016.

    Journal ref: Machine Learning 105 (2016) 199 - 232

  47. arXiv:1607.00706  [pdf

    stat.ML

    A Semi-supervised learning approach to enhance health care Community-based Question Answering: A case study in alcoholism

    Authors: Papis Wongchaisuwat, Diego Klabjan, Siddhartha R. Jonnalagadda

    Abstract: Community-based Question Answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for online health communities. In this study, we developed an algorithm to automatically answer health-related questions based on pas… ▽ More

    Submitted 3 July, 2016; originally announced July 2016.

    Comments: 28 pages, 6 figures, 4 tables

  48. arXiv:1603.07738  [pdf

    stat.ML

    Skill-Based Differences in Spatio-Temporal Team Behavior in Defence of The Ancients 2

    Authors: Anders Drachen, Matthew Yancey, John Maguire, Derrek Chu, Iris Yuhui Wang, Tobias Mahlmann, Matthias Schubert, Diego Klabjan

    Abstract: Multiplayer Online Battle Arena (MOBA) games are among the most played digital games in the world. In these games, teams of players fight against each other in arena environments, and the gameplay is focused on tactical combat. Mastering MOBAs requires extensive practice, as is exemplified in the popular MOBA Defence of the Ancients 2 (DotA 2). In this paper, we present three data-driven measures… ▽ More

    Submitted 24 March, 2016; originally announced March 2016.

    Journal ref: 6th IEEE Consumer Electronics Society Games, Entertainment, Media Conference, Toronto, 2014

  49. arXiv:1603.07692  [pdf

    cs.CY cs.HC stat.ML

    Predictive Analytics Using Smartphone Sensors for Depressive Episodes

    Authors: Taeheon Jeong, Diego Klabjan, Justin Starren

    Abstract: The behaviors of patients with depression are usually difficult to predict because the patients demonstrate the symptoms of a depressive episode without a warning at unexpected times. The goal of this research is to build algorithms that detect signals of such unusual moments so that doctors can be proactive in approaching already diagnosed patients before they fall in depression. Each patient is… ▽ More

    Submitted 24 March, 2016; originally announced March 2016.

    Comments: HIAI 2016, Expanding the Boundaries of Health Informatics using AI, Phoenix, AZ

  50. arXiv:1603.07624  [pdf

    cs.CL cs.IR cs.SI stat.ML

    Semantic Properties of Customer Sentiment in Tweets

    Authors: Eun Hee Ko, Diego Klabjan

    Abstract: An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the pu… ▽ More

    Submitted 24 March, 2016; originally announced March 2016.

    Comments: The 28th IEEE International Conference on Advanced Information Networking and Applications. Victoria, Canada, 2014