Search | arXiv e-print repository

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

Authors: Jaeyoung Cha, Jaewook Lee, Chulhee Yun

Abstract: We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components $n$ and the number of epochs $K$, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the… ▽ More We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components $n$ and the number of epochs $K$, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the condition number $κ$. For SGD with Random Reshuffling, we present lower bounds that have tighter $κ$ dependencies than existing bounds. Our results are the first to perfectly close the gap between lower and upper bounds for weighted average iterates in both strongly-convex and convex cases. We also prove weighted average iterate lower bounds for arbitrary permutation-based SGD, which apply to all variants that carefully choose the best permutation. Our bounds improve the existing bounds in factors of $n$ and $κ$ and thereby match the upper bounds shown for a recently proposed algorithm called GraB. △ Less

Submitted 9 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: 58 pages

arXiv:2202.07389 [pdf, other]

doi 10.1080/09332480.2022.2066414

Spam four ways: Making sense of text data

Authors: Nicholas J. Horton, Jie Chao, William Finzer, Phebe Palmer

Abstract: The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Act… ▽ More The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Activity, exploration with CODAP, modeling with a specially designed Shiny app, and coding more sophisticated analyses using R. The approaches vary in their use of technology and code but all share the common goal of using data to make better decisions and assessment of the accuracy of those decisions. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: in press, CHANCE

arXiv:2105.03855 [pdf]

GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers

Authors: Seung Jee Yang, Kyung Joon Cha

Abstract: Classification of imbalanced data is one of the common problems in the recent field of data mining. Imbalanced data substantially affects the performance of standard classification models. Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE). However, since the methods such as SMOTE generate instances by linear in… ▽ More Classification of imbalanced data is one of the common problems in the recent field of data mining. Imbalanced data substantially affects the performance of standard classification models. Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE). However, since the methods such as SMOTE generate instances by linear interpolation, synthetic data space may look like a polygonal. Also, the oversampling methods generate outliers of the minority class. In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets. To avoid linear interpolation and to consider outliers, this proposed method generates instances by the Gaussian Mixture Model. Motivated by clustering-based multivariate Gaussian outlier score (CMGOS), we propose to adapt tail probability of instances through the Mahalanobis distance to consider local outliers. The experiment was carried out on a representative set of benchmark datasets. The performance of the GMOTE is compared with other methods such as SMOTE. When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score. Experimental results demonstrate the robust performance. △ Less

Submitted 9 May, 2021; originally announced May 2021.

Comments: 20 pages, 6 figures

MSC Class: 62P99

arXiv:2011.08930 [pdf, other]

Distributed Online Learning with Multiple Kernels

Authors: Jeongmin Chae, Songnam Hong

Abstract: In the Internet-of-Things (IoT) systems, there are plenty of informative data provided by a massive number of IoT devices (e.g., sensors). Learning a function from such data is of great interest in machine learning tasks for IoT systems. Focusing on streaming (or sequential) data, we present a privacy-preserving distributed online learning framework with multiplekernels (named DOMKL). The proposed… ▽ More In the Internet-of-Things (IoT) systems, there are plenty of informative data provided by a massive number of IoT devices (e.g., sensors). Learning a function from such data is of great interest in machine learning tasks for IoT systems. Focusing on streaming (or sequential) data, we present a privacy-preserving distributed online learning framework with multiplekernels (named DOMKL). The proposed DOMKL is devised by leveraging the principles of an online alternating direction of multipliers (OADMM) and a distributed Hedge algorithm. We theoretically prove that DOMKL over T time slots can achieve an optimal sublinear regret, implying that every learned function achieves the performance of the best function in hindsight as in the state-of-the-art centralized online learning method. Moreover, it is ensured that the learned functions of any two neighboring learners have a negligible difference as T grows, i.e., the so-called consensus constraints hold. Via experimental tests with various real datasets, we verify the effectiveness of the proposed DOMKL on regression and time-series prediction tasks. △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2005.03188 [pdf, other]

Active Learning with Multiple Kernels

Authors: Songnam Hong, Jeongmin Chae

Abstract: Online multiple kernel learning (OMKL) has provided an attractive performance in nonlinear function learning tasks. Leveraging a random feature approximation, the major drawback of OMKL, known as the curse of dimensionality, has been recently alleviated. In this paper, we introduce a new research problem, termed (stream-based) active multiple kernel learning (AMKL), in which a learner is allowed t… ▽ More Online multiple kernel learning (OMKL) has provided an attractive performance in nonlinear function learning tasks. Leveraging a random feature approximation, the major drawback of OMKL, known as the curse of dimensionality, has been recently alleviated. In this paper, we introduce a new research problem, termed (stream-based) active multiple kernel learning (AMKL), in which a learner is allowed to label selected data from an oracle according to a selection criterion. This is necessary in many real-world applications as acquiring true labels is costly or time-consuming. We prove that AMKL achieves an optimal sublinear regret, implying that the proposed selection criterion indeed avoids unuseful label-requests. Furthermore, we propose AMKL with an adaptive kernel selection (AMKL-AKS) in which irrelevant kernels can be excluded from a kernel dictionary 'on the fly'. This approach can improve the efficiency of active learning as well as the accuracy of a function approximation. Via numerical tests with various real datasets, it is demonstrated that AMKL-AKS yields a similar or better performance than the best-known OMKL, with a smaller number of labeled data. △ Less

Submitted 6 May, 2020; originally announced May 2020.

arXiv:1909.12291 [pdf, other]

Exascale Deep Learning to Accelerate Cancer Research

Authors: Robert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Junghoon Chae, Le Hou, Shahira Abousamra, Dimitris Samaras, Joel Saltz

Abstract: Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend… ▽ More Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: Submitted to IEEE Big Data

arXiv:1906.00852 [pdf, other]

Hierarchical Auxiliary Learning

Authors: Jaehoon Cha, Kyeong Soo Kim, Sanghyuk Lee

Abstract: Conventional application of convolutional neural networks (CNNs) for image classification and recognition is based on the assumption that all target classes are equal(i.e., no hierarchy) and exclusive of one another (i.e., no overlap). CNN-based image classifiers built on this assumption, therefore, cannot take into account an innate hierarchy among target classes (e.g., cats and dogs in animal im… ▽ More Conventional application of convolutional neural networks (CNNs) for image classification and recognition is based on the assumption that all target classes are equal(i.e., no hierarchy) and exclusive of one another (i.e., no overlap). CNN-based image classifiers built on this assumption, therefore, cannot take into account an innate hierarchy among target classes (e.g., cats and dogs in animal image classification) or additional information that can be easily derived from the data (e.g.,numbers larger than five in the recognition of handwritten digits), thereby resulting in scalability issues when the number of target classes is large. Combining two related but slightly different ideas of hierarchical classification and logical learning by auxiliary inputs, we propose a new learning framework called hierarchical auxiliary learning, which not only address the scalability issues with a large number of classes but also could further reduce the classification/recognition errors with a reasonable number of classes. In the hierarchical auxiliary learning, target classes are semantically or non-semantically grouped into superclasses, which turns the original problem of map** between an image and its target class into a new problem of map** between a pair of an image and its superclass and the target class. To take the advantage of superclasses, we introduce an auxiliary block into a neural network, which generates auxiliary scores used as additional information for final classification/recognition; in this paper, we add the auxiliary block between the last residual block and the fully-connected output layer of the ResNet. Experimental results demonstrate that the proposed hierarchical auxiliary learning can reduce classification errors up to 0.56, 1.6 and 3.56 percent with MNIST, SVHN and CIFAR-10 datasets, respectively. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1901.08479 [pdf, other]

On the Transformation of Latent Space in Autoencoders

Authors: Jaehoon Cha, Kyeong Soo Kim, Sanghyuk Lee

Abstract: Noting the importance of the latent variables in inference and learning, we propose a novel framework for autoencoders based on the homeomorphic transformation of latent variables, which could reduce the distance between vectors in the transformed space, while preserving the topological properties of the original space, and investigate the effect of the latent space transformation on learning gene… ▽ More Noting the importance of the latent variables in inference and learning, we propose a novel framework for autoencoders based on the homeomorphic transformation of latent variables, which could reduce the distance between vectors in the transformed space, while preserving the topological properties of the original space, and investigate the effect of the latent space transformation on learning generative models and denoising corrupted data. The experimental results demonstrate that our generative and denoising models based on the proposed framework can provide better performance than conventional variational and denoising autoencoders due to the transformation, where we evaluate the performance of generative and denoising models in terms of the Hausdorff distance between the sets of training and processed i.e., either generated or denoised images, which can objectively measure their differences, as well as through direct comparison of the visual characteristics of the processed images. △ Less

Submitted 3 June, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

arXiv:1801.05755 [pdf]

Discussions on non-probabilistic convex modelling for uncertain problems

Authors: Ni Bingyu, Jiang Chao, Huang Zhiliang

Abstract: Non-probabilistic convex model utilizes a convex set to quantify the uncertainty domain of uncertain-but-bounded parameters, which is very effective for structural uncertainty analysis with limited or poor-quality experimental data. To overcome the complexity and diversity of the formulations of current convex models, in this paper, a unified framework for construction of the non-probabilistic con… ▽ More Non-probabilistic convex model utilizes a convex set to quantify the uncertainty domain of uncertain-but-bounded parameters, which is very effective for structural uncertainty analysis with limited or poor-quality experimental data. To overcome the complexity and diversity of the formulations of current convex models, in this paper, a unified framework for construction of the non-probabilistic convex models is proposed. By introducing the correlation analysis technique, the mathematical expression of a convex model can be conveniently formulated once the correlation matrix of the uncertain parameters is created. More importantly, from the theoretic analysis level, an evaluation criterion for convex modelling methods is proposed, which can be regarded as a test standard for validity verification of subsequent newly proposed convex modelling methods. And from the practical application level, two model assessment indexes are proposed, by which the adaptabilities of different convex models to a specific uncertain problem with given experimental samples can be estimated. Four numerical examples are investigated to demonstrate the effectiveness of the present study. △ Less

Submitted 29 November, 2017; originally announced January 2018.

arXiv:1710.06618

On Optimal Operational Sequence of Components in a Warm Standby System

Authors: M. Finkelstein, N. K. Hazra, J. H. Cha

Abstract: We consider an open problem of optimal operational sequence for the $1$-out-of-$n$ system with warm standby. Using the virtual age concept and the cumulative exposure model, we show that the components should be activated in accordance with the increasing sequence of their lifetimes. Lifetimes of the components and the system are compared with respect to the stochastic precedence order. Only speci… ▽ More We consider an open problem of optimal operational sequence for the $1$-out-of-$n$ system with warm standby. Using the virtual age concept and the cumulative exposure model, we show that the components should be activated in accordance with the increasing sequence of their lifetimes. Lifetimes of the components and the system are compared with respect to the stochastic precedence order. Only specific cases of this optimal problem were considered in the literature previously. △ Less

Submitted 6 December, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

Comments: The proof of one of the theorems is erroneous. Apart from this, there are some other technical issues

Showing 1–10 of 10 results for author: Cha, J