Search | arXiv e-print repository

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Abstract: We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole traject… ▽ More We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold. △ Less

Submitted 12 January, 2024; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: v2 fixed several mistakes. Some parts have been rewritten

arXiv:2303.03027 [pdf, other]

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

Authors: Pierre Bréchet, Katerina Papagiannouli, **g An, Guido Montúfar

Abstract: We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the g… ▽ More We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. The Hessian of this loss at low-rank matrices can theoretically blow up, which creates challenges to analyze convergence of gradient optimization methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights. △ Less

Submitted 13 July, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 42 pages, 3 figures, accepted at ICML 2023

arXiv:2103.02565 [pdf, other]

GLAMOUR: Graph Learning over Macromolecule Representations

Authors: Somesh Mohapatra, Joyce An, Rafael Gómez-Bombarelli

Abstract: The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed GLAMOUR, a fram… ▽ More The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed GLAMOUR, a framework for chemistry-informed graph representation of macromolecules that enables quantifying structural similarity, and interpretable supervised learning for macromolecules. △ Less

Submitted 23 August, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

Comments: Main text: 4 pages, 2 figures; Appendix: 33 pages, 46 figures, 7 in-text tables, 4 supplementary tables

ACM Class: J.2.4; J.3.1

arXiv:2009.13447 [pdf, other]

Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients

Authors: **g An, Lexing Ying, Yuhua Zhu

Abstract: A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for the bias. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of… ▽ More A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for the bias. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function. Though statistically equivalent, it has been observed that resampling outperforms reweighting when combined with stochastic gradient algorithms. By analyzing illustrative examples, we explain the reason behind this phenomenon using tools from dynamical stability and stochastic asymptotics. We also present experiments from regression, classification, and off-policy prediction to demonstrate that this is a general phenomenon. We argue that it is imperative to consider the objective function design and the optimization algorithm together while addressing the sampling bias. △ Less

Submitted 27 August, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2004.08581 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207111

Are You A Risk Taker? Adversarial Learning of Asymmetric Cross-Domain Alignment for Risk Tolerance Prediction

Authors: Zhe Liu, Lina Yao, Xianzhi Wang, Lei Bai, Jake An

Abstract: Most current studies on survey analysis and risk tolerance modelling lack professional knowledge and domain-specific models. Given the effectiveness of generative adversarial learning in cross-domain information, we design an Asymmetric cross-Domain Generative Adversarial Network (ADGAN) for domain scale inequality. ADGAN utilizes the information-sufficient domain to provide extra information to i… ▽ More Most current studies on survey analysis and risk tolerance modelling lack professional knowledge and domain-specific models. Given the effectiveness of generative adversarial learning in cross-domain information, we design an Asymmetric cross-Domain Generative Adversarial Network (ADGAN) for domain scale inequality. ADGAN utilizes the information-sufficient domain to provide extra information to improve the representation learning on the information-insufficient domain via domain alignment. We provide data analysis and user model on two data sources: Consumer Consumption Information and Survey Information. We further test ADGAN on a real-world dataset with view embedding structures and show ADGAN can better deal with the class imbalance and unqualified data space than state-of-the-art, demonstrating the effectiveness of leveraging asymmetrical domain information. △ Less

Submitted 18 April, 2020; originally announced April 2020.

arXiv:1911.08252 [pdf, other]

IC-Network: Efficient Structure for Convolutional Neural Networks

Authors: Junyi An, Fengshan Liu, Jian Zhao, Furao Shen

Abstract: Neural networks have been widely used, and most networks achieve excellent performance by stacking certain types of basic units. Compared to increasing the depth and width of the network, designing more effective basic units has become an important research topic. Inspired by the elastic collision model in physics, we present a universal structure that could be integrated into the existing network… ▽ More Neural networks have been widely used, and most networks achieve excellent performance by stacking certain types of basic units. Compared to increasing the depth and width of the network, designing more effective basic units has become an important research topic. Inspired by the elastic collision model in physics, we present a universal structure that could be integrated into the existing network structures to speed up the training process and increase their generalization abilities. We term this structure the "Inter-layer Collision" (IC) structure. We built two kinds of basic computational units (IC layer and IC block) that compose the convolutional neural networks (CNNs) by combining the IC structure with the convolution operation. Compared to traditional convolutions, both of the proposed computational units have a stronger non-linear representation ability and can filter features useful for a given task. Using these computational units to build networks, we bring significant improvements in performance for existing state-of-the-art CNNs. On the imagenet experiment, we integrate the IC block into ResNet-50 and reduce the top-1 error from 22.85% to 21.49%, which also exceeds the top-1 error of ResNet-100 (21.75%). △ Less

Submitted 4 June, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: 9 pages, 4 figures

arXiv:1909.02391 [pdf, other]

Data-driven simulation for general purpose multibody dynamics using deep neural networks

Authors: Hee-Sun Choi, Junmo An, **-Gyun Kim, Jae-Yoon Jung, Juhwan Choi, Grzegorz Orzechowski, Aki Mikkola, ** Hwan Choi

Abstract: In this paper, a machine learning-based simulation framework of general-purpose multibody dynamics is introduced. The aim of the framework is to generate a well-trained meta-model of multibody dynamics (MBD) systems. To this end, deep neural network (DNN) is employed to the framework so as to construct data-based meta-model representing multibody systems. Constructing well-defined training data se… ▽ More In this paper, a machine learning-based simulation framework of general-purpose multibody dynamics is introduced. The aim of the framework is to generate a well-trained meta-model of multibody dynamics (MBD) systems. To this end, deep neural network (DNN) is employed to the framework so as to construct data-based meta-model representing multibody systems. Constructing well-defined training data set with time variable is essential to get accurate and reliable motion data such as displacement, velocity, acceleration, and forces. As a result of the introduced approach, the meta-model provides motion estimation of system dynamics without solving the analytical equations of motion. The performance of the proposed DNN meta-modeling was evaluated to represent several MBD systems. △ Less

Submitted 2 September, 2019; originally announced September 2019.

Comments: 32 pages, 17 figures, 11 tables

arXiv:1907.05286 [pdf]

Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds

Authors: Bei Wang, Jian** An, Jiayan Cao

Abstract: Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-FPN, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts mul… ▽ More Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-FPN, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts multi-scale voxel information in a bottom-up manner while decoder fuses multiple feature maps from various scales in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios. △ Less

Submitted 16 July, 2019; v1 submitted 28 June, 2019; originally announced July 2019.

arXiv:1811.00246 [pdf, other]

SARN: Relational Reasoning through Sequential Attention

Authors: **won An, Sungwon Lyu, Sungzoon Cho

Abstract: This paper proposes an attention module augmented relational network called SARN(Sequential Attention Relational Network) that can carry out relational reasoning by extracting reference objects and making efficient pairing between objects. SARN greatly reduces the computational and memory requirements of the relational network, which computes all object pairs. It also shows high accuracy on the So… ▽ More This paper proposes an attention module augmented relational network called SARN(Sequential Attention Relational Network) that can carry out relational reasoning by extracting reference objects and making efficient pairing between objects. SARN greatly reduces the computational and memory requirements of the relational network, which computes all object pairs. It also shows high accuracy on the Sort-of-CLEVR dataset compared to other models, especially on relational questions. △ Less

Submitted 1 November, 2018; originally announced November 2018.

arXiv:1805.08244 [pdf, other]

doi 10.1093/imaiai/iaz030

Stochastic modified equations for the asynchronous stochastic gradient descent

Authors: **g An, Jianfeng Lu, Lexing Ying

Abstract: We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's prec… ▽ More We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's precise prediction to the trajectories of ASGD with various forcing terms. As an application of the SME, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME. △ Less

Submitted 27 September, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: Final version. To appear in Information and Inference

arXiv:1612.05021 [pdf, other]

Dynamic Modeling of Price Responsive Demand in Real-time Electricity Market: Empirical Analysis

Authors: Jaeyong An, P. R. Kumar, Le Xie

Abstract: In this paper, we study the price responsiveness of electricity consumption from empirical commercial and industrial load data obtained from Texas. Employing a dynamical system perspective, we show that price responsive demand can be modeled as a hybrid of a Hammerstein model with delay following a price surge, and a linear ARX model under moderate price changes. It is observed that electricity co… ▽ More In this paper, we study the price responsiveness of electricity consumption from empirical commercial and industrial load data obtained from Texas. Employing a dynamical system perspective, we show that price responsive demand can be modeled as a hybrid of a Hammerstein model with delay following a price surge, and a linear ARX model under moderate price changes. It is observed that electricity consumption therefore has unique characteristics including (1) qualitatively distinct response between moderate and extremely high prices; and (2) a time delay associated with the response to high prices. It is shown that these observed features may render traditional approaches to demand response and retail pricing based on classical economic theories ineffective. In particular, ultimate real-time retail pricing may be limitedly beneficial than as considered in classical economic theories. △ Less

Submitted 15 December, 2016; originally announced December 2016.

arXiv:1409.7489 [pdf, other]

Recommending Investors for Crowdfunding Projects

Authors: Jisun An, Daniele Quercia, Jon Crowcroft

Abstract: To bring their innovative ideas to market, those embarking in new ventures have to raise money, and, to do so, they have often resorted to banks and venture capitalists. Nowadays, they have an additional option: that of crowdfunding. The name refers to the idea that funds come from a network of people on the Internet who are passionate about supporting others' projects. One of the most popular cro… ▽ More To bring their innovative ideas to market, those embarking in new ventures have to raise money, and, to do so, they have often resorted to banks and venture capitalists. Nowadays, they have an additional option: that of crowdfunding. The name refers to the idea that funds come from a network of people on the Internet who are passionate about supporting others' projects. One of the most popular crowdfunding sites is Kickstarter. In it, creators post descriptions of their projects and advertise them on social media sites (mainly Twitter), while investors look for projects to support. The most common reason for project failure is the inability of founders to connect with a sufficient number of investors, and that is mainly because hitherto there has not been any automatic way of matching creators and investors. We thus set out to propose different ways of recommending investors found on Twitter for specific Kickstarter projects. We do so by conducting hypothesis-driven analyses of pledging behavior and translate the corresponding findings into different recommendation strategies. The best strategy achieves, on average, 84% of accuracy in predicting a list of potential investors' Twitter accounts for any given project. Our findings also produced key insights about the whys and wherefores of investors deciding to support innovative efforts. △ Less

Submitted 12 October, 2014; v1 submitted 26 September, 2014; originally announced September 2014.

Comments: Published in Proc. of WWW 2014

arXiv:1308.4017 [pdf, other]

A Study on Stroke Rehabilitation through Task-Oriented Control of a Haptic Device via Near-Infrared Spectroscopy-Based BCI

Authors: Berdakh Abibullaev, **ung An, Seung-Hyun Lee, Jeon-Il Moon

Abstract: This paper presents a study in task-oriented approach to stroke rehabilitation by controlling a haptic device via near-infrared spectroscopy-based brain-computer interface (BCI). The task is to command the haptic device to move in opposing directions of leftward and rightward movement. Our study consists of data acquisition, signal preprocessing, and classification. In data acquisition, we conduct… ▽ More This paper presents a study in task-oriented approach to stroke rehabilitation by controlling a haptic device via near-infrared spectroscopy-based brain-computer interface (BCI). The task is to command the haptic device to move in opposing directions of leftward and rightward movement. Our study consists of data acquisition, signal preprocessing, and classification. In data acquisition, we conduct experiments based on two different mental tasks: one on pure motor imagery, and another on combined motor imagery and action observation. The experiments were conducted in both offline and online modes. In the signal preprocessing, we use localization method to eliminate channels that are irrelevant to the mental task, as well as perform feature extraction for subsequent classification. We propose multiple support vector machine classifiers with a majority-voting scheme for improved classification results. And lastly, we present test results to demonstrate the efficacy of our proposed approach to possible stroke rehabilitation practice. △ Less

Submitted 14 April, 2014; v1 submitted 19 August, 2013; originally announced August 2013.

Comments: 13 pages, 6 figures

arXiv:1209.5467

Minimizing inter-subject variability in fNIRS based Brain Computer Interfaces via multiple-kernel support vector learning

Authors: Berdakh Abibullaev, **ung An, Seung-Hyun Lee, Sang-Hyeon **, Jeon-Il Moon

Abstract: Brain signal variability in the measurements obtained from different subjects during different sessions significantly deteriorates the accuracy of most brain-computer interface (BCI) systems. Moreover these variabilities, also known as inter-subject or inter-session variabilities, require lengthy calibration sessions before the BCI system can be used. Furthermore, the calibration session has to be… ▽ More Brain signal variability in the measurements obtained from different subjects during different sessions significantly deteriorates the accuracy of most brain-computer interface (BCI) systems. Moreover these variabilities, also known as inter-subject or inter-session variabilities, require lengthy calibration sessions before the BCI system can be used. Furthermore, the calibration session has to be repeated for each subject independently and before use of the BCI due to the inter-session variability. In this study, we present an algorithm in order to minimize the above-mentioned variabilities and to overcome the time-consuming and usually error-prone calibration time. Our algorithm is based on linear programming support-vector machines and their extensions to a multiple kernel learning framework. We tackle the inter-subject or -session variability in the feature spaces of the classifiers. This is done by incorporating each subject- or session-specific feature spaces into much richer feature spaces with a set of optimal decision boundaries. Each decision boundary represents the subject- or a session specific spatio-temporal variabilities of neural signals. Consequently, a single classifier with multiple feature spaces will generalize well to new unseen test patterns even without the calibration steps. We demonstrate that classifiers maintain good performances even under the presence of a large degree of BCI variability. The present study analyzes BCI variability related to oxy-hemoglobin neural signals measured using a functional near-infrared spectroscopy. △ Less

Submitted 7 May, 2013; v1 submitted 24 September, 2012; originally announced September 2012.

Comments: This paper has been withdrawn by the author due to an error in equation 19

Showing 1–14 of 14 results for author: An, J