Search | arXiv e-print repository

Agile gesture recognition for low-power applications: customisation for generalisation

Authors: Ying Liu, Liucheng Guo, Valeri A. Makarovc, Alexander Gorbana, Evgeny Mirkesa, Ivan Y. Tyukin

Abstract: Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that op… ▽ More Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that operate on low-power sensor devices. This is due to the rising concerns for data leakage and end-user privacy, as well as the limited battery capacity and the computing power in low-cost devices. Moreover, the challenge in data collection for individually designed hardware also hinders the generalisation of a gesture recognition model. In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction, designed to enhance the performance of legacy gesture recognition models on devices with limited battery capacity and computing power. This system comprises a compact Support Vector Machine as the base model for live gesture recognition. Additionally, it features an adaptive agile error corrector that employs few-shot learning within the feature space induced by high-dimensional kernel map**s. The error corrector can be customised for each user, allowing for dynamic adjustments to the gesture prediction based on their movement patterns while maintaining the agile performance of its base model on a low-cost and low-power micro-controller. This proposed system is distinguished by its compact size, rapid processing speed, and low power consumption, making it ideal for a wide range of embedded systems. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07213 [pdf, other]

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Authors: Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li

Abstract: Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a cos… ▽ More Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a costly API-based LLM or a locally finetuned small LLM, weighing cost against performance. Traditional selection methods often evaluate every candidate model before choosing one, which are becoming impractical given the rising costs of training and finetuning LLMs. Moreover, it is undesirable to allocate excessive resources towards exploring poor-performing models. While some recent works leverage online bandit algorithm to manage such exploration-exploitation trade-off in model selection, they tend to overlook the increasing-then-converging trend in model performances as the model is iteratively finetuned, leading to less accurate predictions and suboptimal model selections. In this paper, we propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances due to finetuning and efficiently balances exploration and exploitation in model selection. To further capture the converging points of models, we develop a change detection mechanism by comparing consecutive increase predictions. We theoretically prove that our algorithm achieves a logarithmic regret upper bound in a typical increasing bandit setting, which implies a fast convergence rate. The advantage of our method is also empirically validated through extensive experiments on classification model selection and online selection of LLMs. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection in the deployment of LLMs. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Accepted by WWW'24 (Oral)

arXiv:2401.02080 [pdf, other]

Energy based diffusion generator for efficient sampling of Boltzmann distributions

Authors: Yan Wang, Ling Guo, Hao Wu, Tao Zhou

Abstract: We introduce a novel sampler called the energy based diffusion generator for generating samples from arbitrary target distributions. The sampling model employs a structure similar to a variational autoencoder, utilizing a decoder to transform latent variables from a simple distribution into random variables approximating the target distribution, and we design an encoder based on the diffusion mode… ▽ More We introduce a novel sampler called the energy based diffusion generator for generating samples from arbitrary target distributions. The sampling model employs a structure similar to a variational autoencoder, utilizing a decoder to transform latent variables from a simple distribution into random variables approximating the target distribution, and we design an encoder based on the diffusion model. Leveraging the powerful modeling capacity of the diffusion model for complex distributions, we can obtain an accurate variational estimate of the Kullback-Leibler divergence between the distributions of the generated samples and the target. Moreover, we propose a decoder based on generalized Hamiltonian dynamics to further enhance sampling performance. Through empirical evaluation, we demonstrate the effectiveness of our method across various complex distribution functions, showcasing its superiority compared to existing methods. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2311.18506 [pdf, other]

Global Convergence of Online Identification for Mixed Linear Regression

Authors: Yu**g Liu, Zhixin Liu, Lei Guo

Abstract: Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper invest… ▽ More Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper investigates the online identification and data clustering problems for two basic classes of MLRs, by introducing two corresponding new online identification algorithms based on the expectation-maximization (EM) principle. It is shown that both algorithms will converge globally without resorting to the traditional i.i.d data assumptions. The main challenge in our investigation lies in the fact that the gradient of the maximum likelihood function does not have a unique zero, and a key step in our analysis is to establish the stability of the corresponding differential equation in order to apply the celebrated Ljung's ODE method. It is also shown that the within-cluster error and the probability that the new data is categorized into the correct cluster are asymptotically the same as those in the case of known parameters. Finally, numerical simulations are provided to verify the effectiveness of our online algorithms. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.07983 [pdf, other]

Revisiting Decentralized ProxSkip: Achieving Linear Speedup

Authors: Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, **de Cao

Abstract: The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect… ▽ More The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect to the number of nodes. So far, questions remain open about how ProxSkip behaves in the non-convex setting and whether linear speedup is achievable. In this paper, we revisit decentralized ProxSkip and address both questions. We demonstrate that the leading communication complexity of ProxSkip is $\mathcal{O}\left(\frac{pσ^2}{nε^2}\right)$ for non-convex and convex settings, and $\mathcal{O}\left(\frac{pσ^2}{nε}\right)$ for the strongly convex setting, where $n$ represents the number of nodes, $p$ denotes the probability of communication, $σ^2$ signifies the level of stochastic noise, and $ε$ denotes the desired accuracy level. This result illustrates that ProxSkip achieves linear speedup and can asymptotically reduce communication overhead proportional to the probability of communication. Additionally, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes. △ Less

Submitted 19 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2305.07625 [pdf, other]

Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn

Authors: Ondrej Bohdal, Yinbing Tian, Yongshuo Zong, Ruchika Chavhan, Da Li, Henry Gouk, Li Guo, Timothy Hospedales

Abstract: Meta-learning and other approaches to few-shot learning are widely studied for image recognition, and are increasingly applied to other vision tasks such as pose estimation and dense prediction. This naturally raises the question of whether there is any few-shot meta-learning algorithm capable of generalizing across these diverse task types? To support the community in answering this question, we… ▽ More Meta-learning and other approaches to few-shot learning are widely studied for image recognition, and are increasingly applied to other vision tasks such as pose estimation and dense prediction. This naturally raises the question of whether there is any few-shot meta-learning algorithm capable of generalizing across these diverse task types? To support the community in answering this question, we introduce Meta Omnium, a dataset-of-datasets spanning multiple vision tasks including recognition, keypoint localization, semantic segmentation and regression. We experiment with popular few-shot meta-learning baselines and analyze their ability to generalize across tasks and to transfer knowledge between them. Meta Omnium enables meta-learning researchers to evaluate model generalization to a much wider array of tasks than previously possible, and provides a single framework for evaluating meta-learners across a wide suite of vision applications in a consistent manner. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted at CVPR 2023. Project page: https://edi-meta-learning.github.io/meta-omnium

arXiv:2303.17758 [pdf, other]

Commuter Count: Inferring Travel Patterns from Location Data

Authors: Nathan Musoke, Emily Kendall, Mateja Gosenca, Lillian Guo, Lerh Feng Low, Angela Xue, Richard Easther

Abstract: In this Working Paper we analyse computational strategies for using aggregated spatio-temporal population data acquired from telecommunications networks to infer travel and movement patterns between geographical regions. Specifically, we focus on hour-by-hour cellphone counts for the SA-2 geographical regions covering the whole of New Zealand. This Working Paper describes the implementation of the… ▽ More In this Working Paper we analyse computational strategies for using aggregated spatio-temporal population data acquired from telecommunications networks to infer travel and movement patterns between geographical regions. Specifically, we focus on hour-by-hour cellphone counts for the SA-2 geographical regions covering the whole of New Zealand. This Working Paper describes the implementation of the inference algorithms, their ability to produce models of travel patterns during the day, and lays out opportunities for future development. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: Submitted to Covid-19 Modelling Aotearoa

arXiv:2008.08931 [pdf, other]

doi 10.1145/3340531.3412681

A Deep Prediction Network for Understanding Advertiser Intent and Satisfaction

Authors: Liyi Guo, Rui Lu, Haoqi Zhang, Junqi **, Zhenzhe Zheng, Fan Wu, ** Li, Haiyang Xu, Han Li, Wenkai Lu, Jian Xu, Kun Gai

Abstract: For e-commerce platforms such as Taobao and Amazon, advertisers play an important role in the entire digital ecosystem: their behaviors explicitly influence users' browsing and shop** experience; more importantly, advertiser's expenditure on advertising constitutes a primary source of platform revenue. Therefore, providing better services for advertisers is essential for the long-term prosperity… ▽ More For e-commerce platforms such as Taobao and Amazon, advertisers play an important role in the entire digital ecosystem: their behaviors explicitly influence users' browsing and shop** experience; more importantly, advertiser's expenditure on advertising constitutes a primary source of platform revenue. Therefore, providing better services for advertisers is essential for the long-term prosperity for e-commerce platforms. To achieve this goal, the ad platform needs to have an in-depth understanding of advertisers in terms of both their marketing intents and satisfaction over the advertising performance, based on which further optimization could be carried out to service the advertisers in the correct direction. In this paper, we propose a novel Deep Satisfaction Prediction Network (DSPN), which models advertiser intent and satisfaction simultaneously. It employs a two-stage network structure where advertiser intent vector and satisfaction are jointly learned by considering the features of advertiser's action information and advertising performance indicators. Experiments on an Alibaba advertisement dataset and online evaluations show that our proposed DSPN outperforms state-of-the-art baselines and has stable performance in terms of AUC in the online environment. Further analyses show that DSPN not only predicts advertisers' satisfaction accurately but also learns an explainable advertiser intent, revealing the opportunities to optimize the advertising performance further. △ Less

Submitted 20 August, 2020; originally announced August 2020.

Journal ref: CIKM 2020, Virtual Event, Ireland

arXiv:2007.01231 [pdf, other]

Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs

Authors: Kian Ahrabian, Daniel Tarlow, Hehuimin Cheng, ** L. C. Guo

Abstract: We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest social coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link pred… ▽ More We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest social coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link prediction and ii) extrapolated time-conditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events. △ Less

Submitted 12 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: 11 pages, 1 figure. 37th International Conference on Machine Learning (ICML 2020) - Workshop on Graph Representation Learning and Beyond

arXiv:2006.16865 [pdf, other]

doi 10.5194/nhess-21-2109-2021

Space-time clustering of flash floods in a changing climate (China, 1950-2015)

Authors: Nan Wang, Luigi Lombardo, Marj Tonini, Weiming Cheng, Liang Guo, Junnan Xiong

Abstract: The persistence over space and time of flash flood disasters -- flash floods that have caused either economical or life losses, or both -- is a diagnostic measure of areas subjected to hydrological risk. The concept of persistence can be assessed via clustering analyses, performed here to analyse the national inventory of flash floods disasters in China occurred in the period 1950-2015. Specifical… ▽ More The persistence over space and time of flash flood disasters -- flash floods that have caused either economical or life losses, or both -- is a diagnostic measure of areas subjected to hydrological risk. The concept of persistence can be assessed via clustering analyses, performed here to analyse the national inventory of flash floods disasters in China occurred in the period 1950-2015. Specifically, we investigated the spatio-temporal pattern distribution of the flash floods and their clustering behavior by using both global and local methods: the first, based on the Ripley's K-function, and the second on scan statistics. As a result, we could visualize patterns of aggregated events, estimate the cluster duration and make assumptions about their evolution over time, also with respect precipitations trend. Due to the large spatial (the whole Chinese territory) and temporal scale of the dataset (66 years), we were able to capture whether certain clusters gather in specific locations and times, but also whether their magnitude tends to increase or decrease. Overall, the eastern regions in China are much more subjected to flash floods compared to the rest of the country. Detected clusters revealed that these phenomena predominantly occur between July and October, a period coinciding with the wet season in China. The number of detected clusters increases with time, but the associated duration drastically decreases in the recent period. This may indicate a change towards triggering mechanisms which are typical of short-duration extreme rainfall events. Finally, being flash floods directly linked to precipitation and their extreme realization, we indirectly assessed whether the magnitude of the trigger itself has also varied through space and time, enabling considerations in the context of climatic changes. △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:2005.11560 [pdf, ps, other]

Adversarial Attack on Hierarchical Graph Pooling Neural Networks

Authors: Haoteng Tang, Guixiang Ma, Yurong Chen, Lei Guo, Wei Wang, Bo Zeng, Liang Zhan

Abstract: Recent years have witnessed the emergence and development of graph neural networks (GNNs), which have been shown as a powerful approach for graph representation learning in many tasks, such as node classification and graph classification. The research on the robustness of these models has also started to attract attentions in the machine learning field. However, most of the existing work in this a… ▽ More Recent years have witnessed the emergence and development of graph neural networks (GNNs), which have been shown as a powerful approach for graph representation learning in many tasks, such as node classification and graph classification. The research on the robustness of these models has also started to attract attentions in the machine learning field. However, most of the existing work in this area focus on the GNNs for node-level tasks, while little work has been done to study the robustness of the GNNs for the graph classification task. In this paper, we aim to explore the vulnerability of the Hierarchical Graph Pooling (HGP) Neural Networks, which are advanced GNNs that perform very well in the graph classification in terms of prediction accuracy. We propose an adversarial attack framework for this task. Specifically, we design a surrogate model that consists of convolutional and pooling operators to generate adversarial samples to fool the hierarchical GNN-based graph classification models. We set the preserved nodes by the pooling operator as our attack targets, and then we perturb the attack targets slightly to fool the pooling operator in hierarchical GNNs so that they will select the wrong nodes to preserve. We show the adversarial samples generated from multiple datasets by our surrogate model have enough transferability to attack current state-of-art graph classification models. Furthermore, we conduct the robust train on the target models and demonstrate that the retrained graph classification models are able to better defend against the attack from the adversarial samples. To the best of our knowledge, this is the first work on the adversarial attack against hierarchical GNN-based graph classification models. △ Less

Submitted 23 May, 2020; originally announced May 2020.

arXiv:2001.09027 [pdf, other]

Weakly Supervised Learning Meets Ride-Sharing User Experience Enhancement

Authors: Lan-Zhe Guo, Feng Kuang, Zhang-Xun Liu, Yu-Feng Li, Nan Ma, Xiao-Hu Qie

Abstract: Weakly supervised learning aims at co** with scarce labeled data. Previous weakly supervised studies typically assume that there is only one kind of weak supervision in data. In many applications, however, raw data usually contains more than one kind of weak supervision at the same time. For example, in user experience enhancement from Didi, one of the largest online ride-sharing platforms, the… ▽ More Weakly supervised learning aims at co** with scarce labeled data. Previous weakly supervised studies typically assume that there is only one kind of weak supervision in data. In many applications, however, raw data usually contains more than one kind of weak supervision at the same time. For example, in user experience enhancement from Didi, one of the largest online ride-sharing platforms, the ride comment data contains severe label noise (due to the subjective factors of passengers) and severe label distribution bias (due to the sampling bias). We call such a problem as "compound weakly supervised learning". In this paper, we propose the CWSL method to address this problem based on Didi ride-sharing comment data. Specifically, an instance reweighting strategy is employed to cope with severe label noise in comment data, where the weights for harmful noisy instances are small. Robust criteria like AUC rather than accuracy and the validation performance are optimized for the correction of biased data label. Alternating optimization and stochastic gradient methods accelerate the optimization on large-scale data. Experiments on Didi ride-sharing comment data clearly validate the effectiveness. We hope this work may shed some light on applying weakly supervised learning to complex real situations. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: AAAI 2020

arXiv:1912.04783 [pdf, other]

Frivolous Units: Wider Networks Are Not Really That Wide

Authors: Stephen Casper, Xavier Boix, Vanessa D'Amario, Ling Guo, Martin Schrimpf, Kasper Vinken, Gabriel Kreiman

Abstract: A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy does not degrade when the network's width is increased. Recent evidence suggests that develo** compressible representations is key for adjusting the complexity of large networks to the learning task at hand. However, these compressible representations are poorly understood. A promising strand of r… ▽ More A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy does not degrade when the network's width is increased. Recent evidence suggests that develo** compressible representations is key for adjusting the complexity of large networks to the learning task at hand. However, these compressible representations are poorly understood. A promising strand of research inspired from biology is understanding representations at the unit level as it offers a more granular and intuitive interpretation of the neural mechanisms. In order to better understand what facilitates increases in width without decreases in accuracy, we ask: Are there mechanisms at the unit level by which networks control their effective complexity as their width is increased? If so, how do these depend on the architecture, dataset, and training parameters? We identify two distinct types of "frivolous" units that proliferate when the network's width is increased: prunable units which can be dropped out of the network without significant change to the output and redundant units whose activities can be expressed as a linear combination of others. These units imply complexity constraints as the function the network represents could be expressed by a network without them. We also identify how the development of these units can be influenced by architecture and a number of training factors. Together, these results help to explain why the accuracy of DNNs does not degrade when width is increased and highlight the importance of frivolous units toward understanding implicit regularization in DNNs. △ Less

Submitted 31 May, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2021

arXiv:1907.06582 [pdf, other]

AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data

Authors: Zheng Gao, Lin Guo, Chi Ma, Xiao Ma, Kai Sun, Hang Xiang, Xiaoqiang Zhu, Hongsong Li, Xiaozhong Liu

Abstract: Anomaly detection is facing with emerging challenges in many important industry domains, such as cyber security and online recommendation and advertising. The recent trend in these areas calls for anomaly detection on time-evolving data with high-dimensional categorical features without labeled samples. Also, there is an increasing demand for identifying and monitoring irregular patterns at multip… ▽ More Anomaly detection is facing with emerging challenges in many important industry domains, such as cyber security and online recommendation and advertising. The recent trend in these areas calls for anomaly detection on time-evolving data with high-dimensional categorical features without labeled samples. Also, there is an increasing demand for identifying and monitoring irregular patterns at multiple resolutions. In this work, we propose a unified end-to-end approach to solve these challenges by combining the advantages of Adversarial Autoencoder and Recurrent Neural Network. The model learns data representations cross different scales with attention mechanisms, on which an enhanced two-resolution anomaly detector is developed for both instances and data blocks. Extensive experiments are performed over three types of datasets to demonstrate the efficacy of our method and its superiority over the state-of-art approaches. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: Accepted by 2019 KDD Workshop on Deep Learning Practice for High-Dimensional Sparse Data

arXiv:1907.02437 [pdf, other]

Subsampling Bias and The Best-Discrepancy Systematic Cross Validation

Authors: Liang Guo, Jianya Liu, Ruodan Lu

Abstract: Statistical machine learning models should be evaluated and validated before putting to work. Conventional k-fold Monte Carlo Cross-Validation (MCCV) procedure uses a pseudo-random sequence to partition instances into k subsets, which usually causes subsampling bias, inflates generalization errors and jeopardizes the reliability and effectiveness of cross-validation. Based on ordered systematic sa… ▽ More Statistical machine learning models should be evaluated and validated before putting to work. Conventional k-fold Monte Carlo Cross-Validation (MCCV) procedure uses a pseudo-random sequence to partition instances into k subsets, which usually causes subsampling bias, inflates generalization errors and jeopardizes the reliability and effectiveness of cross-validation. Based on ordered systematic sampling theory in statistics and low-discrepancy sequence theory in number theory, we propose a new k-fold cross-validation procedure by replacing a pseudo-random sequence with a best-discrepancy sequence, which ensures low subsampling bias and leads to more precise Expected-Prediction-Error estimates. Experiments with 156 benchmark datasets and three classifiers (logistic regression, decision tree and naive bayes) show that in general, our cross-validation procedure can extrude subsampling bias in the MCCV by lowering the EPE around 7.18% and the variances around 26.73%. In comparison, the stratified MCCV can reduce the EPE and variances of the MCCV around 1.58% and 11.85% respectively. The Leave-One-Out (LOO) can lower the EPE around 2.50% but its variances are much higher than the any other CV procedure. The computational time of our cross-validation procedure is just 8.64% of the MCCV, 8.67% of the stratified MCCV and 16.72% of the LOO. Experiments also show that our approach is more beneficial for datasets characterized by relatively small size and large aspect ratio. This makes our approach particularly pertinent when solving bioscience classification problems. Our proposed systematic subsampling technique could be generalized to other machine learning algorithms that involve random subsampling mechanism. △ Less

Submitted 4 July, 2019; originally announced July 2019.

Comments: SCIENCE China Mathematics. 2019

MSC Class: 62-07; 11J71; 62G09; 68T05

arXiv:1905.01205 [pdf, other]

Learning in Modal Space: Solving Time-Dependent Stochastic PDEs Using Physics-Informed Neural Networks

Authors: Dongkun Zhang, Ling Guo, George Em Karniadakis

Abstract: One of the open problems in scientific computing is the long-time integration of nonlinear stochastic partial differential equations (SPDEs). We address this problem by taking advantage of recent advances in scientific machine learning and the dynamically orthogonal (DO) and bi-orthogonal (BO) methods for representing stochastic processes. Specifically, we propose two new Physics-Informed Neural N… ▽ More One of the open problems in scientific computing is the long-time integration of nonlinear stochastic partial differential equations (SPDEs). We address this problem by taking advantage of recent advances in scientific machine learning and the dynamically orthogonal (DO) and bi-orthogonal (BO) methods for representing stochastic processes. Specifically, we propose two new Physics-Informed Neural Networks (PINNs) for solving time-dependent SPDEs, namely the NN-DO/BO methods, which incorporate the DO/BO constraints into the loss function with an implicit form instead of generating explicit expressions for the temporal derivatives of the DO/BO modes. Hence, the proposed methods overcome some of the drawbacks of the original DO/BO methods: we do not need the assumption that the covariance matrix of the random coefficients is invertible as in the original DO method, and we can remove the assumption of no eigenvalue crossing as in the original BO method. Moreover, the NN-DO/BO methods can be used to solve time-dependent stochastic inverse problems with the same formulation and computational complexity as for forward problems. We demonstrate the capability of the proposed methods via several numerical examples: (1) A linear stochastic advection equation with deterministic initial condition where the original DO/BO method would fail; (2) Long-time integration of the stochastic Burgers' equation with many eigenvalue crossings during the whole time evolution where the original BO method fails. (3) Nonlinear reaction diffusion equation: we consider both the forward and the inverse problem, including noisy initial data, to investigate the flexibility of the NN-DO/BO methods in handling inverse and mixed type problems. Taken together, these simulation results demonstrate that the NN-DO/BO methods can be employed to effectively quantify uncertainty propagation in a wide range of physical problems. △ Less

Submitted 3 September, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

arXiv:1904.10450 [pdf, other]

Latent Variable Algorithms for Multimodal Learning and Sensor Fusion

Authors: Lijiang Guo

Abstract: Multimodal learning has been lacking principled ways of combining information from different modalities and learning a low-dimensional manifold of meaningful representations. We study multimodal learning and sensor fusion from a latent variable perspective. We first present a regularized recurrent attention filter for sensor fusion. This algorithm can dynamically combine information from different… ▽ More Multimodal learning has been lacking principled ways of combining information from different modalities and learning a low-dimensional manifold of meaningful representations. We study multimodal learning and sensor fusion from a latent variable perspective. We first present a regularized recurrent attention filter for sensor fusion. This algorithm can dynamically combine information from different types of sensors in a sequential decision making task. Each sensor is bonded with a modular neural network to maximize utility of its own information. A gating modular neural network dynamically generates a set of mixing weights for outputs from sensor networks by balancing utility of all sensors' information. We design a co-learning mechanism to encourage co-adaption and independent learning of each sensor at the same time, and propose a regularization based co-learning method. In the second part, we focus on recovering the manifold of latent representation. We propose a co-learning approach using probabilistic graphical model which imposes a structural prior on the generative model: multimodal variational RNN (MVRNN) model, and derive a variational lower bound for its objective functions. In the third part, we extend the siamese structure to sensor fusion for robust acoustic event detection. We perform experiments to investigate the latent representations that are extracted; works will be done in the following months. Our experiments show that the recurrent attention filter can dynamically combine different sensor inputs according to the information carried in the inputs. We consider MVRNN can identify latent representations that are useful for many downstream tasks such as speech synthesis, activity recognition, and control and planning. Both algorithms are general frameworks which can be applied to other tasks where different types of sensors are jointly used for decision making. △ Less

Submitted 23 April, 2019; originally announced April 2019.

arXiv:1904.09743 [pdf, other]

Reliable Weakly Supervised Learning: Maximize Gain and Maintain Safeness

Authors: Lan-Zhe Guo, Yu-Feng Li, Ming Li, **-Feng Yi, Bo-Wen Zhou, Zhi-Hua Zhou

Abstract: Weakly supervised data are widespread and have attracted much attention. However, since label quality is often difficult to guarantee, sometimes the use of weakly supervised data will lead to unsatisfactory performance, i.e., performance degradation or poor performance gains. Moreover, it is usually not feasible to manually increase the label quality, which results in weakly supervised learning be… ▽ More Weakly supervised data are widespread and have attracted much attention. However, since label quality is often difficult to guarantee, sometimes the use of weakly supervised data will lead to unsatisfactory performance, i.e., performance degradation or poor performance gains. Moreover, it is usually not feasible to manually increase the label quality, which results in weakly supervised learning being somewhat difficult to rely on. In view of this crucial issue, this paper proposes a simple and novel weakly supervised learning framework. We guide the optimization of label quality through a small amount of validation data, and to ensure the safeness of performance while maximizing performance gain. As validation set is a good approximation for describing generalization risk, it can effectively avoid the unsatisfactory performance caused by incorrect data distribution assumptions. We formalize this underlying consideration into a novel Bi-Level optimization and give an effective solution. Extensive experimental results verify that the new framework achieves impressive performance on weakly supervised learning with a small amount of validation data. △ Less

Submitted 22 April, 2019; originally announced April 2019.

arXiv:1902.00956 [pdf, ps, other]

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Lijiang Guo, Aswin Sivaraman, Minje Kim

Abstract: We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompani… ▽ More We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompaniment tracks. Hence, the pitch shift in cents suggested by the model can be used to make the voice sound in tune with the accompaniment. This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees. We train the model using a dataset of 4,702 amateur karaoke performances selected for good intonation. We present a Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This method can be extended into unsupervised pitch correction of a vocal performance, popularly referred to as autotuning. △ Less

Submitted 3 February, 2019; originally announced February 2019.

arXiv:1901.06237 [pdf, other]

BUOCA: Budget-Optimized Crowd Worker Allocation

Authors: Mehrnoosh Sameki, Sha Lai, Kate K. Mays, Lei Guo, Prakash Ishwar, Margrit Betke

Abstract: Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution… ▽ More Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the allocations of the number of workers to a task can be computed optimally based on task features alone, without using worker profiles. Our target tasks are delineating cells in microscopy images and analyzing the sentiment toward the 2016 U.S. presidential candidates in tweets. We first propose an algorithm that computes budget-optimized crowd worker allocation (BUOCA). We next train a machine learning system (BUOCA-ML) that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling. We show that the computed allocation can yield large savings in the crowdsourcing budget (up to 49 percent points) while maintaining labeling accuracy. Finally, we envisage a human-machine system for performing budget-optimized data analysis at a scale beyond the feasibility of crowdsourcing. △ Less

Submitted 11 January, 2019; originally announced January 2019.

arXiv:1810.12582 [pdf, other]

DSKG: A Deep Sequential Model for Knowledge Graph Completion

Authors: Lingbing Guo, Qingheng Zhang, Weiyi Ge, Wei Hu, Yuzhong Qu

Abstract: Knowledge graph (KG) completion aims to fill the missing facts in a KG, where a fact is represented as a triple in the form of $(subject, relation, object)$. Current KG completion models compel two-thirds of a triple provided (e.g., $subject$ and $relation$) to predict the remaining one. In this paper, we propose a new model, which uses a KG-specific multi-layer recurrent neural network (RNN) to m… ▽ More Knowledge graph (KG) completion aims to fill the missing facts in a KG, where a fact is represented as a triple in the form of $(subject, relation, object)$. Current KG completion models compel two-thirds of a triple provided (e.g., $subject$ and $relation$) to predict the remaining one. In this paper, we propose a new model, which uses a KG-specific multi-layer recurrent neural network (RNN) to model triples in a KG as sequences. It outperformed several state-of-the-art KG completion models on the conventional entity prediction task for many evaluation metrics, based on two benchmark datasets and a more difficult dataset. Furthermore, our model is enabled by the sequential characteristic and thus capable of predicting the whole triples only given one entity. Our experiments demonstrated that our model achieved promising performance on this new triple prediction task. △ Less

Submitted 30 December, 2018; v1 submitted 30 October, 2018; originally announced October 2018.

Comments: CCKS (China Conference on Knowledge Graph and Semantic Computing) Best English Paper Award 2018

arXiv:1809.08327 [pdf, other]

doi 10.1016/j.jcp.2019.07.048

Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems

Authors: Dongkun Zhang, Lu Lu, Ling Guo, George Em Karniadakis

Abstract: Physics-informed neural networks (PINNs) have recently emerged as an alternative way of solving partial differential equations (PDEs) without the need of building elaborate grids, instead, using a straightforward implementation. In particular, in addition to the deep neural network (DNN) for the solution, a second DNN is considered that represents the residual of the PDE. The residual is then comb… ▽ More Physics-informed neural networks (PINNs) have recently emerged as an alternative way of solving partial differential equations (PDEs) without the need of building elaborate grids, instead, using a straightforward implementation. In particular, in addition to the deep neural network (DNN) for the solution, a second DNN is considered that represents the residual of the PDE. The residual is then combined with the mismatch in the given data of the solution in order to formulate the loss function. This framework is effective but is lacking uncertainty quantification of the solution due to the inherent randomness in the data or due to the approximation limitations of the DNN architecture. Here, we propose a new method with the objective of endowing the DNN with uncertainty quantification for both sources of uncertainty, i.e., the parametric uncertainty and the approximation uncertainty. We first account for the parametric uncertainty when the parameter in the differential equation is represented as a stochastic process. Multiple DNNs are designed to learn the modal functions of the arbitrary polynomial chaos (aPC) expansion of its solution by using stochastic data from sparse sensors. We can then make predictions from new sensor measurements very efficiently with the trained DNNs. Moreover, we employ dropout to correct the over-fitting and also to quantify the uncertainty of DNNs in approximating the modal functions. We then design an active learning strategy based on the dropout uncertainty to place new sensors in the domain to improve the predictions of DNNs. Several numerical tests are conducted for both the forward and the inverse problems to quantify the effectiveness of PINNs combined with uncertainty quantification. This NN-aPC new paradigm of physics-informed deep learning with uncertainty quantification can be readily applied to other types of stochastic PDEs in multi-dimensions. △ Less

Submitted 21 September, 2018; originally announced September 2018.

arXiv:1806.08541 [pdf, other]

Visualizing and Understanding Deep Neural Networks in CTR Prediction

Authors: Lin Guo, Hui Ye, Wenbo Su, Henhuan Liu, Kai Sun, Hang Xiang

Abstract: Although deep learning techniques have been successfully applied to many tasks, interpreting deep neural network models is still a big challenge to us. Recently, many works have been done on visualizing and analyzing the mechanism of deep neural networks in the areas of image processing and natural language processing. In this paper, we present our approaches to visualize and understand deep neura… ▽ More Although deep learning techniques have been successfully applied to many tasks, interpreting deep neural network models is still a big challenge to us. Recently, many works have been done on visualizing and analyzing the mechanism of deep neural networks in the areas of image processing and natural language processing. In this paper, we present our approaches to visualize and understand deep neural networks for a very important commercial task--CTR (Click-through rate) prediction. We conduct experiments on the productive data from our online advertising system with daily varying distribution. To understand the mechanism and the performance of the model, we inspect the model's inner status at neuron level. Also, a probe approach is implemented to measure the layer-wise performance of the model. Moreover, to measure the influence from the input features, we calculate saliency scores based on the back-propagated gradients. Practical applications are also discussed, for example, in understanding, monitoring, diagnosing and refining models and algorithms. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: Accept by 2018 SIGIR Workshop on eCommerce

arXiv:1802.04350 [pdf, other]

Cost-Aware Learning for Improved Identifiability with Multiple Experiments

Authors: Longyun Guo, Jean Honorio, John Morgan

Abstract: We analyze the sample complexity of learning from multiple experiments where the experimenter has a total budget for obtaining samples. In this problem, the learner should choose a hypothesis that performs well with respect to multiple experiments, and their related data distributions. Each collected sample is associated with a cost which depends on the particular experiments. In our setup, a lear… ▽ More We analyze the sample complexity of learning from multiple experiments where the experimenter has a total budget for obtaining samples. In this problem, the learner should choose a hypothesis that performs well with respect to multiple experiments, and their related data distributions. Each collected sample is associated with a cost which depends on the particular experiments. In our setup, a learner performs $m$ experiments, while incurring a total cost $C$. We first show that learning from multiple experiments allows to improve identifiability. Additionally, by using a Rademacher complexity approach, we show that the gap between the training and generalization error is $O(C^{-1/2})$. We also provide some examples for linear prediction, two-layer neural networks and kernel methods. △ Less

Submitted 13 July, 2019; v1 submitted 12 February, 2018; originally announced February 2018.

Comments: 17 pages, 4 figures

Journal ref: IEEE International Symposium on Information Theory (ISIT) 2019

arXiv:1802.03708

A Time-Varying Network for Cryptocurrencies

Authors: Li Guo, Wolfgang Karl Härdle, Yubo Tao

Abstract: Cryptocurrencies return cross-predictability and technological similarity yield information on risk propagation and market segmentation. To investigate these effects, we build a time-varying network for cryptocurrencies, based on the evolution of return cross-predictability and technological similarities. We develop a dynamic covariate-assisted spectral clustering method to consistently estimate t… ▽ More Cryptocurrencies return cross-predictability and technological similarity yield information on risk propagation and market segmentation. To investigate these effects, we build a time-varying network for cryptocurrencies, based on the evolution of return cross-predictability and technological similarities. We develop a dynamic covariate-assisted spectral clustering method to consistently estimate the latent community structure of cryptocurrencies network that accounts for both sets of information. We demonstrate that investors can achieve better risk diversification by investing in cryptocurrencies from different communities. A cross-sectional portfolio that implements an inter-crypto momentum trading strategy earns a 1.08% daily return. By dissecting the portfolio returns on behavioral factors, we confirm that our results are not driven by behavioral mechanisms. △ Less

Submitted 17 November, 2022; v1 submitted 11 February, 2018; originally announced February 2018.

Comments: Duplicate with arXiv:2108.11921

MSC Class: 62H30; 62F12 (Primary); 91D30 (Secondary)

arXiv:1409.2232 [pdf, ps, other]

When coding meets ranking: A joint framework based on local learning

Authors: Jim **g-Yan Wang, Xuefeng Cui, Ge Yu, Lili Guo, Xin Gao

Abstract: Sparse coding, which represents a data point as a sparse reconstruction code with regard to a dictionary, has been a popular data representation method. Meanwhile, in database retrieval problems, learning the ranking scores from data points plays an important role. Up to now, these two problems have always been considered separately, assuming that data coding and ranking are two independent and ir… ▽ More Sparse coding, which represents a data point as a sparse reconstruction code with regard to a dictionary, has been a popular data representation method. Meanwhile, in database retrieval problems, learning the ranking scores from data points plays an important role. Up to now, these two problems have always been considered separately, assuming that data coding and ranking are two independent and irrelevant problems. However, is there any internal relationship between sparse coding and ranking score learning? If yes, how to explore and make use of this internal relationship? In this paper, we try to answer these questions by develo** the first joint sparse coding and ranking score learning algorithm. To explore the local distribution in the sparse code space, and also to bridge coding and ranking problems, we assume that in the neighborhood of each data point, the ranking scores can be approximated from the corresponding sparse codes by a local linear function. By considering the local approximation error of ranking scores, the reconstruction error and sparsity of sparse coding, and the query information provided by the user, we construct a unified objective function for learning of sparse codes, the dictionary and ranking scores. We further develop an iterative algorithm to solve this optimization problem. △ Less

Submitted 2 November, 2016; v1 submitted 8 September, 2014; originally announced September 2014.

Showing 1–26 of 26 results for author: Guo, L