Search | arXiv e-print repository

Hyperparameter Estimation for Sparse Bayesian Learning Models

Authors: Feng Yu, Lixin Shen, Guohui Song

Abstract: Sparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors. The hyperparameters in SBL models are crucial for the model's performance, but they are often difficult to estimate due to the non-convexity and the high-dimensionality of the associated objective function. This paper presents a comprehensive fram… ▽ More Sparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors. The hyperparameters in SBL models are crucial for the model's performance, but they are often difficult to estimate due to the non-convexity and the high-dimensionality of the associated objective function. This paper presents a comprehensive framework for hyperparameter estimation in SBL models, encompassing well-known algorithms such as the expectation-maximization (EM), MacKay, and convex bounding (CB) algorithms. These algorithms are cohesively interpreted within an alternating minimization and linearization (AML) paradigm, distinguished by their unique linearized surrogate functions. Additionally, a novel algorithm within the AML framework is introduced, showing enhanced efficiency, especially under low signal noise ratios. This is further improved by a new alternating minimization and quadratic approximation (AMQ) paradigm, which includes a proximal regularization term. The paper substantiates these advancements with thorough convergence analysis and numerical experiments, demonstrating the algorithm's effectiveness in various noise conditions and signal-to-noise ratios. △ Less

Submitted 4 January, 2024; originally announced January 2024.

MSC Class: 62F15; 65K10; 65F22

arXiv:2302.14247 [pdf, ps, other]

Sequential edge detection using joint hierarchical Bayesian learning

Authors: Yao Xiao, Anne Gelb, Guohui Song

Abstract: This paper introduces a new sparse Bayesian learning (SBL) algorithm that jointly recovers a temporal sequence of edge maps from noisy and under-sampled Fourier data. The new method is cast in a Bayesian framework and uses a prior that simultaneously incorporates intra-image information to promote sparsity in each individual edge map with inter-image information to promote similarities in any unch… ▽ More This paper introduces a new sparse Bayesian learning (SBL) algorithm that jointly recovers a temporal sequence of edge maps from noisy and under-sampled Fourier data. The new method is cast in a Bayesian framework and uses a prior that simultaneously incorporates intra-image information to promote sparsity in each individual edge map with inter-image information to promote similarities in any unchanged regions. By treating both the edges as well as the similarity between adjacent images as random variables, there is no need to separately form regions of change. Thus we avoid both additional computational cost as well as any information loss resulting from pre-processing the image. Our numerical examples demonstrate that our new method compares favorably with more standard SBL approaches. △ Less

Submitted 27 February, 2023; originally announced February 2023.

MSC Class: 15A29; 62F15; 65F22; 65K10; 68U10

arXiv:2302.04582 [pdf, other]

Reliable event rates for disease map**

Authors: Harrison Quick, Guangzi Song

Abstract: When analyzing spatially referenced event data, the criteria for declaring rates as "reliable" is still a matter of dispute. What these varying criteria have in common, however, is that they are rarely satisfied for crude estimates in small area analysis settings, prompting the use of spatial models to improve reliability. While reasonable, recent work has quantified the extent to which popular mo… ▽ More When analyzing spatially referenced event data, the criteria for declaring rates as "reliable" is still a matter of dispute. What these varying criteria have in common, however, is that they are rarely satisfied for crude estimates in small area analysis settings, prompting the use of spatial models to improve reliability. While reasonable, recent work has quantified the extent to which popular models from the spatial statistics literature can overwhelm the information contained in the data, leading to oversmoothing. Here, we begin by providing a definition for a "reliable" estimate for event rates that can be used for crude and model-based estimates and allows for discrete and continuous statements of reliability. We then construct a spatial Bayesian framework that allows users to infuse prior information into their models to improve reliability while also guarding against oversmoothing. We apply our approach to county-level birth data from Pennsylvania, highlighting the effect of oversmoothing in spatial models and how our approach can allow users to better focus their attention to areas where sufficient data exists to drive inferential decisions. We then conclude with a brief discussion of how this definition of reliability can be used in the design of small area studies. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2202.10574 [pdf, other]

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Authors: Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu

Abstract: The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multi… ▽ More The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at https://github.com/RunzheStat/CausalMARL. △ Less

Submitted 26 March, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

arXiv:2106.10571 [pdf, other]

Geographic and Racial Disparities in the Incidence of Low Birthweight in Pennsylvania

Authors: Guangzi Song, Loni Philip Tabb, Harrison Quick

Abstract: Babies born with low and very low birthweights -- i.e., birthweights below 2,500 and 1,500 grams, respectively -- have an increased risk of complications compared to other babies, and the proportion of babies with a low birthweight is a common metric used when evaluating public health in a population. While many factors increase the risk of a baby having a low birthweight, many can be linked to th… ▽ More Babies born with low and very low birthweights -- i.e., birthweights below 2,500 and 1,500 grams, respectively -- have an increased risk of complications compared to other babies, and the proportion of babies with a low birthweight is a common metric used when evaluating public health in a population. While many factors increase the risk of a baby having a low birthweight, many can be linked to the mother's socioeconomic status, which in turn contributes to large racial disparities in the incidence of low weight births. Here, we employ Bayesian statistical models to analyze the proportion of babies with low birthweight in Pennsylvania counties by race/ethnicity. Due to the small number of births -- and low weight births -- in many Pennsylvania counties when stratified by race/ethnicity, our methods must walk a fine line. On one hand, leveraging spatial structure can help improve the precision of our estimates. On the other hand, we must be cautious to avoid letting the model overwhelm the information in the data and produce spurious conclusions. As such, we first develop a framework by which we can measure (and control) the informativeness of our spatial model. After demonstrating the properties of our framework via simulation, we analyze the low birthweight data from Pennsylvania and examine the extent to which the commonly used conditional autoregressive model can lead to oversmoothing. We then reanalyze the data using our proposed framework and highlight its ability to detect (or not detect) evidence of racial disparities in the incidence of low birthweight. △ Less

Submitted 19 June, 2021; originally announced June 2021.

arXiv:2106.06044 [pdf, other]

Convergence and Alignment of Gradient Descent with Random Backpropagation Weights

Authors: Ganlin Song, Ruitu Xu, John Lafferty

Abstract: Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure -- updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neura… ▽ More Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure -- updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neural networks as a tool for understanding the biological principles of information processing in the brain. Lillicrap et al. (2016) propose a more biologically plausible "feedback alignment" algorithm that uses random and fixed backpropagation weights, and show promising simulations. In this paper we study the mathematical properties of the feedback alignment procedure by analyzing convergence and alignment for two-layer networks under squared error loss. In the overparameterized setting, we prove that the error converges to zero exponentially fast, and also that regularization is necessary in order for the parameters to become aligned with the random backpropagation weights. Simulations are given that are consistent with this analysis and suggest further generalizations. These results contribute to our understanding of how biologically plausible algorithms might carry out weight learning in a manner different from Hebbian learning, with performance that is comparable with the full non-local backpropagation algorithm. △ Less

Submitted 22 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: 35 pages

arXiv:2012.08784 [pdf, other]

Tensor Completion by Multi-Rank via Unitary Transformation

Authors: Guang-**g Song, Michael K. Ng, Xiongjun Zhang

Abstract: One of the key problems in tensor completion is the number of uniformly random sample entries required for recovery guarantee. The main aim of this paper is to study $n_1 \times n_2 \times n_3$ third-order tensor completion based on transformed tensor singular value decomposition, and provide a bound on the number of required sample entries. Our approach is to make use of the multi-rank of the und… ▽ More One of the key problems in tensor completion is the number of uniformly random sample entries required for recovery guarantee. The main aim of this paper is to study $n_1 \times n_2 \times n_3$ third-order tensor completion based on transformed tensor singular value decomposition, and provide a bound on the number of required sample entries. Our approach is to make use of the multi-rank of the underlying tensor instead of its tubal rank in the bound. In numerical experiments on synthetic and imaging data sets, we demonstrate the effectiveness of our proposed bound for the number of sample entries. Moreover, our theoretical results are valid to any unitary transformation applied to $n_3$-dimension under transformed tensor singular value decomposition. △ Less

Submitted 24 January, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

arXiv:2009.11510 [pdf, other]

EPNE: Evolutionary Pattern Preserving Network Embedding

Authors: Junshan Wang, Yilun **, Guojie Song, Xiaojun Ma

Abstract: Information networks are ubiquitous and are ideal for modeling relational data. Networks being sparse and irregular, network embedding algorithms have caught the attention of many researchers, who came up with numerous embeddings algorithms in static networks. Yet in real life, networks constantly evolve over time. Hence, evolutionary patterns, namely how nodes develop itself over time, would serv… ▽ More Information networks are ubiquitous and are ideal for modeling relational data. Networks being sparse and irregular, network embedding algorithms have caught the attention of many researchers, who came up with numerous embeddings algorithms in static networks. Yet in real life, networks constantly evolve over time. Hence, evolutionary patterns, namely how nodes develop itself over time, would serve as a powerful complement to static structures in embedding networks, on which relatively few works focus. In this paper, we propose EPNE, a temporal network embedding model preserving evolutionary patterns of the local structure of nodes. In particular, we analyze evolutionary patterns with and without periodicity and design strategies correspondingly to model such patterns in time-frequency domains based on causal convolutions. In addition, we propose a temporal objective function which is optimized simultaneously with proximity ones such that both temporal and structural information are preserved. With the adequate modeling of temporal information, our model is able to outperform other competitive methods in various prediction tasks. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 8 pages

arXiv:2009.10951 [pdf, other]

doi 10.1145/3340531.3411963

Streaming Graph Neural Networks via Continual Learning

Authors: Junshan Wang, Guojie Song, Yi Wu, Liang Wang

Abstract: Graph neural networks (GNNs) have achieved strong performance in various applications. In the real world, network data is usually formed in a streaming fashion. The distributions of patterns that refer to neighborhood information of nodes may shift over time. The GNN model needs to learn the new patterns that cannot yet be captured. But learning incrementally leads to the catastrophic forgetting p… ▽ More Graph neural networks (GNNs) have achieved strong performance in various applications. In the real world, network data is usually formed in a streaming fashion. The distributions of patterns that refer to neighborhood information of nodes may shift over time. The GNN model needs to learn the new patterns that cannot yet be captured. But learning incrementally leads to the catastrophic forgetting problem that historical knowledge is overwritten by newly learned knowledge. Therefore, it is important to train GNN model to learn new patterns and maintain existing patterns simultaneously, which few works focus on. In this paper, we propose a streaming GNN model based on continual learning so that the model is trained incrementally and up-to-date node representations can be obtained at each time step. Firstly, we design an approximation algorithm to detect new coming patterns efficiently based on information propagation. Secondly, we combine two perspectives of data replaying and model regularization for existing pattern consolidation. Specially, a hierarchy-importance sampling strategy for nodes is designed and a weighted regularization term for GNN parameters is derived, achieving greater stability and generalization of knowledge consolidation. Our model is evaluated on real and synthetic data sets and compared with multiple baselines. The results of node classification prove that our model can efficiently update model parameters and achieve comparable performance to model retraining. In addition, we also conduct a case study on the synthetic data, and carry out some specific analysis for each part of our model, illustrating its ability to learn new knowledge and maintain existing knowledge from different perspectives. △ Less

Submitted 4 December, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: 10 pages, 4 figures, CIKM 2020

arXiv:2009.03998 [pdf, other]

Tangent Space Based Alternating Projections for Nonnegative Low Rank Matrix Approximation

Authors: Guang**g Song, Michael K. Ng, Tai-Xiang Jiang

Abstract: In this paper, we develop a new alternating projection method to compute nonnegative low rank matrix approximation for nonnegative matrices. In the nonnegative low rank matrix approximation method, the projection onto the manifold of fixed rank matrices can be expensive as the singular value decomposition is required. We propose to use the tangent space of the point in the manifold to approximate… ▽ More In this paper, we develop a new alternating projection method to compute nonnegative low rank matrix approximation for nonnegative matrices. In the nonnegative low rank matrix approximation method, the projection onto the manifold of fixed rank matrices can be expensive as the singular value decomposition is required. We propose to use the tangent space of the point in the manifold to approximate the projection onto the manifold in order to reduce the computational cost. We show that the sequence generated by the alternating projections onto the tangent spaces of the fixed rank matrices manifold and the nonnegative matrix manifold, converge linearly to a point in the intersection of the two manifolds where the convergent point is sufficiently close to optimal solutions. This convergence result based inexact projection onto the manifold is new and is not studied in the literature. Numerical examples in data clustering, pattern recognition and hyperspectral data analysis are given to demonstrate that the performance of the proposed method is better than that of nonnegative matrix factorization methods in terms of computational time and accuracy. △ Less

Submitted 2 September, 2020; originally announced September 2020.

arXiv:2007.10559 [pdf, other]

doi 10.1016/j.sste.2021.100420

Evaluating the Informativeness of the Besag-York-Mollié CAR Model

Authors: Harrison Quick, Guangzi Song, Loni Tabb

Abstract: The use of the conditional autoregressive framework proposed by Besag, York, and Mollié (1991; BYM) is ubiquitous in Bayesian disease map** and spatial epidemiology. While it is understood that Bayesian inference is based on a combination of the information contained in the data and the information contributed by the model, quantifying the contribution of the model relative to the information in… ▽ More The use of the conditional autoregressive framework proposed by Besag, York, and Mollié (1991; BYM) is ubiquitous in Bayesian disease map** and spatial epidemiology. While it is understood that Bayesian inference is based on a combination of the information contained in the data and the information contributed by the model, quantifying the contribution of the model relative to the information in the data is often non-trivial. Here, we provide a measure of the contribution of the BYM framework by first considering the simple Poisson-gamma setting in which quantifying the prior's contribution is quite clear. We then propose a relationship between gamma and lognormal priors that we then extend to cover the framework proposed by BYM. Following a brief simulation study in which we illustrate the accuracy of our lognormal approximation of the gamma prior, we analyze a dataset comprised of county-level heart disease-related death data across the United States. In addition to demonstrating the potential for the BYM framework to correspond to a highly informative prior specification, we also illustrate the sensitivity of death rate estimates to changes in the informativeness of the BYM framework. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: 11 pages, 4 figures

Journal ref: Spatial and Spatio-temporal Epidemiology, 37, 100420 (2021)

arXiv:2006.14278 [pdf, other]

Graph Structural-topic Neural Network

Authors: Qingqing Long, Yilun **, Guojie Song, Yi Li, Wei Lin

Abstract: Graph Convolutional Networks (GCNs) achieved tremendous success by effectively gathering local features for nodes. However, commonly do GCNs focus more on node features but less on graph structures within the neighborhood, especially higher-order structural patterns. However, such local structural patterns are shown to be indicative of node properties in numerous fields. In addition, it is not jus… ▽ More Graph Convolutional Networks (GCNs) achieved tremendous success by effectively gathering local features for nodes. However, commonly do GCNs focus more on node features but less on graph structures within the neighborhood, especially higher-order structural patterns. However, such local structural patterns are shown to be indicative of node properties in numerous fields. In addition, it is not just single patterns, but the distribution over all these patterns matter, because networks are complex and the neighborhood of each node consists of a mixture of various nodes and structural patterns. Correspondingly, in this paper, we propose Graph Structural-topic Neural Network, abbreviated GraphSTONE, a GCN model that utilizes topic models of graphs, such that the structural topics capture indicative graph structures broadly from a probabilistic aspect rather than merely a few structures. Specifically, we build topic models upon graphs using anonymous walks and Graph Anchor LDA, an LDA variant that selects significant structural patterns first, so as to alleviate the complexity and generate structural topics efficiently. In addition, we design multi-view GCNs to unify node features and structural topic features and utilize structural topics to guide the aggregation. We evaluate our model through both quantitative and qualitative experiments, where our model exhibits promising performance, high efficiency, and clear interpretability. △ Less

Submitted 4 July, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

arXiv:1911.10699 [pdf, other]

Multi-Component Graph Convolutional Collaborative Filtering

Authors: Xiao Wang, Ruijia Wang, Chuan Shi, Guojie Song, Qingyong Li

Abstract: The interactions of users and items in recommender system could be naturally modeled as a user-item bipartite graph. In recent years, we have witnessed an emerging research effort in exploring user-item graph for collaborative filtering methods. Nevertheless, the formation of user-item interactions typically arises from highly complex latent purchasing motivations, such as high cost performance or… ▽ More The interactions of users and items in recommender system could be naturally modeled as a user-item bipartite graph. In recent years, we have witnessed an emerging research effort in exploring user-item graph for collaborative filtering methods. Nevertheless, the formation of user-item interactions typically arises from highly complex latent purchasing motivations, such as high cost performance or eye-catching appearance, which are indistinguishably represented by the edges. The existing approaches still remain the differences between various purchasing motivations unexplored, rendering the inability to capture fine-grained user preference. Therefore, in this paper we propose a novel Multi-Component graph convolutional Collaborative Filtering (MCCF) approach to distinguish the latent purchasing motivations underneath the observed explicit user-item interactions. Specifically, there are two elaborately designed modules, decomposer and combiner, inside MCCF. The former first decomposes the edges in user-item graph to identify the latent components that may cause the purchasing relationship; the latter then recombines these latent components automatically to obtain unified embeddings for prediction. Furthermore, the sparse regularizer and weighted random sample strategy are utilized to alleviate the overfitting problem and accelerate the optimization. Empirical results on three real datasets and a synthetic dataset not only show the significant performance gains of MCCF, but also well demonstrate the necessity of considering multiple components. △ Less

Submitted 24 November, 2019; originally announced November 2019.

arXiv:1911.07675 [pdf, other]

GraLSP: Graph Neural Networks with Local Structural Patterns

Authors: Yilun **, Guojie Song, Chuan Shi

Abstract: It is not until recently that graph neural networks (GNNs) are adopted to perform graph representation learning, among which, those based on the aggregation of features within the neighborhood of a node achieved great success. However, despite such achievements, GNNs illustrate defects in identifying some common structural patterns which, unfortunately, play significant roles in various network ph… ▽ More It is not until recently that graph neural networks (GNNs) are adopted to perform graph representation learning, among which, those based on the aggregation of features within the neighborhood of a node achieved great success. However, despite such achievements, GNNs illustrate defects in identifying some common structural patterns which, unfortunately, play significant roles in various network phenomena. In this paper, we propose GraLSP, a GNN framework which explicitly incorporates local structural patterns into the neighborhood aggregation through random anonymous walks. Specifically, we capture local graph structures via random anonymous walks, powerful and flexible tools that represent structural patterns. The walks are then fed into the feature aggregation, where we design various mechanisms to address the impact of structural features, including adaptive receptive radius, attention and amplification. In addition, we design objectives that capture similarities between structures and are optimized jointly with node proximity objectives. With the adequate leverage of structural patterns, our model is able to outperform competitive counterparts in various prediction tasks in multiple datasets. △ Less

Submitted 7 December, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.04143 [pdf, other]

doi 10.1609/aaai.v34i04.5769

Time2Graph: Revisiting Time Series Modeling with Dynamic Shapelets

Authors: Ziqiang Cheng, Yang Yang, Wei Wang, Wenjie Hu, Yueting Zhuang, Guojie Song

Abstract: Time series modeling has attracted extensive research efforts; however, achieving both reliable efficiency and interpretability from a unified model still remains a challenging problem. Among the literature, shapelets offer interpretable and explanatory insights in the classification tasks, while most existing works ignore the differing representative power at different time slices, as well as (mo… ▽ More Time series modeling has attracted extensive research efforts; however, achieving both reliable efficiency and interpretability from a unified model still remains a challenging problem. Among the literature, shapelets offer interpretable and explanatory insights in the classification tasks, while most existing works ignore the differing representative power at different time slices, as well as (more importantly) the evolution pattern of shapelets. In this paper, we propose to extract time-aware shapelets by designing a two-level timing factor. Moreover, we define and construct the shapelet evolution graph, which captures how shapelets evolve over time and can be incorporated into the time series embeddings by graph embedding algorithms. To validate whether the representations obtained in this way can be applied effectively in various scenarios, we conduct experiments based on three public time series datasets, and two real-world datasets from different domains. Experimental results clearly show the improvements achieved by our approach compared with 17 state-of-the-art baselines. △ Less

Submitted 30 November, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

Comments: An extended version with 11 pages including appendix; Accepted by AAAI'2020

MSC Class: 10010147.10010257.10010258.10010259.10010263

arXiv:1909.03261 [pdf, other]

Active learning to optimise time-expensive algorithm selection

Authors: Riccardo Volpato, Guangyan Song

Abstract: Hard optimisation problems such as Boolean Satisfiability typically have long solving times and can usually be solved by many algorithms, although the performance can vary widely in practice. Research has shown that no single algorithm outperforms all the others; thus, it is crucial to select the best algorithm for a given problem. Supervised machine learning models can accurately predict which so… ▽ More Hard optimisation problems such as Boolean Satisfiability typically have long solving times and can usually be solved by many algorithms, although the performance can vary widely in practice. Research has shown that no single algorithm outperforms all the others; thus, it is crucial to select the best algorithm for a given problem. Supervised machine learning models can accurately predict which solver is best for a given problem, but they require first to run every solver in the portfolio for all examples available to create labelled data. As this approach cannot scale, we developed an active learning framework that addresses this problem by constructing an optimal training set, so that the learner can achieve higher or equal performances with less training data. Our work proves that active learning is beneficial for algorithm selection techniques and provides practical guidance to incorporate into existing systems. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: 11 pages, 3 figures, 3 tables and 1 appendix

arXiv:1908.04979 [pdf, other]

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Authors: Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Abstract: Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM exten… ▽ More Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent model parameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation. △ Less

Submitted 14 August, 2019; originally announced August 2019.

arXiv:1907.08653 [pdf, other]

Surfing: Iterative optimization over incrementally trained deep networks

Authors: Ganlin Song, Zhou Fan, John Lafferty

Abstract: We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hatθ}(x) = \frac{1}{2}\|G_{\hatθ}(x) - y\|^2$ for certain families of deep networks $G_θ(x)$. The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters $θ_0$, we show that… ▽ More We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hatθ}(x) = \frac{1}{2}\|G_{\hatθ}(x) - y\|^2$ for certain families of deep networks $G_θ(x)$. The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters $θ_0$, we show that the objective $f_{θ_0}(x)$ is "nice'' and easy to optimize with gradient descent. As learning is carried out, we obtain a sequence of generative networks $x \mapsto G_{θ_t}(x)$ and associated risk functions $f_{θ_t}(x)$, where $t$ indicates a stage of stochastic gradient descent during training. Since the parameters of the network do not change by very much in each step, the surface evolves slowly and can be incrementally optimized. The algorithm is formalized and analyzed for a family of expansive networks. We call the procedure {\it surfing} since it rides along the peak of the evolving (negative) empirical risk function, starting from a smooth surface at the beginning of learning and ending with a wavy nonconvex surface after learning is complete. Experiments show how surfing can be used to find the global optimum and for compressed sensing even when direct gradient descent on the final learned network fails. △ Less

Submitted 19 July, 2019; originally announced July 2019.

arXiv:1907.01113 [pdf, other]

Robust Tensor Completion Using Transformed Tensor SVD

Authors: Guang**g Song, Michael K. Ng, Xiongjun Zhang

Abstract: In this paper, we study robust tensor completion by using transformed tensor singular value decomposition (SVD), which employs unitary transform matrices instead of discrete Fourier transform matrix that is used in the traditional tensor SVD. The main motivation is that a lower tubal rank tensor can be obtained by using other unitary transform matrices than that by using discrete Fourier transform… ▽ More In this paper, we study robust tensor completion by using transformed tensor singular value decomposition (SVD), which employs unitary transform matrices instead of discrete Fourier transform matrix that is used in the traditional tensor SVD. The main motivation is that a lower tubal rank tensor can be obtained by using other unitary transform matrices than that by using discrete Fourier transform matrix. This would be more effective for robust tensor completion. Experimental results for hyperspectral, video and face datasets have shown that the recovery performance for the robust tensor completion problem by using transformed tensor SVD is better in PSNR than that by using Fourier transform and other robust tensor completion methods. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1906.00684 [pdf, other]

DANE: Domain Adaptive Network Embedding

Authors: Yizhou Zhang, Guojie Song, Lun Du, Shuwen Yang, Yilun **

Abstract: Recent works reveal that network embedding techniques enable many machine learning models to handle diverse downstream tasks on graph structured data. However, as previous methods usually focus on learning embeddings for a single network, they can not learn representations transferable on multiple networks. Hence, it is important to design a network embedding algorithm that supports downstream mod… ▽ More Recent works reveal that network embedding techniques enable many machine learning models to handle diverse downstream tasks on graph structured data. However, as previous methods usually focus on learning embeddings for a single network, they can not learn representations transferable on multiple networks. Hence, it is important to design a network embedding algorithm that supports downstream model transferring on different networks, known as domain adaptation. In this paper, we propose a novel Domain Adaptive Network Embedding framework, which applies graph convolutional network to learn transferable embeddings. In DANE, nodes from multiple networks are encoded to vectors via a shared set of learnable parameters so that the vectors share an aligned embedding space. The distribution of embeddings on different networks are further aligned by adversarial learning regularization. In addition, DANE's advantage in learning transferable network embedding can be guaranteed theoretically. Extensive experiments reflect that the proposed framework outperforms other state-of-the-art network embedding baselines in cross-network domain adaptation tasks. △ Less

Submitted 19 August, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: 7 pages, 4 figures, accepted by IJCAI 2019

arXiv:1905.12835 [pdf, ps, other]

Adversarial Sub-sequence for Text Generation

Authors: Xingyuan Chen, Yanzhe Li, Peng **, Jiuhua Zhang, Xinyu Dai, Jiajun Chen, Gang Song

Abstract: Generative adversarial nets (GAN) has been successfully introduced for generating text to alleviate the exposure bias. However, discriminators in these models only evaluate the entire sequence, which causes feedback sparsity and mode collapse. To tackle these problems, we propose a novel mechanism. It first segments the entire sequence into several sub-sequences. Then these sub-sequences, together… ▽ More Generative adversarial nets (GAN) has been successfully introduced for generating text to alleviate the exposure bias. However, discriminators in these models only evaluate the entire sequence, which causes feedback sparsity and mode collapse. To tackle these problems, we propose a novel mechanism. It first segments the entire sequence into several sub-sequences. Then these sub-sequences, together with the entire sequence, are evaluated individually by the discriminator. At last these feedback signals are all used to guide the learning of GAN. This mechanism learns the generation of both the entire sequence and the sub-sequences simultaneously. Learning to generate sub-sequences is easy and is helpful in generating an entire sequence. It is easy to improve the existing GAN-based models with this mechanism. We rebuild three previous well-designed models with our mechanism, and the experimental results on benchmark data show these models are improved significantly, the best one outperforms the state-of-the-art model.\footnote[1]{All code and data are available at https://github.com/liyzcj/seggan.git △ Less

Submitted 29 May, 2019; originally announced May 2019.

arXiv:1905.03041 [pdf, other]

doi 10.1145/3308558.3308558.3313622

Tag2Vec: Learning Tag Representations in Tag Networks

Authors: Junshan Wang, Zhicong Lu, Guojie Song, Yue Fan, Lun Du, Wei Lin

Abstract: Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, whi… ▽ More Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, which mixes nodes and tags into a hybrid network. Firstly, for tag networks, we define semantic distance as the proximity between tags and design a novel strategy, parameterized random walk, to generate context with semantic and hierarchical information of tags adaptively. Then, we propose hyperbolic Skip-gram model to express the complex hierarchical structure better with lower output dimensions. We evaluate our model on the NBER U.S. patent dataset and WordNet dataset. The results show that our model can learn tag representations with rich semantic information and it outperforms other baselines. △ Less

Submitted 24 September, 2020; v1 submitted 19 April, 2019; originally announced May 2019.

Comments: 6 pages

arXiv:1902.09817 [pdf, other]

GCN-LASE: Towards Adequately Incorporating Link Attributes in Graph Convolutional Networks

Authors: Ziyao Li, Liang Zhang, Guojie Song

Abstract: Graph Convolutional Networks (GCNs) have proved to be a most powerful architecture in aggregating local neighborhood information for individual graph nodes. Low-rank proximities and node features are successfully leveraged in existing GCNs, however, attributes that graph links may carry are commonly ignored, as almost all of these models simplify graph links into binary or scalar values describing… ▽ More Graph Convolutional Networks (GCNs) have proved to be a most powerful architecture in aggregating local neighborhood information for individual graph nodes. Low-rank proximities and node features are successfully leveraged in existing GCNs, however, attributes that graph links may carry are commonly ignored, as almost all of these models simplify graph links into binary or scalar values describing node connectedness. In our paper instead, links are reverted to hypostatic relationships between entities with descriptional attributes. We propose GCN-LASE (GCN with Link Attributes and Sampling Estimation), a novel GCN model taking both node and link attributes as inputs. To adequately captures the interactions between link and node attributes, their tensor product is used as neighbor features, based on which we define several graph kernels and further develop according architectures for LASE. Besides, to accelerate the training process, the sum of features in entire neighborhoods are estimated through Monte Carlo method, with novel sampling strategies designed for LASE to minimize the estimation variance. Our experiments show that LASE outperforms strong baselines over various graph datasets, and further experiments corroborate the informativeness of link attributes and our model's ability of adequately leveraging them. △ Less

Submitted 30 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: IJCAI2019 Accepted Paper

arXiv:1811.05614 [pdf, other]

SepNE: Bringing Separability to Network Embedding

Authors: Ziyao Li, Liang Zhang, Guojie Song

Abstract: Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almos… ▽ More Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to super-large networks, automatic implementation in distributed learning and further adaptations. We demonstrate the effectiveness of this approach on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks. △ Less

Submitted 26 February, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

Comments: 8 pages, 4 figures, accepted in the Proceedings of the 33rd AAAI's Conference on Artificial Intelligence

arXiv:1811.02629 [pdf, other]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset. △ Less

Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

arXiv:1805.11761 [pdf, other]

Collaborative Learning for Deep Neural Networks

Authors: Guocong Song, Wei Chai

Abstract: We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning.… ▽ More We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning. First, the consensus of multiple views from different classifier heads on the same example provides supplementary information as well as regularization to each classifier, thereby improving generalization. Second, intermediate-level representation (ILR) sharing with backpropagation rescaling aggregates the gradient flows from all heads, which not only reduces training computational complexity, but also facilitates supervision to the shared layers. The empirical results on CIFAR and ImageNet datasets demonstrate that deep neural networks learned as a group in a collaborative way significantly reduce the generalization error and increase the robustness to label noise. △ Less

Submitted 6 November, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: To appear in NIPS 2018

arXiv:1101.4439 [pdf, ps, other]

Reproducing Kernel Banach Spaces with the l1 Norm II: Error Analysis for Regularized Least Square Regression

Authors: Guohui Song, Haizhang Zhang

Abstract: A typical approach in estimating the learning rate of a regularized learning scheme is to bound the approximation error by the sum of the sampling error, the hypothesis error and the regularization error. Using a reproducing kernel space that satisfies the linear representer theorem brings the advantage of discarding the hypothesis error from the sum automatically. Following this direction, we ill… ▽ More A typical approach in estimating the learning rate of a regularized learning scheme is to bound the approximation error by the sum of the sampling error, the hypothesis error and the regularization error. Using a reproducing kernel space that satisfies the linear representer theorem brings the advantage of discarding the hypothesis error from the sum automatically. Following this direction, we illustrate how reproducing kernel Banach spaces with the l1 norm can be applied to improve the learning rate estimate of l1-regularization in machine learning. △ Less

Submitted 27 January, 2011; v1 submitted 23 January, 2011; originally announced January 2011.

arXiv:1101.4388 [pdf, ps, other]

doi 10.1016/j.acha.2012.03.009

Reproducing Kernel Banach Spaces with the l1 Norm

Authors: Guohui Song, Haizhang Zhang, Fred J. Hickernell

Abstract: Targeting at sparse learning, we construct Banach spaces B of functions on an input space X with the properties that (1) B possesses an l1 norm in the sense that it is isometrically isomorphic to the Banach space of integrable functions on X with respect to the counting measure; (2) point evaluations are continuous linear functionals on B and are representable through a bilinear form with a kernel… ▽ More Targeting at sparse learning, we construct Banach spaces B of functions on an input space X with the properties that (1) B possesses an l1 norm in the sense that it is isometrically isomorphic to the Banach space of integrable functions on X with respect to the counting measure; (2) point evaluations are continuous linear functionals on B and are representable through a bilinear form with a kernel function; (3) regularized learning schemes on B satisfy the linear representer theorem. Examples of kernel functions admissible for the construction of such spaces are given. △ Less

Submitted 28 March, 2012; v1 submitted 23 January, 2011; originally announced January 2011.

Comments: 28 pages, an extra section was added

Journal ref: Appl. Comput. Harmon. Anal., 34:96-116, 2013

arXiv:0705.4588 [pdf, ps, other]

Variable Selection Incorporating Prior Constraint Information into Lasso

Authors: Shurong Zheng, Guodong Song, Ning-Zhong Shi

Abstract: We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where the true parameters lie. It increases the efficiency to choose the true model correctly. The proposed procedure can be executed by many constrained quadratic… ▽ More We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where the true parameters lie. It increases the efficiency to choose the true model correctly. The proposed procedure can be executed by many constrained quadratic programming methods and the initial estimator can be found by least square or Monte Carlo method. The proposed procedure also enjoys good theoretical properties. Moreover, the proposed procedure is not only used for linear models but also can be used for generalized linear models({\sl GLM}), Cox models, quantile regression models and many others with the help of Wang and Leng (2007)'s LSA, which changes these models as the approximation of linear models. The idea of combining sample and prior constraint information can be also used for other modified lasso procedures. Some examples are used for illustration of the idea of incorporating prior constraint information in variable selection procedures. △ Less

Submitted 31 May, 2007; originally announced May 2007.

Comments: 15 pages

Showing 1–29 of 29 results for author: Song, G