Search | arXiv e-print repository

Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation

Authors: Yueming Lyu, Kim Yong Tan, Yew Soon Ong, Ivor W. Tsang

Abstract: Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equatio… ▽ More Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a $O(\frac{d^2}{\sqrt{T}})$ convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores. △ Less

Submitted 8 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2304.14618 [pdf, other]

Recognizable Information Bottleneck

Authors: Yilin Lyu, Xin Liu, Mingyang Song, Xinyue Wang, Yaxin Peng, Tieyong Zeng, Li** **g

Abstract: Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generaliza… ▽ More Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generalization bound. However, it requires the computation of expensive second-order curvature, which hinders its practical application. In this paper, we establish the connection between the recognizability of representations and the recent functional conditional mutual information (f-CMI) generalization bound, which is significantly easier to estimate. On this basis we propose a Recognizable Information Bottleneck (RIB) which regularizes the recognizability of representations through a recognizability critic optimized by density ratio matching under the Bregman divergence. Extensive experiments on several commonly used datasets demonstrate the effectiveness of the proposed method in regularizing the model and estimating the generalization gap. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: 12 pages. To appear in IJCAI 2023

arXiv:2207.10796 [pdf, other]

Multiple Robust Learning for Recommendation

Authors: Haoxuan Li, Quanyu Dai, Yuru Li, Yan Lyu, Zhenhua Dong, Xiao-Hua Zhou, Peng Wu

Abstract: In recommender systems, a common problem is the presence of various biases in the collected data, which deteriorates the generalization ability of the recommendation models and leads to inaccurate predictions. Doubly robust (DR) learning has been studied in many tasks in RS, with the advantage that unbiased learning can be achieved when either a single imputation or a single propensity model is ac… ▽ More In recommender systems, a common problem is the presence of various biases in the collected data, which deteriorates the generalization ability of the recommendation models and leads to inaccurate predictions. Doubly robust (DR) learning has been studied in many tasks in RS, with the advantage that unbiased learning can be achieved when either a single imputation or a single propensity model is accurate. In this paper, we propose a multiple robust (MR) estimator that can take the advantage of multiple candidate imputation and propensity models to achieve unbiasedness. Specifically, the MR estimator is unbiased when any of the imputation or propensity models, or a linear combination of these models is accurate. Theoretical analysis shows that the proposed MR is an enhanced version of DR when only having a single imputation and propensity model, and has a smaller bias. Inspired by the generalization error bound of MR, we further propose a novel multiple robust learning approach with stabilization. We conduct extensive experiments on real-world and semi-synthetic datasets, which demonstrates the superiority of the proposed approach over state-of-the-art methods. △ Less

Submitted 19 December, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: Accepted by AAAI'23

arXiv:2203.10258 [pdf, other]

TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations

Authors: Haoxuan Li, Yan Lyu, Chunyuan Zheng, Peng Wu

Abstract: Bias is a common problem inherent in recommender systems, which is entangled with users' preferences and poses a great challenge to unbiased learning. For debiasing tasks, the doubly robust (DR) method and its variants show superior performance due to the double robustness property, that is, DR is unbiased when either imputed errors or learned propensities are accurate. However, our theoretical an… ▽ More Bias is a common problem inherent in recommender systems, which is entangled with users' preferences and poses a great challenge to unbiased learning. For debiasing tasks, the doubly robust (DR) method and its variants show superior performance due to the double robustness property, that is, DR is unbiased when either imputed errors or learned propensities are accurate. However, our theoretical analysis reveals that DR usually has a large variance. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which usually occur in practice. In this paper, we propose a principled approach that can effectively reduce bias and variance simultaneously for existing DR approaches when the error imputation model is misspecified. In addition, we further propose a novel semi-parametric collaborative learning approach that decomposes imputed errors into parametric and nonparametric parts and updates them collaboratively, resulting in more accurate predictions. Both theoretical analysis and experiments demonstrate the superiority of the proposed methods compared with existing debiasing methods. △ Less

Submitted 2 March, 2023; v1 submitted 19 March, 2022; originally announced March 2022.

arXiv:2110.11562 [pdf, other]

Temporal Point Process Graphical Models

Authors: Yalong Lyu, Huiyuan Wang, Wei Lin

Abstract: Many real-world objects can be modeled as a stream of events on the nodes of a graph. In this paper, we propose a class of graphical event models named temporal point process graphical models for representing the temporal dependencies among different components of a multivariate point process. In our model, the intensity of an event stream can depend on the historical events in a nonlinear way. We… ▽ More Many real-world objects can be modeled as a stream of events on the nodes of a graph. In this paper, we propose a class of graphical event models named temporal point process graphical models for representing the temporal dependencies among different components of a multivariate point process. In our model, the intensity of an event stream can depend on the historical events in a nonlinear way. We provide a procedure that allows us to estimate the parameters in the model with a convex loss function in the high-dimensional setting. For the approximation error introduced during the implementation, we also establish the error bound for our estimators. We demonstrate the performance of our method with extensive simulations and a spike train data set. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 21 pages,5 figures

MSC Class: Primary 62M08; secondary 62H08; 60G08

arXiv:2106.06097 [pdf, other]

Neural Optimization Kernel: Towards Robust Deep Learning

Authors: Yueming Lyu, Ivor Tsang

Abstract: Deep neural networks (NN) have achieved great success in many applications. However, why do deep neural networks obtain good generalization at an over-parameterization regime is still unclear. To better understand deep NN, we establish the connection between deep NN and a novel kernel family, i.e., Neural Optimization Kernel (NOK). The architecture of structured approximation of NOK performs monot… ▽ More Deep neural networks (NN) have achieved great success in many applications. However, why do deep neural networks obtain good generalization at an over-parameterization regime is still unclear. To better understand deep NN, we establish the connection between deep NN and a novel kernel family, i.e., Neural Optimization Kernel (NOK). The architecture of structured approximation of NOK performs monotonic descent updates of implicit regularization problems. We can implicitly choose the regularization problems by employing different activation functions, e.g., ReLU, max pooling, and soft-thresholding. We further establish a new generalization bound of our deep structured approximated NOK architecture. Our unsupervised structured approximated NOK block can serve as a simple plug-in of popular backbones for a good generalization against input noise. △ Less

Submitted 30 November, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Deep Learning, Kernel Methods, Deep Learning Theory, Kernel Approximation, Integral Approximation

arXiv:2011.06446 [pdf, other]

Subgroup-based Rank-1 Lattice Quasi-Monte Carlo

Authors: Yueming Lyu, Yuan Yuan, Ivor W. Tsang

Abstract: Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer sea… ▽ More Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate a near-optimal rank-1 lattice compared with the Korobov exhaustive search regarding the $l_1$-norm and $l_2$-norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on benchmark integration test problems and kernel approximation problems. △ Less

Submitted 28 October, 2020; originally announced November 2020.

Comments: NeurIPS 2020

arXiv:2006.15061 [pdf, other]

Intrinsic Reward Driven Imitation Learning via Generative Model

Authors: Xingrui Yu, Yueming Lyu, Ivor W. Tsang

Abstract: Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state… ▽ More Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment. Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator. Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration. Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration. △ Less

Submitted 11 September, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

arXiv:1910.04301 [pdf, other]

Black-box Optimizer with Implicit Natural Gradient

Authors: Yueming Lyu, Ivor W. Tsang

Abstract: Black-box optimization is primarily important for many compute-intensive applications, including reinforcement learning (RL), robot control, etc. This paper presents a novel theoretical framework for black-box optimization, in which our method performs stochastic update with the implicit natural gradient of an exponential-family distribution. Theoretically, we prove the convergence rate of our fra… ▽ More Black-box optimization is primarily important for many compute-intensive applications, including reinforcement learning (RL), robot control, etc. This paper presents a novel theoretical framework for black-box optimization, in which our method performs stochastic update with the implicit natural gradient of an exponential-family distribution. Theoretically, we prove the convergence rate of our framework with full matrix update for convex functions. Our theoretical results also hold for continuous non-differentiable black-box functions. Our methods are very simple and contain less hyper-parameters than CMA-ES \cite{hansen2006cma}. Empirically, our method with full matrix update achieves competitive performance compared with one of the state-of-the-art method CMA-ES on benchmark test problems. Moreover, our methods can achieve high optimization precision on some challenging test functions (e.g., $l_1$-norm ellipsoid test problem and Levy test problem), while methods with explicit natural gradient, i.e., IGO \cite{ollivier2017information} with full matrix update can not. This shows the efficiency of our methods. △ Less

Submitted 9 September, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Black-box Optimization

arXiv:1909.12383 [pdf, other]

Graph-Preserving Grid Layout: A Simple Graph Drawing Method for Graph Classification using CNNs

Authors: Yecheng Lyu, Xinming Huang, Ziming Zhang

Abstract: Graph convolutional networks (GCNs) suffer from the irregularity of graphs, while more widely-used convolutional neural networks (CNNs) benefit from regular grids. To bridge the gap between GCN and CNN, in contrast to previous works on generalizing the basic operations in CNNs to graph data, in this paper we address the problem of how to project undirected graphs onto the grid in a {\em principled… ▽ More Graph convolutional networks (GCNs) suffer from the irregularity of graphs, while more widely-used convolutional neural networks (CNNs) benefit from regular grids. To bridge the gap between GCN and CNN, in contrast to previous works on generalizing the basic operations in CNNs to graph data, in this paper we address the problem of how to project undirected graphs onto the grid in a {\em principled} way where CNNs can be used as backbone for geometric deep learning. To this end, inspired by the literature of graph drawing we propose a novel graph-preserving grid layout (GPGL), an integer programming that minimizes the topological loss on the grid. Technically we propose solving GPGL approximately using a {\em regularized} Kamada-Kawai algorithm, a well-known nonconvex optimization technique in graph drawing, with a vertex separation penalty that improves the rounding performance on top of the solutions from relaxation. Using GPGL we can easily conduct data augmentation as every local minimum will lead to a grid layout for the same graph. Together with the help of multi-scale maxout CNNs, we demonstrate the empirical success of our method for graph classification. △ Less

Submitted 26 September, 2019; originally announced September 2019.

arXiv:1905.10045 [pdf, other]

Curriculum Loss: Robust Learning and Generalization against Label Corruption

Authors: Yueming Lyu, Ivor W. Tsang

Abstract: Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk~\citep{hu2016does}. Although the 0-1 loss has some robust proper… ▽ More Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk~\citep{hu2016does}. Although the 0-1 loss has some robust properties, it is difficult to optimize. To efficiently optimize the 0-1 loss while kee** its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for model training. As a result, our loss can be deemed as a novel perspective of curriculum sample selection strategy, which bridges a connection between curriculum learning and robust learning. Experimental results on benchmark datasets validate the robustness of the proposed loss. △ Less

Submitted 20 February, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

Comments: ICLR2020

arXiv:1905.10041 [pdf, other]

Efficient Batch Black-box Optimization with Deterministic Regret Bounds

Authors: Yueming Lyu, Yuan Yuan, Ivor W. Tsang

Abstract: In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm, which jointly maximizes the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings irrespective of the choice of kernel. Moreover, we an… ▽ More In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm, which jointly maximizes the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings irrespective of the choice of kernel. Moreover, we analyze the property of the adversarial regret that is required by a robust initialization for Bayesian Optimization (BO). We prove that the adversarial regret bounds decrease with the decrease of covering radius, which provides a criterion for generating a point set to minimize the bound. We then propose fast searching algorithms to generate a point set with a small covering radius for the robust initialization. Experimental results on both synthetic benchmark problems and real-world problems show the effectiveness of the proposed algorithms. △ Less

Submitted 27 March, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1904.13341 [pdf, other]

Learning Fair Representations via an Adversarial Framework

Authors: Rui Feng, Yang Yang, Yuehan Lyu, Chenhao Tan, Yizhou Sun, Chun** Wang

Abstract: Fairness has become a central issue for our research community as classification algorithms are adopted in societally critical domains such as recidivism prediction and loan approval. In this work, we consider the potential bias based on protected attributes (e.g., race and gender), and tackle this problem by learning latent representations of individuals that are statistically indistinguishable b… ▽ More Fairness has become a central issue for our research community as classification algorithms are adopted in societally critical domains such as recidivism prediction and loan approval. In this work, we consider the potential bias based on protected attributes (e.g., race and gender), and tackle this problem by learning latent representations of individuals that are statistically indistinguishable between protected groups while sufficiently preserving other information for classification. To do that, we develop a minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar. Our framework provides a theoretical guarantee with respect to statistical parity and individual fairness. Empirical results on four real-world datasets also show that the learned representation can effectively be used for classification tasks such as credit risk prediction while obstructing information related to protected groups, especially when removing protected attributes is not sufficient for fair classification. △ Less

Submitted 30 April, 2019; originally announced April 2019.

Showing 1–13 of 13 results for author: Lyu, Y