Search | arXiv e-print repository

Self-Distilled Disentangled Learning for Counterfactual Prediction

Authors: Xinshu Li, Mingming Gong, Lina Yao

Abstract: The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenari… ▽ More The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenarios, especially within high-dimensional spaces. To circumvent this challenge, we propose the Self-Distilled Disentanglement framework, referred to as $SD^2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs for high-dimensional representations. Our comprehensive experiments, conducted on both synthetic and real-world datasets, confirms the effectiveness of our approach in facilitating counterfactual inference in the presence of both observed and unobserved confounders. △ Less

Submitted 14 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.09810 [pdf, other]

Trajectory-Based Individualized Treatment Rules

Authors: Lanqiu Yao, Thaddeus Tarpey

Abstract: A core component of precision medicine research involves optimizing individualized treatment rules (ITRs) based on patient characteristics. Many studies used to estimate ITRs are longitudinal in nature, collecting outcomes over time. Yet, to date, methods developed to estimate ITRs often ignore the longitudinal structure of the data. Information available from the longitudinal nature of the data c… ▽ More A core component of precision medicine research involves optimizing individualized treatment rules (ITRs) based on patient characteristics. Many studies used to estimate ITRs are longitudinal in nature, collecting outcomes over time. Yet, to date, methods developed to estimate ITRs often ignore the longitudinal structure of the data. Information available from the longitudinal nature of the data can be especially useful in mental health studies. Although treatment means might appear similar, understanding the trajectory of outcomes over time can reveal important differences between treatments and placebo effects. This longitudinal perspective is especially beneficial in mental health research, where subtle shifts in outcome patterns can hold significant implications. Despite numerous studies involving the collection of outcome data across various time points, most precision medicine methods used to develop ITRs overlook the information available from the longitudinal structure. The prevalence of missing data in such studies exacerbates the issue, as neglecting the longitudinal nature of the data can significantly impair the effectiveness of treatment rules. This paper develops a powerful longitudinal trajectory-based ITR construction method that incorporates baseline variables, via a single-index or biosignature, into the modeling of longitudinal outcomes. This trajectory-based ITR approach substantially minimizes the negative impact of missing data compared to more traditional ITR approaches. The approach is illustrated through simulation studies and a clinical trial for depression, contrasting it with more traditional ITRs that ignore longitudinal information. △ Less

Submitted 27 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2402.15301 [pdf, other]

Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models

Authors: Yuzhe Zhang, Yipeng Zhang, Yidong Gan, Lina Yao, Chen Wang

Abstract: Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests. They often suffer from data collection biases and limitations of individuals' knowledge. The advance of large language models (LLMs) provides opportunities to address these problems. We propose a novel method that leverages LLMs to deduce causal re… ▽ More Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests. They often suffer from data collection biases and limitations of individuals' knowledge. The advance of large language models (LLMs) provides opportunities to address these problems. We propose a novel method that leverages LLMs to deduce causal relationships in general causal graph recovery tasks. This method leverages knowledge compressed in LLMs and knowledge LLMs extracted from scientific publication database as well as experiment data about factors of interest to achieve this goal. Our method gives a prompting strategy to extract associational relationships among those factors and a mechanism to perform causality verification for these associations. Comparing to other LLM-based methods that directly instruct LLMs to do the highly complex causal reasoning, our method shows clear advantage on causal graph quality on benchmark datasets. More importantly, as causality among some factors may change as new research results emerge, our method show sensitivity to new evidence in the literature and can provide useful information for updating causal graphs accordingly. △ Less

Submitted 18 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.01827 [pdf, other]

Extracting Scalar Measures from Curves

Authors: Lanqiu Yao, Thaddeus Tarpey

Abstract: The ability to order outcomes is necessary to make comparisons which is complicated when there is no natural ordering on the space of outcomes, as in the case of functional outcomes. This paper examines methods for extracting a scalar summary from functional or longitudinal outcomes based on an average rate of change which can be used to compare curves. Common approaches used in practice use a cha… ▽ More The ability to order outcomes is necessary to make comparisons which is complicated when there is no natural ordering on the space of outcomes, as in the case of functional outcomes. This paper examines methods for extracting a scalar summary from functional or longitudinal outcomes based on an average rate of change which can be used to compare curves. Common approaches used in practice use a change score or an analysis of covariance (ANCOVA) to make comparisons. However, these standard approaches only use a fraction of the available data and are inefficient. We derive measures of performance of an averaged rate of change of a functional outcome and compare this measure to standard measures. Simulations and data from a depression clinical trial are used to illustrate results. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.14549 [pdf, other]

Privacy-preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

Authors: Leon Yao, Paul Yiming Li, Jiannan Lu

Abstract: In accordance with the principle of "data minimization", many internet companies are opting to record less data. However, this is often at odds with A/B testing efficacy. For experiments with units with multiple observations, one popular data minimizing technique is to aggregate data for each unit. However, exact quantile estimation requires the full observation-level data. In this paper, we devel… ▽ More In accordance with the principle of "data minimization", many internet companies are opting to record less data. However, this is often at odds with A/B testing efficacy. For experiments with units with multiple observations, one popular data minimizing technique is to aggregate data for each unit. However, exact quantile estimation requires the full observation-level data. In this paper, we develop a method for approximate Quantile Treatment Effect (QTE) analysis using histogram aggregation. In addition, we can also achieve formal privacy guarantees using differential privacy. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted to 2023 CODE conference as a parallel presentation

arXiv:2306.07652 [pdf]

Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes

Authors: Qi Wan, Ying Ling Yao, XingYu Lv, Li Hong Geng, Yue Wang, Enoch Appiah Adu-Gyamfi, Xue Jiao Wang, Yue Qian, Juan Yang, Ming Xing Chend, Zhao Hui Zhong, Yuan Li, Yu Bin Ding

Abstract: Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan… ▽ More Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan **xin Xinan Women & Children's Hospital. Based on whether they received vaccination before ovarian stimulation, the couples were divided into the vaccination group and the non-vaccination group. We compared the laboratory parameters and pregnancy outcomes between the two groups. Findings: After performing propensity score matching (PSM), the analysis demonstrated similar clinical pregnancy rates, biochemical pregnancy and ongoing pregnancy rates between vaccinated and unvaccinated women. No significant disparities were found in terms of embryo development and laboratory parameters among the groups. Moreover, male vaccination had no impact on patient performance or pregnancy outcomes in assisted reproductive technology treatments. Additionally, there were no significant differences observed in the effects of vaccination on embryo development and pregnancy outcomes among couples undergoing ART. Interpretation: The findings suggest that COVID-19 vaccination did not have a significant effect on patients undergoing IVF/ICSI with fresh embryo transfer. Therefore, it is recommended that couples should receive COVID-19 vaccination as scheduled to help mitigate the COVID-19 pandemic. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 26 pages, 4 figures and 5 tables

arXiv:2301.02225 [pdf, ps, other]

$l_{1-2}$ GLasso: $L_{1-2}$ Regularized Multi-task Graphical Lasso for Joint Estimation of eQTL Map** and Gene Network

Authors: Wei Miao, Lan Yao

Abstract: A critical problem in genetics is to discover how gene expression is regulated within cells. Two major tasks of regulatory association learning are : (i) identifying SNP-gene relationships, known as eQTL map**, and (ii) determining gene-gene relationships, known as gene network estimation. To share information between these two tasks, we focus on the unified model for joint estimation of eQTL ma… ▽ More A critical problem in genetics is to discover how gene expression is regulated within cells. Two major tasks of regulatory association learning are : (i) identifying SNP-gene relationships, known as eQTL map**, and (ii) determining gene-gene relationships, known as gene network estimation. To share information between these two tasks, we focus on the unified model for joint estimation of eQTL map** and gene network, and propose a $L_{1-2}$ regularized multi-task graphical lasso, named $L_{1-2}$ GLasso. Numerical experiments on artificial datasets demonstrate the competitive performance of $L_{1-2}$ GLasso on capturing the true sparse structure of eQTL map** and gene network. $L_{1-2}$ GLasso is further applied to real dataset of ADNI-1 and experimental results show that $L_{1 -2}$ GLasso can obtain sparser and more accurate solutions than other commonly-used methods. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2208.06748 [pdf, other]

Learning to Infer Counterfactuals: Meta-Learning for Estimating Multiple Imbalanced Treatment Effects

Authors: Guanglin Zhou, Lina Yao, Xiwei Xu, Chen Wang, Liming Zhu

Abstract: We regularly consider answering counterfactual questions in practice, such as "Would people with diabetes take a turn for the better had they choose another medication?". Observational studies are growing in significance in answering such questions due to their widespread accumulation and comparatively easier acquisition than Randomized Control Trials (RCTs). Recently, some works have introduced r… ▽ More We regularly consider answering counterfactual questions in practice, such as "Would people with diabetes take a turn for the better had they choose another medication?". Observational studies are growing in significance in answering such questions due to their widespread accumulation and comparatively easier acquisition than Randomized Control Trials (RCTs). Recently, some works have introduced representation learning and domain adaptation into counterfactual inference. However, most current works focus on the setting of binary treatments. None of them considers that different treatments' sample sizes are imbalanced, especially data examples in some treatment groups are relatively limited due to inherent user preference. In this paper, we design a new algorithmic framework for counterfactual inference, which brings an idea from Meta-learning for Estimating Individual Treatment Effects (MetaITE) to fill the above research gaps, especially considering multiple imbalanced treatments. Specifically, we regard data episodes among treatment groups in counterfactual inference as meta-learning tasks. We train a meta-learner from a set of source treatment groups with sufficient samples and update the model by gradient descent with limited samples in target treatment. Moreover, we introduce two complementary losses. One is the supervised loss on multiple source treatments. The other loss which aligns latent distributions among various treatment groups is proposed to reduce the discrepancy. We perform experiments on two real-world datasets to evaluate inference accuracy and generalization ability. Experimental results demonstrate that the model MetaITE matches/outperforms state-of-the-art methods. △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: 11 pages

arXiv:2206.04907 [pdf, other]

Efficient Heterogeneous Treatment Effect Estimation With Multiple Experiments and Multiple Outcomes

Authors: Leon Yao, Caroline Lo, Israel Nir, Sarah Tan, Ariel Evnine, Adam Lerer, Alex Peysakhovich

Abstract: Learning heterogeneous treatment effects (HTEs) is an important problem across many fields. Most existing methods consider the setting with a single treatment arm and a single outcome metric. However, in many real world domains, experiments are run consistently - for example, in internet companies, A/B tests are run every day to measure the impacts of potential changes across many different metric… ▽ More Learning heterogeneous treatment effects (HTEs) is an important problem across many fields. Most existing methods consider the setting with a single treatment arm and a single outcome metric. However, in many real world domains, experiments are run consistently - for example, in internet companies, A/B tests are run every day to measure the impacts of potential changes across many different metrics of interest. We show that even if an analyst cares only about the HTEs in one experiment for one metric, precision can be improved greatly by analyzing all of the data together to take advantage of cross-experiment and cross-outcome metric correlations. We formalize this idea in a tensor factorization framework and propose a simple and scalable model which we refer to as the low rank or LR-learner. Experiments in both synthetic and real data suggest that the LR-learner can be much more precise than independent HTE estimation. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2203.03978 [pdf, other]

Contrastive Conditional Neural Processes

Authors: Zesheng Ye, Lina Yao

Abstract: Conditional Neural Processes~(CNPs) bridge neural networks with probabilistic inference to approximate functions of Stochastic Processes under meta-learning settings. Given a batch of non-{\it i.i.d} function instantiations, CNPs are jointly optimized for in-instantiation observation prediction and cross-instantiation meta-representation adaptation within a generative reconstruction pipeline. Ther… ▽ More Conditional Neural Processes~(CNPs) bridge neural networks with probabilistic inference to approximate functions of Stochastic Processes under meta-learning settings. Given a batch of non-{\it i.i.d} function instantiations, CNPs are jointly optimized for in-instantiation observation prediction and cross-instantiation meta-representation adaptation within a generative reconstruction pipeline. There can be a challenge in tying together such two targets when the distribution of function observations scales to high-dimensional and noisy spaces. Instead, noise contrastive estimation might be able to provide more robust representations by learning distributional matching objectives to combat such inherent limitation of generative models. In light of this, we propose to equip CNPs by 1) aligning prediction with encoded ground-truth observation, and 2) decoupling meta-representation adaptation from generative reconstruction. Specifically, two auxiliary contrastive branches are set up hierarchically, namely in-instantiation temporal contrastive learning~({\tt TCL}) and cross-instantiation function contrastive learning~({\tt FCL}), to facilitate local predictive alignment and global function consistency, respectively. We empirically show that {\tt TCL} captures high-level abstraction of observations, whereas {\tt FCL} helps identify underlying functions, which in turn provides more efficient representations. Our model outperforms other CNPs variants when evaluating function distribution reconstruction and parameter identification across 1D, 2D and high-dimensional time-series. △ Less

Submitted 25 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: accepted to CVPR2022

arXiv:2203.03523 [pdf, other]

A Single Index Model for Longitudinal Outcomes to Optimize Individual Treatment Decision Rules

Authors: Lanqiu Yao, Thaddeus Tarpey

Abstract: A pressing challenge in medical research is to identify optimal treatments for individual patients. This is particularly challenging in mental health settings where mean responses are often similar across multiple treatments. For example, the mean longitudinal trajectories for patients treated with an active drug and placebo may be very similar but different treatments may exhibit distinctly diffe… ▽ More A pressing challenge in medical research is to identify optimal treatments for individual patients. This is particularly challenging in mental health settings where mean responses are often similar across multiple treatments. For example, the mean longitudinal trajectories for patients treated with an active drug and placebo may be very similar but different treatments may exhibit distinctly different individual trajectory shapes. Most precision medicine approaches using longitudinal data often ignore information from the longitudinal data structure. This paper investigates a powerful precision medicine approach by examining the impact of baseline covariates on longitudinal outcome trajectories to guide treatment decisions instead of traditional scalar outcome measures derived from longitudinal data, such as a change score. We introduce a method of estimating "biosignatures" defined as linear combinations of baseline characteristics (i.e., a single index) that optimally separate longitudinal trajectories among different treatment groups. The criterion used is to maximize the Kullback-Leibler Divergence between different treatment outcome distributions. The approach is illustrated via simulation studies and a depression clinical trial. The approach is also contrasted with more traditional methods and compares performance in the presence of missing data. △ Less

Submitted 7 March, 2022; originally announced March 2022.

arXiv:2105.00991 [pdf, ps, other]

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Authors: Yunwen Chen, Zuotao Liu, Daqi Ji, Yingwei Xin, Wenguang Wang, Lu Yao, Yi Zou

Abstract: This paper describes the solution of Shanda Innovations team to Task 1 of KDD-Cup 2012. A novel approach called Multifaceted Factorization Models is proposed to incorporate a great variety of features in social networks. Social relationships and actions between users are integrated as implicit feedbacks to improve the recommendation accuracy. Keywords, tags, profiles, time and some other features… ▽ More This paper describes the solution of Shanda Innovations team to Task 1 of KDD-Cup 2012. A novel approach called Multifaceted Factorization Models is proposed to incorporate a great variety of features in social networks. Social relationships and actions between users are integrated as implicit feedbacks to improve the recommendation accuracy. Keywords, tags, profiles, time and some other features are also utilized for modeling user interests. In addition, user behaviors are modeled from the durations of recommendation records. A context-aware ensemble framework is then applied to combine multiple predictors and produce final recommendation results. The proposed approach obtained 0.43959 (public score) / 0.41874 (private score) on the testing dataset, which achieved the 2nd place in the KDD-Cup competition. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: KDD 2012

arXiv:2011.12182 [pdf, other]

A New Algorithm for Convex Biclustering and Its Extension to the Compositional Data

Authors: Binhuan Wang, Lanqiu Yao, Jiyuan Hu, Huilin Li

Abstract: Biclustering is a powerful data mining technique that allows simultaneously clustering rows (observations) and columns (features) in a matrix-format data set, which can provide results in a checkerboard-like pattern for visualization and exploratory analysis in a wide array of domains. Multiple biclustering algorithms have been developed in the past two decades, among which the convex biclustering… ▽ More Biclustering is a powerful data mining technique that allows simultaneously clustering rows (observations) and columns (features) in a matrix-format data set, which can provide results in a checkerboard-like pattern for visualization and exploratory analysis in a wide array of domains. Multiple biclustering algorithms have been developed in the past two decades, among which the convex biclustering can guarantee a global optimum by formulating in as a convex optimization problem. On the other hand, the application of biclustering has not progressed in parallel with the algorithm techniques. For example, biclustering for increasingly popular microbiome research data is under-applied possibly due to its compositional constraints for each sample. In this manuscript, we propose a new convex biclustering algorithm, called the bi-ADMM, under general setups based on the ADMM algorithm, which is free of extra smoothing steps to visualize informative biclusters required by existing convex biclustering algorithms. Furthermore, we tailor it to the algorithm named biC-ADMM specifically to tackle compositional constraints confronted in microbiome data. The key step of our methods utilizes the Sylvester Equation to derive the ADMM algorithm, which is new to the clustering research. The effectiveness of the proposed methods is examined through a variety of numerical experiments and a microbiome data application. △ Less

Submitted 8 June, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

arXiv:2007.06831 [pdf, other]

doi 10.1145/3394486.3403054

Spectrum-Guided Adversarial Disparity Learning

Authors: Zhe Liu, Lina Yao, Lei Bai, Xianzhi Wang, Can Wang

Abstract: It has been a significant challenge to portray intraclass disparity precisely in the area of activity recognition, as it requires a robust representation of the correlation between subject-specific variation for each activity class. In this work, we propose a novel end-to-end knowledge directed adversarial learning framework, which portrays the class-conditioned intraclass disparity using two comp… ▽ More It has been a significant challenge to portray intraclass disparity precisely in the area of activity recognition, as it requires a robust representation of the correlation between subject-specific variation for each activity class. In this work, we propose a novel end-to-end knowledge directed adversarial learning framework, which portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity. Furthermore, the domain knowledge is incorporated in an unsupervised manner to guide the optimization and further boosts the performance. The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art. We further prove the effectiveness of automatic domain knowledge incorporation in performance enhancement. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.06758 [pdf, other]

Recommender Systems for the Internet of Things: A Survey

Authors: May Altulyan, Lina Yao, Xianzhi Wang, Chaoran Huang, Salil S Kanhere, Quan Z Sheng

Abstract: Recommendation represents a vital stage in develo** and promoting the benefits of the Internet of Things (IoT). Traditional recommender systems fail to exploit ever-growing, dynamic, and heterogeneous IoT data. This paper presents a comprehensive review of the state-of-the-art recommender systems, as well as related techniques and application in the vibrant field of IoT. We discuss several limit… ▽ More Recommendation represents a vital stage in develo** and promoting the benefits of the Internet of Things (IoT). Traditional recommender systems fail to exploit ever-growing, dynamic, and heterogeneous IoT data. This paper presents a comprehensive review of the state-of-the-art recommender systems, as well as related techniques and application in the vibrant field of IoT. We discuss several limitations of applying recommendation systems to IoT and propose a reference framework for comparing existing studies to guide future research and practices. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:2007.03183 [pdf, other]

MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation

Authors: Manqing Dong, Feng Yuan, Lina Yao, Xiwei Xu, Liming Zhu

Abstract: A common challenge for most current recommender systems is the cold-start problem. Due to the lack of user-item interactions, the fine-tuned recommender systems are unable to handle situations with new users or new items. Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e. predicting the user preference by only a few of past interacted items. The core… ▽ More A common challenge for most current recommender systems is the cold-start problem. Due to the lack of user-item interactions, the fine-tuned recommender systems are unable to handle situations with new users or new items. Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e. predicting the user preference by only a few of past interacted items. The core idea is learning a global sharing initialization parameter for all users and then learning the local parameters for each user separately. However, most meta-learning based recommendation approaches adopt model-agnostic meta-learning for parameter initialization, where the global sharing parameter may lead the model into local optima for some users. In this paper, we design two memory matrices that can store task-specific memories and feature-specific memories. Specifically, the feature-specific memories are used to guide the model with personalized parameter initialization, while the task-specific memories are used to guide the model fast predicting the user preference. And we adopt a meta-optimization approach for optimizing the proposed method. We test the model on two widely used recommendation datasets and consider four cold-start situations. The experimental results show the effectiveness of the proposed methods. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:2007.02842 [pdf, other]

Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting

Authors: Lei Bai, Lina Yao, Can Li, Xianzhi Wang, Can Wang

Abstract: Modeling complex spatial and temporal correlations in the correlated time series data is indispensable for understanding the traffic dynamics and predicting the future status of an evolving traffic system. Recent works focus on designing complicated graph neural network architectures to capture shared patterns with the help of pre-defined graphs. In this paper, we argue that learning node-specific… ▽ More Modeling complex spatial and temporal correlations in the correlated time series data is indispensable for understanding the traffic dynamics and predicting the future status of an evolving traffic system. Recent works focus on designing complicated graph neural network architectures to capture shared patterns with the help of pre-defined graphs. In this paper, we argue that learning node-specific patterns is essential for traffic forecasting while the pre-defined graph is avoidable. To this end, we propose two adaptive modules for enhancing Graph Convolutional Network (GCN) with new capabilities: 1) a Node Adaptive Parameter Learning (NAPL) module to capture node-specific patterns; 2) a Data Adaptive Graph Generation (DAGG) module to infer the inter-dependencies among different traffic series automatically. We further propose an Adaptive Graph Convolutional Recurrent Network (AGCRN) to capture fine-grained spatial and temporal correlations in traffic series automatically based on the two modules and recurrent networks. Our experiments on two real-world traffic datasets show AGCRN outperforms state-of-the-art by a significant margin without pre-defined graphs about spatial connections. △ Less

Submitted 21 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: NeurIPS 2020

arXiv:2007.00767 [pdf, other]

NP-PROV: Neural Processes with Position-Relevant-Only Variances

Authors: Xuesong Wang, Lina Yao, Xianzhi Wang, Fei** Nie

Abstract: Neural Processes (NPs) families encode distributions over functions to a latent representation, given context data, and decode posterior mean and variance at unknown locations. Since mean and variance are derived from the same latent space, they may fail on out-of-domain tasks where fluctuations in function values amplify the model uncertainty. We present a new member named Neural Processes with P… ▽ More Neural Processes (NPs) families encode distributions over functions to a latent representation, given context data, and decode posterior mean and variance at unknown locations. Since mean and variance are derived from the same latent space, they may fail on out-of-domain tasks where fluctuations in function values amplify the model uncertainty. We present a new member named Neural Processes with Position-Relevant-Only Variances (NP-PROV). NP-PROV hypothesizes that a target point close to a context point has small uncertainty, regardless of the function value at that position. The resulting approach derives mean and variance from a function-value-related space and a position-related-only latent space separately. Our evaluation on synthetic and real-world datasets reveals that NP-PROV can achieve state-of-the-art likelihood while retaining a bounded variance when drifts exist in the function value. △ Less

Submitted 15 June, 2020; originally announced July 2020.

Comments: 10 pages, 5 figures

arXiv:2005.05556 [pdf, other]

Agglomerative Neural Networks for Multi-view Clustering

Authors: Zhe Liu, Yun Li, Lina Yao, Xianzhi Wang, Fei** Nie

Abstract: Conventional multi-view clustering methods seek for a view consensus through minimizing the pairwise discrepancy between the consensus and subviews. However, the pairwise comparison cannot portray the inter-view relationship precisely if some of the subviews can be further agglomerated. To address the above challenge, we propose the agglomerative analysis to approximate the optimal consensus view,… ▽ More Conventional multi-view clustering methods seek for a view consensus through minimizing the pairwise discrepancy between the consensus and subviews. However, the pairwise comparison cannot portray the inter-view relationship precisely if some of the subviews can be further agglomerated. To address the above challenge, we propose the agglomerative analysis to approximate the optimal consensus view, thereby describing the subview relationship within a view structure. We present Agglomerative Neural Network (ANN) based on Constrained Laplacian Rank to cluster multi-view data directly while avoiding a dedicated postprocessing step (e.g., using K-means). We further extend ANN with learnable data space to handle data of complex scenarios. Our evaluations against several state-of-the-art multi-view clustering approaches on four popular datasets show the promising view-consensus analysis ability of ANN. We further demonstrate ANN's capability in analyzing complex view structures and extensibility in our case study and explain its robustness and effectiveness of data-driven modifications. △ Less

Submitted 12 May, 2020; originally announced May 2020.

arXiv:2004.13245 [pdf, other]

Deep Conversational Recommender Systems: A New Frontier for Goal-Oriented Dialogue Systems

Authors: Dai Hoang Tran, Quan Z. Sheng, Wei Emma Zhang, Salma Abdalla Hamad, Munazza Zaib, Nguyen H. Tran, Lina Yao, Nguyen Lu Dang Khoa

Abstract: In recent years, the emerging topics of recommender systems that take advantage of natural language processing techniques have attracted much attention, and one of their applications is the Conversational Recommender System (CRS). Unlike traditional recommender systems with content-based and collaborative filtering approaches, CRS learns and models user's preferences through interactive dialogue c… ▽ More In recent years, the emerging topics of recommender systems that take advantage of natural language processing techniques have attracted much attention, and one of their applications is the Conversational Recommender System (CRS). Unlike traditional recommender systems with content-based and collaborative filtering approaches, CRS learns and models user's preferences through interactive dialogue conversations. In this work, we provide a summarization of the recent evolution of CRS, where deep learning approaches are applied to CRS and have produced fruitful results. We first analyze the research problems and present key challenges in the development of Deep Conversational Recommender Systems (DCRS), then present the current state of the field taken from the most recent researches, including the most common deep learning models that benefit DCRS. Finally, we discuss future directions for this vibrant area. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: 7 pages, 3 figures, 1 table

arXiv:2004.08581 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207111

Are You A Risk Taker? Adversarial Learning of Asymmetric Cross-Domain Alignment for Risk Tolerance Prediction

Authors: Zhe Liu, Lina Yao, Xianzhi Wang, Lei Bai, Jake An

Abstract: Most current studies on survey analysis and risk tolerance modelling lack professional knowledge and domain-specific models. Given the effectiveness of generative adversarial learning in cross-domain information, we design an Asymmetric cross-Domain Generative Adversarial Network (ADGAN) for domain scale inequality. ADGAN utilizes the information-sufficient domain to provide extra information to i… ▽ More Most current studies on survey analysis and risk tolerance modelling lack professional knowledge and domain-specific models. Given the effectiveness of generative adversarial learning in cross-domain information, we design an Asymmetric cross-Domain Generative Adversarial Network (ADGAN) for domain scale inequality. ADGAN utilizes the information-sufficient domain to provide extra information to improve the representation learning on the information-insufficient domain via domain alignment. We provide data analysis and user model on two data sources: Consumer Consumption Information and Survey Information. We further test ADGAN on a real-world dataset with view embedding structures and show ADGAN can better deal with the class imbalance and unqualified data space than state-of-the-art, demonstrating the effectiveness of leveraging asymmetrical domain information. △ Less

Submitted 18 April, 2020; originally announced April 2020.

arXiv:2004.08068 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207010

Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation

Authors: Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wei Liu, Wenjie Zhang

Abstract: Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy. Reinforcement learning is inherently advantageous for co** with dynamic environments and thus has attracted increasing attention in interactive recommendation research. Inspired by knowledge-aware recommendation, we proposed Knowledge-Guided deep Reinforcement learni… ▽ More Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy. Reinforcement learning is inherently advantageous for co** with dynamic environments and thus has attracted increasing attention in interactive recommendation research. Inspired by knowledge-aware recommendation, we proposed Knowledge-Guided deep Reinforcement learning (KGRL) to harness the advantages of both reinforcement learning and knowledge graphs for interactive recommendation. This model is implemented upon the actor-critic network framework. It maintains a local knowledge network to guide decision-making and employs the attention mechanism to capture long-term semantics between items. We have conducted comprehensive experiments in a simulated online environment with six public real-world datasets and demonstrated the superiority of our model over several state-of-the-art methods. △ Less

Submitted 17 April, 2020; originally announced April 2020.

arXiv:2002.02770 [pdf, other]

A Survey on Causal Inference

Authors: Liuyi Yao, Zhixuan Chu, Sheng Li, Yaliang Li, **g Gao, Aidong Zhang

Abstract: Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidl… ▽ More Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:1910.04689 [pdf, other]

Graph Spectral Embedding for Parsimonious Transmission of Multivariate Time Series

Authors: Lihan Yao, Paul Bendich

Abstract: We propose a graph spectral representation of time series data that 1) is parsimoniously encoded to user-demanded resolution; 2) is unsupervised and performant in data-constrained scenarios; 3) captures event and event-transition structure within the time series; and 4) has near-linear computational complexity in both signal length and ambient dimension. This representation, which we call Laplacia… ▽ More We propose a graph spectral representation of time series data that 1) is parsimoniously encoded to user-demanded resolution; 2) is unsupervised and performant in data-constrained scenarios; 3) captures event and event-transition structure within the time series; and 4) has near-linear computational complexity in both signal length and ambient dimension. This representation, which we call Laplacian Events Signal Segmentation (LESS), can be computed on time series of arbitrary dimension and originating from sensors of arbitrary type. Hence, time series originating from sensors of heterogeneous type can be compressed to levels demanded by constrained-communication environments, before being fused at a common center. Temporal dynamics of the data is summarized without explicit partitioning or probabilistic modeling. As a proof-of-principle, we apply this technique on high dimensional wavelet coefficients computed from the Free Spoken Digit Dataset to generate a memory efficient representation that is interpretable. Due to its unsupervised and non-parametric nature, LESS representations remain performant in the digit classification task despite the absence of labels and limited data. △ Less

Submitted 10 October, 2019; originally announced October 2019.

arXiv:1907.13359 [pdf, other]

Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning

Authors: Xiang Zhang, Xiaocong Chen, Lina Yao, Chang Ge, Manqing Dong

Abstract: Deep learning algorithms have achieved excellent performance lately in a wide range of fields (e.g., computer version). However, a severe challenge faced by deep learning is the high dependency on hyper-parameters. The algorithm results may fluctuate dramatically under the different configuration of hyper-parameters. Addressing the above issue, this paper presents an efficient Orthogonal Array Tun… ▽ More Deep learning algorithms have achieved excellent performance lately in a wide range of fields (e.g., computer version). However, a severe challenge faced by deep learning is the high dependency on hyper-parameters. The algorithm results may fluctuate dramatically under the different configuration of hyper-parameters. Addressing the above issue, this paper presents an efficient Orthogonal Array Tuning Method (OATM) for deep learning hyper-parameter tuning. We describe the OATM approach in five detailed steps and elaborate on it using two widely used deep neural network structures (Recurrent Neural Networks and Convolutional Neural Networks). The proposed method is compared to the state-of-the-art hyper-parameter tuning methods including manually (e.g., grid search and random search) and automatically (e.g., Bayesian Optimization) ones. The experiment results state that OATM can significantly save the tuning time compared to the state-of-the-art methods while preserving the satisfying performance. The codes are open in GitHub (https://github.com/xiangzhang1015/OATM) △ Less

Submitted 28 February, 2020; v1 submitted 31 July, 2019; originally announced July 2019.

Journal ref: Published on ICONIP 2019

arXiv:1905.10760 [pdf, other]

DARec: Deep Domain Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns

Authors: Feng Yuan, Lina Yao, Boualem Benatallah

Abstract: Cross-domain recommendation has long been one of the major topics in recommender systems. Recently, various deep models have been proposed to transfer the learned knowledge across domains, but most of them focus on extracting abstract transferable features from auxilliary contents, e.g., images and review texts, and the patterns in the rating matrix itself is rarely touched. In this work, inspired… ▽ More Cross-domain recommendation has long been one of the major topics in recommender systems. Recently, various deep models have been proposed to transfer the learned knowledge across domains, but most of them focus on extracting abstract transferable features from auxilliary contents, e.g., images and review texts, and the patterns in the rating matrix itself is rarely touched. In this work, inspired by the concept of domain adaptation, we proposed a deep domain adaptation model (DARec) that is capable of extracting and transferring patterns from rating matrices {\em only} without relying on any auxillary information. We empirically demonstrate on public datasets that our method achieves the best performance among several state-of-the-art alternative cross-domain recommendation models. △ Less

Submitted 26 May, 2019; originally announced May 2019.

arXiv:1905.10069 [pdf, other]

STG2Seq: Spatial-temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting

Authors: Lei Bai, Lina Yao, Salil. S Kanhere, Xianzhi Wang, Quan. Z Sheng

Abstract: Multi-step passenger demand forecasting is a crucial task in on-demand vehicle sharing services. However, predicting passenger demand over multiple time horizons is generally challenging due to the nonlinear and dynamic spatial-temporal dependencies. In this work, we propose to model multi-step citywide passenger demand prediction based on a graph and use a hierarchical graph convolutional structu… ▽ More Multi-step passenger demand forecasting is a crucial task in on-demand vehicle sharing services. However, predicting passenger demand over multiple time horizons is generally challenging due to the nonlinear and dynamic spatial-temporal dependencies. In this work, we propose to model multi-step citywide passenger demand prediction based on a graph and use a hierarchical graph convolutional structure to capture both spatial and temporal correlations simultaneously. Our model consists of three parts: 1) a long-term encoder to encode historical passenger demands; 2) a short-term encoder to derive the next-step prediction for generating multi-step prediction; 3) an attention-based output module to model the dynamic temporal and channel-wise information. Experiments on three real-world datasets show that our model consistently outperforms many baseline methods and state-of-the-art models. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: 7 pages

arXiv:1905.04042 [pdf, other]

Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph

Authors: Lu Liu, Tianyi Zhou, Guodong Long, **g Jiang, Lina Yao, Chengqi Zhang

Abstract: A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised informatio… ▽ More A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data. △ Less

Submitted 2 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: Accepted to IJCAI 2019, Code is publicly available at: https://github.com/liulu112601/PPN

arXiv:1905.02361 [pdf, other]

doi 10.1145/3292500.3330966

Adversarial Variational Embedding for Robust Semi-supervised Learning

Authors: Xiang Zhang, Lina Yao, Feng Yuan

Abstract: Semi-supervised learning is sought for leveraging the unlabelled data when labelled data is difficult or expensive to acquire. Deep generative models (e.g., Variational Autoencoder (VAE)) and semisupervised Generative Adversarial Networks (GANs) have recently shown promising performance in semi-supervised classification for the excellent discriminative representing ability. However, the latent cod… ▽ More Semi-supervised learning is sought for leveraging the unlabelled data when labelled data is difficult or expensive to acquire. Deep generative models (e.g., Variational Autoencoder (VAE)) and semisupervised Generative Adversarial Networks (GANs) have recently shown promising performance in semi-supervised classification for the excellent discriminative representing ability. However, the latent code learned by the traditional VAE is not exclusive (repeatable) for a specific input sample, which prevents it from excellent classification performance. In particular, the learned latent representation depends on a non-exclusive component which is stochastically sampled from the prior distribution. Moreover, the semi-supervised GAN models generate data from pre-defined distribution (e.g., Gaussian noises) which is independent of the input data distribution and may obstruct the convergence and is difficult to control the distribution of the generated data. To address the aforementioned issues, we propose a novel Adversarial Variational Embedding (AVAE) framework for robust and effective semi-supervised learning to leverage both the advantage of GAN as a high quality generative model and VAE as a posterior distribution learner. The proposed approach first produces an exclusive latent code by the model which we call VAE++, and meanwhile, provides a meaningful prior distribution for the generator of GAN. The proposed approach is evaluated over four different real-world applications and we show that our method outperforms the state-of-the-art models, which confirms that the combination of VAE++ and GAN can provide significant improvements in semisupervised classification. △ Less

Submitted 7 May, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

Comments: 9 pages, Accepted by Research Track in KDD 2019

arXiv:1904.10281 [pdf, other]

Quaternion Knowledge Graph Embeddings

Authors: Shuai Zhang, Yi Tay, Lina Yao, Qi Liu

Abstract: In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings. More specifically, quaternion embeddings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modelled as rotations in the quaternion space.… ▽ More In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings. More specifically, quaternion embeddings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modelled as rotations in the quaternion space. The advantages of the proposed approach are: (1) Latent inter-dependencies (between all components) are aptly captured with Hamilton product, encouraging a more compact interaction between entities and relations; (2) Quaternions enable expressive rotation in four-dimensional space and have more degree of freedom than rotation in complex plane; (3) The proposed framework is a generalization of ComplEx on hypercomplex space while offering better geometrical interpretations, concurrently satisfying the key desiderata of relational representation learning (i.e., modeling symmetry, anti-symmetry and inversion). Experimental results demonstrate that our method achieves state-of-the-art performance on four well-established knowledge graph completion benchmarks. △ Less

Submitted 31 October, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

Comments: Accepted by NeurIPS 2019

arXiv:1904.01638 [pdf, other]

A Strong Baseline for Domain Adaptation and Generalization in Medical Imaging

Authors: Li Yao, Jordan Prosky, Ben Covington, Kevin Lyman

Abstract: This work provides a strong baseline for the problem of multi-source multi-target domain adaptation and generalization in medical imaging. Using a diverse collection of ten chest X-ray datasets, we empirically demonstrate the benefits of training medical imaging deep learning models on varied patient populations for generalization to out-of-sample domains. This work provides a strong baseline for the problem of multi-source multi-target domain adaptation and generalization in medical imaging. Using a diverse collection of ten chest X-ray datasets, we empirically demonstrate the benefits of training medical imaging deep learning models on varied patient populations for generalization to out-of-sample domains. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: Extended abstract of a journal submission

arXiv:1904.00326 [pdf, other]

doi 10.1016/j.jbi.2022.104000

MedGCN: Medication recommendation and lab test imputation via graph convolutional networks

Authors: Chengsheng Mao, Liang Yao, Yuan Luo

Abstract: Laboratory testing and medication prescription are two of the most important routines in daily clinical practice. Develo** an artificial intelligence system that can automatically make lab test imputations and medication recommendations can save costs on potentially redundant lab tests and inform physicians of a more effective prescription. We present an intelligent medical system (named MedGCN)… ▽ More Laboratory testing and medication prescription are two of the most important routines in daily clinical practice. Develo** an artificial intelligence system that can automatically make lab test imputations and medication recommendations can save costs on potentially redundant lab tests and inform physicians of a more effective prescription. We present an intelligent medical system (named MedGCN) that can automatically recommend the patients' medications based on their incomplete lab tests, and can even accurately estimate the lab values that have not been taken. In our system, we integrate the complex relations between multiple types of medical entities with their inherent features in a heterogeneous graph. Then we model the graph to learn a distributed representation for each entity in the graph based on graph convolutional networks (GCN). By the propagation of graph convolutional networks, the entity representations can incorporate multiple types of medical information that can benefit multiple medical tasks. Moreover, we introduce a cross regularization strategy to reduce overfitting for multi-task training by the interaction between the multiple tasks. In this study, we construct a graph to associate 4 types of medical entities, i.e., patients, encounters, lab tests, and medications, and applied a graph neural network to learn node embeddings for medication recommendation and lab test imputation. we validate our MedGCN model on two real-world datasets: NMEDW and MIMIC-III. The experimental results on both datasets demonstrate that our model can outperform the state-of-the-art in both tasks. We believe that our innovative system can provide a promising and reliable way to assist physicians to make medication prescriptions and to save costs on potentially redundant lab tests. △ Less

Submitted 3 February, 2022; v1 submitted 30 March, 2019; originally announced April 2019.

Journal ref: ournal of Biomedical Informatics, Volume 127, 2022

arXiv:1811.02757 [pdf, other]

Early Prediction of Acute Kidney Injury in Critical Care Setting Using Clinical Notes

Authors: Yikuan Li, Liang Yao, Chengsheng Mao, Anand Srivastava, Xiaoqian Jiang, Yuan Luo

Abstract: Acute kidney injury (AKI) in critically ill patients is associated with significant morbidity and mortality. Development of novel methods to identify patients with AKI earlier will allow for testing of novel strategies to prevent or reduce the complications of AKI. We developed data-driven prediction models to estimate the risk of new AKI onset. We generated models from clinical notes within the f… ▽ More Acute kidney injury (AKI) in critically ill patients is associated with significant morbidity and mortality. Development of novel methods to identify patients with AKI earlier will allow for testing of novel strategies to prevent or reduce the complications of AKI. We developed data-driven prediction models to estimate the risk of new AKI onset. We generated models from clinical notes within the first 24 hours following intensive care unit (ICU) admission extracted from Medical Information Mart for Intensive Care III (MIMIC-III). From the clinical notes, we generated clinically meaningful word and concept representations and embeddings, respectively. Five supervised learning classifiers and knowledge-guided deep learning architecture were used to construct prediction models. The best configuration yielded a competitive AUC of 0.779. Our work suggests that natural language processing of clinical notes can be applied to assist clinicians in identifying the risk of incident AKI onset in critically ill patients upon admission to the ICU. △ Less

Submitted 9 November, 2018; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: 4 pages, 3 figures, accepted by BIBM 2018

arXiv:1809.08106 [pdf, other]

Distribution Networks for Open Set Learning

Authors: Chengsheng Mao, Liang Yao, Yuan Luo

Abstract: In open set learning, a model must be able to generalize to novel classes when it encounters a sample that does not belong to any of the classes it has seen before. Open set learning poses a realistic learning scenario that is receiving growing attention. Existing studies on open set learning mainly focused on detecting novel classes, but few studies tried to model them for differentiating novel c… ▽ More In open set learning, a model must be able to generalize to novel classes when it encounters a sample that does not belong to any of the classes it has seen before. Open set learning poses a realistic learning scenario that is receiving growing attention. Existing studies on open set learning mainly focused on detecting novel classes, but few studies tried to model them for differentiating novel classes. In this paper, we recognize that novel classes should be different from each other, and propose distribution networks for open set learning that can model different novel classes based on probability distributions. We hypothesize that, through a certain map**, samples from different classes with the same classification criterion should follow different probability distributions from the same distribution family. A deep neural network is learned to map the samples in the original feature space to a latent space where the distributions of known classes can be jointly learned with the network. We additionally propose a distribution parameter transfer and updating strategy for novel class modeling when a novel class is detected in the latent space. By novel class modeling, the detected novel classes can serve as known classes to the subsequent classification. Our experimental results on image datasets MNIST and CIFAR10 show that the distribution networks can detect novel classes accurately, and model them well for the subsequent classification tasks. △ Less

Submitted 23 November, 2018; v1 submitted 19 September, 2018; originally announced September 2018.

arXiv:1806.08079 [pdf, other]

GrCAN: Gradient Boost Convolutional Autoencoder with Neural Decision Forest

Authors: Manqing Dong, Lina Yao, Xianzhi Wang, Boualem Benatallah, Shuai Zhang

Abstract: Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, w… ▽ More Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods. △ Less

Submitted 24 June, 2018; v1 submitted 21 June, 2018; originally announced June 2018.

arXiv:1802.04407 [pdf, other]

Adversarially Regularized Graph Autoencoder for Graph Embedding

Authors: Shirui Pan, Ruiqi Hu, Guodong Long, **g Jiang, Lina Yao, Chengqi Zhang

Abstract: Graph embedding is an effective method to represent graph data in a low dimensional space for graph analytics. Most existing embedding algorithms typically focus on preserving the topological structure or minimizing the reconstruction errors of graph data, but they have mostly ignored the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world… ▽ More Graph embedding is an effective method to represent graph data in a low dimensional space for graph analytics. Most existing embedding algorithms typically focus on preserving the topological structure or minimizing the reconstruction errors of graph data, but they have mostly ignored the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world graph data. In this paper, we propose a novel adversarial graph embedding framework for graph data. The framework encodes the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure. Furthermore, the latent representation is enforced to match a prior distribution via an adversarial training scheme. To learn a robust embedding, two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA), are developed. Experimental studies on real-world graphs validate our design and demonstrate that our algorithms outperform baselines by a wide margin in link prediction, graph clustering, and graph visualization tasks. △ Less

Submitted 7 January, 2019; v1 submitted 12 February, 2018; originally announced February 2018.

arXiv:1511.04590 [pdf, other]

Oracle performance for visual captioning

Authors: Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

Abstract: The task of associating images and videos with a natural language description has attracted a great amount of attention recently. Rapid progress has been made in terms of both develo** novel algorithms and releasing new datasets. Indeed, the state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant i… ▽ More The task of associating images and videos with a natural language description has attracted a great amount of attention recently. Rapid progress has been made in terms of both develo** novel algorithms and releasing new datasets. Indeed, the state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant improvements. Instead of proposing new models, this work investigates the possibility of empirically establishing performance upper bounds on various visual captioning datasets without extra data labelling effort or human evaluation. In particular, it is assumed that visual captioning is decomposed into two steps: from visual inputs to visual concepts, and from visual concepts to natural language descriptions. One would be able to obtain an upper bound when assuming the first step is perfect and only requiring training a conditional language model for the second step. We demonstrate the construction of such bounds on MS-COCO, YouTube2Text and LSMDC (a combination of M-VAD and MPII-MD). Surprisingly, despite of the imperfect process we used for visual concept extraction in the first step and the simplicity of the language model for the second step, we show that current state-of-the-art models fall short when being compared with the learned upper bounds. Furthermore, with such a bound, we quantify several important factors concerning image and video captioning: the number of visual concepts captured by different models, the trade-off between the amount of visual elements captured and their accuracy, and the intrinsic difficulty and blessing of different datasets. △ Less

Submitted 14 September, 2016; v1 submitted 14 November, 2015; originally announced November 2015.

Comments: BMVC2016 (Oral paper)

arXiv:1502.08029 [pdf, other]

Describing Videos by Exploiting Temporal Structure

Authors: Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville

Abstract: Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully… ▽ More Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions. △ Less

Submitted 30 September, 2015; v1 submitted 27 February, 2015; originally announced February 2015.

Comments: Accepted to ICCV15. This version comes with code release and supplementary material

arXiv:1409.0585 [pdf, other]

On the Equivalence Between Deep NADE and Generative Stochastic Networks

Authors: Li Yao, Sherjil Ozair, Kyunghyun Cho, Yoshua Bengio

Abstract: Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for $P(\mathbf{x})$. This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE.… ▽ More Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for $P(\mathbf{x})$. This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes $P(\mathbf{x})$ with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels). △ Less

Submitted 1 September, 2014; originally announced September 2014.

Comments: ECML/PKDD 2014

arXiv:1406.1485 [pdf, other]

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

Authors: Tapani Raiko, Li Yao, Kyunghyun Cho, Yoshua Bengio

Abstract: Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step. The proposed model is an unsupervi… ▽ More Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step. The proposed model is an unsupervised building block for deep learning that combines the desirable properties of NADE and multi-predictive training: (1) Its test likelihood can be computed analytically, (2) it is easy to generate independent samples from it, and (3) it uses an inference engine that is a superset of variational inference for Boltzmann machines. The proposed NADE-k is competitive with the state-of-the-art in density estimation on the two datasets tested. △ Less

Submitted 5 December, 2014; v1 submitted 5 June, 2014; originally announced June 2014.

Comments: Accepted at Neural Information Processing Systems (NIPS) 2014

arXiv:1312.5578 [pdf, other]

Multimodal Transitions for Generative Stochastic Networks

Authors: Sherjil Ozair, Li Yao, Yoshua Bengio

Abstract: Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution. The result of training is therefore a machine that generates samples through… ▽ More Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution. The result of training is therefore a machine that generates samples through this Markov chain. However, the previously introduced GSN consistency theorems suggest that in order to capture a wide class of distributions, the transition operator in general should be multimodal, something that has not been done before this paper. We introduce for the first time multimodal transition distributions for GSNs, in particular using models in the NADE family (Neural Autoregressive Density Estimator) as output distributions of the transition operator. A NADE model is related to an RBM (and can thus model multimodal distributions) but its likelihood (and likelihood gradient) can be computed easily. The parameters of the NADE are obtained as a learned function of the previous state of the learned Markov chain. Experiments clearly illustrate the advantage of such multimodal transition distributions over unimodal GSNs. △ Less

Submitted 24 January, 2014; v1 submitted 19 December, 2013; originally announced December 2013.

Comments: 7 figures, 9 pages, submitted to ICLR14

arXiv:1301.4293 [pdf, ps, other]

Latent Relation Representations for Universal Schemas

Authors: Sebastian Riedel, Limin Yao, Andrew McCallum

Abstract: Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing structured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all involved schemas (surface form predicates as in Ope… ▽ More Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing structured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all involved schemas (surface form predicates as in OpenIE, and relations in the schemas of pre-existing databases). This schema has an almost unlimited set of relations (due to surface forms), and supports integration with existing structured data (through the relation types of existing databases). To populate a database of such schema we present a family of matrix factorization models that predict affinity between database tuples and relations. We show that this achieves substantially higher accuracy than the traditional classification approach. More importantly, by operating simultaneously on relations observed in text and in pre-existing structured DBs such as Freebase, we are able to reason about unstructured and structured data in mutually-supporting ways. By doing so our approach outperforms state-of-the-art distant supervision systems. △ Less

Submitted 28 January, 2013; v1 submitted 17 January, 2013; originally announced January 2013.

Comments: 4 pages, ICLR workshop

Showing 1–42 of 42 results for author: Yao, L