-
Rethinking Causal Relationships Learning in Graph Neural Networks
Authors:
Hang Gao,
Chengyu Yao,
Jiangmeng Li,
Lingyu Si,
Yifan **,
Fengge Wu,
Changwen Zheng,
Hua** Liu
Abstract:
Graph Neural Networks (GNNs) demonstrate their significance by effectively modeling complex interrelationships within graph-structured data. To enhance the credibility and robustness of GNNs, it becomes exceptionally crucial to bolster their ability to capture causal relationships. However, despite recent advancements that have indeed strengthened GNNs from a causal learning perspective, conductin…
▽ More
Graph Neural Networks (GNNs) demonstrate their significance by effectively modeling complex interrelationships within graph-structured data. To enhance the credibility and robustness of GNNs, it becomes exceptionally crucial to bolster their ability to capture causal relationships. However, despite recent advancements that have indeed strengthened GNNs from a causal learning perspective, conducting an in-depth analysis specifically targeting the causal modeling prowess of GNNs remains an unresolved issue. In order to comprehensively analyze various GNN models from a causal learning perspective, we constructed an artificially synthesized dataset with known and controllable causal relationships between data and labels. The rationality of the generated data is further ensured through theoretical foundations. Drawing insights from analyses conducted using our dataset, we introduce a lightweight and highly adaptable GNN module designed to strengthen GNNs' causal learning capabilities across a diverse range of tasks. Through a series of experiments conducted on both synthetic datasets and other real-world datasets, we empirically validate the effectiveness of the proposed module.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression
Authors:
**g Xu,
Jiaye Teng,
Yang Yuan,
Andrew Chi-Chih Yao
Abstract:
One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demons…
▽ More
One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demonstrate that the generalization behavior of overparameterized model should be analyzed in a both data-relevant and algorithm-relevant manner. To make a formal characterization, We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory, instead of traditional last-iterate analysis. We validate our claim by studying the setting of solving overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting. Our theoretical results demonstrate that if we take early stop** iterates into consideration, generalization can hold with significantly weaker restrictions on the problem instance than the previous last-iterate analysis.
△ Less
Submitted 21 November, 2023; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Automated Discovery of Adaptive Attacks on Adversarial Defenses
Authors:
Chengyuan Yao,
Pavol Bielik,
Petar Tsankov,
Martin Vechev
Abstract:
Reliable evaluation of adversarial defenses is a challenging task, currently limited to an expert who manually crafts attacks that exploit the defense's inner workings or approaches based on an ensemble of fixed attacks, none of which may be effective for the specific defense at hand. Our key observation is that adaptive attacks are composed of reusable building blocks that can be formalized in a…
▽ More
Reliable evaluation of adversarial defenses is a challenging task, currently limited to an expert who manually crafts attacks that exploit the defense's inner workings or approaches based on an ensemble of fixed attacks, none of which may be effective for the specific defense at hand. Our key observation is that adaptive attacks are composed of reusable building blocks that can be formalized in a search space and used to automatically discover attacks for unknown defenses. We evaluated our approach on 24 adversarial defenses and show that it outperforms AutoAttack, the current state-of-the-art tool for reliable evaluation of adversarial defenses: our tool discovered significantly stronger attacks by producing 3.0\%-50.8\% additional adversarial examples for 10 models, while obtaining attacks with slightly stronger or similar strength for the remaining models.
△ Less
Submitted 27 October, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Deep Learning for Post-Processing Ensemble Weather Forecasts
Authors:
Peter Grönquist,
Chengyuan Yao,
Tal Ben-Nun,
Nikoli Dryden,
Peter Dueben,
Shigang Li,
Torsten Hoefler
Abstract:
Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many perturbed numerical weather simulations, or trajectories, run in parallel. These systems are associated with a high computational cost and often involve statistical post-processing steps to inexpensively i…
▽ More
Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many perturbed numerical weather simulations, or trajectories, run in parallel. These systems are associated with a high computational cost and often involve statistical post-processing steps to inexpensively improve their raw prediction qualities. We propose a mixed model that uses only a subset of the original weather trajectories combined with a post-processing step using deep neural networks. These enable the model to account for non-linear relationships that are not captured by current numerical models or post-processing methods. Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 14%. Furthermore, we demonstrate that the improvement is larger for extreme weather events on select case studies. We also show that our post-processing can use fewer trajectories to achieve comparable results to the full ensemble. By using fewer trajectories, the computational costs of an ensemble prediction system can be reduced, allowing it to run at higher resolution and produce more accurate forecasts.
△ Less
Submitted 21 September, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Adaptive-Step Graph Meta-Learner for Few-Shot Graph Classification
Authors:
Ning Ma,
Jiajun Bu,
Jieyu Yang,
Zhen Zhang,
Chengwei Yao,
Zhi Yu,
Sheng Zhou,
Xifeng Yan
Abstract:
Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protei…
▽ More
Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protein graph labels usually needs laborious experiments. Recently, few-shot learning has been explored to alleviate this problem with only given a few labeled graph samples of test classes. The shared sub-structures between training classes and test classes are essential in few-shot graph classification. Exiting methods assume that the test classes belong to the same set of super-classes clustered from training classes. However, according to our observations, the label spaces of training classes and test classes usually do not overlap in real-world scenario. As a result, the existing methods don't well capture the local structures of unseen test classes. To overcome the limitation, in this paper, we propose a direct method to capture the sub-structures with well initialized meta-learner within a few adaptation steps. More specifically, (1) we propose a novel framework consisting of a graph meta-learner, which uses GNNs based modules for fast adaptation on graph data, and a step controller for the robustness and generalization of meta-learner; (2) we provide quantitative analysis for the framework and give a graph-dependent upper bound of the generalization error based on our framework; (3) the extensive experiments on real-world datasets demonstrate that our framework gets state-of-the-art results on several few-shot graph classification tasks compared to baselines.
△ Less
Submitted 23 June, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Hierarchical Graph Pooling with Structure Learning
Authors:
Zhen Zhang,
Jiajun Bu,
Martin Ester,
Jianfeng Zhang,
Chengwei Yao,
Zhi Yu,
Can Wang
Abstract:
Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representat…
▽ More
Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.
△ Less
Submitted 25 December, 2019; v1 submitted 14 November, 2019;
originally announced November 2019.
-
On the penalized maximum likelihood estimation of high-dimensional approximate factor model
Authors:
Shaoxin Wang,
Hu Yang,
Chaoli Yao
Abstract:
In this paper, we mainly focus on the penalized maximum likelihood estimation (MLE) of the high-dimensional approximate factor model. Since the current estimation procedure can not guarantee the positive definiteness of the error covariance matrix, by reformulating the estimation of error covariance matrix and based on the lagrangian duality, we propose an accelerated proximal gradient (APG) algor…
▽ More
In this paper, we mainly focus on the penalized maximum likelihood estimation (MLE) of the high-dimensional approximate factor model. Since the current estimation procedure can not guarantee the positive definiteness of the error covariance matrix, by reformulating the estimation of error covariance matrix and based on the lagrangian duality, we propose an accelerated proximal gradient (APG) algorithm to give a positive definite estimate of the error covariance matrix. Combined the APG algorithm with EM method, a new estimation procedure is proposed to estimate the high-dimensional approximate factor model. The new method not only gives positive definite estimate of error covariance matrix but also improves the efficiency of estimation for the high-dimensional approximate factor model. Although the proposed algorithm can not guarantee a global unique solution, it enjoys a desirable non-increasing property. The efficiency of the new algorithm on estimation and forecasting is also investigated via simulation and real data analysis.
△ Less
Submitted 17 January, 2019; v1 submitted 22 August, 2016;
originally announced August 2016.