-
On the Learnability of Out-of-distribution Detection
Authors:
Zhen Fang,
Yixuan Li,
Feng Liu,
Bo Han,
Jie Lu
Abstract:
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good general…
▽ More
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms, and corresponding learning theory is still an open problem. To study the generalization of OOD detection, this paper investigates the probably approximately correct (PAC) learning theory of OOD detection that fits the commonly used evaluation metrics in the literature. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we offer theoretical support for representative OOD detection works based on our OOD theory.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
Authors:
Xuefeng Du,
Zhen Fang,
Ilias Diakonikolas,
Yixuan Li
Abstract:
Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD…
▽ More
Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at https://github.com/deeplearning-wisc/sal.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Learned Image Compression with Dual-Branch Encoder and Conditional Information Coding
Authors:
Haisheng Fu,
Feng Liang,
Jie Liang,
Zhenman Fang,
Guohe Zhang,
**gning Han
Abstract:
Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two te…
▽ More
Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two techniques to balance the trade-off between complexity and performance. First, we introduce two branching coding networks to independently learn a low-resolution latent representation and a high-resolution latent representation of the input image, discriminatively representing the global and local information therein. Second, we utilize the high-resolution latent representation as conditional information for the low-resolution latent representation, furnishing it with global information, thus aiding in the reduction of redundancy between low-resolution information. We do not utilize any serial entropy models. Instead, we employ a parallel channel-wise auto-regressive entropy model for encoding and decoding low-resolution and high-resolution latent representations. Experiments demonstrate that our method is approximately twice as fast in both encoding and decoding compared to the parallelizable checkerboard context model, and it also achieves a 1.2% improvement in R-D performance compared to state-of-the-art learned image compression schemes. Our method also outperforms classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods in rate-distortion performance, as validated by both PSNR and MS-SSIM metrics on the Kodak dataset.
△ Less
Submitted 21 March, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Effective filtering approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization
Authors:
Zhou Fang,
Ankit Gupta,
Mustafa Khammash
Abstract:
Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemmi…
▽ More
Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemming from the dynamics of the particles generated to represent system parameters.
This paper provides a novel and effective approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization. Our method operates in two layers: the first layer estimates the system states using a bootstrap particle filter, and the second layer marginalizes out system parameters explicitly. This strategy circumvents the need to generate particles representing system parameters, thereby mitigating their associated problems of sample degeneracy and information loss. Moreover, our method employs a modularization approach when integrating out the parameters, which significantly reduces the computational complexity. All these designs ensure the superior performance of our method. Finally, a numerical example is presented to illustrate that our method outperforms existing approaches by a large margin.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness
Authors:
Shide Du,
Zihan Fang,
Shiyang Lan,
Yanchao Tan,
Manuel Günther,
Shi** Wang,
Wenzhong Guo
Abstract:
As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence…
▽ More
As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
△ Less
Submitted 18 October, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
A Signed Subgraph Encoding Approach via Linear Optimization for Link Sign Prediction
Authors:
Zhihong Fang,
Shaolin Tan,
Yaonan Wang
Abstract:
In this paper, we consider the problem of inferring the sign of a link based on limited sign data in signed networks. Regarding this link sign prediction problem, SDGNN (Signed Directed Graph Neural Networks) provides the best prediction performance currently to the best of our knowledge. In this paper, we propose a different link sign prediction architecture call SELO (Subgraph Encoding via Linea…
▽ More
In this paper, we consider the problem of inferring the sign of a link based on limited sign data in signed networks. Regarding this link sign prediction problem, SDGNN (Signed Directed Graph Neural Networks) provides the best prediction performance currently to the best of our knowledge. In this paper, we propose a different link sign prediction architecture call SELO (Subgraph Encoding via Linear Optimization), which obtains overall leading prediction performances compared the state-of-the-art algorithm SDGNN. The proposed model utilizes a subgraph encoding approach to learn edge embeddings for signed directed networks. In particular, a signed subgraph encoding approach is introduced to embed each subgraph into a likelihood matrix instead of the adjacency matrix through a linear optimization method. Comprehensive experiments are conducted on six real-world signed networks with AUC, F1, micro-F1, and Macro-F1 as the evaluation metrics. The experiment results show that the proposed SELO model outperforms existing baseline feature-based methods and embedding-based methods on all the six real-world networks and in all the four evaluation metrics.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Investigating the spatial heterogeneity of factors influencing speeding-related crash severities using correlated random parameter order models with heterogeneity-in-means
Authors:
Renteng Yuan,
Qiaojun Xiang,
Zhiheng Fang,
Xin Gu
Abstract:
Speeding has been acknowledged as a critical determinant in increasing the risk of crashes and their resulting injury severities. This paper demonstrates that severe speeding-related crashes within the state of Pennsylvania have a spatial clustering trend, where four crash datasets are extracted from four hotspot districts. Two log-likelihood ratio (LR) tests were conducted to determine whether sp…
▽ More
Speeding has been acknowledged as a critical determinant in increasing the risk of crashes and their resulting injury severities. This paper demonstrates that severe speeding-related crashes within the state of Pennsylvania have a spatial clustering trend, where four crash datasets are extracted from four hotspot districts. Two log-likelihood ratio (LR) tests were conducted to determine whether speeding-related crashes classified by hotspot districts should be modeled separately. The results suggest that separate modeling is necessary. To capture the unobserved heterogeneity, four correlated random parameter order models with heterogeneity in means are employed to explore the factors contributing to crash severity involving at least one vehicle speeding. Overall, the findings exhibit that some indicators are observed to be spatial instability, including hit pedestrian crashes, head-on crashes, speed limits, work zones, light conditions (dark), rural areas, older drivers, running stop signs, and running red lights. Moreover, drunk driving, exceeding the speed limit, and being unbelted present relative spatial stability in four district models. This paper provides insights into preventing speeding-related crashes and potentially facilitating the development of corresponding crash injury mitigation policies.
△ Less
Submitted 5 July, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Longitudinal Canonical Correlation Analysis
Authors:
Seonjoo Lee,
Jongwoo Choi,
Zhiqian Fang,
F. DuBois Bowman
Abstract:
This paper considers canonical correlation analysis for two longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis…
▽ More
This paper considers canonical correlation analysis for two longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis effectively recovers underlying correlation patterns between two high-dimensional longitudinal data sets. We applied the proposed LCCA to data from the Alzheimer's Disease Neuroimaging Initiative and identified the longitudinal profiles of morphological brain changes and amyloid cumulation.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Is Out-of-Distribution Detection Learnable?
Authors:
Zhen Fang,
Yixuan Li,
Jie Lu,
Jiahua Dong,
Bo Han,
Feng Liu
Abstract:
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good general…
▽ More
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.
△ Less
Submitted 23 February, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
On the Representation of Causal Background Knowledge and its Applications in Causal Inference
Authors:
Zhuangyan Fang,
Ruiqi Zhao,
Yue Liu,
Yangbo He
Abstract:
Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical c…
▽ More
Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and give a minimal representation of a causal MPDAG. Then, we introduce a novel representation called direct causal clause (DCC) to represent all types of causal background knowledge in a unified form. Using DCCs, we study the consistency and equivalency of causal background knowledge and show that any causal background knowledge set can be equivalently decomposed into a causal MPDAG plus a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking the consistency, equivalency, and finding the decomposed MPDAG and residual DCCs. Finally, with causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that causal background knowledge can significantly improve the identifiability of causal effects.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
Length L-function for Network-Constrained Point Data
Authors:
Zidong Fang,
Ci Song,
Hua Shu,
Jie Chen,
Tianyu Liu,
Xi Wang,
Xiao Chen,
Tao Pei
Abstract:
Network constrained points are referred to as points restricted to road networks, such as taxi pick up and drop off locations. A significant pattern of network constrained points is referred to as an aggregation; e.g., the aggregation of pick up points may indicate a high taxi demand in a particular area. Although the network K function using the shortest path network distance has been proposed to…
▽ More
Network constrained points are referred to as points restricted to road networks, such as taxi pick up and drop off locations. A significant pattern of network constrained points is referred to as an aggregation; e.g., the aggregation of pick up points may indicate a high taxi demand in a particular area. Although the network K function using the shortest path network distance has been proposed to detect point aggregation, its statistical unit is still radius based. R neighborhood, in particular, has inconsistent network length owing to the complex configuration of road networks which cause unfair counts and identification errors in networks (e.g., the length of the r neighborhood located at an intersection is longer than that on straight roads, which may include more points). In this study, we derived the length L function for network constrained points to identify the aggregation by designing a novel neighborhood as the statistical unit; the total length of this is consistent throughout the network. Compared to the network K function, our method can detect a true to life aggregation scale, identify the aggregation with higher network density, as well as identify the aggregations that the network K function cannot. We validated our method using taxi trips pick up location data within Zhongguancun Area in Bei**g, analyzing differences in maximal aggregation between workdays and weekends to understand taxi demand in the morning and evening peak.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Identifying Aggregation Artery Architecture of constrained Origin-Destination flows using Manhattan L-function
Authors:
Zidong Fang,
Hua Shu,
Ci Song,
Jie Chen,
Tianyu Liu,
Xiaohan Liu,
Tao Pei
Abstract:
The movement of humans and goods in cities can be represented by constrained flow, which is defined as the movement of objects between origin and destination in road networks. Flow aggregation, namely origins and destinations aggregated simultaneously, is one of the most common patterns, say the aggregated origin-to-destination flows between two transport hubs may indicate the great traffic demand…
▽ More
The movement of humans and goods in cities can be represented by constrained flow, which is defined as the movement of objects between origin and destination in road networks. Flow aggregation, namely origins and destinations aggregated simultaneously, is one of the most common patterns, say the aggregated origin-to-destination flows between two transport hubs may indicate the great traffic demand between two sites. Develo** a clustering method for constrained flows is crucial for determining urban flow aggregation. Among existing methods about identifying flow aggregation, L-function of flows is the major one. Nevertheless, this method depends on the aggregation scale, the key parameter detected by Euclidean L-function, it does not adapt to road network. The extracted aggregation may be overestimated and dispersed. Therefore, we propose a clustering method based on L-function of Manhattan space, which consists of three major steps. The first is to detect aggregation scales by Manhattan L-function. The second is to determine core flows possessing highest local L-function values at different scales. The final step is to take the intersection of core flows neighbourhoods, the extent of which depends on corresponding scale. By setting the number of core flows, we could concentrate the aggregation and thus highlight Aggregation Artery Architecture (AAA), which depicts road sections that contain the projection of key flow cluster on the road networks. Experiment using taxi flows showed that AAA could clarify resident movement type of identified aggregated flows. Our method also helps selecting locations for distribution sites, thereby supporting accurate analysis of urban interactions.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Implementing an Improved Test of Matrix Rank in Stata
Authors:
Qihui Chen,
Zheng Fang,
Xun Huang
Abstract:
We develop a Stata command, bootranktest, for implementing the matrix rank test of Chen and Fang (2019) in linear instrumental variable regression models. Existing rank tests employ critical values that may be too small, and hence may not even be first order valid in the sense that they may fail to control the Type I error. By appealing to the bootstrap, they devise a test that overcomes the defic…
▽ More
We develop a Stata command, bootranktest, for implementing the matrix rank test of Chen and Fang (2019) in linear instrumental variable regression models. Existing rank tests employ critical values that may be too small, and hence may not even be first order valid in the sense that they may fail to control the Type I error. By appealing to the bootstrap, they devise a test that overcomes the deficiency of existing tests. The command bootranktest implements the two-step version of their test, and also the analytic version if chosen. The command also accommodates data with temporal and cluster dependence.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
A Unifying Framework for Testing Shape Restrictions
Authors:
Zheng Fang
Abstract:
This paper makes the following original contributions. First, we develop a unifying framework for testing shape restrictions based on the Wald principle. The test has asymptotic uniform size control and is uniformly consistent. Second, we examine the applicability and usefulness of some prominent shape enforcing operators in implementing our framework. In particular, in stark contrast to its use i…
▽ More
This paper makes the following original contributions. First, we develop a unifying framework for testing shape restrictions based on the Wald principle. The test has asymptotic uniform size control and is uniformly consistent. Second, we examine the applicability and usefulness of some prominent shape enforcing operators in implementing our framework. In particular, in stark contrast to its use in point and interval estimation, the rearrangement operator is inapplicable due to a lack of convexity. The greatest convex minorization and the least concave majorization are shown to enjoy the analytic properties required to employ our framework. Third, we show that, despite that the projection operator may not be well-defined/behaved in general parameter spaces such as those defined by uniform norms, one may nonetheless employ a powerful distance-based test by applying our framework. Monte Carlo simulations confirm that our test works well. We further showcase the empirical relevance by investigating the relationship between weekly working hours and the annual wage growth in the high-end labor market.
△ Less
Submitted 1 August, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
A Local Method for Identifying Causal Relations under Markov Equivalence
Authors:
Zhuangyan Fang,
Yue Liu,
Zhi Geng,
Shengyu Zhu,
Yangbo He
Abstract:
Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs e…
▽ More
Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs encoding different causal relations are Markov equivalent. In this paper, we first introduce a sufficient and necessary graphical condition to check the existence of a causal path from a variable to a target in every Markov equivalent DAG. Next, we provide local criteria for identifying whether a variable is a cause/non-cause of a target based only on the local structure instead of the entire graph. Finally, we propose a local learning algorithm for this causal query via learning the local structure of the variable and some additional statistical independence tests related to the target. Simulation studies show that our local algorithm is efficient and effective, compared with other state-of-art methods.
△ Less
Submitted 5 March, 2022; v1 submitted 25 February, 2021;
originally announced February 2021.
-
LCS Graph Kernel Based on Wasserstein Distance in Longest Common Subsequence Metric Space
Authors:
Jianming Huang,
Zhongxi Fang,
Hiroyuki Kasai
Abstract:
For graph learning tasks, many existing methods utilize a message-passing mechanism where vertex features are updated iteratively by aggregation of neighbor information. This strategy provides an efficient means for graph features extraction, but obtained features after many iterations might contain too much information from other vertices, and tend to be similar to each other. This makes their re…
▽ More
For graph learning tasks, many existing methods utilize a message-passing mechanism where vertex features are updated iteratively by aggregation of neighbor information. This strategy provides an efficient means for graph features extraction, but obtained features after many iterations might contain too much information from other vertices, and tend to be similar to each other. This makes their representations less expressive. Learning graphs using paths, on the other hand, can be less adversely affected by this problem because it does not involve all vertex neighbors. However, most of them can only compare paths with the same length, which might engender information loss. To resolve this difficulty, we propose a new Graph Kernel based on a Longest Common Subsequence (LCS) similarity. Moreover, we found that the widely-used R-convolution framework is unsuitable for path-based Graph Kernel because a huge number of comparisons between dissimilar paths might deteriorate graph distances calculation. Therefore, we propose a novel metric space by exploiting the proposed LCS-based similarity, and compute a new Wasserstein-based graph distance in this metric space, which emphasizes more the comparison between similar paths. Furthermore, to reduce the computational cost, we propose an adjacent point merging operation to sparsify point clouds in the metric space.
△ Less
Submitted 29 October, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Data Driven Reaction Mechanism Estimation via Transient Kinetics and Machine Learning
Authors:
M. Ross Kunz,
Adam Yonge,
Zongtang Fang,
Andrew J. Medford,
Denis Constales,
Gregory Yablonsky,
Rebecca Fushimi
Abstract:
Understanding the set of elementary steps and kinetics in each reaction is extremely valuable to make informed decisions about creating the next generation of catalytic materials. With physical and mechanistic complexity of industrial catalysts, it is critical to obtain kinetic information through experimental methods. As such, this work details a methodology based on the combination of transient…
▽ More
Understanding the set of elementary steps and kinetics in each reaction is extremely valuable to make informed decisions about creating the next generation of catalytic materials. With physical and mechanistic complexity of industrial catalysts, it is critical to obtain kinetic information through experimental methods. As such, this work details a methodology based on the combination of transient rate/concentration dependencies and machine learning to measure the number of active sites, the individual rate constants, and gain insight into the mechanism under a complex set of elementary steps. This new methodology was applied to simulated transient responses to verify its ability to obtain correct estimates of the micro-kinetic coefficients. Furthermore, experimental CO oxidation data was analyzed to reveal the Langmuir-Hinshelwood mechanism driving the reaction. As oxygen accumulated on the catalyst, a transition in the mechanism was clearly defined in the machine learning analysis due to the large amount of kinetic information available from transient reaction techniques. This methodology is proposed as a new data driven approach to characterize how materials control complex reaction mechanisms relying exclusively on experimental data.
△ Less
Submitted 21 April, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Stochastic filters based on hybrid approximations of multiscale stochastic reaction networks
Authors:
Zhou Fang,
Ankit Gupta,
Mustafa Khammash
Abstract:
We consider the problem of estimating the dynamic latent states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exp…
▽ More
We consider the problem of estimating the dynamic latent states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network, and it can greatly reduce the computational effort required to simulate the dynamics. This enables us to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using a numerical example.
△ Less
Submitted 8 September, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Learning from a Complementary-label Source Domain: Theory and Algorithms
Authors:
Yiyang Zhang,
Feng Liu,
Zhen Fang,
Bo Yuan,
Guangquan Zhang,
Jie Lu
Abstract:
In unsupervised domain adaptation (UDA), a classifier for the target domain is trained with massive true-label data from the source domain and unlabeled data from the target domain. However, collecting fully-true-label data in the source domain is high-cost and sometimes impossible. Compared to the true labels, a complementary label specifies a class that a pattern does not belong to, hence collec…
▽ More
In unsupervised domain adaptation (UDA), a classifier for the target domain is trained with massive true-label data from the source domain and unlabeled data from the target domain. However, collecting fully-true-label data in the source domain is high-cost and sometimes impossible. Compared to the true labels, a complementary label specifies a class that a pattern does not belong to, hence collecting complementary labels would be less laborious than collecting true labels. Thus, in this paper, we propose a novel setting that the source domain is composed of complementary-label data, and a theoretical bound for it is first proved. We consider two cases of this setting, one is that the source domain only contains complementary-label data (completely complementary unsupervised domain adaptation, CC-UDA), and the other is that the source domain has plenty of complementary-label data and a small amount of true-label data (partly complementary unsupervised domain adaptation, PC-UDA). To this end, a complementary label adversarial network} (CLARINET) is proposed to solve CC-UDA and PC-UDA problems. CLARINET maintains two deep networks simultaneously, where one focuses on classifying complementary-label source data and the other takes care of source-to-target distributional adaptation. Experiments show that CLARINET significantly outperforms a series of competent baselines on handwritten-digits-recognition and objects-recognition tasks.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation
Authors:
Yiyang Zhang,
Feng Liu,
Zhen Fang,
Bo Yuan,
Guangquan Zhang,
Jie Lu
Abstract:
In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain. However, it may be difficult to collect fully-true-label data in a source domain given a limited budget. To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to b…
▽ More
In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain. However, it may be difficult to collect fully-true-label data in a source domain given a limited budget. To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to be trained with complementary-label data from the source domain and unlabeled data from the target domain named budget-friendly UDA (BFUDA). The key benefit is that it is much less costly to collect complementary-label source data (required by BFUDA) than collecting the true-label source data (required by ordinary UDA). To this end, the complementary label adversarial network (CLARINET) is proposed to solve the BFUDA problem. CLARINET maintains two deep networks simultaneously, where one focuses on classifying complementary-label source data and the other takes care of the source-to-target distributional adaptation. Experiments show that CLARINET significantly outperforms a series of competent baselines.
△ Less
Submitted 4 March, 2021; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Theory of Deep Convolutional Neural Networks II: Spherical Analysis
Authors:
Zhiying Fang,
Han Feng,
Shuo Huang,
Ding-Xuan Zhou
Abstract:
Deep learning based on deep neural networks of various structures and architectures has been powerful in many practical applications, but it lacks enough theoretical verifications. In this paper, we consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $\mathbb{S}^{d-1}$ of $\mathbb{R}^d$. Our analysis presents rates of uniform approximation wh…
▽ More
Deep learning based on deep neural networks of various structures and architectures has been powerful in many practical applications, but it lacks enough theoretical verifications. In this paper, we consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $\mathbb{S}^{d-1}$ of $\mathbb{R}^d$. Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $W^r_\infty (\mathbb{S}^{d-1})$ with $r>0$ or takes an additive ridge form. Our work verifies theoretically the modelling and approximation ability of deep convolutional neural networks followed by downsampling and one fully connected layer or two. The key idea of our spherical analysis is to use the inner product form of the reproducing kernels of the spaces of spherical harmonics and then to apply convolutional factorizations of filters to realize the generated linear features.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Bridging the Theoretical Bound and Deep Algorithms for Open Set Domain Adaptation
Authors:
Li Zhong,
Zhen Fang,
Feng Liu,
Bo Yuan,
Guangquan Zhang,
Jie Lu
Abstract:
In the unsupervised open set domain adaptation (UOSDA), the target domain contains unknown classes that are not observed in the source domain. Researchers in this area aim to train a classifier to accurately: 1) recognize unknown target data (data with unknown classes) and, 2) classify other target data. To achieve this aim, a previous study has proven an upper bound of the target-domain risk, and…
▽ More
In the unsupervised open set domain adaptation (UOSDA), the target domain contains unknown classes that are not observed in the source domain. Researchers in this area aim to train a classifier to accurately: 1) recognize unknown target data (data with unknown classes) and, 2) classify other target data. To achieve this aim, a previous study has proven an upper bound of the target-domain risk, and the open set difference, as an important term in the upper bound, is used to measure the risk on unknown target data. By minimizing the upper bound, a shallow classifier can be trained to achieve the aim. However, if the classifier is very flexible (e.g., deep neural networks (DNNs)), the open set difference will converge to a negative value when minimizing the upper bound, which causes an issue where most target data are recognized as unknown data. To address this issue, we propose a new upper bound of target-domain risk for UOSDA, which includes four terms: source-domain risk, $ε$-open set difference ($Δ_ε$), a distributional discrepancy between domains, and a constant. Compared to the open set difference, $Δ_ε$ is more robust against the issue when it is being minimized, and thus we are able to use very flexible classifiers (i.e., DNNs). Then, we propose a new principle-guided deep UOSDA method that trains DNNs via minimizing the new upper bound. Specifically, source-domain risk and $Δ_ε$ are minimized by gradient descent, and the distributional discrepancy is minimized via a novel open-set conditional adversarial training strategy. Finally, compared to existing shallow and deep UOSDA methods, our method shows the state-of-the-art performance on several benchmark datasets, including digit recognition (MNIST, SVHN, USPS), object recognition (Office-31, Office-Home), and face recognition (PIE).
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
On Low Rank Directed Acyclic Graphs and Causal Structure Learning
Authors:
Zhuangyan Fang,
Shengyu Zhu,
Jiji Zhang,
Yue Liu,
Zhitang Chen,
Yangbo He
Abstract:
Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In this paper, we propose to exploit a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low rank t…
▽ More
Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In this paper, we propose to exploit a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low rank techniques to adapt causal structure learning methods to take advantage of this assumption and establish several useful results relating interpretable graphical conditions to the low rank assumption. Specifically, we show that the maximum rank is highly related to hubs, suggesting that scale-free networks, which are frequently encountered in practice, tend to be low rank. Our experiments demonstrate the utility of the low rank adaptations for a variety of data models, especially with relatively large and dense graphs. Moreover, with a validation procedure, the adaptations maintain a superior or comparable performance even when graphs are not restricted to be low rank.
△ Less
Submitted 15 May, 2023; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Towards Understanding the Spectral Bias of Deep Learning
Authors:
Yuan Cao,
Zhiying Fang,
Yue Wu,
Ding-Xuan Zhou,
Quanquan Gu
Abstract:
An intriguing phenomenon observed during training neural networks is the spectral bias, which states that neural networks are biased towards learning less complex functions. The priority of learning functions with low complexity might be at the core of explaining generalization ability of neural network, and certain efforts have been made to provide theoretical explanation for spectral bias. Howev…
▽ More
An intriguing phenomenon observed during training neural networks is the spectral bias, which states that neural networks are biased towards learning less complex functions. The priority of learning functions with low complexity might be at the core of explaining generalization ability of neural network, and certain efforts have been made to provide theoretical explanation for spectral bias. However, there is still no satisfying theoretical result justifying the underlying mechanism of spectral bias. In this paper, we give a comprehensive and rigorous explanation for spectral bias and relate it with the neural tangent kernel function proposed in recent work. We prove that the training process of neural networks can be decomposed along different directions defined by the eigenfunctions of the neural tangent kernel, where each direction has its own convergence rate and the rate is determined by the corresponding eigenvalue. We then provide a case study when the input data is uniformly distributed over the unit sphere, and show that lower degree spherical harmonics are easier to be learned by over-parameterized neural networks. Finally, we provide numerical experiments to demonstrate the correctness of our theory. Our experimental results also show that our theory can tolerate certain model misspecification in terms of the input data distribution.
△ Less
Submitted 5 October, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
A Graph Autoencoder Approach to Causal Structure Learning
Authors:
Ignavier Ng,
Shengyu Zhu,
Zhitang Chen,
Zhuangyan Fang
Abstract:
Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-t…
▽ More
Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-the-arts, we propose a new gradient-based method to learn causal structures from observational data. The proposed method generalizes the recent gradient-based methods to a graph autoencoder framework that allows nonlinear structural equation models and is easily applicable to vector-valued variables. We demonstrate that on synthetic datasets, our proposed method outperforms other gradient-based methods significantly, especially on large causal graphs. We further investigate the scalability and efficiency of our method, and observe a near linear training time when scaling up the graph size.
△ Less
Submitted 17 November, 2019;
originally announced November 2019.
-
Masked Gradient-Based Causal Structure Learning
Authors:
Ignavier Ng,
Shengyu Zhu,
Zhuangyan Fang,
Haoyang Li,
Zhitang Chen,
Jun Wang
Abstract:
This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize…
▽ More
This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize the reformulated SEM to develop a causal structure learning method that can be efficiently trained using gradient-based optimization, by leveraging a smooth characterization on acyclicity and the Gumbel-Softmax approach to approximate the binary adjacency matrix. It is found that the obtained entries are typically near zero or one and can be easily thresholded to identify the edges. We conduct experiments on synthetic and real datasets to validate the effectiveness of the proposed method, and show that it readily includes different smooth model functions and achieves a much improved performance on most datasets considered.
△ Less
Submitted 10 January, 2022; v1 submitted 18 October, 2019;
originally announced October 2019.
-
A Projection Framework for Testing Shape Restrictions That Form Convex Cones
Authors:
Zheng Fang,
Juwon Seo
Abstract:
This paper develops a uniformly valid and asymptotically nonconservative test based on projection for a class of shape restrictions. The key insight we exploit is that these restrictions form convex cones, a simple and yet elegant structure that has been barely harnessed in the literature. Based on a monotonicity property afforded by such a geometric structure, we construct a bootstrap procedure t…
▽ More
This paper develops a uniformly valid and asymptotically nonconservative test based on projection for a class of shape restrictions. The key insight we exploit is that these restrictions form convex cones, a simple and yet elegant structure that has been barely harnessed in the literature. Based on a monotonicity property afforded by such a geometric structure, we construct a bootstrap procedure that, unlike many studies in nonstandard settings, dispenses with estimation of local parameter spaces, and the critical values are obtained in a way as simple as computing the test statistic. Moreover, by appealing to strong approximations, our framework accommodates nonparametric regression models as well as distributional/density-related and structural settings. Since the test entails a tuning parameter (due to the nonstandard nature of the problem), we propose a data-driven choice and prove its validity. Monte Carlo simulations confirm that our test works well.
△ Less
Submitted 20 September, 2021; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Open Set Domain Adaptation: Theoretical Bound and Algorithm
Authors:
Zhen Fang,
Jie Lu,
Feng Liu,
Junyu Xuan,
Guangquan Zhang
Abstract:
The aim of unsupervised domain adaptation is to leverage the knowledge in a labeled (source) domain to improve a model's learning performance with an unlabeled (target) domain -- the basic strategy being to mitigate the effects of discrepancies between the two distributions. Most existing algorithms can only handle unsupervised closed set domain adaptation (UCSDA), i.e., where the source and targe…
▽ More
The aim of unsupervised domain adaptation is to leverage the knowledge in a labeled (source) domain to improve a model's learning performance with an unlabeled (target) domain -- the basic strategy being to mitigate the effects of discrepancies between the two distributions. Most existing algorithms can only handle unsupervised closed set domain adaptation (UCSDA), i.e., where the source and target domains are assumed to share the same label set. In this paper, we target a more challenging but realistic setting: unsupervised open set domain adaptation (UOSDA), where the target domain has unknown classes that are not found in the source domain. This is the first study to provide a learning bound for open set domain adaptation, which we do by theoretically investigating the risk of the target classifier on unknown classes. The proposed learning bound has a special term, namely open set difference, which reflects the risk of the target classifier on unknown classes. Further, we present a novel and theoretically guided unsupervised algorithm for open set domain adaptation, called distribution alignment with ppen difference (DAOD), which is based on regularizing this open set difference bound. The experiments on several benchmark datasets show the superior performance of the proposed UOSDA method compared with the state-of-the-art methods in the literature.
△ Less
Submitted 7 October, 2020; v1 submitted 19 July, 2019;
originally announced July 2019.
-
Refinements of the Kiefer-Wolfowitz Theorem and a Test of Concavity
Authors:
Zheng Fang
Abstract:
This paper studies estimation of and inference on a distribution function $F$ that is concave on the nonnegative half line and admits a density function $f$ with potentially unbounded support. When $F$ is strictly concave, we show that the supremum distance between the Grenander distribution estimator and the empirical distribution may still be of order $O(n^{-2/3}(\log n)^{2/3})$ almost surely, w…
▽ More
This paper studies estimation of and inference on a distribution function $F$ that is concave on the nonnegative half line and admits a density function $f$ with potentially unbounded support. When $F$ is strictly concave, we show that the supremum distance between the Grenander distribution estimator and the empirical distribution may still be of order $O(n^{-2/3}(\log n)^{2/3})$ almost surely, which reduces to an existing result of Kiefer and Wolfowitz when $f$ has bounded support. We further refine this result by allowing $F$ to be not strictly concave or even non-concave and instead requiring it be "asymptotically" strictly concave. Building on these results, we then develop a test of concavity of $F$ or equivalently monotonicity of $f$, which is shown to have asymptotically pointwise level control under the entire null as well as consistency under any fixed alternative. In fact, we show that our test has local size control and nontrivial local power against any local alternatives that do not approach the null too fast, which may be of interest given the irregularity of the problem. Extensions to settings involving testing concavity/convexity/monotonicity are discussed.
△ Less
Submitted 9 November, 2019; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Precision Annealing Monte Carlo Methods for Statistical Data Assimilation: Metropolis-Hastings Procedures
Authors:
Adrian S. Wong,
Kangbo Hao,
Zheng Fang,
Henry D. I. Abarbanel
Abstract:
Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of inte…
▽ More
Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of interest in SDA are the conditional expected values of functions of the model state, and these require the approximate evaluation of high dimensional integrals. We introduce a conditional probability distribution and use the Laplace method with annealing to identify the maxima of the conditional probability distribution. The annealing method slowly increases the precision term of the model as it enters the Laplace method. In this paper, we extend the idea of precision annealing (PA) to Monte Carlo calculations of conditional expected values using Metropolis-Hastings methods.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
Improved Inference on the Rank of a Matrix
Authors:
Qihui Chen,
Zheng Fang
Abstract:
This paper develops a general framework for conducting inference on the rank of an unknown matrix $Π_0$. A defining feature of our setup is the null hypothesis of the form $\mathrm H_0: \mathrm{rank}(Π_0)\le r$. The problem is of first order importance because the previous literature focuses on $\mathrm H_0': \mathrm{rank}(Π_0)= r$ by implicitly assuming away $\mathrm{rank}(Π_0)<r$, which may lead…
▽ More
This paper develops a general framework for conducting inference on the rank of an unknown matrix $Π_0$. A defining feature of our setup is the null hypothesis of the form $\mathrm H_0: \mathrm{rank}(Π_0)\le r$. The problem is of first order importance because the previous literature focuses on $\mathrm H_0': \mathrm{rank}(Π_0)= r$ by implicitly assuming away $\mathrm{rank}(Π_0)<r$, which may lead to invalid rank tests due to over-rejections. In particular, we show that limiting distributions of test statistics under $\mathrm H_0'$ may not stochastically dominate those under $\mathrm{rank}(Π_0)<r$. A multiple test on the nulls $\mathrm{rank}(Π_0)=0,\ldots,r$, though valid, may be substantially conservative. We employ a testing statistic whose limiting distributions under $\mathrm H_0$ are highly nonstandard due to the inherent irregular natures of the problem, and then construct bootstrap critical values that deliver size control and improved power. Since our procedure relies on a tuning parameter, a two-step procedure is designed to mitigate concerns on this nuisance. We additionally argue that our setup is also important for estimation. We illustrate the empirical relevance of our results through testing identification in linear IV models that allows for clustered data and inference on sorting dimensions in a two-sided matching model with transferrable utility.
△ Less
Submitted 25 March, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Deep Learning Super-Resolution Enables Rapid Simultaneous Morphological and Quantitative Magnetic Resonance Imaging
Authors:
Akshay Chaudhari,
Zhongnan Fang,
** Hyung Lee,
Garry Gold,
Brian Hargreaves
Abstract:
Obtaining magnetic resonance images (MRI) with high resolution and generating quantitative image-based biomarkers for assessing tissue biochemistry is crucial in clinical and research applications. How- ever, acquiring quantitative biomarkers requires high signal-to-noise ratio (SNR), which is at odds with high-resolution in MRI, especially in a single rapid sequence. In this paper, we demonstrate…
▽ More
Obtaining magnetic resonance images (MRI) with high resolution and generating quantitative image-based biomarkers for assessing tissue biochemistry is crucial in clinical and research applications. How- ever, acquiring quantitative biomarkers requires high signal-to-noise ratio (SNR), which is at odds with high-resolution in MRI, especially in a single rapid sequence. In this paper, we demonstrate how super-resolution can be utilized to maintain adequate SNR for accurate quantification of the T2 relaxation time biomarker, while simultaneously generating high- resolution images. We compare the efficacy of resolution enhancement using metrics such as peak SNR and structural similarity. We assess accuracy of cartilage T2 relaxation times by comparing against a standard reference method. Our evaluation suggests that SR can successfully maintain high-resolution and generate accurate biomarkers for accelerating MRI scans and enhancing the value of clinical and research MRI.
△ Less
Submitted 7 August, 2018;
originally announced August 2018.
-
Efficacy of regularized multi-task learning based on SVM models
Authors:
Shaohan Chen,
Zhou Fang,
Sijie Lu,
Chuanhou Gao
Abstract:
This paper investigates the efficacy of a regularized multi-task learning (MTL) framework based on SVM (M-SVM) to answer whether MTL always provides reliable results and how MTL outperforms independent learning. We first find that M-SVM is Bayes risk consistent in the limit of large sample size. This implies that despite the task dissimilarities, M-SVM always produces a reliable decision rule for…
▽ More
This paper investigates the efficacy of a regularized multi-task learning (MTL) framework based on SVM (M-SVM) to answer whether MTL always provides reliable results and how MTL outperforms independent learning. We first find that M-SVM is Bayes risk consistent in the limit of large sample size. This implies that despite the task dissimilarities, M-SVM always produces a reliable decision rule for each task in terms of misclassification error when the data size is large enough. Furthermore, we find that the task-interaction vanishes as the data size goes to infinity, and the convergence rates of M-SVM and its single-task counterpart have the same upper bound. The former suggests that M-SVM cannot improve the limit classifier's performance; based on the latter, we conjecture that the optimal convergence rate is not improved when the task number is fixed. As a novel insight of MTL, our theoretical and experimental results achieved an excellent agreement that the benefit of the MTL methods lies in the improvement of the pre-convergence-rate factor (PCR, to be denoted in Section III) rather than the convergence rate. Moreover, this improvement of PCR factors is more significant when the data size is small.
△ Less
Submitted 20 February, 2022; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Semiparametric Mixed Model for Evaluating Pathway-Environment Interaction
Authors:
Zaili Fang,
Inyoung Kim,
Jeesun Jung
Abstract:
A biological pathway represents a set of genes that serves a particular cellular or a physiological function. The genes within the same pathway are expected to function together and hence may interact with each other. It is also known that many genes, and so pathways, interact with other environmental variables. However, no formal procedure has yet been developed to evaluate the pathway-environmen…
▽ More
A biological pathway represents a set of genes that serves a particular cellular or a physiological function. The genes within the same pathway are expected to function together and hence may interact with each other. It is also known that many genes, and so pathways, interact with other environmental variables. However, no formal procedure has yet been developed to evaluate the pathway-environment interaction. In this article, we propose a semiparametric method to model the pathway-environment interaction. The method connects a least square kernel machine and a semiparametric mixed effects model. We model nonparametrically the environmental effect via a natural cubic spline. Both a pathway effect and an interaction between a pathway and an environmental effect are modeled nonparametrically via a kernel machine, and we estimate variance component representing an interaction effect under a semiparametric mixed effects model. We then employ a restricted likelihood ratio test and a score test to evaluate the main pathway effect and the pathway-environment interaction. The approach was applied to a genetic pathway data of Type II diabetes, and pathways with either a significant main pathway effect, an interaction effect or both were identified. Other methods previously developed determined many as having a significant main pathway effect only. Furthermore, among those significant pathways, we discovered some pathways having a significant pathway-environment interaction effect, a result that other methods would not be able to detect.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
A Graphical View of Bayesian Variable Selection
Authors:
Zaili Fang,
Inyoung Kim
Abstract:
In recent years, Ising prior with the network information for the "in" or "out" binary random variable in Bayesian variable selections has received more and more attentions. In this paper, we discover that even without the informative prior a Bayesian variable selection problem itself can be considered as a complete graph and described by a Ising model with random interactions. There are many adva…
▽ More
In recent years, Ising prior with the network information for the "in" or "out" binary random variable in Bayesian variable selections has received more and more attentions. In this paper, we discover that even without the informative prior a Bayesian variable selection problem itself can be considered as a complete graph and described by a Ising model with random interactions. There are many advantages of treating variable selection as a graphical model, such as it is easy to employ the single site updating as well as the cluster updating algorithm, suitable for problems with small sample size and larger variable number, easy to extend to nonparametric regression models and incorporate graphical prior information and so on. In a Bayesian variable selection Ising model the interactions are determined by the linear model coefficients, so we systematically study the performance of different scale normal mixture priors for the model coefficients by adopting the global-local shrinkage strategy. Our results prove that the best prior of the model coefficients in terms of variable selection should maintain substantial weight on small shrinkage instead of large shrinkage. We also discuss the connection between the tempering algorithms for Ising models and the global-local shrinkage approach, showing that the shrinkage parameter plays a tempering role. The methods are illustrated with simulated and real data.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Flexible Variable Selection for Recovering Sparsity in Nonadditive Nonparametric Models
Authors:
Zaili Fang,
Inyoung Kim,
Patrick Schaumont
Abstract:
Variable selection for recovering sparsity in nonadditive nonparametric models has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this paper we propose a variable selection approach that is developed by con…
▽ More
Variable selection for recovering sparsity in nonadditive nonparametric models has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this paper we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric multiple regression model. The advantages of our approach are that it can: (1) recover the sparsity, (2) automatically model unknown and complicated interactions, (3) connect with several existing approaches including linear nonnegative garrote, kernel learning and automatic relevant determinants (ARD), and (4) provide flexibility for both additive and nonadditive nonparametric models. Our approach may be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a least squares kernel machine and construct the nonnegative garrote objective function as the function of the similarity matrix. Since the multiple regression similarity matrix can be written as an additive form of univariate similarity matrices corresponding to input variables, applying a sparse scale parameter on each univariate similarity matrix can reveal its relevance to the response variable. We also derive the asymptotic properties of our approach, and show that it provides a square root consistent estimator of the scale parameters. Furthermore, we prove that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions and give the necessary and sufficient conditions for sparsistency. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve power.
△ Less
Submitted 12 June, 2012;
originally announced June 2012.
-
Sparse Group Selection Through Co-Adaptive Penalties
Authors:
Zhou Fang
Abstract:
Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well approximated by a fit involving only a small number of covariates -- a so called sparsity assumption, which leads to the Lasso and other methods. In many situations,…
▽ More
Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well approximated by a fit involving only a small number of covariates -- a so called sparsity assumption, which leads to the Lasso and other methods. In many situations, however, the covariates can be considered to be structured, in that the selection of some variables favours the selection of others -- with variables organised into groups entering or leaving the model simultaneously as a special case. This structure creates a different form of sparsity. In this paper, we suggest the Co-adaptive Lasso to fit models accommodating this form of `group sparsity'. The Co-adaptive Lasso is fast and simple to calculate, and we show that it holds theoretical advantages over the Lasso, performs well under a broad set of conclusions, and is very competitive in empirical simulations in comparison with previously suggested algorithms like the Group Lasso and the Adaptive Lasso.
△ Less
Submitted 18 November, 2011;
originally announced November 2011.
-
LASSO ISOtone for High Dimensional Additive Isotonic Regression
Authors:
Zhou Fang,
Nicolai Meinshausen
Abstract:
Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling to a…
▽ More
Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling to additive isotonic regression. Thus, it is viable in many situations with high dimensional predictor variables, where selection of significant versus insignificant variables are required. We suggest an algorithm involving a modification of the backfitting algorithm CPAV. We give a numerical convergence result, and finally examine some of its properties through simulations. We also suggest some possible extensions that improve performance, and allow calculation to be carried out when the direction of the monotonicity is unknown.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.