Search | arXiv e-print repository

On the Learnability of Out-of-distribution Detection

Authors: Zhen Fang, Yixuan Li, Feng Liu, Bo Han, Jie Lu

Abstract: Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good general… ▽ More Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms, and corresponding learning theory is still an open problem. To study the generalization of OOD detection, this paper investigates the probably approximately correct (PAC) learning theory of OOD detection that fits the commonly used evaluation metrics in the literature. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we offer theoretical support for representative OOD detection works based on our OOD theory. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Accepted by JMLR in 7th of April, 2024. This is a journal extension of the previous NeurIPS 2022 Outstanding Paper "Is Out-of-distribution Detection Learnable?" [arXiv:2210.14707]

arXiv:2402.03502 [pdf, other]

How Does Unlabeled Data Provably Help Out-of-Distribution Detection?

Authors: Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li

Abstract: Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD… ▽ More Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at https://github.com/deeplearning-wisc/sal. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: ICLR 2024

arXiv:2401.11093 [pdf, other]

Learned Image Compression with Dual-Branch Encoder and Conditional Information Coding

Authors: Haisheng Fu, Feng Liang, Jie Liang, Zhenman Fang, Guohe Zhang, **gning Han

Abstract: Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two te… ▽ More Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two techniques to balance the trade-off between complexity and performance. First, we introduce two branching coding networks to independently learn a low-resolution latent representation and a high-resolution latent representation of the input image, discriminatively representing the global and local information therein. Second, we utilize the high-resolution latent representation as conditional information for the low-resolution latent representation, furnishing it with global information, thus aiding in the reduction of redundancy between low-resolution information. We do not utilize any serial entropy models. Instead, we employ a parallel channel-wise auto-regressive entropy model for encoding and decoding low-resolution and high-resolution latent representations. Experiments demonstrate that our method is approximately twice as fast in both encoding and decoding compared to the parallelizable checkerboard context model, and it also achieves a 1.2% improvement in R-D performance compared to state-of-the-art learned image compression schemes. Our method also outperforms classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods in rate-distortion performance, as validated by both PSNR and MS-SSIM metrics on the Kodak dataset. △ Less

Submitted 21 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted by DCC2024

arXiv:2311.00836 [pdf, ps, other]

Effective filtering approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization

Authors: Zhou Fang, Ankit Gupta, Mustafa Khammash

Abstract: Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemmi… ▽ More Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemming from the dynamics of the particles generated to represent system parameters. This paper provides a novel and effective approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization. Our method operates in two layers: the first layer estimates the system states using a bootstrap particle filter, and the second layer marginalizes out system parameters explicitly. This strategy circumvents the need to generate particles representing system parameters, thereby mitigating their associated problems of sample degeneracy and information loss. Moreover, our method employs a modularization approach when integrating out the parameters, which significantly reduces the computational complexity. All these designs ensure the superior performance of our method. Finally, a numerical example is presented to illustrate that our method outperforms existing approaches by a large margin. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 figures

MSC Class: 62M20; 62F15; 65C05; 92-08; 93E11

arXiv:2308.03666 [pdf, other]

Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

Authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shi** Wang, Wenzhong Guo

Abstract: As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence… ▽ More As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed. △ Less

Submitted 18 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2305.09869 [pdf, ps, other]

A Signed Subgraph Encoding Approach via Linear Optimization for Link Sign Prediction

Authors: Zhihong Fang, Shaolin Tan, Yaonan Wang

Abstract: In this paper, we consider the problem of inferring the sign of a link based on limited sign data in signed networks. Regarding this link sign prediction problem, SDGNN (Signed Directed Graph Neural Networks) provides the best prediction performance currently to the best of our knowledge. In this paper, we propose a different link sign prediction architecture call SELO (Subgraph Encoding via Linea… ▽ More In this paper, we consider the problem of inferring the sign of a link based on limited sign data in signed networks. Regarding this link sign prediction problem, SDGNN (Signed Directed Graph Neural Networks) provides the best prediction performance currently to the best of our knowledge. In this paper, we propose a different link sign prediction architecture call SELO (Subgraph Encoding via Linear Optimization), which obtains overall leading prediction performances compared the state-of-the-art algorithm SDGNN. The proposed model utilizes a subgraph encoding approach to learn edge embeddings for signed directed networks. In particular, a signed subgraph encoding approach is introduced to embed each subgraph into a likelihood matrix instead of the adjacency matrix through a linear optimization method. Comprehensive experiments are conducted on six real-world signed networks with AUC, F1, micro-F1, and Macro-F1 as the evaluation metrics. The experiment results show that the proposed SELO model outperforms existing baseline feature-based methods and embedding-based methods on all the six real-world networks and in all the four evaluation metrics. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2303.12160 [pdf]

doi 10.1080/19427867.2023.2262201

Investigating the spatial heterogeneity of factors influencing speeding-related crash severities using correlated random parameter order models with heterogeneity-in-means

Authors: Renteng Yuan, Qiaojun Xiang, Zhiheng Fang, Xin Gu

Abstract: Speeding has been acknowledged as a critical determinant in increasing the risk of crashes and their resulting injury severities. This paper demonstrates that severe speeding-related crashes within the state of Pennsylvania have a spatial clustering trend, where four crash datasets are extracted from four hotspot districts. Two log-likelihood ratio (LR) tests were conducted to determine whether sp… ▽ More Speeding has been acknowledged as a critical determinant in increasing the risk of crashes and their resulting injury severities. This paper demonstrates that severe speeding-related crashes within the state of Pennsylvania have a spatial clustering trend, where four crash datasets are extracted from four hotspot districts. Two log-likelihood ratio (LR) tests were conducted to determine whether speeding-related crashes classified by hotspot districts should be modeled separately. The results suggest that separate modeling is necessary. To capture the unobserved heterogeneity, four correlated random parameter order models with heterogeneity in means are employed to explore the factors contributing to crash severity involving at least one vehicle speeding. Overall, the findings exhibit that some indicators are observed to be spatial instability, including hit pedestrian crashes, head-on crashes, speed limits, work zones, light conditions (dark), rural areas, older drivers, running stop signs, and running red lights. Moreover, drunk driving, exceeding the speed limit, and being unbelted present relative spatial stability in four district models. This paper provides insights into preventing speeding-related crashes and potentially facilitating the development of corresponding crash injury mitigation policies. △ Less

Submitted 5 July, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.00841 [pdf, other]

Longitudinal Canonical Correlation Analysis

Authors: Seonjoo Lee, Jongwoo Choi, Zhiqian Fang, F. DuBois Bowman

Abstract: This paper considers canonical correlation analysis for two longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis… ▽ More This paper considers canonical correlation analysis for two longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis effectively recovers underlying correlation patterns between two high-dimensional longitudinal data sets. We applied the proposed LCCA to data from the Alzheimer's Disease Neuroimaging Initiative and identified the longitudinal profiles of morphological brain changes and amyloid cumulation. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 24 pages, 16 figures

MSC Class: 62H20

arXiv:2210.14707 [pdf, other]

Is Out-of-Distribution Detection Learnable?

Authors: Zhen Fang, Yixuan Li, Jie Lu, Jiahua Dong, Bo Han, Feng Liu

Abstract: Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good general… ▽ More Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory. △ Less

Submitted 23 February, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022 Outstanding Paper

arXiv:2207.05067 [pdf, other]

On the Representation of Causal Background Knowledge and its Applications in Causal Inference

Authors: Zhuangyan Fang, Ruiqi Zhao, Yue Liu, Yangbo He

Abstract: Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical c… ▽ More Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and give a minimal representation of a causal MPDAG. Then, we introduce a novel representation called direct causal clause (DCC) to represent all types of causal background knowledge in a unified form. Using DCCs, we study the consistency and equivalency of causal background knowledge and show that any causal background knowledge set can be equivalently decomposed into a causal MPDAG plus a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking the consistency, equivalency, and finding the decomposed MPDAG and residual DCCs. Finally, with causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that causal background knowledge can significantly improve the identifiability of causal effects. △ Less

Submitted 10 July, 2022; originally announced July 2022.

arXiv:2203.17262 [pdf]

Length L-function for Network-Constrained Point Data

Authors: Zidong Fang, Ci Song, Hua Shu, Jie Chen, Tianyu Liu, Xi Wang, Xiao Chen, Tao Pei

Abstract: Network constrained points are referred to as points restricted to road networks, such as taxi pick up and drop off locations. A significant pattern of network constrained points is referred to as an aggregation; e.g., the aggregation of pick up points may indicate a high taxi demand in a particular area. Although the network K function using the shortest path network distance has been proposed to… ▽ More Network constrained points are referred to as points restricted to road networks, such as taxi pick up and drop off locations. A significant pattern of network constrained points is referred to as an aggregation; e.g., the aggregation of pick up points may indicate a high taxi demand in a particular area. Although the network K function using the shortest path network distance has been proposed to detect point aggregation, its statistical unit is still radius based. R neighborhood, in particular, has inconsistent network length owing to the complex configuration of road networks which cause unfair counts and identification errors in networks (e.g., the length of the r neighborhood located at an intersection is longer than that on straight roads, which may include more points). In this study, we derived the length L function for network constrained points to identify the aggregation by designing a novel neighborhood as the statistical unit; the total length of this is consistent throughout the network. Compared to the network K function, our method can detect a true to life aggregation scale, identify the aggregation with higher network density, as well as identify the aggregations that the network K function cannot. We validated our method using taxi trips pick up location data within Zhongguancun Area in Bei**g, analyzing differences in maximal aggregation between workdays and weekends to understand taxi demand in the morning and evening peak. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2108.09042 [pdf]

Identifying Aggregation Artery Architecture of constrained Origin-Destination flows using Manhattan L-function

Authors: Zidong Fang, Hua Shu, Ci Song, Jie Chen, Tianyu Liu, Xiaohan Liu, Tao Pei

Abstract: The movement of humans and goods in cities can be represented by constrained flow, which is defined as the movement of objects between origin and destination in road networks. Flow aggregation, namely origins and destinations aggregated simultaneously, is one of the most common patterns, say the aggregated origin-to-destination flows between two transport hubs may indicate the great traffic demand… ▽ More The movement of humans and goods in cities can be represented by constrained flow, which is defined as the movement of objects between origin and destination in road networks. Flow aggregation, namely origins and destinations aggregated simultaneously, is one of the most common patterns, say the aggregated origin-to-destination flows between two transport hubs may indicate the great traffic demand between two sites. Develo** a clustering method for constrained flows is crucial for determining urban flow aggregation. Among existing methods about identifying flow aggregation, L-function of flows is the major one. Nevertheless, this method depends on the aggregation scale, the key parameter detected by Euclidean L-function, it does not adapt to road network. The extracted aggregation may be overestimated and dispersed. Therefore, we propose a clustering method based on L-function of Manhattan space, which consists of three major steps. The first is to detect aggregation scales by Manhattan L-function. The second is to determine core flows possessing highest local L-function values at different scales. The final step is to take the intersection of core flows neighbourhoods, the extent of which depends on corresponding scale. By setting the number of core flows, we could concentrate the aggregation and thus highlight Aggregation Artery Architecture (AAA), which depicts road sections that contain the projection of key flow cluster on the road networks. Experiment using taxi flows showed that AAA could clarify resident movement type of identified aggregated flows. Our method also helps selecting locations for distribution sites, thereby supporting accurate analysis of urban interactions. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: 29 pages, 12 figures

arXiv:2108.00511 [pdf, ps, other]

Implementing an Improved Test of Matrix Rank in Stata

Authors: Qihui Chen, Zheng Fang, Xun Huang

Abstract: We develop a Stata command, bootranktest, for implementing the matrix rank test of Chen and Fang (2019) in linear instrumental variable regression models. Existing rank tests employ critical values that may be too small, and hence may not even be first order valid in the sense that they may fail to control the Type I error. By appealing to the bootstrap, they devise a test that overcomes the defic… ▽ More We develop a Stata command, bootranktest, for implementing the matrix rank test of Chen and Fang (2019) in linear instrumental variable regression models. Existing rank tests employ critical values that may be too small, and hence may not even be first order valid in the sense that they may fail to control the Type I error. By appealing to the bootstrap, they devise a test that overcomes the deficiency of existing tests. The command bootranktest implements the two-step version of their test, and also the analytic version if chosen. The command also accommodates data with temporal and cluster dependence. △ Less

Submitted 1 August, 2021; originally announced August 2021.

arXiv:2107.12494 [pdf, ps, other]

A Unifying Framework for Testing Shape Restrictions

Authors: Zheng Fang

Abstract: This paper makes the following original contributions. First, we develop a unifying framework for testing shape restrictions based on the Wald principle. The test has asymptotic uniform size control and is uniformly consistent. Second, we examine the applicability and usefulness of some prominent shape enforcing operators in implementing our framework. In particular, in stark contrast to its use i… ▽ More This paper makes the following original contributions. First, we develop a unifying framework for testing shape restrictions based on the Wald principle. The test has asymptotic uniform size control and is uniformly consistent. Second, we examine the applicability and usefulness of some prominent shape enforcing operators in implementing our framework. In particular, in stark contrast to its use in point and interval estimation, the rearrangement operator is inapplicable due to a lack of convexity. The greatest convex minorization and the least concave majorization are shown to enjoy the analytic properties required to employ our framework. Third, we show that, despite that the projection operator may not be well-defined/behaved in general parameter spaces such as those defined by uniform norms, one may nonetheless employ a powerful distance-based test by applying our framework. Monte Carlo simulations confirm that our test works well. We further showcase the empirical relevance by investigating the relationship between weekly working hours and the annual wage growth in the high-end labor market. △ Less

Submitted 1 August, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

arXiv:2102.12685 [pdf, other]

doi 10.1016/j.artint.2022.103669

A Local Method for Identifying Causal Relations under Markov Equivalence

Authors: Zhuangyan Fang, Yue Liu, Zhi Geng, Shengyu Zhu, Yangbo He

Abstract: Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs e… ▽ More Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs encoding different causal relations are Markov equivalent. In this paper, we first introduce a sufficient and necessary graphical condition to check the existence of a causal path from a variable to a target in every Markov equivalent DAG. Next, we provide local criteria for identifying whether a variable is a cause/non-cause of a target based only on the local structure instead of the entire graph. Finally, we propose a local learning algorithm for this causal query via learning the local structure of the variable and some additional statistical independence tests related to the target. Simulation studies show that our local algorithm is efficient and effective, compared with other state-of-art methods. △ Less

Submitted 5 March, 2022; v1 submitted 25 February, 2021; originally announced February 2021.

arXiv:2012.03612 [pdf, ps, other]

doi 10.1016/j.sigpro.2021.108281

LCS Graph Kernel Based on Wasserstein Distance in Longest Common Subsequence Metric Space

Authors: Jianming Huang, Zhongxi Fang, Hiroyuki Kasai

Abstract: For graph learning tasks, many existing methods utilize a message-passing mechanism where vertex features are updated iteratively by aggregation of neighbor information. This strategy provides an efficient means for graph features extraction, but obtained features after many iterations might contain too much information from other vertices, and tend to be similar to each other. This makes their re… ▽ More For graph learning tasks, many existing methods utilize a message-passing mechanism where vertex features are updated iteratively by aggregation of neighbor information. This strategy provides an efficient means for graph features extraction, but obtained features after many iterations might contain too much information from other vertices, and tend to be similar to each other. This makes their representations less expressive. Learning graphs using paths, on the other hand, can be less adversely affected by this problem because it does not involve all vertex neighbors. However, most of them can only compare paths with the same length, which might engender information loss. To resolve this difficulty, we propose a new Graph Kernel based on a Longest Common Subsequence (LCS) similarity. Moreover, we found that the widely-used R-convolution framework is unsuitable for path-based Graph Kernel because a huge number of comparisons between dissimilar paths might deteriorate graph distances calculation. Therefore, we propose a novel metric space by exploiting the proposed LCS-based similarity, and compute a new Wasserstein-based graph distance in this metric space, which emphasizes more the comparison between similar paths. Furthermore, to reduce the computational cost, we propose an adjacent point merging operation to sparsify point clouds in the metric space. △ Less

Submitted 29 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

Journal ref: Signal Processing, Vol.189, 2021

arXiv:2011.08810 [pdf, other]

Data Driven Reaction Mechanism Estimation via Transient Kinetics and Machine Learning

Authors: M. Ross Kunz, Adam Yonge, Zongtang Fang, Andrew J. Medford, Denis Constales, Gregory Yablonsky, Rebecca Fushimi

Abstract: Understanding the set of elementary steps and kinetics in each reaction is extremely valuable to make informed decisions about creating the next generation of catalytic materials. With physical and mechanistic complexity of industrial catalysts, it is critical to obtain kinetic information through experimental methods. As such, this work details a methodology based on the combination of transient… ▽ More Understanding the set of elementary steps and kinetics in each reaction is extremely valuable to make informed decisions about creating the next generation of catalytic materials. With physical and mechanistic complexity of industrial catalysts, it is critical to obtain kinetic information through experimental methods. As such, this work details a methodology based on the combination of transient rate/concentration dependencies and machine learning to measure the number of active sites, the individual rate constants, and gain insight into the mechanism under a complex set of elementary steps. This new methodology was applied to simulated transient responses to verify its ability to obtain correct estimates of the micro-kinetic coefficients. Furthermore, experimental CO oxidation data was analyzed to reveal the Langmuir-Hinshelwood mechanism driving the reaction. As oxygen accumulated on the catalyst, a transition in the mechanism was clearly defined in the machine learning analysis due to the large amount of kinetic information available from transient reaction techniques. This methodology is proposed as a new data driven approach to characterize how materials control complex reaction mechanisms relying exclusively on experimental data. △ Less

Submitted 21 April, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

arXiv:2008.11682 [pdf, other]

Stochastic filters based on hybrid approximations of multiscale stochastic reaction networks

Authors: Zhou Fang, Ankit Gupta, Mustafa Khammash

Abstract: We consider the problem of estimating the dynamic latent states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exp… ▽ More We consider the problem of estimating the dynamic latent states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network, and it can greatly reduce the computational effort required to simulate the dynamics. This enables us to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using a numerical example. △ Less

Submitted 8 September, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Comments: 6 pages, 1 figure. Accepted to CDC 2020

MSC Class: 60J22; 62M20; 65C05; 92-08; 93E11

arXiv:2008.01454 [pdf, other]

Learning from a Complementary-label Source Domain: Theory and Algorithms

Authors: Yiyang Zhang, Feng Liu, Zhen Fang, Bo Yuan, Guangquan Zhang, Jie Lu

Abstract: In unsupervised domain adaptation (UDA), a classifier for the target domain is trained with massive true-label data from the source domain and unlabeled data from the target domain. However, collecting fully-true-label data in the source domain is high-cost and sometimes impossible. Compared to the true labels, a complementary label specifies a class that a pattern does not belong to, hence collec… ▽ More In unsupervised domain adaptation (UDA), a classifier for the target domain is trained with massive true-label data from the source domain and unlabeled data from the target domain. However, collecting fully-true-label data in the source domain is high-cost and sometimes impossible. Compared to the true labels, a complementary label specifies a class that a pattern does not belong to, hence collecting complementary labels would be less laborious than collecting true labels. Thus, in this paper, we propose a novel setting that the source domain is composed of complementary-label data, and a theoretical bound for it is first proved. We consider two cases of this setting, one is that the source domain only contains complementary-label data (completely complementary unsupervised domain adaptation, CC-UDA), and the other is that the source domain has plenty of complementary-label data and a small amount of true-label data (partly complementary unsupervised domain adaptation, PC-UDA). To this end, a complementary label adversarial network} (CLARINET) is proposed to solve CC-UDA and PC-UDA problems. CLARINET maintains two deep networks simultaneously, where one focuses on classifying complementary-label source data and the other takes care of source-to-target distributional adaptation. Experiments show that CLARINET significantly outperforms a series of competent baselines on handwritten-digits-recognition and objects-recognition tasks. △ Less

Submitted 4 August, 2020; originally announced August 2020.

Comments: arXiv admin note: text overlap with arXiv:2007.14612

arXiv:2007.14612 [pdf, other]

Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation

Authors: Yiyang Zhang, Feng Liu, Zhen Fang, Bo Yuan, Guangquan Zhang, Jie Lu

Abstract: In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain. However, it may be difficult to collect fully-true-label data in a source domain given a limited budget. To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to b… ▽ More In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain. However, it may be difficult to collect fully-true-label data in a source domain given a limited budget. To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to be trained with complementary-label data from the source domain and unlabeled data from the target domain named budget-friendly UDA (BFUDA). The key benefit is that it is much less costly to collect complementary-label source data (required by BFUDA) than collecting the true-label source data (required by ordinary UDA). To this end, the complementary label adversarial network (CLARINET) is proposed to solve the BFUDA problem. CLARINET maintains two deep networks simultaneously, where one focuses on classifying complementary-label source data and the other takes care of the source-to-target distributional adaptation. Experiments show that CLARINET significantly outperforms a series of competent baselines. △ Less

Submitted 4 March, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: This paper has been accepted by IJCAI-PRICAI 2020. Yiyang Zhang, Feng Liu and Zhen Fang equally contribute to this paper

arXiv:2007.14285 [pdf, ps, other]

Theory of Deep Convolutional Neural Networks II: Spherical Analysis

Authors: Zhiying Fang, Han Feng, Shuo Huang, Ding-Xuan Zhou

Abstract: Deep learning based on deep neural networks of various structures and architectures has been powerful in many practical applications, but it lacks enough theoretical verifications. In this paper, we consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $\mathbb{S}^{d-1}$ of $\mathbb{R}^d$. Our analysis presents rates of uniform approximation wh… ▽ More Deep learning based on deep neural networks of various structures and architectures has been powerful in many practical applications, but it lacks enough theoretical verifications. In this paper, we consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $\mathbb{S}^{d-1}$ of $\mathbb{R}^d$. Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $W^r_\infty (\mathbb{S}^{d-1})$ with $r>0$ or takes an additive ridge form. Our work verifies theoretically the modelling and approximation ability of deep convolutional neural networks followed by downsampling and one fully connected layer or two. The key idea of our spherical analysis is to use the inner product form of the reproducing kernels of the spaces of spherical harmonics and then to apply convolutional factorizations of filters to realize the generated linear features. △ Less

Submitted 28 July, 2020; originally announced July 2020.

arXiv:2006.13022 [pdf, other]

Bridging the Theoretical Bound and Deep Algorithms for Open Set Domain Adaptation

Authors: Li Zhong, Zhen Fang, Feng Liu, Bo Yuan, Guangquan Zhang, Jie Lu

Abstract: In the unsupervised open set domain adaptation (UOSDA), the target domain contains unknown classes that are not observed in the source domain. Researchers in this area aim to train a classifier to accurately: 1) recognize unknown target data (data with unknown classes) and, 2) classify other target data. To achieve this aim, a previous study has proven an upper bound of the target-domain risk, and… ▽ More In the unsupervised open set domain adaptation (UOSDA), the target domain contains unknown classes that are not observed in the source domain. Researchers in this area aim to train a classifier to accurately: 1) recognize unknown target data (data with unknown classes) and, 2) classify other target data. To achieve this aim, a previous study has proven an upper bound of the target-domain risk, and the open set difference, as an important term in the upper bound, is used to measure the risk on unknown target data. By minimizing the upper bound, a shallow classifier can be trained to achieve the aim. However, if the classifier is very flexible (e.g., deep neural networks (DNNs)), the open set difference will converge to a negative value when minimizing the upper bound, which causes an issue where most target data are recognized as unknown data. To address this issue, we propose a new upper bound of target-domain risk for UOSDA, which includes four terms: source-domain risk, $ε$-open set difference ($Δ_ε$), a distributional discrepancy between domains, and a constant. Compared to the open set difference, $Δ_ε$ is more robust against the issue when it is being minimized, and thus we are able to use very flexible classifiers (i.e., DNNs). Then, we propose a new principle-guided deep UOSDA method that trains DNNs via minimizing the new upper bound. Specifically, source-domain risk and $Δ_ε$ are minimized by gradient descent, and the distributional discrepancy is minimized via a novel open-set conditional adversarial training strategy. Finally, compared to existing shallow and deep UOSDA methods, our method shows the state-of-the-art performance on several benchmark datasets, including digit recognition (MNIST, SVHN, USPS), object recognition (Office-31, Office-Home), and face recognition (PIE). △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:2006.05691 [pdf, other]

On Low Rank Directed Acyclic Graphs and Causal Structure Learning

Authors: Zhuangyan Fang, Shengyu Zhu, Jiji Zhang, Yue Liu, Zhitang Chen, Yangbo He

Abstract: Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In this paper, we propose to exploit a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low rank t… ▽ More Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In this paper, we propose to exploit a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low rank techniques to adapt causal structure learning methods to take advantage of this assumption and establish several useful results relating interpretable graphical conditions to the low rank assumption. Specifically, we show that the maximum rank is highly related to hubs, suggesting that scale-free networks, which are frequently encountered in practice, tend to be low rank. Our experiments demonstrate the utility of the low rank adaptations for a variety of data models, especially with relatively large and dense graphs. Moreover, with a validation procedure, the adaptations maintain a superior or comparable performance even when graphs are not restricted to be low rank. △ Less

Submitted 15 May, 2023; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: This paper has been accepted by the IEEE Transactions on Neural Networks and Learning Systems

arXiv:1912.01198 [pdf, other]

Towards Understanding the Spectral Bias of Deep Learning

Authors: Yuan Cao, Zhiying Fang, Yue Wu, Ding-Xuan Zhou, Quanquan Gu

Abstract: An intriguing phenomenon observed during training neural networks is the spectral bias, which states that neural networks are biased towards learning less complex functions. The priority of learning functions with low complexity might be at the core of explaining generalization ability of neural network, and certain efforts have been made to provide theoretical explanation for spectral bias. Howev… ▽ More An intriguing phenomenon observed during training neural networks is the spectral bias, which states that neural networks are biased towards learning less complex functions. The priority of learning functions with low complexity might be at the core of explaining generalization ability of neural network, and certain efforts have been made to provide theoretical explanation for spectral bias. However, there is still no satisfying theoretical result justifying the underlying mechanism of spectral bias. In this paper, we give a comprehensive and rigorous explanation for spectral bias and relate it with the neural tangent kernel function proposed in recent work. We prove that the training process of neural networks can be decomposed along different directions defined by the eigenfunctions of the neural tangent kernel, where each direction has its own convergence rate and the rate is determined by the corresponding eigenvalue. We then provide a case study when the input data is uniformly distributed over the unit sphere, and show that lower degree spherical harmonics are easier to be learned by over-parameterized neural networks. Finally, we provide numerical experiments to demonstrate the correctness of our theory. Our experimental results also show that our theory can tolerate certain model misspecification in terms of the input data distribution. △ Less

Submitted 5 October, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: 29 pages, 7 figures. This version adds more experimental results

arXiv:1911.07420 [pdf, other]

A Graph Autoencoder Approach to Causal Structure Learning

Authors: Ignavier Ng, Shengyu Zhu, Zhitang Chen, Zhuangyan Fang

Abstract: Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-t… ▽ More Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-the-arts, we propose a new gradient-based method to learn causal structures from observational data. The proposed method generalizes the recent gradient-based methods to a graph autoencoder framework that allows nonlinear structural equation models and is easily applicable to vector-valued variables. We demonstrate that on synthetic datasets, our proposed method outperforms other gradient-based methods significantly, especially on large causal graphs. We further investigate the scalability and efficiency of our method, and observe a near linear training time when scaling up the graph size. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: NeurIPS 2019 Workshop "Do the right thing": machine learning and causal inference for improved decision making

arXiv:1910.08527 [pdf, other]

Masked Gradient-Based Causal Structure Learning

Authors: Ignavier Ng, Shengyu Zhu, Zhuangyan Fang, Haoyang Li, Zhitang Chen, Jun Wang

Abstract: This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize… ▽ More This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize the reformulated SEM to develop a causal structure learning method that can be efficiently trained using gradient-based optimization, by leveraging a smooth characterization on acyclicity and the Gumbel-Softmax approach to approximate the binary adjacency matrix. It is found that the obtained entries are typically near zero or one and can be easily thresholded to identify the edges. We conduct experiments on synthetic and real datasets to validate the effectiveness of the proposed method, and show that it readily includes different smooth model functions and achieves a much improved performance on most datasets considered. △ Less

Submitted 10 January, 2022; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: Accepted to SDM 2022

arXiv:1910.07689 [pdf, ps, other]

doi 10.3982/ECTA17764

A Projection Framework for Testing Shape Restrictions That Form Convex Cones

Authors: Zheng Fang, Juwon Seo

Abstract: This paper develops a uniformly valid and asymptotically nonconservative test based on projection for a class of shape restrictions. The key insight we exploit is that these restrictions form convex cones, a simple and yet elegant structure that has been barely harnessed in the literature. Based on a monotonicity property afforded by such a geometric structure, we construct a bootstrap procedure t… ▽ More This paper develops a uniformly valid and asymptotically nonconservative test based on projection for a class of shape restrictions. The key insight we exploit is that these restrictions form convex cones, a simple and yet elegant structure that has been barely harnessed in the literature. Based on a monotonicity property afforded by such a geometric structure, we construct a bootstrap procedure that, unlike many studies in nonstandard settings, dispenses with estimation of local parameter spaces, and the critical values are obtained in a way as simple as computing the test statistic. Moreover, by appealing to strong approximations, our framework accommodates nonparametric regression models as well as distributional/density-related and structural settings. Since the test entails a tuning parameter (due to the nonstandard nature of the problem), we propose a data-driven choice and prove its validity. Monte Carlo simulations confirm that our test works well. △ Less

Submitted 20 September, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: This version contains the following sections omitted from the published version: i) discussions of the examples in the main text, ii) proofs for Appendix C (in the online appendix), and iii) the complete set of simulation results. A previous version of this paper was circulated under the title "A General Framework for Inference on Shape Restrictions."

arXiv:1907.08375 [pdf, other]

Open Set Domain Adaptation: Theoretical Bound and Algorithm

Authors: Zhen Fang, Jie Lu, Feng Liu, Junyu Xuan, Guangquan Zhang

Abstract: The aim of unsupervised domain adaptation is to leverage the knowledge in a labeled (source) domain to improve a model's learning performance with an unlabeled (target) domain -- the basic strategy being to mitigate the effects of discrepancies between the two distributions. Most existing algorithms can only handle unsupervised closed set domain adaptation (UCSDA), i.e., where the source and targe… ▽ More The aim of unsupervised domain adaptation is to leverage the knowledge in a labeled (source) domain to improve a model's learning performance with an unlabeled (target) domain -- the basic strategy being to mitigate the effects of discrepancies between the two distributions. Most existing algorithms can only handle unsupervised closed set domain adaptation (UCSDA), i.e., where the source and target domains are assumed to share the same label set. In this paper, we target a more challenging but realistic setting: unsupervised open set domain adaptation (UOSDA), where the target domain has unknown classes that are not found in the source domain. This is the first study to provide a learning bound for open set domain adaptation, which we do by theoretically investigating the risk of the target classifier on unknown classes. The proposed learning bound has a special term, namely open set difference, which reflects the risk of the target classifier on unknown classes. Further, we present a novel and theoretically guided unsupervised algorithm for open set domain adaptation, called distribution alignment with ppen difference (DAOD), which is based on regularizing this open set difference bound. The experiments on several benchmark datasets show the superior performance of the proposed UOSDA method compared with the state-of-the-art methods in the literature. △ Less

Submitted 7 October, 2020; v1 submitted 19 July, 2019; originally announced July 2019.

Comments: This paper has been accepted by IEEE-TNNLS

arXiv:1906.10305 [pdf, other]

Refinements of the Kiefer-Wolfowitz Theorem and a Test of Concavity

Authors: Zheng Fang

Abstract: This paper studies estimation of and inference on a distribution function $F$ that is concave on the nonnegative half line and admits a density function $f$ with potentially unbounded support. When $F$ is strictly concave, we show that the supremum distance between the Grenander distribution estimator and the empirical distribution may still be of order $O(n^{-2/3}(\log n)^{2/3})$ almost surely, w… ▽ More This paper studies estimation of and inference on a distribution function $F$ that is concave on the nonnegative half line and admits a density function $f$ with potentially unbounded support. When $F$ is strictly concave, we show that the supremum distance between the Grenander distribution estimator and the empirical distribution may still be of order $O(n^{-2/3}(\log n)^{2/3})$ almost surely, which reduces to an existing result of Kiefer and Wolfowitz when $f$ has bounded support. We further refine this result by allowing $F$ to be not strictly concave or even non-concave and instead requiring it be "asymptotically" strictly concave. Building on these results, we then develop a test of concavity of $F$ or equivalently monotonicity of $f$, which is shown to have asymptotically pointwise level control under the entire null as well as consistency under any fixed alternative. In fact, we show that our test has local size control and nontrivial local power against any local alternatives that do not approach the null too fast, which may be of interest given the irregularity of the problem. Extensions to settings involving testing concavity/convexity/monotonicity are discussed. △ Less

Submitted 9 November, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: Forthcoming in Electronic Journal of Statistics. Compared to the journal version, the difference is that this version contains additional simulation results, collected in Appendix C

arXiv:1901.04598 [pdf, other]

Precision Annealing Monte Carlo Methods for Statistical Data Assimilation: Metropolis-Hastings Procedures

Authors: Adrian S. Wong, Kangbo Hao, Zheng Fang, Henry D. I. Abarbanel

Abstract: Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of inte… ▽ More Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of interest in SDA are the conditional expected values of functions of the model state, and these require the approximate evaluation of high dimensional integrals. We introduce a conditional probability distribution and use the Laplace method with annealing to identify the maxima of the conditional probability distribution. The annealing method slowly increases the precision term of the model as it enters the Laplace method. In this paper, we extend the idea of precision annealing (PA) to Monte Carlo calculations of conditional expected values using Metropolis-Hastings methods. △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1812.02337 [pdf, ps, other]

Improved Inference on the Rank of a Matrix

Authors: Qihui Chen, Zheng Fang

Abstract: This paper develops a general framework for conducting inference on the rank of an unknown matrix $Π_0$. A defining feature of our setup is the null hypothesis of the form $\mathrm H_0: \mathrm{rank}(Π_0)\le r$. The problem is of first order importance because the previous literature focuses on $\mathrm H_0': \mathrm{rank}(Π_0)= r$ by implicitly assuming away $\mathrm{rank}(Π_0)<r$, which may lead… ▽ More This paper develops a general framework for conducting inference on the rank of an unknown matrix $Π_0$. A defining feature of our setup is the null hypothesis of the form $\mathrm H_0: \mathrm{rank}(Π_0)\le r$. The problem is of first order importance because the previous literature focuses on $\mathrm H_0': \mathrm{rank}(Π_0)= r$ by implicitly assuming away $\mathrm{rank}(Π_0)<r$, which may lead to invalid rank tests due to over-rejections. In particular, we show that limiting distributions of test statistics under $\mathrm H_0'$ may not stochastically dominate those under $\mathrm{rank}(Π_0)<r$. A multiple test on the nulls $\mathrm{rank}(Π_0)=0,\ldots,r$, though valid, may be substantially conservative. We employ a testing statistic whose limiting distributions under $\mathrm H_0$ are highly nonstandard due to the inherent irregular natures of the problem, and then construct bootstrap critical values that deliver size control and improved power. Since our procedure relies on a tuning parameter, a two-step procedure is designed to mitigate concerns on this nuisance. We additionally argue that our setup is also important for estimation. We illustrate the empirical relevance of our results through testing identification in linear IV models that allows for clustered data and inference on sorting dimensions in a two-sided matching model with transferrable utility. △ Less

Submitted 25 March, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

arXiv:1808.04447 [pdf, other]

Deep Learning Super-Resolution Enables Rapid Simultaneous Morphological and Quantitative Magnetic Resonance Imaging

Authors: Akshay Chaudhari, Zhongnan Fang, ** Hyung Lee, Garry Gold, Brian Hargreaves

Abstract: Obtaining magnetic resonance images (MRI) with high resolution and generating quantitative image-based biomarkers for assessing tissue biochemistry is crucial in clinical and research applications. How- ever, acquiring quantitative biomarkers requires high signal-to-noise ratio (SNR), which is at odds with high-resolution in MRI, especially in a single rapid sequence. In this paper, we demonstrate… ▽ More Obtaining magnetic resonance images (MRI) with high resolution and generating quantitative image-based biomarkers for assessing tissue biochemistry is crucial in clinical and research applications. How- ever, acquiring quantitative biomarkers requires high signal-to-noise ratio (SNR), which is at odds with high-resolution in MRI, especially in a single rapid sequence. In this paper, we demonstrate how super-resolution can be utilized to maintain adequate SNR for accurate quantification of the T2 relaxation time biomarker, while simultaneously generating high- resolution images. We compare the efficacy of resolution enhancement using metrics such as peak SNR and structural similarity. We assess accuracy of cartilage T2 relaxation times by comparing against a standard reference method. Our evaluation suggests that SR can successfully maintain high-resolution and generate accurate biomarkers for accelerating MRI scans and enhancing the value of clinical and research MRI. △ Less

Submitted 7 August, 2018; originally announced August 2018.

Comments: Accepted for the Machine Learning for Medical Image Reconstruction Workshop at MICCAI 2018

arXiv:1805.12507 [pdf, other]

Efficacy of regularized multi-task learning based on SVM models

Authors: Shaohan Chen, Zhou Fang, Sijie Lu, Chuanhou Gao

Abstract: This paper investigates the efficacy of a regularized multi-task learning (MTL) framework based on SVM (M-SVM) to answer whether MTL always provides reliable results and how MTL outperforms independent learning. We first find that M-SVM is Bayes risk consistent in the limit of large sample size. This implies that despite the task dissimilarities, M-SVM always produces a reliable decision rule for… ▽ More This paper investigates the efficacy of a regularized multi-task learning (MTL) framework based on SVM (M-SVM) to answer whether MTL always provides reliable results and how MTL outperforms independent learning. We first find that M-SVM is Bayes risk consistent in the limit of large sample size. This implies that despite the task dissimilarities, M-SVM always produces a reliable decision rule for each task in terms of misclassification error when the data size is large enough. Furthermore, we find that the task-interaction vanishes as the data size goes to infinity, and the convergence rates of M-SVM and its single-task counterpart have the same upper bound. The former suggests that M-SVM cannot improve the limit classifier's performance; based on the latter, we conjecture that the optimal convergence rate is not improved when the task number is fixed. As a novel insight of MTL, our theoretical and experimental results achieved an excellent agreement that the benefit of the MTL methods lies in the improvement of the pre-convergence-rate factor (PCR, to be denoted in Section III) rather than the convergence rate. Moreover, this improvement of PCR factors is more significant when the data size is small. △ Less

Submitted 20 February, 2022; v1 submitted 31 May, 2018; originally announced May 2018.

Comments: 12 pages, 4 figures

arXiv:1206.2716 [pdf, other]

Semiparametric Mixed Model for Evaluating Pathway-Environment Interaction

Authors: Zaili Fang, Inyoung Kim, Jeesun Jung

Abstract: A biological pathway represents a set of genes that serves a particular cellular or a physiological function. The genes within the same pathway are expected to function together and hence may interact with each other. It is also known that many genes, and so pathways, interact with other environmental variables. However, no formal procedure has yet been developed to evaluate the pathway-environmen… ▽ More A biological pathway represents a set of genes that serves a particular cellular or a physiological function. The genes within the same pathway are expected to function together and hence may interact with each other. It is also known that many genes, and so pathways, interact with other environmental variables. However, no formal procedure has yet been developed to evaluate the pathway-environment interaction. In this article, we propose a semiparametric method to model the pathway-environment interaction. The method connects a least square kernel machine and a semiparametric mixed effects model. We model nonparametrically the environmental effect via a natural cubic spline. Both a pathway effect and an interaction between a pathway and an environmental effect are modeled nonparametrically via a kernel machine, and we estimate variance component representing an interaction effect under a semiparametric mixed effects model. We then employ a restricted likelihood ratio test and a score test to evaluate the main pathway effect and the pathway-environment interaction. The approach was applied to a genetic pathway data of Type II diabetes, and pathways with either a significant main pathway effect, an interaction effect or both were identified. Other methods previously developed determined many as having a significant main pathway effect only. Furthermore, among those significant pathways, we discovered some pathways having a significant pathway-environment interaction effect, a result that other methods would not be able to detect. △ Less

Submitted 13 June, 2012; originally announced June 2012.

arXiv:1206.2715 [pdf, other]

A Graphical View of Bayesian Variable Selection

Authors: Zaili Fang, Inyoung Kim

Abstract: In recent years, Ising prior with the network information for the "in" or "out" binary random variable in Bayesian variable selections has received more and more attentions. In this paper, we discover that even without the informative prior a Bayesian variable selection problem itself can be considered as a complete graph and described by a Ising model with random interactions. There are many adva… ▽ More In recent years, Ising prior with the network information for the "in" or "out" binary random variable in Bayesian variable selections has received more and more attentions. In this paper, we discover that even without the informative prior a Bayesian variable selection problem itself can be considered as a complete graph and described by a Ising model with random interactions. There are many advantages of treating variable selection as a graphical model, such as it is easy to employ the single site updating as well as the cluster updating algorithm, suitable for problems with small sample size and larger variable number, easy to extend to nonparametric regression models and incorporate graphical prior information and so on. In a Bayesian variable selection Ising model the interactions are determined by the linear model coefficients, so we systematically study the performance of different scale normal mixture priors for the model coefficients by adopting the global-local shrinkage strategy. Our results prove that the best prior of the model coefficients in terms of variable selection should maintain substantial weight on small shrinkage instead of large shrinkage. We also discuss the connection between the tempering algorithms for Ising models and the global-local shrinkage approach, showing that the shrinkage parameter plays a tempering role. The methods are illustrated with simulated and real data. △ Less

Submitted 13 June, 2012; originally announced June 2012.

arXiv:1206.2696 [pdf, other]

Flexible Variable Selection for Recovering Sparsity in Nonadditive Nonparametric Models

Authors: Zaili Fang, Inyoung Kim, Patrick Schaumont

Abstract: Variable selection for recovering sparsity in nonadditive nonparametric models has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this paper we propose a variable selection approach that is developed by con… ▽ More Variable selection for recovering sparsity in nonadditive nonparametric models has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this paper we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric multiple regression model. The advantages of our approach are that it can: (1) recover the sparsity, (2) automatically model unknown and complicated interactions, (3) connect with several existing approaches including linear nonnegative garrote, kernel learning and automatic relevant determinants (ARD), and (4) provide flexibility for both additive and nonadditive nonparametric models. Our approach may be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a least squares kernel machine and construct the nonnegative garrote objective function as the function of the similarity matrix. Since the multiple regression similarity matrix can be written as an additive form of univariate similarity matrices corresponding to input variables, applying a sparse scale parameter on each univariate similarity matrix can reveal its relevance to the response variable. We also derive the asymptotic properties of our approach, and show that it provides a square root consistent estimator of the scale parameters. Furthermore, we prove that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions and give the necessary and sufficient conditions for sparsistency. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve power. △ Less

Submitted 12 June, 2012; originally announced June 2012.

arXiv:1111.4416 [pdf, other]

Sparse Group Selection Through Co-Adaptive Penalties

Authors: Zhou Fang

Abstract: Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well approximated by a fit involving only a small number of covariates -- a so called sparsity assumption, which leads to the Lasso and other methods. In many situations,… ▽ More Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well approximated by a fit involving only a small number of covariates -- a so called sparsity assumption, which leads to the Lasso and other methods. In many situations, however, the covariates can be considered to be structured, in that the selection of some variables favours the selection of others -- with variables organised into groups entering or leaving the model simultaneously as a special case. This structure creates a different form of sparsity. In this paper, we suggest the Co-adaptive Lasso to fit models accommodating this form of `group sparsity'. The Co-adaptive Lasso is fast and simple to calculate, and we show that it holds theoretical advantages over the Lasso, performs well under a broad set of conclusions, and is very competitive in empirical simulations in comparison with previously suggested algorithms like the Group Lasso and the Adaptive Lasso. △ Less

Submitted 18 November, 2011; originally announced November 2011.

arXiv:1006.2940 [pdf, other]

LASSO ISOtone for High Dimensional Additive Isotonic Regression

Authors: Zhou Fang, Nicolai Meinshausen

Abstract: Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling to a… ▽ More Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling to additive isotonic regression. Thus, it is viable in many situations with high dimensional predictor variables, where selection of significant versus insignificant variables are required. We suggest an algorithm involving a modification of the backfitting algorithm CPAV. We give a numerical convergence result, and finally examine some of its properties through simulations. We also suggest some possible extensions that improve performance, and allow calculation to be carried out when the direction of the monotonicity is unknown. △ Less

Submitted 15 June, 2010; originally announced June 2010.

Showing 1–38 of 38 results for author: Fang, Z