-
Online robust estimation and bootstrap inference for function-on-scalar regression
Authors:
Guanghui Cheng,
Wenjuan Hu,
Ruitao Lin,
Chen Wang
Abstract:
We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datase…
▽ More
We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datasets, eliminating the need to store large volumes of data in memory. We establish the almost sure consistency, $L_p$ convergence, and asymptotic normality of the online estimator. To enable efficient and fast inference of the parameters of interest, including the derivation of confidence intervals, we also develop an innovative two-step online bootstrap procedure to approximate the limiting error distribution of the robust online estimator. Numerical studies under a variety of scenarios demonstrate the effectiveness and efficiency of the proposed online learning method. A real application analyzing PM$_{2.5}$ air-quality data is also included to exemplify the proposed online approach.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
DEMO: Dose Exploration, Monitoring, and Optimization Using a Biological Mediator for Clinical Outcomes
Authors:
Cheng-Han Yang,
Peter F. Thall,
Ruitao Lin
Abstract:
Phase 1-2 designs provide a methodological advance over phase 1 designs for dose finding by using both clinical response and toxicity. A phase 1-2 trial still may fail to select a truly optimal dose. because early response is not a perfect surrogate for long term therapeutic success. To address this problem, a generalized phase 1-2 design first uses a phase 1-2 design's components to identify a se…
▽ More
Phase 1-2 designs provide a methodological advance over phase 1 designs for dose finding by using both clinical response and toxicity. A phase 1-2 trial still may fail to select a truly optimal dose. because early response is not a perfect surrogate for long term therapeutic success. To address this problem, a generalized phase 1-2 design first uses a phase 1-2 design's components to identify a set of candidate doses, adaptively randomizes patients among the candidates, and after longer follow up selects a dose to maximize long-term success rate. In this paper, we extend this paradigm by proposing a design that exploits an early treatment-related, real-valued biological outcome, such as pharmacodynamic activity or an immunological effect, that may act as a mediator between dose and clinical outcomes, including tumor response, toxicity, and survival time. We assume multivariate dose-outcome models that include effects appearing in causal pathways from dose to the clinical outcomes. Bayesian model selection is used to identify and eliminate biologically inactive doses. At the end of the trial, a therapeutically optimal dose is chosen from the set of doses that are acceptably safe, clinically effective, and biologically active to maximize restricted mean survival time. Results of a simulation study show that the proposed design may provide substantial improvements over designs that ignore the biological variable.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Cluster-based Method for Eavesdrop** Identification and Localization in Optical Links
Authors:
Haokun Song,
Rui Lin,
Andrea Sgambelluri,
Filippo Cugini,
Yajie Li,
Jie Zhang,
Paolo Monti
Abstract:
We propose a cluster-based method to detect and locate eavesdrop** events in optical line systems characterized by small power losses. Our findings indicate that detecting such subtle losses from eavesdrop** can be accomplished solely through optical performance monitoring (OPM) data collected at the receiver. On the other hand, the localization of such events can be effectively achieved by le…
▽ More
We propose a cluster-based method to detect and locate eavesdrop** events in optical line systems characterized by small power losses. Our findings indicate that detecting such subtle losses from eavesdrop** can be accomplished solely through optical performance monitoring (OPM) data collected at the receiver. On the other hand, the localization of such events can be effectively achieved by leveraging in-line OPM data.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Fixed-Point Algorithms for Solving the Critical Value and Upper Tail Quantile of Kuiper's Statistics
Authors:
Hong-Yan Zhang,
Wei Sun,
Xiao Chen,
Rui-Jia Lin,
Yu Zhou
Abstract:
Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem to solve the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to the difficulties of solving the nonlinear equation and reasonable approximation of infinite series. In this work, the contr…
▽ More
Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem to solve the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to the difficulties of solving the nonlinear equation and reasonable approximation of infinite series. In this work, the contributions lie in three perspectives: firstly, the second order approximation for the infinite series of the cumulative distribution of the critical value is used to achieve higher precision; secondly, the principles and fixed-point algorithms for solving the Kuiper pair are presented with details; finally, finally, a mistake about the critical value $c^α_n$ for $(α, n)=(0.01,30)$ in Kuiper's distribution table has been labeled and corrected where $n$ is the sample capacity and $α$ is the upper tail quantile. The algorithms are verified and validated by comparing with the table provided by Kuiper. The methods and algorithms proposed are enlightening and worth of introducing to the college students, computer programmers, engineers, experimental psychologists and so on.
△ Less
Submitted 23 March, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Design and Sample Size Determination for Multiple-dose Randomized Phase II Trials for Dose Optimization
Authors:
Peng Yang,
Daniel Li,
Ruitao Lin,
Bo Huang,
Ying Yuan
Abstract:
The conventional more-is-better dose selection paradigm, which targets the maximum tolerated dose (MTD), is not suitable for the development of targeted therapies and immunotherapies as the efficacy of these novel therapies may not increase with the dose. The U.S. Food and Drug Administration (FDA) has launched Project Optimus "to reform the dose optimization and dose selection paradigm in oncolog…
▽ More
The conventional more-is-better dose selection paradigm, which targets the maximum tolerated dose (MTD), is not suitable for the development of targeted therapies and immunotherapies as the efficacy of these novel therapies may not increase with the dose. The U.S. Food and Drug Administration (FDA) has launched Project Optimus "to reform the dose optimization and dose selection paradigm in oncology drug development", and recently published a draft guidance on dose optimization, which outlines various approaches to achieve this goal. One highlighted approach involves conducting a randomized phase II trial following the completion of a phase I trial, where multiple doses (typically including the MTD and one or two doses lower than the MTD) are compared to identify the optimal dose that maximizes the benefit-risk tradeoff. This paper focuses on the design of such a multiple-dose randomized trial, specifically the determination of the sample size. We propose a MERIT (Multiple-dosE RandomIzed Trial design for dose optimization based on toxicity and efficacy) design that can be easily implemented with pre-calculated decision boundaries included in the protocol. We generalized the standard definitions of type I error and power to accommodate the unique characteristics of dose optimization and derived a decision rule along with an algorithm to determine the optimal sample size. Simulation studies demonstrate that the resulting MERIT design has desirable operating characteristics. To facilitate the implementation of the MERIT design, we provide software, available at www.trialdesign.org.
△ Less
Submitted 29 August, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Scalable Model-based Policy Optimization for Decentralized Networked Systems
Authors:
Yali Du,
Chengdong Ma,
Yuchen Liu,
Runji Lin,
Hao Dong,
Jun Wang,
Yaodong Yang
Abstract:
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networke…
▽ More
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.
△ Less
Submitted 1 September, 2022; v1 submitted 13 July, 2022;
originally announced July 2022.
-
On singular values of large dimensional lag-tau sample autocorrelation matrices
Authors:
Zhanting Long,
Zeng Li,
Ruitao Lin
Abstract:
We study the limiting behavior of singular values of a lag-$τ$ sample auto-correlation matrix $\bf{R}_τ^ε$ of error term $ε$ in the high-dimensional factor model. We establish the limiting spectral distribution (LSD) which characterizes the global spectrum of $\bf{R}_τ^ε$, and derive the limit of its largest singular value. All the asymptotic results are derived under the high-dimensional asymptot…
▽ More
We study the limiting behavior of singular values of a lag-$τ$ sample auto-correlation matrix $\bf{R}_τ^ε$ of error term $ε$ in the high-dimensional factor model. We establish the limiting spectral distribution (LSD) which characterizes the global spectrum of $\bf{R}_τ^ε$, and derive the limit of its largest singular value. All the asymptotic results are derived under the high-dimensional asymptotic regime where the data dimension and sample size go to infinity proportionally. Under mild assumptions, we show that the LSD of $\bf{R}_τ^ε$ is the same as that of the lag-$τ$ sample auto-covariance matrix. Based on this asymptotic equivalence, we additionally show that the largest singular value of $\bf{R}_τ^ε$ converges almost surely to the right end point of the support of its LSD. Our results take the first step to identify the number of factors in factor analysis using lag-$τ$ sample auto-correlation matrices. Our theoretical results are fully supported by numerical experiments as well.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
A meta-analytic framework to adjust for bias in external control studies
Authors:
Devin Incerti,
Michael T Bretscher,
Ray Lin,
Chris Harbron
Abstract:
While randomized controlled trials (RCTs) are the gold standard for estimating treatment effects in medical research, there is increasing use of and interest in using real-world data for drug development. One such use case is the construction of external control arms for evaluation of efficacy in single-arm trials, particularly in cases where randomization is either infeasible or unethical. Howeve…
▽ More
While randomized controlled trials (RCTs) are the gold standard for estimating treatment effects in medical research, there is increasing use of and interest in using real-world data for drug development. One such use case is the construction of external control arms for evaluation of efficacy in single-arm trials, particularly in cases where randomization is either infeasible or unethical. However, it is well known that treated patients in non-randomized studies may not be comparable to control patients -- on either measured or unmeasured variables -- and that the underlying population differences between the two groups may result in biased treatment effect estimates as well as increased variability in estimation. To address these challenges for analyses of time-to-event outcomes, we developed a meta-analytic framework that uses historical reference studies to adjust a log hazard ratio estimate in a new external control study for its additional bias and variability. The set of historical studies is formed by constructing external control arms for historical RCTs, and a meta-analysis compares the trial controls to the external control arms. Importantly, a prospective external control study can be performed independently of the meta-analysis using standard causal inference techniques for observational data. We illustrate our approach with a simulation study and an empirical example based on reference studies for advanced non-small cell lung cancer. In our empirical analysis, external control patients had lower survival than trial controls (hazard ratio: 0.907), but our methodology is able to correct for this bias. An implementation of our approach is available in the R package ecmeta.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
From Two-Class Linear Discriminant Analysis to Interpretable Multilayer Perceptron Design
Authors:
Ruiyuan Lin,
Zhiruo Zhou,
Suya You,
Raghuveer Rao,
C. -C. Jay Kuo
Abstract:
A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussian-distributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer…
▽ More
A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussian-distributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer $l_{in}$ and output layer $l_{out}$, the MLP of interest consists of two intermediate layers, $l_1$ and $l_2$. We propose a feedforward design that has three stages: 1) from $l_{in}$ to $l_1$: half-space partitionings accomplished by multiple parallel LDAs, 2) from $l_1$ to $l_2$: subspace isolation where one Gaussian modality is represented by one neuron, 3) from $l_2$ to $l_{out}$: class-wise subspace mergence, where each Gaussian modality is connected to its target class. Through this process, we present an automatic MLP design that can specify the network architecture (i.e., the layer number and the neuron number at a layer) and all filter weights in a feedforward one-pass fashion. This design can be generalized to an arbitrary distribution by leveraging the Gaussian mixture model (GMM). Experiments are conducted to compare the performance of the traditional backpropagation-based MLP (BP-MLP) and the new feedforward MLP (FF-MLP).
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Orthogonal Over-Parameterized Training
Authors:
Weiyang Liu,
Rongmei Lin,
Zhen Liu,
James M. Rehg,
Liam Paull,
Li Xiong,
Le Song,
Adrian Weller
Abstract:
The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By…
▽ More
The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the empirical generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We consider multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. For better scalability, we propose the stochastic OPT which performs orthogonal transformation stochastically for partial dimensions of neurons. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT over the standard training.
△ Less
Submitted 4 June, 2021; v1 submitted 9 April, 2020;
originally announced April 2020.
-
HOTCAKE: Higher Order Tucker Articulated Kernels for Deeper CNN Compression
Authors:
Rui Lin,
Ching-Yun Ko,
Zhuolun He,
Cong Chen,
Yuan Cheng,
Hao Yu,
Graziano Chesi,
Ngai Wong
Abstract:
The emerging edge computing has promoted immense interests in compacting a neural network without sacrificing much accuracy. In this regard, low-rank tensor decomposition constitutes a powerful tool to compress convolutional neural networks (CNNs) by decomposing the 4-way kernel tensor into multi-stage smaller ones. Building on top of Tucker-2 decomposition, we propose a generalized Higher Order T…
▽ More
The emerging edge computing has promoted immense interests in compacting a neural network without sacrificing much accuracy. In this regard, low-rank tensor decomposition constitutes a powerful tool to compress convolutional neural networks (CNNs) by decomposing the 4-way kernel tensor into multi-stage smaller ones. Building on top of Tucker-2 decomposition, we propose a generalized Higher Order Tucker Articulated Kernels (HOTCAKE) scheme comprising four steps: input channel decomposition, guided Tucker rank selection, higher order Tucker decomposition and fine-tuning. By subjecting each CONV layer to HOTCAKE, a highly compressed CNN model with graceful accuracy trade-off is obtained. Experiments show HOTCAKE can compress even pre-compressed models and produce state-of-the-art lightweight networks.
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
Alternative Analysis Methods for Time to Event Endpoints under Non-proportional Hazards: A Comparative Analysis
Authors:
Ray S. Lin,
Ji Lin,
Satrajit Roychoudhury,
Keaven M. Anderson,
Tianle Hu,
Bo Huang,
Larry F Leon,
Jason JZ Liao,
Rong Liu,
Xiaodong Luo,
Pralay Mukhopadhyay,
Rui Qin,
Kay Tatsuoka,
Xue**g Wang,
Yang Wang,
Jian Zhu,
Tai-Tsang Chen,
Renee Iacona,
Cross-Pharma Non-proportional Hazards Working Group
Abstract:
The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods are needed to restore the efficiency of statistical testing. Three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan-Meier curve-based tests (including weighted Kaplan…
▽ More
The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods are needed to restore the efficiency of statistical testing. Three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan-Meier curve-based tests (including weighted Kaplan-Meier and Restricted Mean Survival Time, RMST), and combination tests (including Breslow test, Lee's combo test, and MaxCombo test). Nine scenarios representing the PH and various non-PH patterns were simulated. The power, type I error, and effect estimates of each method were compared. In general, all tests control type I error well. There is not a single most powerful test across all scenarios. In the absence of prior knowledge regarding the PH or non-PH patterns, the MaxCombo test is relatively robust across patterns. Since the treatment effect changes overtime under non-PH, the overall profile of the treatment effect may not be represented comprehensively based on a single measure. Thus, multiple measures of the treatment effect should be pre-specified as sensitivity analyses to evaluate the totality of the data.
△ Less
Submitted 20 September, 2019;
originally announced September 2019.
-
On Reproducing Kernel Banach Spaces: Generic Definitions and Unified Framework of Constructions
Authors:
Rongrong Lin,
Haizhang Zhang,
Jun Zhang
Abstract:
Recently, there has been emerging interest in constructing reproducing kernel Banach spaces (RKBS) for applied and theoretical purposes such as machine learning, sampling reconstruction, sparse approximation and functional analysis. Existing constructions include the reflexive RKBS via a bilinear form, the semi-inner-product RKBS, the RKBS with $\ell^1$ norm, the $p$-norm RKBS via generalized Merc…
▽ More
Recently, there has been emerging interest in constructing reproducing kernel Banach spaces (RKBS) for applied and theoretical purposes such as machine learning, sampling reconstruction, sparse approximation and functional analysis. Existing constructions include the reflexive RKBS via a bilinear form, the semi-inner-product RKBS, the RKBS with $\ell^1$ norm, the $p$-norm RKBS via generalized Mercer kernels, etc. The definitions of RKBS and the associated reproducing kernel in those references are dependent on the construction. Moreover, relations among those constructions are unclear. We explore a generic definition of RKBS and the reproducing kernel for RKBS that is independent of construction. Furthermore, we propose a framework of constructing RKBSs that unifies existing constructions mentioned above via a continuous bilinear form and a pair of feature maps. A new class of Orlicz RKBSs is proposed. Finally, we develop representer theorems for machine learning in RKBSs constructed in our framework, which also unifies representer theorems in existing RKBSs.
△ Less
Submitted 8 December, 2021; v1 submitted 4 January, 2019;
originally announced January 2019.
-
TOP: Time-to-Event Bayesian Optimal Phase II Trial Design for Cancer Immunotherapy
Authors:
Ruitao Lin,
Robert L Coleman,
Ying Yuan
Abstract:
Immunotherapies have revolutionized cancer treatment. Unlike chemotherapies, immune agents often take longer time to show benefit, and the complex and unique mechanism of action of these agents renders the use of multiple endpoints more appropriate in some trials. These new features of immunotherapy make conventional phase II trial designs, which assume a single binary endpoint that is quickly asc…
▽ More
Immunotherapies have revolutionized cancer treatment. Unlike chemotherapies, immune agents often take longer time to show benefit, and the complex and unique mechanism of action of these agents renders the use of multiple endpoints more appropriate in some trials. These new features of immunotherapy make conventional phase II trial designs, which assume a single binary endpoint that is quickly ascertainable, inefficient and dysfunctional. We propose a flexible and efficient time-to-event Bayesian optimal phase II (TOP) design. The TOP design is efficient in that it allows real-time "go/no-go" interim decision making in the presence of late-onset responses by using all available data, and maximizes the statistical power for detecting effective treatments. TOP is flexible in the number of interim looks and capable of handling simple and complicated endpoints under a unified framework. We conduct simulation studies to evaluate the operating characteristics of the TOP design.Compared to some existing designs, the TOP design shortens the trial duration and has higher power to detect effective treatment with well controlled type I errors. The TOP design allows for making real-time "go/no-go" interim decisions in the presence of late-onset responses, and is capable of handling various types of endpoints under a unified framework. It is transparent and easy to implement as its decision rules can be tabulated and included in the protocol prior to the conduct of the trial. The TOP design provides a flexible, efficient and easy-to-implement method to accelerate and improve the development of immunotherapies.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.
-
Time-to-Event Model-Assisted Designs to Accelerate Phase I Clinical Trials
Authors:
Ruitao Lin,
Ying Yuan
Abstract:
Two useful strategies to speed up drug development are to increase the patient accrual rate and use novel adaptive designs. Unfortunately, these two strategies often conflict when the evaluation of the outcome cannot keep pace with the patient accrual rate and thus the interim data cannot be observed in time to make adaptive decisions. A similar logistic difficulty arises when the outcome is of la…
▽ More
Two useful strategies to speed up drug development are to increase the patient accrual rate and use novel adaptive designs. Unfortunately, these two strategies often conflict when the evaluation of the outcome cannot keep pace with the patient accrual rate and thus the interim data cannot be observed in time to make adaptive decisions. A similar logistic difficulty arises when the outcome is of late onset. Based on a novel formulation and approximation of the likelihood of the observed data, we propose a general methodology for model-assisted designs to handle toxicity data that are pending due to fast accrual or late-onset toxicity, and facilitate seamless decision making in phase I dose-finding trials. The dose escalation/de-escalation rules of the proposed time-to-event model-assisted designs can be tabulated before the trial begins, which greatly simplifies trial conduct in practice compared to that under existing methods. We show that the proposed designs have desirable finite and large-sample properties and yield performance that is superior to that of more complicated model-based designs. We provide user-friendly software for implementing the designs.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
Learning towards Minimum Hyperspherical Energy
Authors:
Weiyang Liu,
Rongmei Lin,
Zhen Liu,
Lixin Liu,
Zhiding Yu,
Bo Dai,
Le Song
Abstract:
Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications. While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary comput…
▽ More
Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications. While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary computation cost. As a result, how to regularize the network to avoid undesired representation redundancy becomes an important issue. To this end, we draw inspiration from a well-known problem in physics -- Thomson problem, where one seeks to find a state that distributes N electrons on a unit sphere as evenly as possible with minimum potential energy. In light of this intuition, we reduce the redundancy regularization problem to generic energy minimization, and propose a minimum hyperspherical energy (MHE) objective as generic regularization for neural networks. We also propose a few novel variants of MHE, and provide some insights from a theoretical point of view. Finally, we apply neural networks with MHE regularization to several challenging tasks. Extensive experiments demonstrate the effectiveness of our intuition, by showing the superior performance with MHE regularization.
△ Less
Submitted 22 July, 2020; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Deformable Part Networks
Authors:
Ziming Zhang,
Rongmei Lin,
Alan Sullivan
Abstract:
In this paper we propose novel Deformable Part Networks (DPNs) to learn {\em pose-invariant} representations for 2D object recognition. In contrast to the state-of-the-art pose-aware networks such as CapsNet \cite{sabour2017dynamic} and STN \cite{jaderberg2015spatial}, DPNs can be naturally {\em interpreted} as an efficient solver for a challenging detection problem, namely Localized Deformable Pa…
▽ More
In this paper we propose novel Deformable Part Networks (DPNs) to learn {\em pose-invariant} representations for 2D object recognition. In contrast to the state-of-the-art pose-aware networks such as CapsNet \cite{sabour2017dynamic} and STN \cite{jaderberg2015spatial}, DPNs can be naturally {\em interpreted} as an efficient solver for a challenging detection problem, namely Localized Deformable Part Models (LDPMs) where localization is introduced to DPMs as another latent variable for searching for the best poses of objects over all pixels and (predefined) scales. In particular we construct DPNs as sequences of such LDPM units to model the semantic and spatial relations among the deformable parts as hierarchical composition and spatial parsing trees. Empirically our 17-layer DPN can outperform both CapsNets and STNs significantly on affNIST \cite{sabour2017dynamic}, for instance, by 19.19\% and 12.75\%, respectively, with better generalization and better tolerance to affine transformations.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
Decoupled Networks
Authors:
Weiyang Liu,
Zhen Liu,
Zhiding Yu,
Bo Dai,
Rongmei Lin,
Yisen Wang,
James M. Rehg,
Le Song
Abstract:
Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framewor…
▽ More
Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness.
△ Less
Submitted 22 April, 2018;
originally announced April 2018.
-
Deep learning based supervised semantic segmentation of Electron Cryo-Subtomograms
Authors:
Chang Liu,
Xiangrui Zeng,
Ruogu Lin,
Xiaodan Liang,
Zachary Freyberg,
Eric Xing,
Min Xu
Abstract:
Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution. It enables analyzing the native structures of macromolecular complexes and their spatial organization inside single cells. However, due to the high degree of structural complexity and practical imaging limitations, systematic macromole…
▽ More
Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution. It enables analyzing the native structures of macromolecular complexes and their spatial organization inside single cells. However, due to the high degree of structural complexity and practical imaging limitations, systematic macromolecular structural recovery inside CECT images remains challenging. Particularly, the recovery of a macromolecule is likely to be biased by its neighbor structures due to the high molecular crowding. To reduce the bias, here we introduce a novel 3D convolutional neural network inspired by Fully Convolutional Network and Encoder-Decoder Architecture for the supervised segmentation of macromolecules of interest in subtomograms. The tests of our models on realistically simulated CECT data demonstrate that our new approach has significantly improved segmentation performance compared to our baseline approach. Also, we demonstrate that the proposed model has generalization ability to segment new structures that do not exist in training data.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Statistical Properties of the Keyboard Design with Extension to Drug-Combination Trials
Authors:
Haitao Pan,
Ruitao Lin,
Ying Yuan
Abstract:
The keyboard design is a novel phase I dose-finding method that is simple and has good operating characteristics. This paper studies theoretical properties of the keyboard design, including the optimality of its decision rules, coherence in dose transition, and convergence to the target dose. Establishing these theoretical properties explains the mechanism of the design and provides assurance to p…
▽ More
The keyboard design is a novel phase I dose-finding method that is simple and has good operating characteristics. This paper studies theoretical properties of the keyboard design, including the optimality of its decision rules, coherence in dose transition, and convergence to the target dose. Establishing these theoretical properties explains the mechanism of the design and provides assurance to practitioners regarding the behavior of the keyboard design. We further extend the keyboard design to dual-agent dose-finding trials, which inherit the same statistical properties and simplicity as the single-agent keyboard design. Extensive simulations are conducted to evaluate the performance of the proposed keyboard drug-combination design using a novel, random two-dimensional dose--toxicity scenario generating algorithm. The simulation results confirm the desirable and competitive operating characteristics of the keyboard design as established by the theoretical study. An R Shiny application is developed to facilitate implementing the keyboard combination design in practice.
△ Less
Submitted 18 December, 2017;
originally announced December 2017.
-
Robust Elastic Net Regression
Authors:
Weiyang Liu,
Rongmei Lin,
Meng Yang
Abstract:
We propose a robust elastic net (REN) model for high-dimensional sparse regression and give its performance guarantees (both the statistical error bound and the optimization bound). A simple idea of trimming the inner product is applied to the elastic net model. Specifically, we robustify the covariance matrix by trimming the inner product based on the intuition that the trimmed inner product can…
▽ More
We propose a robust elastic net (REN) model for high-dimensional sparse regression and give its performance guarantees (both the statistical error bound and the optimization bound). A simple idea of trimming the inner product is applied to the elastic net model. Specifically, we robustify the covariance matrix by trimming the inner product based on the intuition that the trimmed inner product can not be significant affected by a bounded number of arbitrarily corrupted points (outliers). The REN model can also derive two interesting special cases: robust Lasso and robust soft thresholding. Comprehensive experimental results show that the robustness of the proposed model consistently outperforms the original elastic net and matches the performance guarantees nicely.
△ Less
Submitted 1 May, 2016; v1 submitted 15 November, 2015;
originally announced November 2015.