Search | arXiv e-print repository

arXiv:2406.02028 [pdf]

How should parallel cluster randomized trials with a baseline period be analyzed? A survey of estimands and common estimators

Authors: Kenneth Menglin Lee, Fan Li

Abstract: The parallel cluster randomized trial with baseline (PB-CRT) is a common variant of the standard parallel cluster randomized trial (P-CRT) that maintains parallel randomization but additionally allows for both within and between-cluster comparisons. We define two estimands of interest in the context of PB-CRTs, the participant-average treatment effect (pATE) and cluster-average treatment effect (c… ▽ More The parallel cluster randomized trial with baseline (PB-CRT) is a common variant of the standard parallel cluster randomized trial (P-CRT) that maintains parallel randomization but additionally allows for both within and between-cluster comparisons. We define two estimands of interest in the context of PB-CRTs, the participant-average treatment effect (pATE) and cluster-average treatment effect (cATE), to address participant and cluster-level hypotheses. Previous work has indicated that under informative cluster sizes, commonly used mixed-effects models may yield inconsistent estimators for the estimands of interest. In this work, we theoretically derive the convergence of the unweighted and inverse cluster-period size weighted (i.) independence estimating equation, (ii.) fixed-effects model, (iii.) exchangeable mixed-effects model, and (iv.) nested-exchangeable mixed-effects model treatment effect estimators in a PB-CRT with continuous outcomes. We report a simulation study to evaluate the bias and inference with these different treatment effect estimators and their corresponding model-based or jackknife variance estimators. We then re-analyze a PB-CRT examining the effects of community youth teams on improving mental health among adolescent girls in rural eastern India. We demonstrate that the unweighted and weighted independence estimating equation and fixed-effects model regularly yield consistent estimators for the pATE and cATE estimands, whereas the mixed-effects models yield inconsistent estimators under informative cluster sizes. However, we demonstrate that unlike the nested-exchangeable mixed-effects model and corresponding analyses in P-CRTs, the exchangeable mixed-effects model is surprisingly robust to bias in many PB-CRT scenarios. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 77 pages, 16 figures

arXiv:2405.15053 [pdf, other]

A Latent Variable Approach to Learning High-dimensional Multivariate longitudinal Data

Authors: Sze Ming Lee, Yunxiao Chen, Tony Sit

Abstract: High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduce… ▽ More High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduces unobserved factors to account for the between-variable and across-time dependence and assist the prediction. Statistical inference and prediction tools are developed under a general setting that allows outcome variables to be of mixed types and possibly unobserved for certain time points, for example, due to right censoring. A central limit theorem is established for drawing statistical inferences on regression coefficients. Additionally, an information criterion is introduced to choose the number of factors. The proposed model is applied to customer grocery shop** records to predict and understand shop** behavior. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.10221 [pdf, other]

Scalarisation-based risk concepts for robust multi-objective optimisation

Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

Abstract: Robust optimisation is a well-established framework for optimising functions in the presence of uncertainty. The inherent goal of this problem is to identify a collection of inputs whose outputs are both desirable for the decision maker, whilst also being robust to the underlying uncertainties in the problem. In this work, we study the multi-objective extension of this problem from a computational… ▽ More Robust optimisation is a well-established framework for optimising functions in the presence of uncertainty. The inherent goal of this problem is to identify a collection of inputs whose outputs are both desirable for the decision maker, whilst also being robust to the underlying uncertainties in the problem. In this work, we study the multi-objective extension of this problem from a computational standpoint. We identify that the majority of all robust multi-objective algorithms rely on two key operations: robustification and scalarisation. Robustification refers to the strategy that is used to marginalise over the uncertainty in the problem. Whilst scalarisation refers to the procedure that is used to encode the relative importance of each objective. As these operations are not necessarily commutative, the order that they are performed in has an impact on the resulting solutions that are identified and the final decisions that are made. This work aims to give an exposition on the philosophical differences between these two operations and highlight when one should opt for one ordering over the other. As part of our analysis, we showcase how many existing risk concepts can be easily integrated into the specification and solution of a robust multi-objective optimisation problem. Besides this, we also demonstrate how one can principally define the notion of a robust Pareto front and a robust performance metric based on our robustify and scalarise methodology. To illustrate the efficacy of these new ideas, we present two insightful numerical case studies which are based on real-world data sets. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: The code is available at: https://github.com/benmltu/scalarize

arXiv:2405.01404 [pdf, other]

Random Pareto front surfaces

Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

Abstract: The goal of multi-objective optimisation is to identify the Pareto front surface which is the set obtained by connecting the best trade-off points. Typically this surface is computed by evaluating the objectives at different points and then interpolating between the subset of the best evaluated trade-off points. In this work, we propose to parameterise the Pareto front surface using polar coordina… ▽ More The goal of multi-objective optimisation is to identify the Pareto front surface which is the set obtained by connecting the best trade-off points. Typically this surface is computed by evaluating the objectives at different points and then interpolating between the subset of the best evaluated trade-off points. In this work, we propose to parameterise the Pareto front surface using polar coordinates. More precisely, we show that any Pareto front surface can be equivalently represented using a scalar-valued length function which returns the projected length along any positive radial direction. We then use this representation in order to rigorously develop the theory and applications of stochastic Pareto front surfaces. In particular, we derive many Pareto front surface statistics of interest such as the expectation, covariance and quantiles. We then discuss how these can be used in practice within a design of experiments setting, where the goal is to both infer and use the Pareto front surface distribution in order to make effective decisions. Our framework allows for clear uncertainty quantification and we also develop advanced visualisation techniques for this purpose. Finally we discuss the applicability of our ideas within multivariate extreme value theory and illustrate our methodology in a variety of numerical examples, including a case study with a real-world air pollution data set. △ Less

Submitted 21 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: The code is available at: https://github.com/benmltu/scalarize

arXiv:2404.17709 [pdf, other]

Low-rank Matrix Bandits with Heavy-tailed Rewards

Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

Abstract: In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank… ▽ More In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank matrix bandit with \underline{h}eavy-\underline{t}ailed \underline{r}ewards (LowHTR), where the rewards only have finite $(1+δ)$ moment for some $δ\in (0,1]$. By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde O(d^\frac{3}{2}r^\frac{1}{2}T^\frac{1}{1+δ}/\tilde{D}_{rr})$ without knowing $T$, which matches the state-of-the-art regret bound under sub-Gaussian noises~\citep{lu2021low,kang2022efficient} with $δ= 1$. Moreover, we establish a lower bound of the order $Ω(d^\fracδ{1+δ} r^\fracδ{1+δ} T^\frac{1}{1+δ}) = Ω(T^\frac{1}{1+δ})$ for LowHTR, which indicates our LOTUS is nearly optimal in the order of $T$. In addition, we improve LOTUS so that it does not require knowledge of the rank $r$ with $\tilde O(dr^\frac{3}{2}T^\frac{1+δ}{1+2δ})$ regret bound, and it is efficient under the high-dimensional scenario. We also conduct simulations to demonstrate the practical superiority of our algorithm. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: The 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

arXiv:2404.08169 [pdf, other]

AutoGFI: Streamlined Generalized Fiducial Inference for Modern Inference Problems

Authors: Wei Du, Jan Hannig, Thomas C. M. Lee, Yi Su, Chunzhe Zhang

Abstract: The origins of fiducial inference trace back to the 1930s when R. A. Fisher first introduced the concept as a response to what he perceived as a limitation of Bayesian inference - the requirement for a subjective prior distribution on model parameters in cases where no prior information was available. However, Fisher's initial fiducial approach fell out of favor as complications arose, particularl… ▽ More The origins of fiducial inference trace back to the 1930s when R. A. Fisher first introduced the concept as a response to what he perceived as a limitation of Bayesian inference - the requirement for a subjective prior distribution on model parameters in cases where no prior information was available. However, Fisher's initial fiducial approach fell out of favor as complications arose, particularly in multi-parameter problems. In the wake of 2000, amidst a renewed interest in contemporary adaptations of fiducial inference, generalized fiducial inference (GFI) emerged to extend Fisher's fiducial argument, providing a promising avenue for addressing numerous crucial and practical inference challenges. Nevertheless, the adoption of GFI has been limited due to its often demanding mathematical derivations and the necessity for implementing complex Markov Chain Monte Carlo algorithms. This complexity has impeded its widespread utilization and practical applicability. This paper presents a significant advancement by introducing an innovative variant of GFI designed to alleviate these challenges. Specifically, this paper proposes AutoGFI, an easily implementable algorithm that streamlines the application of GFI to a broad spectrum of inference problems involving additive noise. AutoGFI can be readily implemented as long as a fitting routine is available, making it accessible to a broader audience of researchers and practitioners. To demonstrate its effectiveness, AutoGFI is applied to three contemporary and challenging problems: tensor regression, matrix completion, and regression with network cohesion. These case studies highlight the immense potential of GFI and illustrate AutoGFI's promising performance when compared to specialized solutions for these problems. Overall, this research paves the way for a more accessible and powerful application of GFI in a range of practical domains. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2402.17834 [pdf, other]

Stable LM 2 1.6B Technical Report

Authors: Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme

Abstract: We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including z… ▽ More We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 23 pages, 6 figures

arXiv:2401.07298 [pdf, other]

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

Abstract: In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $Θ^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized lo… ▽ More In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $Θ^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: Revision of the paper accepted by NeurIPS 2022

arXiv:2312.00622 [pdf, other]

Practical Path-based Bayesian Optimization

Authors: Jose Pablo Folch, James Odgers, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

Abstract: There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itse… ▽ More There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itself, and (b) the cost of changing the input parameters. In this short paper, we extend the SnAKe algorithm to deal with both types of costs simultaneously. We further propose extensions to the case of a maximum allowable input change, as well as to the multi-objective setting. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 6 main pages, 12 with references and appendix. 4 figures, 2 tables. To appear in NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

Journal ref: NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

arXiv:2308.11013 [pdf, other]

doi 10.1016/j.artmed.2023.102620

Personalized Event Prediction for Electronic Health Records

Authors: Jeong Min Lee, Milos Hauskrecht

Abstract: Clinical event sequences consist of hundreds of clinical events that represent records of patient care in time. Develo** accurate predictive models of such sequences is of a great importance for supporting a variety of models for interpreting/classifying the current patient condition, or predicting adverse clinical events and outcomes, all aimed to improve patient care. One important challenge o… ▽ More Clinical event sequences consist of hundreds of clinical events that represent records of patient care in time. Develo** accurate predictive models of such sequences is of a great importance for supporting a variety of models for interpreting/classifying the current patient condition, or predicting adverse clinical events and outcomes, all aimed to improve patient care. One important challenge of learning predictive models of clinical sequences is their patient-specific variability. Based on underlying clinical conditions, each patient's sequence may consist of different sets of clinical events (observations, lab results, medications, procedures). Hence, simple population-wide models learned from event sequences for many different patients may not accurately predict patient-specific dynamics of event sequences and their differences. To address the problem, we propose and investigate multiple new event sequence prediction models and methods that let us better adjust the prediction for individual patients and their specific conditions. The methods developed in this work pursue refinement of population-wide models to subpopulations, self-adaptation, and a meta-level model switching that is able to adaptively select the model with the best chance to support the immediate prediction. We analyze and test the performance of these models on clinical event sequences of patients in MIMIC-III database. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2104.01787

Journal ref: Artificial Intelligence in Medicine, Volume 143, 2023, 102620, ISSN 0933-3657

arXiv:2308.07076 [pdf, ps, other]

Subsample Least Squares Estimator for Heterogeneous Effects of Multiple Treatments with Any Outcome Variable

Authors: Myoungjae Lee

Abstract: For multiple treatments D=0,1,...,J, covariates X and outcome Y, the ordinary least squares estimator (OLS) of Y on (D1,...,DJ,X) is widely applied to a constant-effect linear model, where Dj is the dummy variable for D=j. However, the treatment effects are almost always X-heterogeneous in reality, or Y is noncontinuous, to invalidate such a linear model. The blind hope of practitioners is that th… ▽ More For multiple treatments D=0,1,...,J, covariates X and outcome Y, the ordinary least squares estimator (OLS) of Y on (D1,...,DJ,X) is widely applied to a constant-effect linear model, where Dj is the dummy variable for D=j. However, the treatment effects are almost always X-heterogeneous in reality, or Y is noncontinuous, to invalidate such a linear model. The blind hope of practitioners is that the OLS "somehow" estimates a sensible average of the unknown X-heterogeneous effects. This paper shows that, unfortunately, the OLS is inconsistent unless all treatment effects are constant, because the estimand of the Dd-slope involves the X-heterogeneous effects of all treatments, not just Dd. One way to overcome this "contamination" problem is the OLS of Y on Dd-E(Dd|X, D=0,d) using only the subsample D=0,d, and this paper proposes a modified version of the subsample OLS that is robust to misspecifications of E(Dd|X, D=0,d). The robustified subsample OLS is proven to be consistent for an "overlap weight" average of the X-heterogeneous effect of Dd for any form of Y (continuous, binary, count, ...). △ Less

Submitted 13 September, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2305.18543 [pdf, other]

Robust Lipschitz Bandits to Adversarial Corruptions

Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

Abstract: Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set defined on a metric space, where the reward function is subject to a Lipschitz constraint. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions where an adaptive adversary corrupts the stochastic rewards up to a total budget $C$. The budget is measured by th… ▽ More Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set defined on a metric space, where the reward function is subject to a Lipschitz constraint. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions where an adaptive adversary corrupts the stochastic rewards up to a total budget $C$. The budget is measured by the sum of corruption levels across the time horizon $T$. We consider both weak and strong adversaries, where the weak adversary is unaware of the current action before the attack, while the strong one can observe it. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary, even when the total budget of corruption $C$ is unrevealed to the agent. We provide a lower bound under each type of adversary, and show that our algorithm is optimal under the strong case. Finally, we conduct experiments to illustrate the effectiveness of our algorithms against two classic kinds of attacks. △ Less

Submitted 8 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2305.11774 [pdf, other]

Multi-objective optimisation via the R2 utilities

Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

Abstract: The goal of multi-objective optimisation is to identify a collection of points which describe the best possible trade-offs between the multiple objectives. In order to solve this vector-valued optimisation problem, practitioners often appeal to the use of scalarisation functions in order to transform the multi-objective problem into a collection of single-objective problems. This set of scalarised… ▽ More The goal of multi-objective optimisation is to identify a collection of points which describe the best possible trade-offs between the multiple objectives. In order to solve this vector-valued optimisation problem, practitioners often appeal to the use of scalarisation functions in order to transform the multi-objective problem into a collection of single-objective problems. This set of scalarised problems can then be solved using traditional single-objective optimisation techniques. In this work, we formalise this convention into a general mathematical framework. We show how this strategy effectively recasts the original multi-objective optimisation problem into a single-objective optimisation problem defined over sets. An appropriate class of objective functions for this new problem are the R2 utilities, which are utility functions that are defined as a weighted integral over the scalarised optimisation problems. As part of our work, we show that these utilities are monotone and submodular set functions which can be optimised effectively using greedy optimisation algorithms. We then analyse the performance of these greedy algorithms both theoretically and empirically. Our analysis largely focusses on Bayesian optimisation, which is a popular probabilistic framework for black-box optimisation. △ Less

Submitted 1 May, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: The code is available at: https://github.com/benmltu/scalarize

arXiv:2305.05099 [pdf, ps, other]

Dirichlet process mixture models for the Analysis of Repeated Attempt Designs

Authors: Michael J. Daniels, Minji Lee, Wei Feng

Abstract: In longitudinal studies, it is not uncommon to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful provides useful information for the purposes of assessing missing data assumptions. This is because measurements from subjects who provide the data after numerous failed attempts may differ from those who provide the measurement after fewer… ▽ More In longitudinal studies, it is not uncommon to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful provides useful information for the purposes of assessing missing data assumptions. This is because measurements from subjects who provide the data after numerous failed attempts may differ from those who provide the measurement after fewer attempts. Previous models for these designs were parametric and/or did not allow sensitivity analysis. For the former, there are always concerns about model misspecification and for the latter, sensitivity analysis is essential when conducting inference in the presence of missing data. Here, we propose a new approach which minimizes issues with model misspecification by using Bayesian nonparametrics for the observed data distribution. We also introduce a novel approach for identification and sensitivity analysis. We re-analyze the repeated attempts data from a clinical trial involving patients with severe mental illness and conduct simulations to better understand the properties of our approach. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 24 pages, additional 16 pages of supplementary material

arXiv:2304.00149 [pdf]

Using online student focus groups in the development of new educational resources

Authors: Gian Carlo Diluvi, Sonja Isberg, Bruce Dunham, Nancy Heckman, Melissa Lee

Abstract: Educational resources, such as web apps and self-directed tutorials, have become popular tools for teaching and active learning. Ideally, students - the intended users of these resources - should be involved in the resource development stage. However, in practice students often only interact with fully developed resources, when it might be too late to incorporate changes. Previous work has address… ▽ More Educational resources, such as web apps and self-directed tutorials, have become popular tools for teaching and active learning. Ideally, students - the intended users of these resources - should be involved in the resource development stage. However, in practice students often only interact with fully developed resources, when it might be too late to incorporate changes. Previous work has addressed this by involving students in the development of new resources via in-person focus groups and interviews. In these, the resource developers observe students interacting with the resource. This allows developers to incorporate their observations and students' direct feedback into further development of the resource. However, as a result of the COVID-19 pandemic, carrying out in-person focus groups became infeasible due to social distancing restrictions. Instead, online meetings and classes became ubiquitous. In this work, we describe a fully-online methodology to evaluate new resources in development. Specifically, our methodology consists of carrying out student focus groups via online video conferencing software. We assessed two educational resources for introductory statistics using our methodology and found that the online setting allowed us to obtain rich, detailed information from the students. We also found online focus groups to be more efficient: students and researchers did not need to travel and scheduling was not restricted by the availability of physical space. Our findings suggest that online focus groups are an attractive alternative to in-person focus groups for student assessment of resources in development, even now that pandemic restrictions are being eased. △ Less

Submitted 31 March, 2023; originally announced April 2023.

arXiv:2303.17705 [pdf]

Incorporating patient-reported outcomes in dose-finding clinical trials with continuous patient enrollment

Authors: Anaïs Andrillon, Lucie Biard, Shing M. Lee

Abstract: Dose-finding clinical trials in oncology aim to estimate the maximum tolerated dose (MTD), based on safety traditionally obtained from the clinician's perspective. While the collection of patient-reported outcomes (PROs) has been advocated to better inform treatment tolerability, there is a lack of guidance and methods on how to use PROs for dose assignments and recommendations. The PRO continual… ▽ More Dose-finding clinical trials in oncology aim to estimate the maximum tolerated dose (MTD), based on safety traditionally obtained from the clinician's perspective. While the collection of patient-reported outcomes (PROs) has been advocated to better inform treatment tolerability, there is a lack of guidance and methods on how to use PROs for dose assignments and recommendations. The PRO continual reassessment method (PRO-CRM) has been proposed to formally incorporate PROs to estimate the MTD, requiring complete follow-up of both clinician and patient toxicity information per dose cohort to assign the next cohort of patients. In this paper, we propose two extensions of the PRO-CRM, allowing continuous enrollment of patients and handling longer toxicity observation windows to capture late-onset or cumulative toxicities. The first method, the TITE-PRO-CRM, uses a weighted likelihood to include the partial follow-up information from PRO in estimating the MTD during and at the end of the trial. The second method, the TITE-CRM+PRO, uses clinician's information solely to inform dose assignments during the trial and incorporates PRO at the end of the trial for dose recommendation. Simulation studies show that the TITE-PRO-CRM performs similarly to the PRO-CRM in terms of dose recommendation and assignments during the trial while reducing trial duration. The TITE-CRM + PRO slightly underperforms compared to the TITE-PRO-CRM, but similar performance can be attained by requiring larger sample sizes. We also show that the proposed methods have similar performance under higher accrual rates, different toxicity hazards, and correlated time-to-clinician toxicity and time-to-patient toxicity data. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: 23 pages, 1 figure, 4 tables

arXiv:2302.09440 [pdf, other]

Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits

Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

Abstract: In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on the values of hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasib… ▽ More In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on the values of hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasible to use offline tuning methods like cross-validation to choose hyperparameters under the bandit environment, as the decisions should be made in real-time. To address this challenge, we propose the first online continuous hyperparameter tuning framework for contextual bandits to learn the optimal parameter configuration in practice within a search space on the fly. Specifically, we use a double-layer bandit framework named CDT (Continuous Dynamic Tuning) and formulate the hyperparameter optimization as a non-stationary continuum-armed bandit, where each arm represents a combination of hyperparameters, and the corresponding reward is the algorithmic result. For the top layer, we propose the Zooming TS algorithm that utilizes Thompson Sampling (TS) for exploration and a restart technique to get around the \textit{switching} environment. The proposed CDT framework can be easily utilized to tune contextual bandit algorithms without any pre-specified candidate set for multiple hyperparameters. We further show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets. △ Less

Submitted 8 April, 2024; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: Published in Transactions on Machine Learning Research (TMLR)

arXiv:2301.10419 [pdf, other]

Deconstructing Pedestrian Crossing Decision-making in Interactions with Continuous Traffic: an Anthropomorphic Model

Authors: Kai Tian, Gustav Markkula, Chongfeng Wei, Yee Mun Lee, Ruth Madigan, Toshiya Hirose, Natasha Merat, Richard Romano

Abstract: As safe and comfortable interactions with pedestrians could contribute to automated vehicles' (AVs) social acceptance and scale, increasing attention has been drawn to computational pedestrian behavior models. However, very limited studies characterize pedestrian crossing behavior based on specific behavioral mechanisms, as those mechanisms underpinning pedestrian road behavior are not yet clear.… ▽ More As safe and comfortable interactions with pedestrians could contribute to automated vehicles' (AVs) social acceptance and scale, increasing attention has been drawn to computational pedestrian behavior models. However, very limited studies characterize pedestrian crossing behavior based on specific behavioral mechanisms, as those mechanisms underpinning pedestrian road behavior are not yet clear. Here, we reinterpret pedestrian crossing behavior based on a deconstructed crossing decision process at uncontrolled intersections with continuous traffic. Notably, we explain and model pedestrian crossing behavior as they wait for crossing opportunities, optimizing crossing decisions by comparing the visual collision risk of approaching vehicles around them. A collision risk-based crossing initiation model is proposed to characterize the time-dynamic nature of pedestrian crossing decisions. A simulation tool is established to reproduce pedestrian behavior by employing the proposed model and a social force model. Two datasets collected in a CAVE-based immersive pedestrian simulator are applied to calibrate and validate the model. The model predicts pedestrian crossing decisions across all traffic scenarios well. In particular, by considering the decision strategy that pedestrians compare the collision risk of surrounding traffic gaps, model performance is significantly improved. Moreover, the collision risk-based crossing initiation model accurately captures the timing of pedestrian crossing initiations within each gap. This work concisely demonstrates how pedestrians dynamically adapt their crossings in continuous traffic based on perceived collision risk, potentially providing insights into modeling coupled human-AV interactions or serving as a tool to realize human-like pedestrian road behavior in virtual AVs test platforms. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.04578 [pdf, other]

Precision Dose-finding Cancer Clinical Trials in the Setting of Broadened Eligibility

Authors: Rebecca B. Silva, Bin Cheng, Richard D. Carvajal, Shing M. Lee

Abstract: Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed to ensure patient safety. These designs should be able to recommend precise doses for subpopulations if such subpopulations with different toxicity profiles ex… ▽ More Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed to ensure patient safety. These designs should be able to recommend precise doses for subpopulations if such subpopulations with different toxicity profiles exist. While dose-finding designs accounting for patient heterogeneity have been proposed, all existing methods assume the source of heterogeneity is known and thus pre-specify the subpopulations or only allow inclusion of a few patient characteristics. We propose a precision dose-finding design to address the setting of unknown patient heterogeneity in phase I cancer clinical trials where eligibility is expanded, and multiple eligibility criteria could potentially lead to different optimal doses for patient subgroups. The design offers a two-in-one approach to dose-finding by simultaneously selecting patient criteria that differentiate the maximum tolerated dose (MTD) and recommending the subpopulation-specific MTD if needed, using marginal models to sequentially incorporate patient covariates. Our simulation study compares the proposed design to the naive approach of assuming patient homogeneity and our design recommends multiple doses when heterogeneity exists and a single dose when no heterogeneity exists. The proposed dose-finding design addresses the challenges of broadening eligibility criteria in cancer trials and the desire for a more precise dose in the context of early phase clinical trials. △ Less

Submitted 11 January, 2023; originally announced January 2023.

arXiv:2211.06149 [pdf, other]

Combining Multi-Fidelity Modelling and Asynchronous Batch Bayesian Optimization

Authors: Jose Pablo Folch, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

Abstract: Bayesian Optimization is a useful tool for experiment design. Unfortunately, the classical, sequential setting of Bayesian Optimization does not translate well into laboratory experiments, for instance battery design, where measurements may come from different sources and their evaluations may require significant waiting times. Multi-fidelity Bayesian Optimization addresses the setting with measur… ▽ More Bayesian Optimization is a useful tool for experiment design. Unfortunately, the classical, sequential setting of Bayesian Optimization does not translate well into laboratory experiments, for instance battery design, where measurements may come from different sources and their evaluations may require significant waiting times. Multi-fidelity Bayesian Optimization addresses the setting with measurements from different sources. Asynchronous batch Bayesian Optimization provides a framework to select new experiments before the results of the prior experiments are revealed. This paper proposes an algorithm combining multi-fidelity and asynchronous batch methods. We empirically study the algorithm behavior, and show it can outperform single-fidelity batch methods and multi-fidelity sequential methods. As an application, we consider designing electrode materials for optimal performance in pouch cells using experiments with coin cells to approximate battery performance. △ Less

Submitted 23 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 19 pages in main paper / 28 with references and appendix, 7 figures, 2 tables, accepted into Computers and Chemical Engineering

arXiv:2210.13358 [pdf, ps, other]

Novelty Detection in Time Series via Weak Innovations Representation: A Deep Learning Approach

Authors: Xinyi Wang, Mei-jen Lee, Qing Zhao, Lang Tong

Abstract: We consider novelty detection in time series with unknown and nonparametric probability structures. A deep learning approach is proposed to causally extract an innovations sequence consisting of novelty samples statistically independent of all past samples of the time series. A novelty detection algorithm is developed for the online detection of novel changes in the probability structure in the in… ▽ More We consider novelty detection in time series with unknown and nonparametric probability structures. A deep learning approach is proposed to causally extract an innovations sequence consisting of novelty samples statistically independent of all past samples of the time series. A novelty detection algorithm is developed for the online detection of novel changes in the probability structure in the innovations sequence. A minimax optimality under a Bayes risk measure is established for the proposed novelty detection method, and its robustness and efficacy are demonstrated in experiments using real and synthetic datasets. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2209.10105 [pdf, ps, other]

Distributed Online Non-convex Optimization with Composite Regret

Authors: Zhanhong Jiang, Aditya Balu, Xian Yeow Lee, Young M. Lee, Chinmay Hegde, Soumik Sarkar

Abstract: Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex los… ▽ More Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex losses, and very few results have been obtained regarding regret bounds in distributed online optimization for general non-convex losses. To address these two issues, we propose a novel composite regret with a new network regret-based metric to evaluate distributed online optimization algorithms. We concretely define static and dynamic forms of the composite regret. By leveraging the dynamic form of our composite regret, we develop a consensus-based online normalized gradient (CONGD) approach for pseudo-convex losses, and it provably shows a sublinear behavior relating to a regularity term for the path variation of the optimizer. For general non-convex losses, we first shed light on the regret for the setting of distributed online non-convex learning based on recent advances such that no deterministic algorithm can achieve the sublinear regret. We then develop the distributed online non-convex optimization with composite regret (DINOCO) without access to the gradients, depending on an offline optimization oracle. DINOCO is shown to achieve sublinear regret; to our knowledge, this is the first regret bound for general distributed online non-convex learning. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 41 pages, presented in allerton conference 2022

arXiv:2208.06970 [pdf, other]

Level Set Restricted Voronoi Tessellation for Large scale Spatial Statistical Analysis

Authors: Tyson Neuroth, Martin Rieth, Konduri Aditya, Myoungkyu Lee, Jacqueline H Chen, Kwan-Liu Ma

Abstract: Spatial statistical analysis of multivariate volumetric data can be challenging due to scale, complexity, and occlusion. Advances in topological segmentation, feature extraction, and statistical summarization have helped overcome the challenges. This work introduces a new spatial statistical decomposition method based on level sets, connected components, and a novel variation of the restricted cen… ▽ More Spatial statistical analysis of multivariate volumetric data can be challenging due to scale, complexity, and occlusion. Advances in topological segmentation, feature extraction, and statistical summarization have helped overcome the challenges. This work introduces a new spatial statistical decomposition method based on level sets, connected components, and a novel variation of the restricted centroidal Voronoi tessellation that is better suited for spatial statistical decomposition and parallel efficiency. The resulting data structures organize features into a coherent nested hierarchy to support flexible and efficient out-of-core region-of-interest extraction. Next, we provide an efficient parallel implementation. Finally, an interactive visualization system based on this approach is designed and then applied to turbulent combustion data. The combined approach enables an interactive spatial statistical analysis workflow for large-scale data with a top-down approach through multiple-levels-of-detail that links phase space statistics with spatial features. △ Less

Submitted 14 August, 2022; originally announced August 2022.

arXiv:2207.14727 [pdf, other]

Tangential Wasserstein Projections

Authors: Florian Gunsilius, Meng Hsuan Hsieh, Myung ** Lee

Abstract: We develop a notion of projections between sets of probability measures using the geometric properties of the 2-Wasserstein space. It is designed for general multivariate probability measures, is computationally efficient to implement, and provides a unique solution in regular settings. The idea is to work on regular tangent cones of the Wasserstein space using generalized geodesics. Its structure… ▽ More We develop a notion of projections between sets of probability measures using the geometric properties of the 2-Wasserstein space. It is designed for general multivariate probability measures, is computationally efficient to implement, and provides a unique solution in regular settings. The idea is to work on regular tangent cones of the Wasserstein space using generalized geodesics. Its structure and computational properties make the method applicable in a variety of settings, from causal inference to the analysis of object data. An application to estimating causal effects yields a generalization of the notion of synthetic controls to multivariate data with individual-level heterogeneity, as well as a way to estimate optimal weights jointly over all time periods. △ Less

Submitted 2 August, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.00879 [pdf, other]

Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces

Authors: Alexander Thebelt, Calvin Tsay, Robert M. Lee, Nathan Sudermann-Merx, David Walz, Behrang Shafei, Ruth Misener

Abstract: Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search, as they achieve good predictive performance with little or no manual tuning, naturally handle discrete feature spaces, and are relatively insensitive to outliers in the training data. Two well-known challenges in using tree ensembles for black-box optimization are (i) effecti… ▽ More Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search, as they achieve good predictive performance with little or no manual tuning, naturally handle discrete feature spaces, and are relatively insensitive to outliers in the training data. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. To address both points simultaneously, we propose using the kernel interpretation of tree ensembles as a Gaussian Process prior to obtain model variance estimates, and we develop a compatible optimization formulation for the acquisition function. The latter further allows us to seamlessly integrate known constraints to improve sampling efficiency by considering domain-knowledge in engineering settings and modeling search space symmetries, e.g., hierarchical relationships in neural architecture search. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints. △ Less

Submitted 30 December, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: 27 pages, 9 figures, 4 tables

arXiv:2205.03820 [pdf, other]

doi 10.1371/journal.pone.0274272

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Authors: Xi** Chen, Kim May Lee, Sofia S. Villar, David S. Robertson

Abstract: When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data fo… ▽ More When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach. △ Less

Submitted 7 July, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

Comments: 30 pages, 6 figures

arXiv:2204.10909 [pdf, other]

Error-in-variables modelling for operator learning

Authors: Ravi G. Patel, Indu Manickam, Myoungkyu Lee, Mamikon Gulian

Abstract: Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the map** between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typicall… ▽ More Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the map** between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator. △ Less

Submitted 19 July, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: 23 pages, 10 figures

arXiv:2112.15326 [pdf, other]

doi 10.1101/2021.07.08.451684

An empirical Bayes approach to estimating dynamic models of co-regulated gene expression

Authors: Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells

Abstract: Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expressi… ▽ More Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag $R^2$ (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model's parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein's unbiased risk estimate that optimally balance the ODE model's fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems. △ Less

Submitted 31 December, 2021; originally announced December 2021.

arXiv:2111.03140 [pdf, other]

doi 10.1016/j.apenergy.2021.118061

Multi-Objective Constrained Optimization for Energy Applications via Tree Ensembles

Authors: Alexander Thebelt, Calvin Tsay, Robert M. Lee, Nathan Sudermann-Merx, David Walz, Tom Tranter, Ruth Misener

Abstract: Energy systems optimization problems are complex due to strongly non-linear system behavior and multiple competing objectives, e.g. economic gain vs. environmental impact. Moreover, a large number of input variables and different variable types, e.g. continuous and categorical, are challenges commonly present in real-world applications. In some cases, proposed optimal solutions need to obey explic… ▽ More Energy systems optimization problems are complex due to strongly non-linear system behavior and multiple competing objectives, e.g. economic gain vs. environmental impact. Moreover, a large number of input variables and different variable types, e.g. continuous and categorical, are challenges commonly present in real-world applications. In some cases, proposed optimal solutions need to obey explicit input constraints related to physical properties or safety-critical operating conditions. This paper proposes a novel data-driven strategy using tree ensembles for constrained multi-objective optimization of black-box problems with heterogeneous variable spaces for which underlying system dynamics are either too complex to model or unknown. In an extensive case study comprised of synthetic benchmarks and relevant energy applications we demonstrate the competitive performance and sampling efficiency of the proposed algorithm compared to other state-of-the-art tools, making it a useful all-in-one solution for real-world applications with limited evaluation budgets. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 36 pages, 8 figures, 5 tables

arXiv:2110.06504 [pdf, ps, other]

Path-Free Decomposition for Direct, Indirect and Interaction Effects in Mediation Analysis

Authors: Myoung-jae Lee

Abstract: Given a binary treatment and a binary mediator, mediation analysis decomposes the total effect of the treatment on an outcome variable into direct and indirect effects. However, the existing decompositions are "path-dependent", and consequently, there appeared different versions of direct and indirect effects. Differently from these, this paper proposes a "path-free" decomposition of the total eff… ▽ More Given a binary treatment and a binary mediator, mediation analysis decomposes the total effect of the treatment on an outcome variable into direct and indirect effects. However, the existing decompositions are "path-dependent", and consequently, there appeared different versions of direct and indirect effects. Differently from these, this paper proposes a "path-free" decomposition of the total effect into three sub-effects: direct, indirect, and treatment-mediator interaction effects. Whereas the interaction effect has been part of the indirect effect in the existing two-effect decompositions, it is separately identified in our three-effect decomposition. All effects are found using conditional means, but not conditional densities, and are estimated with ordinary least squares estimators. Simulation and empirical analyses are provided as well. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2106.02979 [pdf, other]

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

Authors: Qin Ding, Yue Kang, Yi-Wei Liu, Thomas C. M. Lee, Cho-Jui Hsieh, James Sharpnack

Abstract: The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is… ▽ More The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment. We derive the regret bounds of our proposed Syndicated Bandits framework and show it can avoid its regret dependent exponentially in the number of hyper-parameters to be tuned. Moreover, it achieves optimal regret bounds under certain scenarios. Syndicated Bandits framework is general enough to handle the tuning tasks in many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc. Experiments on both synthetic and real datasets validate the effectiveness of our proposed framework. △ Less

Submitted 11 June, 2022; v1 submitted 5 June, 2021; originally announced June 2021.

arXiv:2105.08620 [pdf, other]

Adversarial Examples Detection with Bayesian Neural Network

Authors: Yao Li, Tongyi Tang, Cho-Jui Hsieh, Thomas C. M. Lee

Abstract: In this paper, we propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate the output distribution of a deep neural network. With these observations, we propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example de… ▽ More In this paper, we propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate the output distribution of a deep neural network. With these observations, we propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection. Specifically, we study the distributional difference of hidden layer output between natural and adversarial examples, and propose to use the randomness of the Bayesian neural network to simulate hidden layer output distribution and leverage the distribution dispersion to detect adversarial examples. The advantage of a Bayesian neural network is that the output is stochastic while a deep neural network without random components does not have such characteristics. Empirical results on several benchmark datasets against popular attacks show that the proposed BATer outperforms the state-of-the-art detectors in adversarial example detection. △ Less

Submitted 22 February, 2024; v1 submitted 18 May, 2021; originally announced May 2021.

arXiv:2101.11202 [pdf, other]

doi 10.3847/1538-3881/abe0b6

Change point detection and image segmentation for time series of astrophysical images

Authors: Cong Xu, Hans Moritz Günther, Vinay L. Kashyap, Thomas C. M. Lee, Andreas Zezas

Abstract: Many astrophysical phenomena are time-varying, in the sense that their intensity, energy spectrum, and/or the spatial distribution of the emission suddenly change. This paper develops a method for modeling a time series of images. Under the assumption that the arrival times of the photons follow a Poisson process, the data are binned into 4D grids of voxels (time, energy band, and x-y coordinates)… ▽ More Many astrophysical phenomena are time-varying, in the sense that their intensity, energy spectrum, and/or the spatial distribution of the emission suddenly change. This paper develops a method for modeling a time series of images. Under the assumption that the arrival times of the photons follow a Poisson process, the data are binned into 4D grids of voxels (time, energy band, and x-y coordinates), and viewed as a time series of non-homogeneous Poisson images. The method assumes that at each time point, the corresponding multi-band image stack is an unknown 3D piecewise constant function including Poisson noise. It also assumes that all image stacks between any two adjacent change points (in time domain) share the same unknown piecewise constant function. The proposed method is designed to estimate the number and the locations of all the change points (in time domain), as well as all the unknown piecewise constant functions between any pairs of the change points. The method applies the minimum description length (MDL) principle to perform this task. A practical algorithm is also developed to solve the corresponding complicated optimization problem. Simulation experiments and applications to real datasets show that the proposed method enjoys very promising empirical properties. Applications to two real datasets, the XMM observation of a flaring star and an emerging solar coronal loop, illustrate the usage of the proposed method and the scientific insight gained from it. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: 22 pages, 10 figures

arXiv:2010.11166 [pdf, other]

Decentralized Deep Learning using Momentum-Accelerated Consensus

Authors: Aditya Balu, Zhanhong Jiang, Sin Yong Tan, Chinmay Hedge, Young M Lee, Soumik Sarkar

Abstract: We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset. While there exist several decentralized deep learning approaches, the majority consider a central parameter-server topology for aggregating the model parameters from the agents. However, such a topology may be inapplicable in networked systems such as ad-hoc mobile networks… ▽ More We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset. While there exist several decentralized deep learning approaches, the majority consider a central parameter-server topology for aggregating the model parameters from the agents. However, such a topology may be inapplicable in networked systems such as ad-hoc mobile networks, field robotics, and power network systems where direct communication with the central parameter server may be inefficient. In this context, we propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology (without a central server). Our algorithm is based on the heavy-ball acceleration method used in gradient-based optimization. We propose a novel consensus protocol where each agent shares with its neighbors its model parameters as well as gradient-momentum values during the optimization process. We consider both strongly convex and non-convex objective functions and theoretically analyze our algorithm's performance. We present several empirical comparisons with competing decentralized learning methods to demonstrate the efficacy of our approach under different communication topologies. △ Less

Submitted 28 November, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

arXiv:2010.06567 [pdf, other]

Conditional Power and Friends: The Why and How of (Un)planned, Unblinded Sample Size Recalculations in Confirmatory Trials

Authors: Kevin Kunzmann, Michael J. Grayling, Kim M. Lee, David S. Robertson, Kaspar Rufibach, James M. S. Wason

Abstract: Adapting the final sample size of a trial to the evidence accruing during the trial is a natural way to address planning uncertainty. Designs with adaptive sample size need to account for their optional stop** to guarantee strict type-I error-rate control. A variety of different methods to maintain type-I error-rate control after unplanned changes of the initial sample size have been proposed in… ▽ More Adapting the final sample size of a trial to the evidence accruing during the trial is a natural way to address planning uncertainty. Designs with adaptive sample size need to account for their optional stop** to guarantee strict type-I error-rate control. A variety of different methods to maintain type-I error-rate control after unplanned changes of the initial sample size have been proposed in the literature. This makes interim analyses for the purpose of sample size recalculation feasible in a regulatory context. Since the sample size is usually determined via an argument based on the power of the trial, an interim analysis raises the question of how the final sample size should be determined conditional on the accrued information. Conditional power is a concept often put forward in this context. Since it depends on the unknown effect size, we take a strict estimation perspective and compare assumed conditional power, observed conditional power, and predictive power with respect to their properties as estimators of the unknown conditional power. We then demonstrate that pre-planning an interim analysis using methodology for unplanned interim analyses is ineffective and naturally leads to the concept of optimal two-stage designs. We conclude that unplanned design adaptations should only be conducted as reaction to trial-external new evidence, operational needs to violate the originally chosen design, or post hoc changes in the objective criterion. Finally, we show that commonly discussed sample size recalculation rules can lead to paradoxical outcomes and propose two alternative ways of reacting to newly emerging trial-external evidence. △ Less

Submitted 13 October, 2020; originally announced October 2020.

arXiv:2009.13697 [pdf, ps, other]

A Fast Graph Neural Network-Based Method for Winner Determination in Multi-Unit Combinatorial Auctions

Authors: Mengyuan Lee, Seyyedali Hosseinalipour, Christopher G. Brinton, Guanding Yu, Huaiyu Dai

Abstract: The combinatorial auction (CA) is an efficient mechanism for resource allocation in different fields, including cloud computing. It can obtain high economic efficiency and user flexibility by allowing bidders to submit bids for combinations of different items instead of only for individual items. However, the problem of allocating items among the bidders to maximize the auctioneers" revenue, i.e.,… ▽ More The combinatorial auction (CA) is an efficient mechanism for resource allocation in different fields, including cloud computing. It can obtain high economic efficiency and user flexibility by allowing bidders to submit bids for combinations of different items instead of only for individual items. However, the problem of allocating items among the bidders to maximize the auctioneers" revenue, i.e., the winner determination problem (WDP), is NP-complete to solve and inapproximable. Existing works for WDPs are generally based on mathematical optimization techniques and most of them focus on the single-unit WDP, where each item only has one unit. On the contrary, few works consider the multi-unit WDP in which each item may have multiple units. Given that the multi-unit WDP is more complicated but prevalent in cloud computing, we propose leveraging machine learning (ML) techniques to develop a novel low-complexity algorithm for solving this problem with negligible revenue loss. Specifically, we model the multi-unit WDP as an augmented bipartite bid-item graph and use a graph neural network (GNN) with half-convolution operations to learn the probability of each bid belonging to the optimal allocation. To improve the sample generation efficiency and decrease the number of needed labeled instances, we propose two different sample generation processes. We also develop two novel graph-based post-processing algorithms to transform the outputs of the GNN into feasible solutions. Through simulations on both synthetic instances and a specific virtual machine (VM) allocation problem in a cloud computing platform, we validate that our proposed method can approach optimal performance with low complexity and has good generalization ability in terms of problem size and user-type distribution. △ Less

Submitted 21 December, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Accepted by Transactions on Cloud Computing

arXiv:2008.03226

Data-Driven Discovery of Molecular Photoswitches with Multioutput Gaussian Processes

Authors: Ryan-Rhys Griffiths, Jake L. Greenfield, Aditya R. Thawani, Arian R. Jamasb, Henry B. Moss, Anthony Bourached, Penelope Jones, William McCorkindale, Alexander A. Aldrick, Matthew J. Fuchter Alpha A. Lee

Abstract: Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacologi… ▽ More Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset △ Less

Submitted 7 August, 2022; v1 submitted 28 June, 2020; originally announced August 2020.

Comments: Authors still in discussion about authorship ordering

arXiv:2007.10637 [pdf, other]

doi 10.1016/j.neunet.2021.07.030

Distributed Associative Memory Network with Memory Refreshing Loss

Authors: Taewon Park, Inchul Choi, Minho Lee

Abstract: Despite recent progress in memory augmented neural network (MANN) research, associative memory networks with a single external memory still show limited performance on complex relational reasoning tasks. Especially the content-based addressable memory networks often fail to encode input data into rich enough representation for relational reasoning and this limits the relation modeling performance… ▽ More Despite recent progress in memory augmented neural network (MANN) research, associative memory networks with a single external memory still show limited performance on complex relational reasoning tasks. Especially the content-based addressable memory networks often fail to encode input data into rich enough representation for relational reasoning and this limits the relation modeling performance of MANN for long temporal sequence data. To address these problems, here we introduce a novel Distributed Associative Memory architecture (DAM) with Memory Refreshing Loss (MRL) which enhances the relation reasoning performance of MANN. Inspired by how the human brain works, our framework encodes data with distributed representation across multiple memory blocks and repeatedly refreshes the contents for enhanced memorization similar to the rehearsal process of the brain. For this procedure, we replace a single external memory with a set of multiple smaller associative memory blocks and update these sub-memory blocks simultaneously and independently for the distributed representation of input data. Moreover, we propose MRL which assists a task's target objective while learning relational information existing in data. MRL enables MANN to reinforce an association between input data and task objective by reproducing stochastically sampled input data from stored memory contents. With this procedure, MANN further enriches the stored representations with relational information. In experiments, we apply our approaches to Differential Neural Computer (DNC), which is one of the representative content-based addressing memory models and achieves the state-of-the-art performance on both memorization and relational reasoning tasks. △ Less

Submitted 27 August, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: Published (https://www.sciencedirect.com/science/article/pii/S0893608021003014?via%3Dihub), Code (https://github.com/taewonpark/DAM)

Journal ref: Neural Networks 144 (2021) 33-48

arXiv:2007.00334 [pdf, other]

Estimation with Uncertainty via Conditional Generative Adversarial Networks

Authors: Minhyeok Lee, Junhee Seok

Abstract: Conventional predictive Artificial Neural Networks (ANNs) commonly employ deterministic weight matrices; therefore, their prediction is a point estimate. Such a deterministic nature in ANNs causes the limitations of using ANNs for medical diagnosis, law problems, and portfolio management, in which discovering not only the prediction but also the uncertainty of the prediction is essentially require… ▽ More Conventional predictive Artificial Neural Networks (ANNs) commonly employ deterministic weight matrices; therefore, their prediction is a point estimate. Such a deterministic nature in ANNs causes the limitations of using ANNs for medical diagnosis, law problems, and portfolio management, in which discovering not only the prediction but also the uncertainty of the prediction is essentially required. To address such a problem, we propose a predictive probabilistic neural network model, which corresponds to a different manner of using the generator in conditional Generative Adversarial Network (cGAN) that has been routinely used for conditional sample generation. By reversing the input and output of ordinary cGAN, the model can be successfully used as a predictive model; besides, the model is robust against noises since adversarial training is employed. In addition, to measure the uncertainty of predictions, we introduce the entropy and relative entropy for regression problems and classification problems, respectively. The proposed framework is applied to stock market data and an image classification task. As a result, the proposed framework shows superior estimation performance, especially on noisy data; moreover, it is demonstrated that the proposed framework can properly estimate the uncertainty of predictions. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.15715 [pdf, other]

doi 10.1080/00031305.2021.1901782

A review of Bayesian perspectives on sample size derivation for confirmatory trials

Authors: Kevin Kunzmann, Michael J. Grayling, Kim May Lee, David S. Robertson, Kaspar Rufibach, James M. S. Wason

Abstract: Sample size derivation is a crucial element of the planning phase of any confirmatory trial. A sample size is typically derived based on constraints on the maximal acceptable type I error rate and a minimal desired power. Here, power depends on the unknown true effect size. In practice, power is typically calculated either for the smallest relevant effect size or a likely point alternative. The fo… ▽ More Sample size derivation is a crucial element of the planning phase of any confirmatory trial. A sample size is typically derived based on constraints on the maximal acceptable type I error rate and a minimal desired power. Here, power depends on the unknown true effect size. In practice, power is typically calculated either for the smallest relevant effect size or a likely point alternative. The former might be problematic if the minimal relevant effect is close to the null, thus requiring an excessively large sample size. The latter is dubious since it does not account for the a priori uncertainty about the likely alternative effect size. A Bayesian perspective on the sample size derivation for a frequentist trial naturally emerges as a way of reconciling arguments about the relative a priori plausibility of alternative effect sizes with ideas based on the relevance of effect sizes. Many suggestions as to how such `hybrid' approaches could be implemented in practice have been put forward in the literature. However, key quantities such as assurance, probability of success, or expected power are often defined in subtly different ways in the literature. Starting from the traditional and entirely frequentist approach to sample size derivation, we derive consistent definitions for the most commonly used `hybrid' quantities and highlight connections, before discussing and demonstrating their use in the context of sample size derivation for clinical trials. △ Less

Submitted 28 June, 2020; originally announced June 2020.

Journal ref: Am. Stat., 2021, 75(4), 424--432

arXiv:2006.12246 [pdf, other]

Pain Intensity Estimation from Mobile Video Using 2D and 3D Facial Keypoints

Authors: Matthew Lee, Lyndon Kennedy, Andreas Girgensohn, Lynn Wilcox, John Song En Lee, Chin Wen Tan, Ban Leong Sng

Abstract: Managing post-surgical pain is critical for successful surgical outcomes. One of the challenges of pain management is accurately assessing the pain level of patients. Self-reported numeric pain ratings are limited because they are subjective, can be affected by mood, and can influence the patient's perception of pain when making comparisons. In this paper, we introduce an approach that analyzes 2D… ▽ More Managing post-surgical pain is critical for successful surgical outcomes. One of the challenges of pain management is accurately assessing the pain level of patients. Self-reported numeric pain ratings are limited because they are subjective, can be affected by mood, and can influence the patient's perception of pain when making comparisons. In this paper, we introduce an approach that analyzes 2D and 3D facial keypoints of post-surgical patients to estimate their pain intensity level. Our approach leverages the previously unexplored capabilities of a smartphone to capture a dense 3D representation of a person's face as input for pain intensity level estimation. Our contributions are adata collection study with post-surgical patients to collect ground-truth labeled sequences of 2D and 3D facial keypoints for develo** a pain estimation algorithm, a pain estimation model that uses multiple instance learning to overcome inherent limitations in facial keypoint sequences, and the preliminary results of the pain estimation model using 2D and 3D features with comparisons of alternate approaches. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:2005.04788 [pdf]

Distributed Fine-Grained Traffic Speed Prediction for Large-Scale Transportation Networks based on Automatic LSTM Customization and Sharing

Authors: Ming-Chang Lee, Jia-Chun Lin, Ernst Gunnar Gran

Abstract: Short-term traffic speed prediction has been an important research topic in the past decade, and many approaches have been introduced. However, providing fine-grained, accurate, and efficient traffic-speed prediction for large-scale transportation networks where numerous traffic detectors are deployed has not been well studied. In this paper, we propose DistPre, which is a distributed fine-grained… ▽ More Short-term traffic speed prediction has been an important research topic in the past decade, and many approaches have been introduced. However, providing fine-grained, accurate, and efficient traffic-speed prediction for large-scale transportation networks where numerous traffic detectors are deployed has not been well studied. In this paper, we propose DistPre, which is a distributed fine-grained traffic speed prediction scheme for large-scale transportation networks. To achieve fine-grained and accurate traffic-speed prediction, DistPre customizes a Long Short-Term Memory (LSTM) model with an appropriate hyperparameter configuration for a detector. To make such customization process efficient and applicable for large-scale transportation networks, DistPre conducts LSTM customization on a cluster of computation nodes and allows any trained LSTM model to be shared between different detectors. If a detector observes a similar traffic pattern to another one, DistPre directly shares the existing LSTM model between the two detectors rather than customizing an LSTM model per detector. Experiments based on traffic data collected from freeway I5-N in California are conducted to evaluate the performance of DistPre. The results demonstrate that DistPre provides time-efficient LSTM customization and accurate fine-grained traffic-speed prediction for large-scale transportation networks. △ Less

Submitted 3 June, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

Comments: 14 pages, 7 figures, 2 tables, Euro-par 2020 conference

arXiv:2005.00564 [pdf, other]

Response-adaptive randomization in clinical trials: from myths to practical considerations

Authors: David S. Robertson, Kim May Lee, Boryana C. Lopez-Kolkovska, Sofia S. Villar

Abstract: Response-Adaptive Randomization (RAR) is part of a wider class of data-dependent sampling algorithms, for which clinical trials are typically used as a motivating application. In that context, patient allocation to treatments is determined by randomization probabilities that change based on the accrued response data in order to achieve experimental goals. RAR has received abundant theoretical atte… ▽ More Response-Adaptive Randomization (RAR) is part of a wider class of data-dependent sampling algorithms, for which clinical trials are typically used as a motivating application. In that context, patient allocation to treatments is determined by randomization probabilities that change based on the accrued response data in order to achieve experimental goals. RAR has received abundant theoretical attention from the biostatistical literature since the 1930's and has been the subject of numerous debates. In the last decade, it has received renewed consideration from the applied and methodological communities, driven by well-known practical examples and its widespread use in machine learning. Papers on the subject present different views on its usefulness, and these are not easy to reconcile. This work aims to address this gap by providing a unified, broad and fresh review of methodological and practical issues to consider when debating the use of RAR in clinical trials. △ Less

Submitted 7 June, 2022; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: Update in response to editor comments

MSC Class: 62-02

arXiv:2004.04258 [pdf, other]

Estimating Fiber Orientation Distribution through Blockwise Adaptive Thresholding with Application to HCP Young Adults Data

Authors: Seungyong Hwang, Thomas C. M. Lee, Debashis Paul, Jie Peng

Abstract: Due to recent technological advances, large brain imaging data sets can now be collected. Such data are highly complex so extraction of meaningful information from them remains challenging. Thus, there is an urgent need for statistical procedures that are computationally scalable and can provide accurate estimates that capture the neuronal structures and their functionalities. We propose a fast me… ▽ More Due to recent technological advances, large brain imaging data sets can now be collected. Such data are highly complex so extraction of meaningful information from them remains challenging. Thus, there is an urgent need for statistical procedures that are computationally scalable and can provide accurate estimates that capture the neuronal structures and their functionalities. We propose a fast method for estimating the fiber orientation distribution(FOD) based on diffusion MRI data. This method models the observed dMRI signal at any voxel as a convolved and noisy version of the underlying FOD, and utilizes the spherical harmonics basis for representing the FOD, where the spherical harmonic coefficients are adaptively and nonlinearly shrunk by using a James-Stein type estimator. To further improve the estimation accuracy by enhancing the localized peaks of the FOD, as a second step a super-resolution sharpening process is then applied. The resulting estimated FODs can be fed to a fiber tracking algorithm to reconstruct the white matter fiber tracts. We illustrate the overall methodology using both synthetic data and data from the Human Connectome Project. △ Less

Submitted 28 June, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

arXiv:2004.02401 [pdf, other]

Applying Cyclical Learning Rate to Neural Machine Translation

Authors: Choon Meng Lee, Jianfeng Liu, Wei Peng

Abstract: In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision rela… ▽ More In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies. △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2004.02319 [pdf]

doi 10.1109/COMPSAC48688.2020.0-226

ReRe: A Lightweight Real-time Ready-to-Go Anomaly Detection Approach for Time Series

Authors: Ming-Chang Lee, Jia-Chun Lin, Ernst Gunnar Gran

Abstract: Anomaly detection is an active research topic in many different fields such as intrusion detection, network monitoring, system health monitoring, IoT healthcare, etc. However, many existing anomaly detection approaches require either human intervention or domain knowledge, and may suffer from high computation complexity, consequently hindering their applicability in real-world scenarios. Therefore… ▽ More Anomaly detection is an active research topic in many different fields such as intrusion detection, network monitoring, system health monitoring, IoT healthcare, etc. However, many existing anomaly detection approaches require either human intervention or domain knowledge, and may suffer from high computation complexity, consequently hindering their applicability in real-world scenarios. Therefore, a lightweight and ready-to-go approach that is able to detect anomalies in real-time is highly sought-after. Such an approach could be easily and immediately applied to perform time series anomaly detection on any commodity machine. The approach could provide timely anomaly alerts and by that enable appropriate countermeasures to be undertaken as early as possible. With these goals in mind, this paper introduces ReRe, which is a Real-time Ready-to-go proactive Anomaly Detection algorithm for streaming time series. ReRe employs two lightweight Long Short-Term Memory (LSTM) models to predict and jointly determine whether or not an upcoming data point is anomalous based on short-term historical data points and two long-term self-adaptive thresholds. Experiments based on real-world time-series datasets demonstrate the good performance of ReRe in real-time anomaly detection without requiring human intervention or domain knowledge. △ Less

Submitted 4 December, 2022; v1 submitted 5 April, 2020; originally announced April 2020.

Comments: 10 pages, 9 figures, COMPSAC 2020

arXiv:2004.02113 [pdf]

doi 10.1155/2020/8478527

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Authors: Gwenaelle Cunha Sergio, Minho Lee

Abstract: Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there's currently a demand for more empathetic computers to aid humans in applications such as augme… ▽ More Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there's currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually and/or hearing impaired people. Current approaches overlook the video's emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video's emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: Published (https://www.hindawi.com/journals/mpe/2020/8478527/)

Journal ref: Mathematical Problems in Engineering 2020 (2020) 1-15

arXiv:2004.00407 [pdf, other]

Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data

Authors: Heeyoung Kwak, Minwoo Lee, Seunghyun Yoon, Jooyoung Chang, Sangmin Park, Kyomin Jung

Abstract: Adverse Drug Reaction (ADR) is a significant public health concern world-wide. Numerous graph-based methods have been applied to biomedical graphs for predicting ADRs in pre-marketing phases. ADR detection in post-market surveillance is no less important than pre-marketing assessment, and ADR detection with large-scale clinical data have attracted much attention in recent years. However, there are… ▽ More Adverse Drug Reaction (ADR) is a significant public health concern world-wide. Numerous graph-based methods have been applied to biomedical graphs for predicting ADRs in pre-marketing phases. ADR detection in post-market surveillance is no less important than pre-marketing assessment, and ADR detection with large-scale clinical data have attracted much attention in recent years. However, there are not many studies considering graph structures from clinical data for detecting an ADR signal, which is a pair of a prescription and a diagnosis that might be a potential ADR. In this study, we develop a novel graph-based framework for ADR signal detection using healthcare claims data. We construct a Drug-disease graph with nodes representing the medical codes. The edges are given as the relationships between two codes, computed using the data. We apply Graph Neural Network to predict ADR signals, using labels from the Side Effect Resource database. The model shows improved AUROC and AUPRC performance of 0.795 and 0.775, compared to other algorithms, showing that it successfully learns node representations expressive of those relationships. Furthermore, our model predicts ADR pairs that do not exist in the established ADR database, showing its capability to supplement the ADR database. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: To appear at PAKDD 2020

arXiv:2003.04774 [pdf, other]

doi 10.1016/j.compchemeng.2021.107343

ENTMOOT: A Framework for Optimization over Ensemble Tree Models

Authors: Alexander Thebelt, Jan Kronqvist, Miten Mistry, Robert M. Lee, Nathan Sudermann-Merx, Ruth Misener

Abstract: Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which i… ▽ More Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which is due to their difficult-to optimize structure and the lack of a reliable uncertainty measure. ENTMOOT is our new framework for integrating (already trained) tree models into larger optimization problems. The contributions of ENTMOOT include: (i) explicitly introducing a reliable uncertainty measure that is compatible with tree models, (ii) solving the larger optimization problems that incorporate these uncertainty aware tree models, (iii) proving that the solutions are globally optimal, i.e. no better solution exists. In particular, we show how the ENTMOOT approach allows a simple integration of tree models into decision-making and black-box optimization, where it proves as a strong competitor to commonly-used frameworks. △ Less

Submitted 18 May, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

Comments: 33 pages, 10 figures, 2 tables

arXiv:2002.03808 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207653

Vocoder-free End-to-End Voice Conversion with Transformer Network

Authors: June-Woo Kim, Ho-Young Jung, Minho Lee

Abstract: Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge amount of computation expense for training process. The additional pre/post processing such as MFB and vocoder is not essential to convert real human speech to othe… ▽ More Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge amount of computation expense for training process. The additional pre/post processing such as MFB and vocoder is not essential to convert real human speech to others. It is possible to only use the raw spectrum along with the phase to generate different style of voices with clear pronunciation. In this regard, we propose a fast and effective approach to convert realistic voices using raw spectrum in a parallel manner. Our transformer-based model architecture which does not have any CNN or RNN layers has shown the advantage of learning fast and solved the limitation of sequential computation of conventional RNN. In this paper, we introduce a vocoder-free end-to-end voice conversion method using transformer network. The presented conversion model can also be used in speaker adaptation for speech recognition. Our approach can convert the source voice to a target voice without using MFB and vocoder. We can get an adapted MFB for speech recognition by multiplying the converted magnitude with phase. We perform our voice conversion experiments on TIDIGITS dataset using the metrics such as naturalness, similarity, and clarity with mean opinion score, respectively. △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: Work in progress

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN)

Showing 1–50 of 117 results for author: Lee, M