Skip to main content

Showing 1–26 of 26 results for author: Tang, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2403.08971  [pdf, other

    stat.CO

    Designing a Data Science simulation with MERITS: A Primer

    Authors: Corrine F Elliott, James Duncan, Tiffany M Tang, Merle Behr, Karl Kumbier, Bin Yu

    Abstract: Simulations play a crucial role in the modern scientific process. Yet despite (or due to) their ubiquity, the Data Science community shares neither a comprehensive definition for a "high-quality" study nor a consolidated guide to designing one. Inspired by the Predictability-Computability-Stability (PCS) framework for 'veridical' Data Science, we propose six MERITS that a Data Science simulation s… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 26 pages (main text); 1 figure; 2 tables; *Authors contributed equally to this manuscript; **Authors contributed equally to this manuscript

  2. arXiv:2401.15076  [pdf, other

    stat.ME q-bio.PE q-bio.QM

    Comparative Analysis of Practical Identifiability Methods for an SEIR Model

    Authors: Omar Saucedo, Amanda Laubmeier, Tingting Tang, Benjamin Levy, Lale Asik, Tim Pollington, Olivia Prosper

    Abstract: Identifiability of a mathematical model plays a crucial role in parameterization of the model. In this study, we establish the structural identifiability of a Susceptible-Exposed-Infected-Recovered (SEIR) model given different combinations of input data and investigate practical identifiability with respect to different observable data, data frequency, and noise distributions. The practical identi… ▽ More

    Submitted 24 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Minor changes to clarify why structural identifiability with respect to incidence data was not perform

  3. arXiv:2311.13036  [pdf, other

    cs.LG stat.ML

    Favour: FAst Variance Operator for Uncertainty Rating

    Authors: Thomas D. Ahle, Sahar Karimi, Peter Tak Peter Tang

    Abstract: Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions. By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference. Unfortunately many inference samples are often needed, the overhead of which greatly hinder BNN's wide adoption. To mitigate this, previous work proposed propagating the first and second moments… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  4. arXiv:2307.13628  [pdf

    physics.soc-ph stat.AP

    Develo** a Comprehensive Model for Feasibility Analysis of Separated Bike Lanes and Electric Bike Lanes: A Case Study in Shanghai, China

    Authors: Lu Ling, Yuntao Guo, Xiongfei Lai, Tianpei Tang, Xinghua Li

    Abstract: Electric bikes (e-bikes), including lightweight e-bikes with pedals and e-bikes in scooter form, are gaining popularity around the world because of their convenience and affordability. At the same time, e-bike-related accidents are also on the rise and many policymakers and practitioners are debating the feasibility of building e-bike lanes in their communities. By collecting e-bikes and bikes dat… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  5. arXiv:2307.01932  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    MDI+: A Flexible Random Forest-Based Feature Importance Framework

    Authors: Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu

    Abstract: Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Speci… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  6. arXiv:2306.07456  [pdf

    stat.AP stat.ME

    On the Temporal-spatial Analysis of Estimating Urban Traffic Patterns Via GPS Trace Data of Car-hailing Vehicles

    Authors: Jiannan Mao, Lan Liu, Hao Huang, Weike Lu, Kaiyu Yang, Tianli Tang, Haotian Shi

    Abstract: Car-hailing services have become a prominent data source for urban traffic studies. Extracting useful information from car-hailing trace data is essential for effective traffic management, while discrepancies between car-hailing vehicles and urban traffic should be considered. This paper proposes a generic framework for estimating and analyzing urban traffic patterns using car-hailing trace data.… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  7. arXiv:2303.16299  [pdf, other

    stat.ME stat.ML

    Comparison of Methods that Combine Multiple Randomized Trials to Estimate Heterogeneous Treatment Effects

    Authors: Carly Lupton Brantner, Trang Quynh Nguyen, Tengjie Tang, Congwen Zhao, Hwanhee Hong, Elizabeth A. Stuart

    Abstract: Individualized treatment decisions can improve health outcomes, but using data to make these decisions in a reliable, precise, and generalizable way is challenging with a single dataset. Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment to better estimate heterogeneous treatment effects. This paper discusses several non-p… ▽ More

    Submitted 15 November, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  8. arXiv:2303.05659  [pdf, other

    stat.ME

    A marginal structural model for normal tissue complication probability

    Authors: Thai-Son Tang, Zhihui Liu, Ali Hosni, John Kim, Olli Saarela

    Abstract: The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modelling has centered around making patient-level risk predictions with fea… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

  9. arXiv:2302.01529  [pdf, other

    math.NA stat.ML

    Failure-informed adaptive sampling for PINNs, Part II: combining with re-sampling and subset simulation

    Authors: Zhiwei Gao, Tao Tang, Liang Yan, Tao Zhou

    Abstract: This is the second part of our series works on failure-informed adaptive sampling for physic-informed neural networks (FI-PINNs). In our previous work \cite{gao2022failure}, we have presented an adaptive sampling framework by using the failure probability as the posterior error indicator, where the truncated Gaussian model has been adopted for estimating the indicator. In this work, we present two… ▽ More

    Submitted 28 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  10. arXiv:2302.00755  [pdf, other

    stat.ML cs.LG stat.ME

    Hierarchical shrinkage Gaussian processes: applications to computer code emulation and dynamical system recovery

    Authors: Tao Tang, Simon Mak, David Dunson

    Abstract: In many areas of science and engineering, computer simulations are widely used as proxies for physical experiments, which can be infeasible or unethical. Such simulations can often be computationally expensive, and an emulator can be trained to efficiently predict the desired response surface. A widely-used emulator is the Gaussian process (GP), which provides a flexible framework for efficient pr… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  11. arXiv:2211.00268  [pdf, other

    stat.ME stat.AP

    Stacking designs: designing multi-fidelity computer experiments with target predictive accuracy

    Authors: Chih-Li Sung, Yi Ji, Simon Mak, Wenjia Wang, Tao Tang

    Abstract: In an era where scientific experiments can be very costly, multi-fidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and thus wishes to (i) maximize predictive power of the multi-fidelity emulator via a careful design of experiments, and (ii) ensure this model ac… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  12. arXiv:2112.07039  [pdf, other

    stat.AP q-bio.PE

    Limits of epidemic prediction using SIR models

    Authors: Omar Melikechi, Alexander L. Young, Tao Tang, Trevor Bowman, David Dunson, James Johndrow

    Abstract: The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of th… ▽ More

    Submitted 20 August, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  13. arXiv:2107.08787  [pdf

    stat.AP cs.LG

    The Future will be Different than Today: Model Evaluation Considerations when Develo** Translational Clinical Biomarker

    Authors: Yichen Lu, Jane Fridlyand, Tiffany Tang, Ting Qi, Noah Simon, Ning Leng

    Abstract: Finding translational biomarkers stands center stage of the future of personalized medicine in healthcare. We observed notable challenges in identifying robust biomarkers as some with great performance in one scenario often fail to perform well in new trials (e.g. different population, indications). With rapid development in the clinical trial world (e.g. assay, disease definition), new trials ver… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: Paper has 4 pages, 2 figures. Appendix are supplementary at the end

  14. arXiv:2105.08620  [pdf, other

    stat.ML cs.CV cs.LG

    Adversarial Examples Detection with Bayesian Neural Network

    Authors: Yao Li, Tongyi Tang, Cho-Jui Hsieh, Thomas C. M. Lee

    Abstract: In this paper, we propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate the output distribution of a deep neural network. With these observations, we propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example de… ▽ More

    Submitted 22 February, 2024; v1 submitted 18 May, 2021; originally announced May 2021.

  15. arXiv:2011.06593  [pdf, other

    q-bio.QM stat.AP

    A stability-driven protocol for drug response interpretable prediction (staDRIP)

    Authors: Xiao Li, Tiffany M. Tang, Xuewei Wang, Jean-Pierre A. Kocher, Bin Yu

    Abstract: Modern cancer -omics and pharmacological data hold great promise in precision cancer medicine for develo** individualized patient treatments. However, high heterogeneity and noise in such data pose challenges for predicting the response of cancer cell lines to therapeutic drugs accurately. As a result, arbitrary human judgment calls are rampant throughout the predictive modeling pipeline. In thi… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

  16. arXiv:2010.08899  [pdf, other

    cs.LG cs.DC stat.ML

    Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism

    Authors: Vipul Gupta, Dhruv Choudhary, ** Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, Michael W. Mahoney

    Abstract: In this paper, we consider hybrid parallelism -- a paradigm that employs both Data Parallelism (DP) and Model Parallelism (MP) -- to scale distributed training of large recommendation models. We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training. DCT filters the entities to be communicated across the network through a simple… ▽ More

    Submitted 21 May, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

    Comments: 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

  17. arXiv:2010.07468  [pdf, other

    cs.LG cs.CV stat.ML

    AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

    Authors: Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar Tatikonda, Nicha Dvornek, Xenophon Papademetris, James S. Duncan

    Abstract: Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptiv… ▽ More

    Submitted 20 December, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

    Journal ref: NeurIPS 2020

  18. arXiv:2008.06656  [pdf

    stat.AP

    An Augmented Regression Model for Tensors with Missing Values

    Authors: Feng Wang, Mostafa Reisi Gahrooei, Zhen Zhong, Tao Tang, Jianjun Shi

    Abstract: Heterogeneous but complementary sources of data provide an unprecedented opportunity for develo** accurate statistical models of systems. Although the existing methods have shown promising results, they are mostly applicable to situations where the system output is measured in its complete form. In reality, however, it may not be feasible to obtain the complete output measurement of a system, wh… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

  19. arXiv:2006.14078  [pdf, other

    stat.ML cs.LG cs.SC math.AG stat.AP

    Machine learning the real discriminant locus

    Authors: Edgar A. Bernal, Jonathan D. Hauenstein, Dhagash Mehta, Margaret H. Regan, Tingting Tang

    Abstract: Parameterized systems of polynomial equations arise in many applications in science and engineering with the real solutions describing, for example, equilibria of a dynamical system, linkages satisfying design constraints, and scene reconstruction in computer vision. Since different parameter values can have a different number of real solutions, the parameter space is decomposed into regions whose… ▽ More

    Submitted 8 August, 2022; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: 22 pages, 14 figures

  20. Curating a COVID-19 data repository and forecasting county-level death counts in the United States

    Authors: Nick Altieri, Rebecca L. Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Chao Zhang, Bin Yu

    Abstract: As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative de… ▽ More

    Submitted 9 August, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: Authors ordered alphabetically. All authors contributed significantly to this work. All collected data, modeling code, forecasts, and visualizations are updated daily and available at \url{https://github.com/Yu-Group/covid19-severity-prediction}

    Journal ref: Published in Harvard Data Science Review, 2020

  21. arXiv:1903.11232  [pdf, other

    stat.ME stat.AP stat.ML

    Feature Selection for Data Integration with Mixed Multi-view Data

    Authors: Yulia Baker, Tiffany M. Tang, Genevera I. Allen

    Abstract: Data integration methods that analyze multiple sources of data simultaneously can often provide more holistic insights than can separate inquiries of each data source. Motivated by the advantages of data integration in the era of "big data", we investigate feature selection for high-dimensional multi-view data with mixed data types (e.g. continuous, binary, count-valued). This heterogeneity of mul… ▽ More

    Submitted 10 January, 2020; v1 submitted 26 March, 2019; originally announced March 2019.

  22. arXiv:1812.11212  [pdf

    cond-mat.soft cond-mat.mtrl-sci cs.LG physics.comp-ph stat.ML

    Machine learning enables polymer cloud-point engineering via inverse design

    Authors: Jatin N. Kumar, Qianxiao Li, Karen Y. T. Tang, Tonio Buonassisi, Anibal L. Gonzalez-Oyarce, Jun Ye

    Abstract: Inverse design is an outstanding challenge in disordered systems with multiple length scales such as polymers, particularly when designing polymers with desired phase behavior. We demonstrate high-accuracy tuning of poly(2-oxazoline) cloud point via machine learning. With a design space of four repeating units and a range of molecular masses, we achieve an accuracy of 4 °C root mean squared error… ▽ More

    Submitted 21 November, 2018; originally announced December 2018.

    Comments: 27 pages made up of main article and electronic supplementary information

  23. arXiv:1810.07716  [pdf, other

    stat.ML cs.LG math.AG

    The loss surface of deep linear networks viewed through the algebraic geometry lens

    Authors: Dhagash Mehta, Tianran Chen, Tingting Tang, Jonathan D. Hauenstein

    Abstract: By using the viewpoint of modern computational algebraic geometry, we explore properties of the optimization landscapes of the deep linear neural network models. After clarifying on the various definitions of "flat" minima, we show that the geometrically flat minima, which are merely artifacts of residual continuous symmetries of the deep linear networks, can be straightforwardly removed by a gene… ▽ More

    Submitted 17 October, 2018; originally announced October 2018.

    Comments: 16 pages (2-columns), 5 figures

  24. arXiv:1810.00832  [pdf, other

    stat.ME

    Integrated Principal Components Analysis

    Authors: Tiffany M. Tang, Genevera I. Allen

    Abstract: Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new unsupervised data integration method named Integrated Principal Components Analysis (iPCA), which is a model-based generalization of PCA and serves as a practical tool to find and visualize com… ▽ More

    Submitted 3 April, 2021; v1 submitted 1 October, 2018; originally announced October 2018.

  25. arXiv:1805.08952  [pdf, other

    cs.LG stat.ML

    Dictionary Learning by Dynamical Neural Networks

    Authors: Tsung-Han Lin, ** Tak Peter Tang

    Abstract: A dynamical neural network consists of a set of interconnected neurons that interact over time continuously. It can exhibit computational properties in the sense that the dynamical system's evolution and/or limit points in the associated state space can correspond to numerical solutions to certain mathematical optimization or learning problems. Such a computational system is particularly attractiv… ▽ More

    Submitted 22 May, 2018; originally announced May 2018.

  26. arXiv:1802.05374  [pdf, other

    math.OC cs.LG stat.ML

    A Progressive Batching L-BFGS Method for Machine Learning

    Authors: Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael Shi, ** Tak Peter Tang

    Abstract: The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization pr… ▽ More

    Submitted 30 May, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: ICML 2018. 25 pages, 17 figures, 2 tables