Skip to main content

Showing 1–29 of 29 results for author: Ho, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.17480  [pdf, other

    q-bio.PE stat.ME

    Detection of evolutionary shifts in variance under an Ornsten-Uhlenbeck model

    Authors: Wensha Zhang, Lam Si Tung Ho, Toby Kenney

    Abstract: 1. Abrupt environmental changes can lead to evolutionary shifts in not only mean (optimal value), but also variance of descendants in trait evolution. There are some methods to detect shifts in optimal value but few studies consider shifts in variance. 2. We use a multi-optima and multi-variance OU process model to describe the trait evolution process with shifts in both optimal value and variance… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  2. arXiv:2312.00656  [pdf, other

    cs.LG cs.AI stat.ML

    Simple Transferability Estimation for Regression Tasks

    Authors: Cuong N. Nguyen, Phong Tran, Lam Si Tung Ho, Vu Dinh, Anh T. Tran, Tal Hassner, Cuong V. Nguyen

    Abstract: We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel… ▽ More

    Submitted 3 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Paper published at The 39th Conference on Uncertainty in Artificial Intelligence (UAI) 2023

  3. arXiv:2310.05892  [pdf, ps, other

    stat.ML cs.LG

    A Generalization Bound of Deep Neural Networks for Dependent Data

    Authors: Quan Huu Do, Binh T. Nguyen, Lam Si Tung Ho

    Abstract: Existing generalization bounds for deep neural networks require data to be independent and identically distributed (iid). This assumption may not hold in real-life applications such as evolutionary biology, infectious disease epidemiology, and stock price prediction. This work establishes a generalization bound of feed-forward neural networks for non-stationary $φ$-mixing data.

    Submitted 9 October, 2023; originally announced October 2023.

  4. arXiv:2205.06622  [pdf

    stat.AP

    What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions

    Authors: Ling **, Alina Lazar, Caitlin Brown, Bingrong Sun, Venu Garikapati, Srinath Ravulaparthy, Qianmiao Chen, Alexander Sim, Kesheng Wu, Tin Ho, Thomas Wenzel, C. Anna Spurlock

    Abstract: What makes you hold on that old car? While the vast majority of the household vehicles are still powered by conventional internal combustion engines, the progress of adopting emerging vehicle technologies will critically depend on how soon the existing vehicles are transacted out of the household fleet. Leveraging a nationally representative longitudinal data set, the Panel Study of Income Dynamic… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

  5. arXiv:2205.05240  [pdf

    stat.AP econ.GN stat.CO

    Using Open Data and Open-Source Software to Develop Spatial Indicators of Urban Design and Transport Features for Achieving Healthy and Sustainable Cities

    Authors: Geoff Boeing, Carl Higgs, Shiqin Liu, Billie Giles-Corti, James F Sallis, Ester Cerin, Melanie Lowe, Deepti Adlakha, Erica Hinckson, Anne Vernez Moudon, Deborah Salvo, Marc A Adams, Ligia Vizeu Barrozo, Tamara Bozovic, Xavier Delclòs-Alió, Jan Dygrýn, Sara Ferguson, Klaus Gebel, Thanh Phuong Ho, Poh-Chin Lai, Joan Carles Martori, Kornsupha Nitvimol, Ana Queralt, Jennifer D Roberts, Garba H Sambo , et al. (5 additional authors not shown)

    Abstract: Benchmarking and monitoring urban design and transport features is critical to achieving local and international health and sustainability goals. However, most urban indicator frameworks use coarse spatial scales that only allow between-city comparisons or require expensive, technical, local spatial analyses for within-city comparisons. This study developed a reusable open-source urban indicator c… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Journal ref: The Lancet Global Health 10 (6), 907-918 (2022)

  6. arXiv:2204.06032  [pdf, other

    q-bio.PE stat.ME

    Evolutionary shift detection with ensemble variable selection

    Authors: Wensha Zhang, Toby Kenney, Lam Si Tung Ho

    Abstract: 1. Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. 2. We propose an ensemble variable selection method (R package ELPASO) for the evolutionary shift detection task and compare it with existing methods (R packages l1ou and PhylogeneticEM) under several scenarios.… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  7. arXiv:2109.13061  [pdf, other

    cs.LG stat.ML

    Searching for Minimal Optimal Neural Networks

    Authors: Lam Si Tung Ho, Vu Dinh

    Abstract: Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough. Therefore, it is desirable to select an appropriate size for neural networks. The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task. Despite its popularity, there is n… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  8. arXiv:2104.00151  [pdf, other

    stat.ME q-bio.PE

    Ancestral state reconstruction with large numbers of sequences and edge-length estimation

    Authors: Lam Si Tung Ho, Edward Susko

    Abstract: Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

  9. arXiv:2010.08097  [pdf, other

    cs.LG math.ST stat.ML

    Consistent Feature Selection for Analytic Deep Neural Networks

    Authors: Vu Dinh, Lam Si Tung Ho

    Abstract: One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unide… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  10. arXiv:2007.08714  [pdf, other

    cs.CV cs.LG stat.ML

    Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources

    Authors: Yun-Yun Tsai, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Current transfer learning methods are mainly based on finetuning a pretrained model with target-domain data. Motivated by the techniques from adversarial machine learning (ML) that are capable of manipulating the model prediction via data perturbations, in this paper we propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box ML model (e.g., a p… ▽ More

    Submitted 29 July, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

  11. arXiv:2006.16679  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

    Authors: Zhongxiang Dai, Yizhou Chen, Kian Hsiang Low, Patrick Jaillet, Teck-Hua Ho

    Abstract: This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship amon… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

    Comments: Accepted to 37th International Conference on Machine Learning (ICML 2020), Extended version with proofs and additional experimental details and results, 27 pages

  12. arXiv:2006.00334  [pdf, other

    stat.ML cs.LG math.ST

    Consistent feature selection for neural networks via Adaptive Group Lasso

    Authors: Vu Dinh, Lam Si Tung Ho

    Abstract: One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features play significant roles in influencing the prediction accuracy. To overcome this issue, many regularization procedures for learning with neural networks have be… ▽ More

    Submitted 2 December, 2021; v1 submitted 30 May, 2020; originally announced June 2020.

  13. arXiv:2003.10336  [pdf, other

    stat.AP q-bio.PE

    Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees

    Authors: Paul Bastide, Lam Si Tung Ho, Guy Baele, Philippe Lemey, Marc A Suchard

    Abstract: Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (H… ▽ More

    Submitted 29 September, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

  14. arXiv:1906.03222  [pdf, other

    stat.ME stat.CO

    Inferring phenotypic trait evolution on large trees with many incomplete measurements

    Authors: Gabriel Hassler, Max R. Tolkoff, William L. Allen, Lam Si Tung Ho, Philippe Lemey, Marc A. Suchard

    Abstract: Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. Existing control techniques almost universally scale poorly as the number of taxa increases. An additional challenge arises as… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: 29 pages, 7 figures, 2 tables, 3 supplementary sections

  15. arXiv:1906.02179  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Bayesian Active Learning With Abstention Feedbacks

    Authors: Cuong V. Nguyen, Lam Si Tung Ho, Huan Xu, Vu Dinh, Binh Nguyen

    Abstract: We study pool-based active learning with abstention feedbacks where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are… ▽ More

    Submitted 30 December, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: Poster presented at 2019 ICML Workshop on Human in the Loop Learning 2019 (non-archival). arXiv admin note: substantial text overlap with arXiv:1705.08481

  16. arXiv:1903.12258  [pdf, other

    q-fin.GN cs.LG q-fin.ST stat.ML

    Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market

    Authors: Rosdyana Mangir Irawan Kusuma, Trang-Thi Ho, Wei-Chun Kao, Yu-Yen Ou, Kai-Lung Hua

    Abstract: Stock market prediction is still a challenging problem because there are many factors effect to the stock market price such as company news and performance, industry performance, investor sentiment, social media sentiment and economic factors. This work explores the predictability in the stock market using Deep Convolutional Network and candlestick charts. The outcome is utilized to design a decis… ▽ More

    Submitted 25 February, 2019; originally announced March 2019.

    Comments: conference,13 pages,3 figures

  17. arXiv:1811.10115  [pdf, other

    cs.IT cs.LG stat.ML

    Recovery guarantees for polynomial approximation from dependent data with outliers

    Authors: Lam Si Tung Ho, Hayden Schaeffer, Giang Tran, Rachel Ward

    Abstract: Learning non-linear systems from noisy, limited, and/or dependent data is an important task across various scientific fields including statistics, engineering, computer science, mathematics, and many more. In general, this learning task is ill-posed; however, additional information about the data's structure or on the behavior of the unknown function can make the task well-posed. In this work, we… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: 17 pages, 1 figure

    MSC Class: 68T05; 41A10; 60F05; 68Q32; 62G08; 94A15; 65K10

  18. arXiv:1808.03591  [pdf, other

    cs.LG stat.ML

    How Complex is your classification problem? A survey on measuring classification complexity

    Authors: Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, Tin K. Ho

    Abstract: Characteristics extracted from the training datasets of classification problems have proven to be effective predictors in a number of meta-analyses. Among them, measures of classification complexity can be used to estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision… ▽ More

    Submitted 30 December, 2020; v1 submitted 10 August, 2018; originally announced August 2018.

    Comments: Survey paper

  19. arXiv:1705.08481   

    stat.ML cs.LG

    Bayesian Pool-based Active Learning With Abstention Feedbacks

    Authors: Cuong V. Nguyen, Lam Si Tung Ho, Huan Xu, Vu Dinh, Binh Nguyen

    Abstract: We study pool-based active learning with abstention feedbacks, where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These ar… ▽ More

    Submitted 2 January, 2021; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: There is a new version at arXiv:1906.02179

  20. Summertime, and the livin is easy: Winter and summer pseudoseasonal life expectancy in the United States

    Authors: Tina Ho, Andrew Noymer

    Abstract: In temperate climates, mortality is seasonal with a winter-dominant pattern, due in part to pneumonia and influenza. Cardiac causes, which are the leading cause of death in the United States, are also winter-seasonal although it is not clear why. Interactions between circulating respiratory viruses (f.e., influenza) and cardiac conditions have been suggested as a cause of winter-dominant mortality… ▽ More

    Submitted 10 March, 2017; originally announced March 2017.

    Journal ref: Demographic Research 2017; vol. 37, art.45, pp: 1445-1476

  21. arXiv:1609.09481  [pdf, ps, other

    stat.ML cs.LG

    Fast learning rates with heavy-tailed losses

    Authors: Vu Dinh, Lam Si Tung Ho, Duy Nguyen, Binh T. Nguyen

    Abstract: We study fast learning rates when the losses are not necessarily bounded and may have a distribution with heavy tails. To enable such analyses, we introduce two new conditions: (i) the envelope function $\sup_{f \in \mathcal{F}}|\ell \circ f|$, where $\ell$ is the loss function and $\mathcal{F}$ is the hypothesis class, exists and is $L^r$-integrable, and (ii) $\ell$ satisfies the multi-scale Bern… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

    Comments: Advances in Neural Information Processing Systems (NIPS 2016): 11 pages

  22. arXiv:1608.06769  [pdf, other

    stat.CO q-bio.PE

    Direct likelihood-based inference for discretely observed stochastic compartmental models of infectious disease

    Authors: Lam Si Tung Ho, Forrest W. Crawford, Marc A. Suchard

    Abstract: Stochastic compartmental models are important tools for understanding the course of infectious diseases epidemics in populations and in prospective evaluation of intervention policies. However, calculating the likelihood for discretely observed data from even simple models -- such as the ubiquitous susceptible-infectious-removed (SIR) model -- has been considered computationally intractable, since… ▽ More

    Submitted 25 July, 2018; v1 submitted 24 August, 2016; originally announced August 2016.

  23. arXiv:1603.03819  [pdf, other

    stat.CO

    Birth/birth-death processes and their computable transition probabilities with biological applications

    Authors: Lam Si Tung Ho, Jason Xu, Forrest W. Crawford, Vladimir N. Minin, Marc A. Suchard

    Abstract: Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationall… ▽ More

    Submitted 7 August, 2017; v1 submitted 11 March, 2016; originally announced March 2016.

  24. arXiv:1512.07948  [pdf, other

    q-bio.PE stat.ME

    A Relaxed Drift Diffusion Model for Phylogenetic Trait Evolution

    Authors: Mandev S. Gill, Lam Si Tung Ho, Guy Baele, Philippe Lemey, Marc A. Suchard

    Abstract: Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits… ▽ More

    Submitted 29 December, 2015; v1 submitted 24 December, 2015; originally announced December 2015.

    Comments: 35 pages, 3 figures, 5 tables. Changed from double-spaced to single-spaced

  25. arXiv:1512.03300  [pdf, other

    stat.ML

    Inference in topic models: sparsity and trade-off

    Authors: Khoat Than, Tu Bao Ho

    Abstract: Topic models are popular for modeling discrete data (e.g., texts, images, videos, links), and provide an efficient way to discover hidden structures/semantics in massive data. One of the core problems in this field is the posterior inference for individual data instances. This problem is particularly important in streaming environments, but is often intractable. In this paper, we investigate the u… ▽ More

    Submitted 10 December, 2015; originally announced December 2015.

  26. arXiv:1408.2714  [pdf, ps, other

    stat.ML

    Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers

    Authors: Vu Dinh, Lam Si Tung Ho, Nguyen Viet Cuong, Duy Nguyen, Binh T. Nguyen

    Abstract: We prove new fast learning rates for the one-vs-all multiclass plug-in classifiers trained either from exponentially strongly mixing data or from data generated by a converging drifting distribution. These are two typical scenarios where training data are not iid. The learning rates are obtained under a multiclass version of Tsybakov's margin assumption, a type of low-noise assumption, and do not… ▽ More

    Submitted 24 January, 2015; v1 submitted 12 August, 2014; originally announced August 2014.

    Comments: 12th Annual Conference on Theory and Applications of Models of Computation (TAMC 2015)

  27. arXiv:1406.3166  [pdf, ps, other

    stat.ML

    Generalization and Robustness of Batched Weighted Average Algorithm with V-geometrically Ergodic Markov Data

    Authors: Nguyen Viet Cuong, Lam Si Tung Ho, Vu Dinh

    Abstract: We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for th… ▽ More

    Submitted 12 August, 2014; v1 submitted 12 June, 2014; originally announced June 2014.

    Comments: This article was published in Proceedings of the 24th International Conference on Algorithmic Learning Theory (ALT 2013). This is the accepted version. The final publication is available at link.springer.com

  28. arXiv:1312.4527  [pdf, ps, other

    cs.LG stat.ML

    Probable convexity and its application to Correlated Topic Models

    Authors: Khoat Than, Tu Bao Ho

    Abstract: Non-convex optimization problems often arise from probabilistic modeling, such as estimation of posterior distributions. Non-convexity makes the problems intractable, and poses various obstacles for us to design efficient algorithms. In this work, we attack non-convexity by first introducing the concept of \emph{probable convexity} for analyzing convexity of real functions in practice. We then use… ▽ More

    Submitted 16 December, 2013; originally announced December 2013.

    Comments: 22 pages

  29. arXiv:1210.7053  [pdf, other

    stat.ML cs.AI cs.CV stat.ME

    Managing sparsity, time, and quality of inference in topic models

    Authors: Khoat Than, Tu Bao Ho

    Abstract: Inference is an integral part of probabilistic topic models, but is often non-trivial to derive an efficient algorithm for a specific model. It is even much more challenging when we want to find a fast inference algorithm which always yields sparse latent representations of documents. In this article, we introduce a simple framework for inference in probabilistic topic models, denoted by FW. This… ▽ More

    Submitted 14 April, 2013; v1 submitted 26 October, 2012; originally announced October 2012.