Search | arXiv e-print repository

Dendrogram of mixing measures: Hierarchical clustering and model selection for finite mixture models

Authors: Dat Do, Linh Do, Scott A. McKinley, Jonathan Terhorst, XuanLong Nguyen

Abstract: We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modeling. The dendrogram's construction is derived from the theory of convergence of the mixing measures, and as a result, we can both consistently select the… ▽ More We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modeling. The dendrogram's construction is derived from the theory of convergence of the mixing measures, and as a result, we can both consistently select the true number of mixing components and obtain the pointwise optimal convergence rate for parameter estimation from the tree, even when the model parameters are only weakly identifiable. In theory, it explicates the choice of the optimal number of clusters in hierarchical clustering. In practice, the dendrogram reveals more information on the hierarchy of subpopulations compared to traditional ways of summarizing mixture models. Several simulation studies are carried out to support our theory. We also illustrate the methodology with an application to single-cell RNA sequence analysis. △ Less

Submitted 8 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 53 pages, 11 figures

arXiv:2311.04880 [pdf, other]

doi 10.1007/s11538-024-01301-4

Inferring stochastic rates from heterogeneous snapshots of particle positions

Authors: Christopher E. Miles, Scott A. McKinley, Fangyuan Ding, Richard B. Lehoucq

Abstract: Many imaging techniques for biological systems -- like fixation of cells coupled with fluorescence microscopy -- provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demog… ▽ More Many imaging techniques for biological systems -- like fixation of cells coupled with fluorescence microscopy -- provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 33 pages, 6 figures

Journal ref: Bulletin of Mathematical Biology 86, 74 (2024)

arXiv:2310.13666 [pdf, other]

Minimal Mechanisms of Microtubule Length Regulation in Living Cells

Authors: Anna C Nelson, Melissa M Rolls, Maria-Veronica Ciocanel, Scott A McKinley

Abstract: The microtubule cytoskeleton is responsible for sustained, long-range intracellular transport of mRNAs, proteins, and organelles in neurons. Neuronal microtubules must be stable enough to ensure reliable transport, but they also undergo dynamic instability, as their plus and minus ends continuously switch between growth and shrinking. This process allows for continuous rebuilding of the cytoskelet… ▽ More The microtubule cytoskeleton is responsible for sustained, long-range intracellular transport of mRNAs, proteins, and organelles in neurons. Neuronal microtubules must be stable enough to ensure reliable transport, but they also undergo dynamic instability, as their plus and minus ends continuously switch between growth and shrinking. This process allows for continuous rebuilding of the cytoskeleton and for flexibility in injury settings. Motivated by \textit{in vivo} experimental data on microtubule behavior in \textit{Drosophila} neurons, we propose a mathematical model of dendritic microtubule dynamics, with a focus on understanding microtubule length, velocity, and state-duration distributions. We find that limitations on microtubule growth phases are needed for realistic dynamics, but the type of limiting mechanism leads to qualitatively different responses to plausible experimental perturbations. We therefore propose and investigate two minimally-complex length-limiting factors: limitation due to resource (tubulin) constraints and limitation due to catastrophe of large-length microtubules. We combine simulations of a detailed stochastic model with steady-state analysis of a mean-field ordinary differential equations model to map out qualitatively distinct parameter regimes. This provides a basis for predicting changes in microtubule dynamics, tubulin allocation, and the turnover rate of tubulin within microtubules in different experimental environments. △ Less

Submitted 4 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 34 pages, 10 figures

arXiv:2210.17175 [pdf, other]

doi 10.1145/3519939.3523440

Low-Latency, High-Throughput Garbage Collection (Extended Version)

Authors: Wenyu Zhao, Stephen M. Blackburn, Kathryn S. McKinley

Abstract: Production garbage collectors make substantial compromises in pursuit of reduced pause times. They require far more CPU cycles and memory than prior simpler collectors. concurrent copying collectors (C4, ZGC, and Shenandoah) suffer from the following design limitations. 1) Concurrent copying. They only reclaim memory by copying, which is inherently expensive with high memory bandwidth demands. Con… ▽ More Production garbage collectors make substantial compromises in pursuit of reduced pause times. They require far more CPU cycles and memory than prior simpler collectors. concurrent copying collectors (C4, ZGC, and Shenandoah) suffer from the following design limitations. 1) Concurrent copying. They only reclaim memory by copying, which is inherently expensive with high memory bandwidth demands. Concurrent copying also requires expensive read and write barriers. 2) Scalability. They depend on tracing, which in the limit and in practice does not scale. 3) Immediacy. They do not reclaim older objects promptly, incurring high memory overheads. We present LXR, which takes a very different approach to optimizing responsiveness and throughput by minimizing concurrent collection work and overheads. 1) LXR reclaims most memory without any copying by using the Immix heap structure. It then combats fragmentation with limited judicious stop-the-world copying. 2) LXR uses reference counting to achieve both scalability and immediacy, promptly reclaiming young and old objects. It uses concurrent tracing as needed for identifying cyclic garbage. 3) To minimize pause times while allowing judicious copying of mature objects, LXR introduces remembered sets for reference counting and concurrent decrement processing. 4) LXR introduces a novel low-overhead write barrier that combines coalescing reference counting, concurrent tracing, and remembered set maintenance. The result is a collector with excellent responsiveness and throughput. On the widely-used Lucene search engine with a generously sized heap, LXR has 6x higher throughput while delivering 30x lower 99.9 percentile tail latency than the popular Shenandoah production collector in its default configuration. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: 17 pages, 7 Figures. This extends the original publication with an LBO analysis (Section 5.5)

ACM Class: D.3.4

Journal ref: p76-91,PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022

arXiv:2201.01278 [pdf, other]

Understanding Power and Energy Utilization in Large Scale Production Physics Simulation Codes

Authors: Brian S. Ryu**, Arturo Vargas, Ian Karlin, Shawn A. Dawson, Kenneth Weiss, Adam Bertsch, M. Scott McKinley, Michael R. Collette, Si D. Hammond, Kevin Pedretti, Robert N. Rieben

Abstract: Power is an often-cited reason for moving to advanced architectures on the path to Exascale computing. This is due to the practical concern of delivering enough power to successfully site and operate these machines, as well as concerns over energy usage while running large simulations. Since accurate power measurements can be difficult to obtain, processor thermal design power (TDP) is a possible… ▽ More Power is an often-cited reason for moving to advanced architectures on the path to Exascale computing. This is due to the practical concern of delivering enough power to successfully site and operate these machines, as well as concerns over energy usage while running large simulations. Since accurate power measurements can be difficult to obtain, processor thermal design power (TDP) is a possible surrogate due to its simplicity and availability. However, TDP is not indicative of typical power usage while running simulations. Using commodity and advance technology systems at Lawrence Livermore National Laboratory (LLNL) and Sandia National Laboratory, we performed a series of experiments to measure power and energy usage in running simulation codes. These experiments indicate that large scale LLNL simulation codes are significantly more efficient than a simple processor TDP model might suggest. △ Less

Submitted 4 January, 2022; originally announced January 2022.

Comments: 13 pages

arXiv:2011.10092 [pdf, ps, other]

On the Hölder regularity of a linear stochastic partial-integro-differential equation with memory

Authors: Scott A. McKinley, Hung D. Nguyen

Abstract: In light of recent work on particles fluctuating in linear viscoelastic fluids, we study a linear stochastic partial-integro-differential equation with memory that is driven by a stationary noise on a bounded, smooth domain. Using the framework of generalized stationary solutions introduced in~\cite{mckinley2018anomalous}, we provide sufficient conditions on the differential operator and the noise… ▽ More In light of recent work on particles fluctuating in linear viscoelastic fluids, we study a linear stochastic partial-integro-differential equation with memory that is driven by a stationary noise on a bounded, smooth domain. Using the framework of generalized stationary solutions introduced in~\cite{mckinley2018anomalous}, we provide sufficient conditions on the differential operator and the noise to obtain the existence as well as Hölder regularity of the stationary solutions for the concerned equation. As an application of the regularity results, we compare to analogous classical results for the stochastic heat equation. When the 1d stochastic heat equation is driven by white noise, solutions are continuous with space and time regularity that is Hölder $(1/2-\ep)$ and $(1/4-\ep)$ respectively. When driven by colored-in-space noise, solutions can have a range of regularity properties depending on the structure of the noise. Here, we show that the particular form of colored-in-time memory that arises in viscoelastic diffusion applications, satisfying what is called the Fluctuation--Dissipation relationship, yields sample paths that are Hölder $(1/2-\ep)$ and $(1/2-\ep)$ in space and time. △ Less

Submitted 31 October, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

MSC Class: 60G10

arXiv:2004.12939 [pdf]

Workshop on Quantification, Communication, and Interpretation of Uncertainty in Simulation and Data Science

Authors: Ross Whitaker, William Thompson, James Berger, Baruch Fischhof, Michael Goodchild, Mary Hegarty, Christopher Jermaine, Kathryn S. McKinley, Alex Pang, Joanne Wendelberger

Abstract: Modern science, technology, and politics are all permeated by data that comes from people, measurements, or computational processes. While this data is often incomplete, corrupt, or lacking in sufficient accuracy and precision, explicit consideration of uncertainty is rarely part of the computational and decision making pipeline. The CCC Workshop on Quantification, Communication, and Interpretatio… ▽ More Modern science, technology, and politics are all permeated by data that comes from people, measurements, or computational processes. While this data is often incomplete, corrupt, or lacking in sufficient accuracy and precision, explicit consideration of uncertainty is rarely part of the computational and decision making pipeline. The CCC Workshop on Quantification, Communication, and Interpretation of Uncertainty in Simulation and Data Science explored this problem, identifying significant shortcomings in the ways we currently process, present, and interpret uncertain data. Specific recommendations on a research agenda for the future were made in four areas: uncertainty quantification in large-scale computational simulations, uncertainty quantification in data science, software support for uncertainty computation, and better integration of uncertainty quantification and communication to stakeholders. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 28 pages

Report number: ccc2014report_4

arXiv:1911.07746 [pdf, other]

doi 10.1007/s11538-020-00797-w

Renewal reward perspective on linear switching diffusion systems

Authors: Maria-Veronica Ciocanel, John Fricks, Peter R. Kramer, Scott A. McKinley

Abstract: In many biological systems, the movement of individual agents is commonly characterized as having multiple qualitatively distinct behaviors that arise from various biophysical states. This is true for vesicles in intracellular transport, micro-organisms like bacteria, or animals moving within and responding to their environment. For example, in cells the movement of vesicles, organelles and other… ▽ More In many biological systems, the movement of individual agents is commonly characterized as having multiple qualitatively distinct behaviors that arise from various biophysical states. This is true for vesicles in intracellular transport, micro-organisms like bacteria, or animals moving within and responding to their environment. For example, in cells the movement of vesicles, organelles and other cargo are affected by their binding to and unbinding from cytoskeletal filaments such as microtubules through molecular motor proteins. A typical goal of theoretical or numerical analysis of models of such systems is to investigate the effective transport properties and their dependence on model parameters. While the effective velocity of particles undergoing switching diffusion is often easily characterized in terms of the long-time fraction of time that particles spend in each state, the calculation of the effective diffusivity is more complicated because it cannot be expressed simply in terms of a statistical average of the particle transport state at one moment of time. However, it is common that these systems are regenerative, in the sense that they can be decomposed into independent cycles marked by returns to a base state. Using decompositions of this kind, we calculate effective transport properties by computing the moments of the dynamics within each cycle and then applying renewal-reward theory. This method provides a useful alternative large-time analysis to direct homogenization for linear advection-reaction-diffusion partial differential equation models. Moreover, it applies to a general class of semi-Markov processes and certain stochastic differential equations that arise in models of intracellular transport. Applications of the proposed framework are illustrated for case studies such as mRNA transport in develo** oocytes and processive cargo movement by teams of motor proteins. △ Less

Submitted 18 November, 2019; originally announced November 2019.

Comments: 35 pages, 6 figures

arXiv:1910.05850 [pdf, other]

doi 10.1007/s11538-020-00847-3

Topological data analysis approaches to uncovering the timing of ring structure onset in filamentous networks

Authors: Maria-Veronica Ciocanel, Riley Juenemann, Adriana T. Dawes, Scott A. McKinley

Abstract: Improvements in experimental and computational technologies have led to significant increases in data available for analysis. Topological data analysis (TDA) is an emerging area of mathematical research that can identify structures in these data sets. Here we develop a TDA method to detect physical structures in a cell that persist over time. In most cells, protein filaments (actin) interact with… ▽ More Improvements in experimental and computational technologies have led to significant increases in data available for analysis. Topological data analysis (TDA) is an emerging area of mathematical research that can identify structures in these data sets. Here we develop a TDA method to detect physical structures in a cell that persist over time. In most cells, protein filaments (actin) interact with motor proteins (myosins) and organize into polymer networks and higher-order structures. An example of these structures are ring channels that maintain constant diameters over time and play key roles in processes such as cell division, development, and wound healing. The interactions of actin with myosin can be challenging to investigate experimentally in living systems, given limitations in filament visualization \textit{in vivo}. We therefore use complex agent-based models that simulate mechanical and chemical interactions of polymer proteins in cells. To understand how filaments organize into structures, we propose a TDA method that assesses effective ring generation in data consisting of simulated actin filament positions through time. We analyze the topological structure of point clouds sampled along these actin filaments and propose an algorithm for connecting significant topological features in time. We introduce visualization tools that allow the detection of dynamic ring structure formation. This method provides a rigorous way to investigate how specific interactions and parameters may impact the timing of filamentous network organization. △ Less

Submitted 16 November, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

Comments: 20 pages, 9 figures

MSC Class: 55-04; 92-00; 92C37

arXiv:1909.02212 [pdf, other]

Author Growth Outstrips Publication Growth in Computer Science and Publication Quality Correlates with Collaboration

Authors: Stephen M. Blackburn, Kathryn S. McKinley, Lexing Xie

Abstract: Although the computer science community successfully harnessed exponential increases in computer performance to drive societal and economic change, the exponential growth in publications is proving harder to accommodate. To gain a deeper understanding of publication growth and inform how the computer science community should handle this growth, we analyzed publication practices from several perspe… ▽ More Although the computer science community successfully harnessed exponential increases in computer performance to drive societal and economic change, the exponential growth in publications is proving harder to accommodate. To gain a deeper understanding of publication growth and inform how the computer science community should handle this growth, we analyzed publication practices from several perspectives: ACM sponsored publications in the ACM Digital Library as a whole: subdisciplines captured by ACM's Special Interest Groups (SIGs); ten top conferences; institutions; four top U.S. departments; authors; faculty; and PhDs between 1990 and 2012. ACM publishes a large fraction of all computer science research. We first summarize how we believe our main findings inform (1) expectations on publication growth, (2) how to distinguish research quality from output quantity; and (3) the evaluation of individual researchers. We then further motivate the study of computer science publication practices and describe our methodology and results in detail. △ Less

Submitted 5 September, 2019; originally announced September 2019.

arXiv:1904.03815 [pdf, other]

Quasi-Direct Drive for Low-Cost Compliant Robotic Manipulation

Authors: David V. Gealy, Stephen McKinley, Brent Yi, Philipp Wu, Phillip R. Downey, Greg Balke, Allan Zhao, Menglong Guo, Rachel Thomasson, Anthony Sinclair, Peter Cuellar, Zoe McCarthy, Pieter Abbeel

Abstract: Robots must cost less and be force-controlled to enable widespread, safe deployment in unconstrained human environments. We propose Quasi-Direct Drive actuation as a capable paradigm for robotic force-controlled manipulation in human environments at low-cost. Our prototype - Blue - is a human scale 7 Degree of Freedom arm with 2kg payload. Blue can cost less than $5000. We show that Blue has dynam… ▽ More Robots must cost less and be force-controlled to enable widespread, safe deployment in unconstrained human environments. We propose Quasi-Direct Drive actuation as a capable paradigm for robotic force-controlled manipulation in human environments at low-cost. Our prototype - Blue - is a human scale 7 Degree of Freedom arm with 2kg payload. Blue can cost less than $5000. We show that Blue has dynamic properties that meet or exceed the needs of human operators: the robot has a nominal position-control bandwidth of 7.5Hz and repeatability within 4mm. We demonstrate a Virtual Reality based interface that can be used as a method for telepresence and collecting robot training demonstrations. Manufacturability, scaling, and potential use-cases for the Blue system are also addressed. Videos and additional information can be found online at berkeleyopenarms.github.io △ Less

Submitted 11 April, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Comments: This is our long version - 8 pages. Our 6 page version without a discussion of thermal limits was accepted to ICRA 2019. 11 Figures

arXiv:1901.01328 [pdf, other]

doi 10.1145/3297858.3304031

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory

Authors: Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, Felix Xiaozhu Lin

Abstract: Stream analytics have an insatiable demand for memory and performance. Emerging hybrid memories combine commodity DDR4 DRAM with 3D-stacked High Bandwidth Memory (HBM) DRAM to meet such demands. However, achieving this promise is challenging because (1) HBM is capacity-limited and (2) HBM boosts performance best for sequential access and high parallelism workloads. At first glance, stream analytic… ▽ More Stream analytics have an insatiable demand for memory and performance. Emerging hybrid memories combine commodity DDR4 DRAM with 3D-stacked High Bandwidth Memory (HBM) DRAM to meet such demands. However, achieving this promise is challenging because (1) HBM is capacity-limited and (2) HBM boosts performance best for sequential access and high parallelism workloads. At first glance, stream analytics appear a particularly poor match for HBM because they have high capacity demands and data grou** operations, their most demanding computations, use random access. This paper presents the design and implementation of StreamBox-HBM, a stream analytics engine that exploits hybrid memories to achieve scalable high performance. StreamBox-HBM performs data grou** with sequential access sorting algorithms in HBM, in contrast to random access hashing algorithms commonly used in DRAM. StreamBox-HBM solely uses HBM to store Key Pointer Array (KPA) data structures that contain only partial records (keys and pointers to full records) for grou** operations. It dynamically creates and manages prodigious data and pipeline parallelism, choosing when to allocate KPAs in HBM. It dynamically optimizes for both the high bandwidth and limited capacity of HBM, and the limited bandwidth and high capacity of standard DRAM. StreamBox-HBM achieves 110 million records per second and 238 GB/s memory bandwidth while effectively utilizing all 64 cores of Intel's Knights Landing, a commercial server with hybrid memory. It outperforms stream engines with sequential access algorithms without KPAs by 7x and stream engines with random access algorithms by an order of magnitude in throughput. To the best of our knowledge, StreamBox-HBM is the first stream engine optimized for hybrid memories. △ Less

Submitted 28 January, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

arXiv:1808.00064 [pdf, other]

Emulating Hybrid Memory on NUMA Hardware

Authors: Shoaib Akram, Jennifer B. Sartor, Kathryn S. McKinley, Lieven Eeckhout

Abstract: Non-volatile memory (NVM) has the potential to disrupt the boundary between memory and storage, including the abstractions that manage this boundary. Researchers comparing the speed, durability, and abstractions of hybrid systems with DRAM, NVM, and disk to traditional systems typically use simulation, which makes it easy to evaluate different hardware technologies and parameters. Unfortunately, s… ▽ More Non-volatile memory (NVM) has the potential to disrupt the boundary between memory and storage, including the abstractions that manage this boundary. Researchers comparing the speed, durability, and abstractions of hybrid systems with DRAM, NVM, and disk to traditional systems typically use simulation, which makes it easy to evaluate different hardware technologies and parameters. Unfortunately, simulation is extremely slow, limiting the number of applications and dataset sizes in the evaluation. Simulation typically precludes realistic multiprogram workloads and considering runtime and operating system design alternatives. Good methodology embraces a variety of techniques for validation, expanding the experimental scope, and uncovering new insights. This paper introduces an emulation platform for hybrid memory that uses commodity NUMA servers. Emulation complements simulation well, offering speed and accuracy for realistic workloads, and richer software experimentation. We use a thread-local socket to emulate DRAM and the remote socket to emulate NVM. We use standard C library routines to allocate heap memory in the DRAM or NVM socket for use with explicit memory management or garbage collection. We evaluate the emulator using various configurations of write-rationing garbage collectors that improve NVM lifetimes by limiting writes to NVM, and use 15 applications from three benchmark suites with various datasets and workload configurations. We show emulation enhances simulation results. The two systems confirm most trends, such as NVM write and read rates of different software configurations, increasing our confidence for predicting future system effects. Emulation adds novel insights, such as the non-linear effects of multi-program workloads on write rates. △ Less

Submitted 31 July, 2018; originally announced August 2018.

arXiv:1807.00071 [pdf]

GOTO Rankings Considered Helpful

Authors: Emery Berger, Stephen M. Blackburn, Carla Brodley, H. V. Jagadish, Kathryn S. McKinley, Mario A. Nascimento, Minjeong Shin, Lexing Xie

Abstract: Rankings are a fact of life. Whether or not one likes them, they exist and are influential. Within academia, and in computer science in particular, rankings not only capture our attention but also widely influence people who have a limited understanding of computing science research, including prospective students, university administrators, and policy-makers. In short, rankings matter. This posit… ▽ More Rankings are a fact of life. Whether or not one likes them, they exist and are influential. Within academia, and in computer science in particular, rankings not only capture our attention but also widely influence people who have a limited understanding of computing science research, including prospective students, university administrators, and policy-makers. In short, rankings matter. This position paper advocates for the adoption of "GOTO rankings": rankings that use Good data, are Open, Transparent, and Objective, and the rejection of rankings that do not meet these criteria. △ Less

Submitted 24 April, 2019; v1 submitted 29 June, 2018; originally announced July 2018.

Comments: Accepted, to appear in Communications of the ACM

arXiv:1805.00036 [pdf, other]

doi 10.3847/1538-4357/aabfea

Searching for Inflow Towards Massive Starless Clump Candidates Identified in the Bolocam Galactic Plane Survey

Authors: Jenny Calahan, Yancy Shirley, Brian Svoboda, Elizabeth Ivanov, Jonathan Schmid, Anna Pulley, Jennifier Lautenbach, Nicole Zawadzki, Christopher Bullivant, Claire Cook, Laurin Gray, Andrew Henrici, Massimo Pascale, Carter Bosse, Quadry Chance, Sarah Choi, Marina Dunn, Ramon Jame-Frias, Ian Kearsley, Joseph Kelledy, Collin Lewin, Qasim Mahmood, Scott McKinley, Adriana Mitchell, Daniel Robinson

Abstract: Recent Galactic plane surveys of dust continuum emission at long wavelengths have identified a population of dense, massive clumps with no evidence for on-going star formation. These massive starless clump candidates are excellent sites to search for the initial phases of massive star formation before the feedback from massive star formation effects the clump. In this study, we search for the spec… ▽ More Recent Galactic plane surveys of dust continuum emission at long wavelengths have identified a population of dense, massive clumps with no evidence for on-going star formation. These massive starless clump candidates are excellent sites to search for the initial phases of massive star formation before the feedback from massive star formation effects the clump. In this study, we search for the spectroscopic signature of inflowing gas toward starless clumps, some of which are massive enough to form a massive star. We observed 101 starless clump candidates identified in the Bolocam Galactic Plane Survey (BGPS) in HCO+ J = 1-0 using the 12m Arizona Radio Observatory telescope. We find a small blue excess of E = (Nblue - Nred)/Ntotal = 0.03 for the complete survey. We identified 6 clumps that are good candidates for inflow motion and used a radiative transfer model to calculate mass inflow rates that range from 500 - 2000 M /Myr. If the observed line profiles are indeed due to large-scale inflow motions, then these clumps will typically double their mass on a free fall time. Our survey finds that massive BGPS starless clump candidates with inflow signatures in HCO+ J = 1-0 are rare throughout our Galaxy. △ Less

Submitted 30 April, 2018; originally announced May 2018.

Comments: 14 pages, 9 figures

arXiv:1804.00202 [pdf, ps, other]

The Generalized Langevin Equation with a power-law memory in a nonlinear potential well

Authors: Nathan Glatt-Holtz, David Herzog, Scott McKinley, Hung Nguyen

Abstract: The generalized Langevin equation (GLE) is a stochastic integro-differential equation that has been used to describe the velocity of microparticles in viscoelastic fluids. In this work, we consider the large-time asymptotic properties of a Markovian approximation to the GLE in the presence of a wide class of external potential wells. The qualitative behavior of the GLE is largely determined by its… ▽ More The generalized Langevin equation (GLE) is a stochastic integro-differential equation that has been used to describe the velocity of microparticles in viscoelastic fluids. In this work, we consider the large-time asymptotic properties of a Markovian approximation to the GLE in the presence of a wide class of external potential wells. The qualitative behavior of the GLE is largely determined by its memory kernel $K$, which summarizes the delayed response of the fluid medium on the particles past movement. When $K$ can be expressed as a finite sum of exponentials, it has been shown that long-term time-averaged properties of the position and velocity do not depend on $K$ at all. In certain applications, however, it is important to consider the GLE with a power law memory kernel. Using the fact that infinite sums of exponentials can have power law tails, we study the infinite-dimensional version of the Markovian GLE in a potential well. In the case where the memory kernel $K$ is integrable (i.e. in the asymptotically diffusive regime), we are able to extend previous results and show that there is a unique stationary distribution for the GLE system and that the long-term statistics of the position and velocity do not depend on $K$. However, when $K$ is not integrable (i.e. in the asymptotically subdiffusive regime), we are able to show the existence of an invariant probability measure but uniqueness remains an open question. In particular, the method of asymptotic coupling used in the integrable case to show uniqueness does not apply when $K$ fails to be integrable. △ Less

Submitted 28 January, 2020; v1 submitted 31 March, 2018; originally announced April 2018.

MSC Class: 60H10

arXiv:1711.00560 [pdf, ps, other]

Anomalous Diffusion and the Generalized Langevin Equation

Authors: Scott A McKinley, Hung D Nguyen

Abstract: The Generalized Langevin Equation (GLE) is a Stochastic Integro-Differential Equation that is commonly used to describe the velocity of microparticles that move randomly in viscoelastic fluids. Such particles commonly exhibit what is known as anomalous subdiffusion, which is to say that their position Mean-Squared Displacement (MSD) scales sublinearly with time. While it is common in the literatur… ▽ More The Generalized Langevin Equation (GLE) is a Stochastic Integro-Differential Equation that is commonly used to describe the velocity of microparticles that move randomly in viscoelastic fluids. Such particles commonly exhibit what is known as anomalous subdiffusion, which is to say that their position Mean-Squared Displacement (MSD) scales sublinearly with time. While it is common in the literature to observe that there is a relationship between the MSD and the memory structure of the GLE, and there exist special cases where explicit solutions exist, this connection has never been fully characterized. Here, we establish a class of memory kernels for which the GLE is well-defined; we investigate the associated regularity properties of solutions; and we prove that large-time asymptotic behavior of the particle MSD is entirely determined by the tail behavior of the GLE's memory kernel. △ Less

Submitted 1 November, 2017; originally announced November 2017.

arXiv:1710.09441 [pdf, other]

High Five: Improving Gesture Recognition by Embracing Uncertainty

Authors: Diman Zad Tootaghaj, Adrian Sampson, Todd Mytkowicz, Kathryn S McKinley

Abstract: Sensors on mobile devices---accelerometers, gyroscopes, pressure meters, and GPS---invite new applications in gesture recognition, gaming, and fitness tracking. However, programming them remains challenging because human gestures captured by sensors are noisy. This paper illustrates that noisy gestures degrade training and classification accuracy for gesture recognition in state-of-the-art determi… ▽ More Sensors on mobile devices---accelerometers, gyroscopes, pressure meters, and GPS---invite new applications in gesture recognition, gaming, and fitness tracking. However, programming them remains challenging because human gestures captured by sensors are noisy. This paper illustrates that noisy gestures degrade training and classification accuracy for gesture recognition in state-of-the-art deterministic Hidden Markov Models (HMM). We introduce a new statistical quantization approach that mitigates these problems by (1) during training, producing gesture-specific codebooks, HMMs, and error models for gesture sequences; and (2) during classification, exploiting the error model to explore multiple feasible HMM state sequences. We implement classification in Uncertain<t>, a probabilistic programming system that encapsulates HMMs and error models and then automates sampling and inference in the runtime. Uncertain<T> developers directly express a choice of application-specific trade-off between recall and precision at gesture recognition time, rather than at training time. We demonstrate benefits in configurability, precision, recall, and recognition on two data sets with 25 gestures from 28 people and 4200 total gestures. Incorporating gesture error more accurately in modeling improves the average recognition rate of 20 gestures from 34\% in prior work to 62\%. Incorporating the error model during classification further improves the average gesture recognition rate to 71\%. As far as we are aware, no prior work shows how to generate an HMM error model during training and use it to improve classification rates. △ Less

Submitted 25 October, 2017; originally announced October 2017.

arXiv:1709.06668 [pdf, other]

Fast and Reliable Autonomous Surgical Debridement with Cable-Driven Robots Using a Two-Phase Calibration Procedure

Authors: Daniel Seita, Sanjay Krishnan, Roy Fox, Stephen McKinley, John Canny, Ken Goldberg

Abstract: Automating precision subtasks such as debridement (removing dead or diseased tissue fragments) with Robotic Surgical Assistants (RSAs) such as the da Vinci Research Kit (dVRK) is challenging due to inherent non-linearities in cable-driven systems. We propose and evaluate a novel two-phase coarse-to-fine calibration method. In Phase I (coarse), we place a red calibration marker on the end effector… ▽ More Automating precision subtasks such as debridement (removing dead or diseased tissue fragments) with Robotic Surgical Assistants (RSAs) such as the da Vinci Research Kit (dVRK) is challenging due to inherent non-linearities in cable-driven systems. We propose and evaluate a novel two-phase coarse-to-fine calibration method. In Phase I (coarse), we place a red calibration marker on the end effector and let it randomly move through a set of open-loop trajectories to obtain a large sample set of camera pixels and internal robot end-effector configurations. This coarse data is then used to train a Deep Neural Network (DNN) to learn the coarse transformation bias. In Phase II (fine), the bias from Phase I is applied to move the end-effector toward a small set of specific target points on a printed sheet. For each target, a human operator manually adjusts the end-effector position by direct contact (not through teleoperation) and the residual compensation bias is recorded. This fine data is then used to train a Random Forest (RF) to learn the fine transformation bias. Subsequent experiments suggest that without calibration, position errors average 4.55mm. Phase I can reduce average error to 2.14mm and the combination of Phase I and Phase II can reduces average error to 1.08mm. We apply these results to debridement of raisins and pumpkin seeds as fragment phantoms. Using an endoscopic stereo camera with standard edge detection, experiments with 120 trials achieved average success rates of 94.5%, exceeding prior results with much larger fragments (89.4%) and achieving a speedup of 2.1x, decreasing time per fragment from 15.8 seconds to 7.3 seconds. Source code, data, and videos are available at https://sites.google.com/view/calib-icra/. △ Less

Submitted 24 February, 2018; v1 submitted 19 September, 2017; originally announced September 2017.

Comments: Code, data, and videos are available at https://sites.google.com/view/calib-icra/. Final version for ICRA 2018

arXiv:1509.03261 [pdf]

doi 10.1122/1.4943988

Maximum Likelihood Estimation for Single Particle, Passive Microrheology Data with Drift

Authors: John W. R. Mellnik, Martin Lysy, Paula A. Vasquez, Natesh S. Pillai, David B. Hill, Jeremy Crib, Scott A. McKinley, M. Gregory Forest

Abstract: Volume limitations and low yield thresholds of biological fluids have led to widespread use of passive microparticle rheology. The mean-squared-displacement (MSD) statistics of bead position time series (bead paths) are either applied directly to determine the creep compliance [Xu et al (1998)] or transformed to determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent hurdle a… ▽ More Volume limitations and low yield thresholds of biological fluids have led to widespread use of passive microparticle rheology. The mean-squared-displacement (MSD) statistics of bead position time series (bead paths) are either applied directly to determine the creep compliance [Xu et al (1998)] or transformed to determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent hurdle arises when there is a non-diffusive experimental drift in the data. Commensurate with the magnitude of drift relative to diffusive mobility, quantified by a Péclet number, the MSD statistics are distorted, and thus the path data must be "corrected" for drift. The standard approach is to estimate and subtract the drift from particle paths, and then calculate MSD statistics. We present an alternative, parametric approach using maximum likelihood estimation that simultaneously fits drift and diffusive model parameters from the path data; the MSD statistics (and consequently the compliance and dynamic moduli) then follow directly from the best-fit model. We illustrate and compare both methods on simulated path data over a range of Péclet numbers, where exact answers are known. We choose fractional Brownian motion as the numerical model because it affords tunable, sub-diffusive MSD statistics consistent with typical 30 second long, experimental observations of microbeads in several biological fluids. Finally, we apply and compare both methods on data from human bronchial epithelial cell culture mucus. △ Less

Submitted 21 February, 2016; v1 submitted 10 September, 2015; originally announced September 2015.

Comments: 29 pages, 12 figures

arXiv:1407.5962 [pdf, other]

Model comparison and assessment for single particle tracking in biological fluids

Authors: Martin Lysy, Natesh S. Pillai, David B. Hill, M. Gregory Forest, John Mellnik, Paula Vasquez, Scott A. McKinley

Abstract: State-of-the-art techniques in passive particle-tracking microscopy provide high-resolution path trajectories of diverse foreign particles in biological fluids. For particles on the order of 1 micron diameter, these paths are generally inconsistent with simple Brownian motion. Yet, despite an abundance of data confirming these findings and their wide-ranging scientific implications, stochastic mod… ▽ More State-of-the-art techniques in passive particle-tracking microscopy provide high-resolution path trajectories of diverse foreign particles in biological fluids. For particles on the order of 1 micron diameter, these paths are generally inconsistent with simple Brownian motion. Yet, despite an abundance of data confirming these findings and their wide-ranging scientific implications, stochastic modeling of the complex particle motion has received comparatively little attention. Even among posited models, there is virtually no literature on likelihood-based inference, model comparisons, and other quantitative assessments. In this article, we develop a rigorous and computationally efficient Bayesian methodology to address this gap. We analyze two of the most prevalent candidate models for 30 second paths of 1 micron diameter tracer particles in human lung mucus: fractional Brownian motion (fBM) and a Generalized Langevin Equation (GLE) consistent with viscoelastic theory. Our model comparisons distinctly favor GLE over fBM, with the former describing the data remarkably well up to the timescales for which we have reliable information. △ Less

Submitted 29 November, 2015; v1 submitted 22 July, 2014; originally announced July 2014.

Comments: 24 pages, 10 figures + supplementary material

MSC Class: 62P10 (Primary)

arXiv:1202.3384 [pdf, other]

doi 10.1073/pnas.1202686109

Sensing and decision-making in random search

Authors: Andrew M. Hein, Scott A. McKinley

Abstract: While microscopic organisms can use gradient-based search to locate resources, this strategy can be poorly suited to the sensory signals available to macroscopic organisms. We propose a framework that models search-decision making in cases where sensory signals are infrequent, subject to large fluctuations, and contain little directional information. Our approach simultaneously models an organism'… ▽ More While microscopic organisms can use gradient-based search to locate resources, this strategy can be poorly suited to the sensory signals available to macroscopic organisms. We propose a framework that models search-decision making in cases where sensory signals are infrequent, subject to large fluctuations, and contain little directional information. Our approach simultaneously models an organism's intrinsic movement behavior (e.g. Levy walk) while allowing this behavior to be adjusted based on sensory data. We find that including even a simple model for signal response can dominate other features of random search and greatly improve search performance. In particular, we show that a lack of signal is not a lack of information. Searchers that receive no signal can quickly abandon target-poor regions. Such phenomena naturally give rise to the area-restricted search behavior exhibited by many searching organisms. △ Less

Submitted 15 February, 2012; originally announced February 2012.

MSC Class: 92D50; 60K40

arXiv:1201.5984 [pdf, ps, other]

Statistical Challenges in Microrheology

Authors: Gustavo Didier, Scott McKinley, David B. Hill, John Fricks

Abstract: Microrheology is the study of the properties of a complex fluid through the diffusion dynamics of small particles, typically latex beads, moving through that material. Currently, it is the dominant technique in the study of the physical properties of biological fluids, of the material properties of membranes or the cytoplasm of cells, or of the entire cell. The theoretical underpinning of microrhe… ▽ More Microrheology is the study of the properties of a complex fluid through the diffusion dynamics of small particles, typically latex beads, moving through that material. Currently, it is the dominant technique in the study of the physical properties of biological fluids, of the material properties of membranes or the cytoplasm of cells, or of the entire cell. The theoretical underpinning of microrheology was given in Mason and Weitz (Physical Review Letters; 1995), who introduced a framework for the use of path data of diffusing particles to infer viscoelastic properties of its fluid environment. The multi-particle tracking techniques that were subsequently developed have presented numerous challenges for experimentalists and theoreticians. This paper describes some specific challenges that await the attention of statisticians and applied probabilists. We describe relevant aspects of the physical theory, current inferential efforts and simulation aspects of a central model for the dynamics of nano-scale particles in viscoelastic fluids, the generalized Langevin equation. △ Less

Submitted 9 February, 2012; v1 submitted 28 January, 2012; originally announced January 2012.

arXiv:1111.0684 [pdf, other]

Asymptotic Analysis of Microtubule-Based Transport by Multiple Identical Molecular Motors

Authors: Scott A. McKinley, Avanti Athreya, John Fricks, Peter R. Kramer

Abstract: We describe a system of stochastic differential equations (SDEs) which model the interaction between processive molecular motors, such as kinesin and dynein, and the biomolecular cargo they tow as part of microtubule-based intracellular transport. We show that the classical experimental environment fits within a parameter regime which is qualitatively distinct from conditions one expects to find i… ▽ More We describe a system of stochastic differential equations (SDEs) which model the interaction between processive molecular motors, such as kinesin and dynein, and the biomolecular cargo they tow as part of microtubule-based intracellular transport. We show that the classical experimental environment fits within a parameter regime which is qualitatively distinct from conditions one expects to find in living cells. Through an asymptotic analysis of our system of SDEs, we develop a means for applying in vitro observations of the nonlinear response by motors to forces induced on the attached cargo to make analytical predictions for two parameter regimes that have thus far eluded direct experimental observation: 1) highly viscous in vivo transport and 2) dynamics when multiple identical motors are attached to the cargo and microtubule. △ Less

Submitted 2 November, 2011; originally announced November 2011.

MSC Class: 92B05 (Primary) 92C05; 60K40 (Secondary)

arXiv:1104.3842 [pdf, other]

Geometric Ergodicity of Two--dimensional Hamiltonian systems with a Lennard--Jones--like Repulsive Potential

Authors: Ben Cooke, David P. Herzog, Jonathan C. Mattingly, Scott A. McKinley, Scott C. Schmidler

Abstract: In this paper we establish the ergodicity of Langevin dynamics for simple two-particle system involving a Lennard-Jones type potential. To the best of our knowledge, this is the first such result for a system operating under this type of potential. Moreover we show that the dynamics are {\it geometrically} ergodic (have a spectral gap) and converge at a geometric rate. Methods from stochastic aver… ▽ More In this paper we establish the ergodicity of Langevin dynamics for simple two-particle system involving a Lennard-Jones type potential. To the best of our knowledge, this is the first such result for a system operating under this type of potential. Moreover we show that the dynamics are {\it geometrically} ergodic (have a spectral gap) and converge at a geometric rate. Methods from stochastic averaging are used to establish the existence of a Lyapunov function. The existence of a Lyapunov function in this setting seems resistant to more traditional approaches. This is a corrected version of the article. △ Less

Submitted 5 July, 2017; v1 submitted 19 April, 2011; originally announced April 2011.

Comments: 20 Pages, 3 Figures. Fixed some typos and improved some explanations. Added some important references which were missing

MSC Class: 60H10; 37A25; 37N05; 74A25

arXiv:0911.4293 [pdf, ps, other]

Anomalous diffusion of distinguished particles in bead-spring networks

Authors: Scott A McKinley

Abstract: We consider the anomalous sub-diffusion of a class of Gaussian processes that can be expressed in terms of sums of Ornstein-Uhlenbeck processes. As a generic class of processes, we introduce a single parameter such that for any $ν\in (0,1)$ the process can be tuned to produce a mean-squared displacement with $\E{x^2(t)} \sim t^ν$ for large $t$. The motivation for the specific structure of thes… ▽ More We consider the anomalous sub-diffusion of a class of Gaussian processes that can be expressed in terms of sums of Ornstein-Uhlenbeck processes. As a generic class of processes, we introduce a single parameter such that for any $ν\in (0,1)$ the process can be tuned to produce a mean-squared displacement with $\E{x^2(t)} \sim t^ν$ for large $t$. The motivation for the specific structure of these sums of OU processes comes from the Rouse chain model from polymer kinetic theory. We generalize the model by studying the general dynamics of individual particles in networks of thermally fluctuating beads connected by Hookean springs. Such a set-up is similar to the study of Kac-Zwanzig heat bath models. Whereas the existing heat bath literature places its assumptions on the spectrum of the Laplacian matrix associated to the spring connection graph, we study explicit graph structures. In this setting we prove a notion of universality for the Rouse chain's well-known $\E{x^2(t)} \sim t^{1/2}$ scaling behavior. Subsequently we demonstrate the existence of other anomalous behavior by changing the dimension of the connection graph or by allowing repulsive forces among the beads. △ Less

Submitted 22 November, 2009; originally announced November 2009.

MSC Class: 60K35 (Primary); 60G22

arXiv:0911.2722 [pdf, ps, other]

A Stochastic Compartmental Model for Fast Axonal Transport

Authors: Lea Popovic, Scott A. McKinley, Michael C. Reed

Abstract: In this paper we develop a probabilistic micro-scale compartmental model and use it to study macro-scale properties of axonal transport, the process by which intracellular cargo is moved in the axons of neurons. By directly modeling the smallest scale interactions, we can use recent microscopic experimental observations to infer all the parameters of the model. Then, using techniques from probabil… ▽ More In this paper we develop a probabilistic micro-scale compartmental model and use it to study macro-scale properties of axonal transport, the process by which intracellular cargo is moved in the axons of neurons. By directly modeling the smallest scale interactions, we can use recent microscopic experimental observations to infer all the parameters of the model. Then, using techniques from probability theory, we compute asymptotic limits of the stochastic behavior of individual motor-cargo complexes, while also characterizing both equilibrium and non-equilibrium ensemble behavior. We use these results in order to investigate three important biological questions: (1) How homogeneous are axons at stochastic equilibrium? (2) How quickly can axons return to stochastic equilibrium after large local perturbations? (3) How is our understanding of delivery time to a depleted target region changed by taking the whole cell point-of-view? △ Less

Submitted 20 May, 2011; v1 submitted 13 November, 2009; originally announced November 2009.

MSC Class: 60J28; 92C05

arXiv:0902.4496 [pdf, ps, other]

Geometric ergodicity of a bead-spring pair with stochastic Stokes forcing

Authors: Jonathan C. Mattingly, Scott A. McKinley, Natesh S. Pillai

Abstract: We consider a simple model for the fluctuating hydrodynamics of a flexible polymer in dilute solution, demonstrating geometric ergodicity for a pair of particles that interact with each other through a nonlinear spring potential while being advected by a stochastic Stokes fluid velocity field. This is a generalization of previous models which have used linear spring forces as well as white-in-time… ▽ More We consider a simple model for the fluctuating hydrodynamics of a flexible polymer in dilute solution, demonstrating geometric ergodicity for a pair of particles that interact with each other through a nonlinear spring potential while being advected by a stochastic Stokes fluid velocity field. This is a generalization of previous models which have used linear spring forces as well as white-in-time fluid velocity fields. We follow previous work combining control theoretic arguments, Lyapunov functions, and hypo-elliptic diffusion theory to prove exponential convergence via a Harris chain argument. In addition we allow the possibility of excluding certain "bad" sets in phase space in which the assumptions are violated but from which the system leaves with a controllable probability. This allows for the treatment of singular drifts, such as those derived from the Lennard-Jones potential, which is a novel feature of this work. △ Less

Submitted 23 July, 2012; v1 submitted 25 February, 2009; originally announced February 2009.

Comments: A number of corrections and improvements. We thank the careful referee for useful suggestions and corrections

MSC Class: 37A30; 76D07; 60H10

Showing 1–28 of 28 results for author: McKinley, S