-
Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II -- Clustering Extremely High-Dimensional Grid-Based Data
Authors:
Chandrika Kamath,
Juliette S. Franzman
Abstract:
Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spat…
▽ More
Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spatial domain is represented by a large number of grid points, numbering in the millions, the clustering of the data becomes more challenging. In this report, we consider output data from simulations of a jet interacting with high explosives. These data are available on spatial domains of different sizes, at grid points that vary in their spatial coordinates, and in a format that distributes the output across multiple files at each time step of the simulation. We first describe how we bring these data into a consistent format prior to clustering. Borrowing the idea of random projections from data mining, we reduce the dimension of our data by a factor of thousand, making it possible to use the iterative k-means method for clustering. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set. Our approach makes clustering of extremely high dimensional data tractable, generating meaningful cluster assignments for our problem, despite the approximation introduced in the random projections.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part I -- Analysis with a Small Sample Size
Authors:
Chandrika Kamath,
Juliette S. Franzman,
Brian H. Daub
Abstract:
Computer simulations, especially of complex phenomena, can be expensive, requiring high-performance computing resources. Often, to understand a phenomenon, multiple simulations are run, each with a different set of simulation input parameters. These data are then used to create an interpolant, or surrogate, relating the simulation outputs to the corresponding inputs. When the inputs and outputs ar…
▽ More
Computer simulations, especially of complex phenomena, can be expensive, requiring high-performance computing resources. Often, to understand a phenomenon, multiple simulations are run, each with a different set of simulation input parameters. These data are then used to create an interpolant, or surrogate, relating the simulation outputs to the corresponding inputs. When the inputs and outputs are scalars, a simple machine learning model can suffice. However, when the simulation outputs are vector valued, available at locations in two or three spatial dimensions, often with a temporal component, creating a surrogate is more challenging. In this report, we use a two-dimensional problem of a jet interacting with high explosives to understand how we can build high-quality surrogates. The characteristics of our data set are unique - the vector-valued outputs from each simulation are available at over two million spatial locations; each simulation is run for a relatively small number of time steps; the size of the computational domain varies with each simulation; and resource constraints limit the number of simulations we can run. We show how we analyze these extremely large data-sets, set the parameters for the algorithms used in the analysis, and use simple ways to improve the accuracy of the spatio-temporal surrogates without substantially increasing the number of simulations required.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Data Mining for Faster, Interpretable Solutions to Inverse Problems: A Case Study Using Additive Manufacturing
Authors:
Chandrika Kamath,
Juliette Franzman,
Ravi Ponmalai
Abstract:
Solving inverse problems, where we find the input values that result in desired values of outputs, can be challenging. The solution process is often computationally expensive and it can be difficult to interpret the solution in high-dimensional input spaces. In this paper, we use a problem from additive manufacturing to address these two issues with the intent of making it easier to solve inverse…
▽ More
Solving inverse problems, where we find the input values that result in desired values of outputs, can be challenging. The solution process is often computationally expensive and it can be difficult to interpret the solution in high-dimensional input spaces. In this paper, we use a problem from additive manufacturing to address these two issues with the intent of making it easier to solve inverse problems and exploit their results. First, focusing on Gaussian process surrogates that are used to solve inverse problems, we describe how a simple modification to the idea of tapering can substantially speed up the surrogate without losing accuracy in prediction. Second, we demonstrate that Kohonen self-organizing maps can be used to visualize and interpret the solution to the inverse problem in the high-dimensional input space. For our data set, as not all input dimensions are equally important, we show that using weighted distances results in a better organized map that makes the relationships among the inputs obvious.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis
Authors:
Chandrika Kamath
Abstract:
Sampling techniques are used in many fields, including design of experiments, image processing, and graphics. The techniques in each field are designed to meet the constraints specific to that field such as uniform coverage of the range of each dimension or random samples that are at least a certain distance apart from each other. When an application imposes new constraints, for example, by requir…
▽ More
Sampling techniques are used in many fields, including design of experiments, image processing, and graphics. The techniques in each field are designed to meet the constraints specific to that field such as uniform coverage of the range of each dimension or random samples that are at least a certain distance apart from each other. When an application imposes new constraints, for example, by requiring samples in a non-rectangular domain or the addition of new samples to an existing set, a common solution is to modify the algorithm currently in use, often with less than satisfactory results. As an alternative, we propose the concept of intelligent sampling, where we devise algorithms specifically tailored to meet our sampling needs, either by creating new algorithms or by modifying suitable algorithms from other fields. Surprisingly, both qualitative and quantitative comparisons indicate that some relatively simple algorithms can be easily modified to meet the many sampling requirements of surrogate modeling, hyperparameter optimization, and data analysis; these algorithms outperform their more sophisticated counterparts currently in use, resulting in better use of time and computer resources.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Classification of Orbits in Poincaré Maps using Machine Learning
Authors:
Chandrika Kamath
Abstract:
Poincaré plots, also called Poincaré maps, are used by plasma physicists to understand the behavior of magnetically confined plasma in numerical simulations of a tokamak. These plots are created by the intersection of field lines with a two-dimensional poloidal plane that is perpendicular to the axis of the torus representing the tokamak. A plot is composed of multiple orbits, each created by a di…
▽ More
Poincaré plots, also called Poincaré maps, are used by plasma physicists to understand the behavior of magnetically confined plasma in numerical simulations of a tokamak. These plots are created by the intersection of field lines with a two-dimensional poloidal plane that is perpendicular to the axis of the torus representing the tokamak. A plot is composed of multiple orbits, each created by a different field line as it goes around the torus. Each orbit can have one of four distinct shapes, or classes, that indicate changes in the topology of the magnetic fields confining the plasma. Given the (x,y) coordinates of the points that form an orbit, the analysis task is to assign a class to the orbit, a task that appears ideally suited for a machine learning approach. In this paper, we describe how we overcame two major challenges in solving this problem - creating a high-quality training set, with few mislabeled orbits, and converting the coordinates of the points into features that are discriminating, despite the variation within the orbits of a class and the apparent similarities between orbits of different classes. Our automated approach is not only more objective and accurate than visual classification, but is also less tedious, making it easier for plasma physicists to analyze the topology of magnetic fields from numerical simulations of the tokamak.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants
Authors:
Shaotong Zhu,
Michael Wan,
Elaheh Hatamimajoumerd,
Kashish Jain,
Samuel Zlota,
Cholpady Vikram Kamath,
Cassandra B. Rowan,
Emma C. Grace,
Matthew S. Goodwin,
Marie J. Hayes,
Rebecca A. Schwartz-Mette,
Emily Zimmerman,
Sarah Ostadabbas
Abstract:
We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of…
▽ More
We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity. Our NNS activity segmentation algorithm solves this problem by identifying periods of NNS with high certainty -- up to 94.0\% average precision and 84.9\% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants. Our method is based on an underlying NNS action recognition algorithm, which uses spatiotemporal deep learning networks and infant-specific pose estimation, achieving 94.9\% accuracy in binary classification of 960 2.5 s balanced NNS vs. non-NNS clips. Tested on our second, independent, and public NNS in-the-wild dataset, NNS recognition classification reaches 92.3\% accuracy, and NNS segmentation achieves 90.8\% precision and 84.2\% recall.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification
Authors:
Jai Gupta,
Yi Tay,
Chaitanya Kamath,
Vinh Q. Tran,
Donald Metzler,
Shailesh Bavadekar,
Mimi Sun,
Evgeniy Gabrilovich
Abstract:
With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic. Given the protection they provide, vaccines are becoming mandatory in certain social and professional settings. This paper presents a classification model for detecting COVID-19 vaccination related search queries, a machine learning model that is used to gener…
▽ More
With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic. Given the protection they provide, vaccines are becoming mandatory in certain social and professional settings. This paper presents a classification model for detecting COVID-19 vaccination related search queries, a machine learning model that is used to generate search insights for COVID-19 vaccinations. The proposed method combines and leverages advancements from modern state-of-the-art (SOTA) natural language understanding (NLU) techniques such as pretrained Transformers with traditional dense features. We propose a novel approach of considering dense features as memory tokens that the model can attend to. We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task, improving a strong well-established gradient-boosting baseline by relative +15% improvement in F1 score and +14% in precision.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Vaccine Search Patterns Provide Insights into Vaccination Intent
Authors:
Sean Malahy,
Mimi Sun,
Keith Spangler,
Jessica Leibler,
Kevin Lane,
Shailesh Bavadekar,
Chaitanya Kamath,
Akim Kumok,
Yuantong Sun,
Jai Gupta,
Tague Griffith,
Adam Boulanger,
Mark Young,
Charlotte Stanton,
Yael Mayer,
Karen Smith,
Tomer Shekel,
Katherine Chou,
Greg Corrado,
Jonathan Levy,
Adam Szpiro,
Evgeniy Gabrilovich,
Gregory A Wellenius
Abstract:
Despite ample supply of COVID-19 vaccines, the proportion of fully vaccinated individuals remains suboptimal across much of the US. Rapid vaccination of additional people will prevent new infections among both the unvaccinated and the vaccinated, thus saving lives. With the rapid rollout of vaccination efforts this year, the internet has become a dominant source of information about COVID-19 vacci…
▽ More
Despite ample supply of COVID-19 vaccines, the proportion of fully vaccinated individuals remains suboptimal across much of the US. Rapid vaccination of additional people will prevent new infections among both the unvaccinated and the vaccinated, thus saving lives. With the rapid rollout of vaccination efforts this year, the internet has become a dominant source of information about COVID-19 vaccines, their safety and efficacy, and their availability. We sought to evaluate whether trends in internet searches related to COVID-19 vaccination - as reflected by Google's Vaccine Search Insights (VSI) index - could be used as a marker of population-level interest in receiving a vaccination. We found that between January and August of 2021: 1) Google's weekly VSI index was associated with the number of new vaccinations administered in the subsequent three weeks, and 2) the average VSI index in earlier months was strongly correlated (up to r = 0.89) with vaccination rates many months later. Given these results, we illustrate an approach by which data on search interest may be combined with other available data to inform local public health outreach and vaccination efforts. These results suggest that the VSI index may be useful as a leading indicator of population-level interest in or intent to obtain a COVID-19 vaccine, especially early in the vaccine deployment efforts. These results may be relevant to current efforts to administer COVID-19 vaccines to unvaccinated individuals, to newly eligible children, and to those eligible to receive a booster shot. More broadly, these results highlight the opportunities for anonymized and aggregated internet search data, available in near real-time, to inform the response to public health emergencies.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Google COVID-19 Vaccination Search Insights: Anonymization Process Description
Authors:
Shailesh Bavadekar,
Adam Boulanger,
John Davis,
Damien Desfontaines,
Evgeniy Gabrilovich,
Krishna Gadepalli,
Badih Ghazi,
Tague Griffith,
Jai Gupta,
Chaitanya Kamath,
Dennis Kraft,
Ravi Kumar,
Akim Kumok,
Yael Mayer,
Pasin Manurangsi,
Arti Patankar,
Irippuge Milinda Perera,
Chris Scott,
Tomer Shekel,
Benjamin Miller,
Karen Smith,
Charlotte Stanton,
Mimi Sun,
Mark Young,
Gregory Wellenius
Abstract:
This report describes the aggregation and anonymization process applied to the COVID-19 Vaccination Search Insights (published at http://goo.gle/covid19vaccinationinsights), a publicly available dataset showing aggregated and anonymized trends in Google searches related to COVID-19 vaccination. The applied anonymization techniques protect every user's daily search activity related to COVID-19 vacc…
▽ More
This report describes the aggregation and anonymization process applied to the COVID-19 Vaccination Search Insights (published at http://goo.gle/covid19vaccinationinsights), a publicly available dataset showing aggregated and anonymized trends in Google searches related to COVID-19 vaccination. The applied anonymization techniques protect every user's daily search activity related to COVID-19 vaccinations with $(\varepsilon, δ)$-differential privacy for $\varepsilon = 2.19$ and $δ= 10^{-5}$.
△ Less
Submitted 7 July, 2021; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Randomized Algorithms for Scientific Computing (RASC)
Authors:
Aydin Buluc,
Tamara G. Kolda,
Stefan M. Wild,
Mihai Anitescu,
Anthony DeGennaro,
John Jakeman,
Chandrika Kamath,
Ramakrishnan Kannan,
Miles E. Lopes,
Per-Gunnar Martinsson,
Kary Myers,
Jelani Nelson,
Juan M. Restrepo,
C. Seshadhri,
Draguna Vrabie,
Brendt Wohlberg,
Stephen J. Wright,
Chao Yang,
Peter Zwart
Abstract:
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc…
▽ More
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
△ Less
Submitted 21 March, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (version 1.0)
Authors:
Shailesh Bavadekar,
Andrew Dai,
John Davis,
Damien Desfontaines,
Ilya Eckstein,
Katie Everett,
Alex Fabrikant,
Gerardo Flores,
Evgeniy Gabrilovich,
Krishna Gadepalli,
Shane Glass,
Rayman Huang,
Chaitanya Kamath,
Dennis Kraft,
Akim Kumok,
Hinali Marfatia,
Yael Mayer,
Benjamin Miller,
Adam Pearce,
Irippuge Milinda Perera,
Venky Ramachandran,
Karthik Raman,
Thomas Roessler,
Izhak Shafran,
Tomer Shekel
, et al. (5 additional authors not shown)
Abstract:
This report describes the aggregation and anonymization process applied to the initial version of COVID-19 Search Trends symptoms dataset (published at https://goo.gle/covid19symptomdataset on September 2, 2020), a publicly available dataset that shows aggregated, anonymized trends in Google searches for symptoms (and some related topics). The anonymization process is designed to protect the daily…
▽ More
This report describes the aggregation and anonymization process applied to the initial version of COVID-19 Search Trends symptoms dataset (published at https://goo.gle/covid19symptomdataset on September 2, 2020), a publicly available dataset that shows aggregated, anonymized trends in Google searches for symptoms (and some related topics). The anonymization process is designed to protect the daily symptom search activity of every user with $\varepsilon$-differential privacy for $\varepsilon$ = 1.68.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Impacts of Social Distancing Policies on Mobility and COVID-19 Case Growth in the US
Authors:
Gregory A. Wellenius,
Swapnil Vispute,
Valeria Espinosa,
Alex Fabrikant,
Thomas C. Tsai,
Jonathan Hennessy,
Andrew Dai,
Brian Williams,
Krishna Gadepalli,
Adam Boulanger,
Adam Pearce,
Chaitanya Kamath,
Arran Schlosberg,
Catherine Bendebury,
Chinmoy Mandayam,
Charlotte Stanton,
Shailesh Bavadekar,
Christopher Pluntke,
Damien Desfontaines,
Benjamin Jacobson,
Zan Armstrong,
Bryant Gipson,
Royce Wilson,
Andrew Widdowson,
Katherine Chou
, et al. (4 additional authors not shown)
Abstract:
Social distancing remains an important strategy to combat the COVID-19 pandemic in the United States. However, the impacts of specific state-level policies on mobility and subsequent COVID-19 case trajectories have not been completely quantified. Using anonymized and aggregated mobility data from opted-in Google users, we found that state-level emergency declarations resulted in a 9.9% reduction i…
▽ More
Social distancing remains an important strategy to combat the COVID-19 pandemic in the United States. However, the impacts of specific state-level policies on mobility and subsequent COVID-19 case trajectories have not been completely quantified. Using anonymized and aggregated mobility data from opted-in Google users, we found that state-level emergency declarations resulted in a 9.9% reduction in time spent away from places of residence. Implementation of one or more social distancing policies resulted in an additional 24.5% reduction in mobility the following week, and subsequent shelter-in-place mandates yielded an additional 29.0% reduction. Decreases in mobility were associated with substantial reductions in case growth 2 to 4 weeks later. For example, a 10% reduction in mobility was associated with a 17.5% reduction in case growth 2 weeks later. Given the continued reliance on social distancing policies to limit the spread of COVID-19, these results may be helpful to public health officials trying to balance infection control with the economic and social consequences of these policies.
△ Less
Submitted 27 May, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Google COVID-19 Community Mobility Reports: Anonymization Process Description (version 1.1)
Authors:
Ahmet Aktay,
Shailesh Bavadekar,
Gwen Cossoul,
John Davis,
Damien Desfontaines,
Alex Fabrikant,
Evgeniy Gabrilovich,
Krishna Gadepalli,
Bryant Gipson,
Miguel Guevara,
Chaitanya Kamath,
Mansi Kansal,
Ali Lange,
Chinmoy Mandayam,
Andrew Oplinger,
Christopher Pluntke,
Thomas Roessler,
Arran Schlosberg,
Tomer Shekel,
Swapnil Vispute,
Mia Vu,
Gregory Wellenius,
Brian Williams,
Royce J Wilson
Abstract:
This document describes the aggregation and anonymization process applied to the initial version of Google COVID-19 Community Mobility Reports (published at http://google.com/covid19/mobility on April 2, 2020), a publicly available resource intended to help public health authorities understand what has changed in response to work-from-home, shelter-in-place, and other recommended policies aimed at…
▽ More
This document describes the aggregation and anonymization process applied to the initial version of Google COVID-19 Community Mobility Reports (published at http://google.com/covid19/mobility on April 2, 2020), a publicly available resource intended to help public health authorities understand what has changed in response to work-from-home, shelter-in-place, and other recommended policies aimed at flattening the curve of the COVID-19 pandemic. Our anonymization process is designed to ensure that no personal data, including an individual's location, movement, or contacts, can be derived from the resulting metrics.
The high-level description of the procedure is as follows: we first generate a set of anonymized metrics from the data of Google users who opted in to Location History. Then, we compute percentage changes of these metrics from a baseline based on the historical part of the anonymized metrics. We then discard a subset which does not meet our bar for statistical reliability, and release the rest publicly in a format that compares the result to the private baseline.
△ Less
Submitted 3 November, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Blind Detection of Ultra-faint Streaks with a Maximum Likelihood Method
Authors:
William A. Dawson,
Michael D. Schneider,
Chandrika Kamath
Abstract:
We have developed a maximum likelihood source detection method capable of detecting ultra-faint streaks with surface brightnesses approximately an order of magnitude fainter than the pixel level noise. Our maximum likelihood detection method is a model based approach that requires no a priori knowledge about the streak location, orientation, length, or surface brightness. This method enables disco…
▽ More
We have developed a maximum likelihood source detection method capable of detecting ultra-faint streaks with surface brightnesses approximately an order of magnitude fainter than the pixel level noise. Our maximum likelihood detection method is a model based approach that requires no a priori knowledge about the streak location, orientation, length, or surface brightness. This method enables discovery of typically undiscovered objects, and enables the utilization of low-cost sensors (i.e., higher-noise data). The method also easily facilitates multi-epoch co-addition. We will present the results from the application of this method to simulations, as well as real low earth orbit observations.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.