Skip to main content

Showing 1–50 of 65 results for author: Miller, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15805  [pdf, other

    q-bio.NC cs.AI cs.LG

    DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks

    Authors: Bishal Thapaliya, Robyn Miller, Jiayu Chen, Yu-** Wang, Esra Akbas, Ram Sapkota, Bhaskar Ray, Pranav Suresh, Santosh Ghimire, Vince Calhoun, **gyu Liu

    Abstract: Resting-state functional magnetic resonance imaging (rs-fMRI) is a noninvasive technique pivotal for understanding human neural mechanisms of intricate cognitive processes. Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest, or dynamic functional connectivity matrices with a sliding window approach. These approaches are at risk of oversimpl… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 18 Pages, 4 figures

  2. arXiv:2405.08784  [pdf, other

    cs.CL cs.SI

    Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

    Authors: Aehong Min, Xuan Wang, Rion Brattig Correia, Jordan Rozum, Wendy R. Miller, Luis M. Rocha

    Abstract: We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. Open… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  3. arXiv:2403.14128  [pdf, other

    cs.DB

    Gen-T: Table Reclamation in Data Lakes

    Authors: Grace Fan, Roee Shraga, Renée J. Miller

    Abstract: We introduce the problem of Table Reclamation. Given a Source Table and a large table repository, reclamation finds a set of tables that, when integrated, reproduce the source table as closely as possible. Unlike query discovery problems like Query-by-Example or by-Target, Table Reclamation focuses on reclaiming the data in the Source Table as fully as possible using real tables that may be incomp… ▽ More

    Submitted 22 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: to appear at ICDE 2024

  4. arXiv:2403.02327  [pdf, other

    cs.DB cs.AI

    Model Lakes

    Authors: Koyena Pal, David Bau, Renée J. Miller

    Abstract: Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue o… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  5. arXiv:2402.06751  [pdf, other

    cs.LG

    Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse

    Authors: Bradley T. Baker, Barak A. Pearlmutter, Robyn Miller, Vince D. Calhoun, Sergey M. Plis

    Abstract: Our understanding of learning dynamics of deep neural networks (DNNs) remains incomplete. Recent research has begun to uncover the mathematical principles underlying these networks, including the phenomenon of "Neural Collapse", where linear classifiers within DNNs converge to specific geometrical structures during late-stage training. However, the role of geometric constraints in learning extends… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  6. arXiv:2401.12088  [pdf, other

    cs.CL

    Unsupervised Learning of Graph from Recipes

    Authors: Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

    Abstract: Cooking recipes are one of the most readily available kinds of procedural text. They consist of natural language instructions that can be challenging to interpret. In this paper, we propose a model to identify relevant information from recipes and generate a graph to represent the sequence of actions in the recipe. In contrast with other approaches, we use an unsupervised approach. We iteratively… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  7. arXiv:2401.06930  [pdf, other

    cs.CL

    PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes

    Authors: Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

    Abstract: Decoding the core of procedural texts, exemplified by cooking recipes, is crucial for intelligent reasoning and instruction automation. Procedural texts can be comprehensively defined as a sequential chain of steps to accomplish a task employing resources. From a cooking perspective, these instructions can be interpreted as a series of modifications to a food preparation, which initially comprises… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: The data is available at: https://github.com/adiallo07/PizzaCommonsense

  8. arXiv:2401.06148  [pdf, other

    eess.IV cs.AI cs.CV q-bio.QM

    Artificial Intelligence for Digital and Computational Pathology

    Authors: Andrew H. Song, Guillaume Jaume, Drew F. K. Williamson, Ming Y. Lu, Anurag Vaidya, Tiffany R. Miller, Faisal Mahmood

    Abstract: Advances in digitizing tissue slides and the fast-paced progress in artificial intelligence, including deep learning, have boosted the field of computational pathology. This field holds tremendous potential to automate clinical diagnosis, predict patient prognosis and response to therapy, and discover new morphological biomarkers from tissue images. Some of these artificial intelligence-based syst… ▽ More

    Submitted 12 December, 2023; originally announced January 2024.

    Journal ref: Nature Reviews Bioengineering 2023

  9. arXiv:2312.08383  [pdf

    cs.LG cs.AI

    Improving age prediction: Utilizing LSTM-based dynamic forecasting for data augmentation in multivariate time series analysis

    Authors: Yutong Gao, Charles A. Ellis, Vince D. Calhoun, Robyn L. Miller

    Abstract: The high dimensionality and complexity of neuroimaging data necessitate large datasets to develop robust and high-performing deep learning models. However, the neuroimaging field is notably hampered by the scarcity of such datasets. In this work, we proposed a data augmentation and validation framework that utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to enrich datasets… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 4 PAGES, 3 FIGURES, CONFERENCE

  10. arXiv:2310.16354  [pdf

    cs.AR

    RAMPART: RowHammer Mitigation and Repair for Server Memory Systems

    Authors: Steven C. Woo, Wendy Elsasser, Mike Hamburg, Eric Linstadt, Michael R. Miller, Taeksang Song, James Tringali

    Abstract: RowHammer attacks are a growing security and reliability concern for DRAMs and computer systems as they can induce many bit errors that overwhelm error detection and correction capabilities. System-level solutions are needed as process technology and circuit improvements alone are unlikely to provide complete protection against RowHammer attacks in the future. This paper introduces RAMPART, a nove… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 16 pages, 13 figures. A version of this paper will appear in the Proceedings of MEMSYS23

    ACM Class: B.3.1; B.3.4

  11. arXiv:2310.02656  [pdf, other

    cs.DB

    Blend: A Unified Data Discovery System

    Authors: Mahdi Esmailoghli, Christoph Schnell, Renée J. Miller, Ziawasch Abedjan

    Abstract: Data discovery is an iterative and incremental process that necessitates the execution of multiple data discovery queries to identify the desired tables from large and diverse data lakes. Current methodologies concentrate on single discovery tasks such as join, correlation, or union discovery. However, in practice, a series of these approaches and their corresponding index structures are necessary… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  12. arXiv:2308.03883  [pdf, other

    cs.DB cs.CL cs.LG

    Generative Benchmark Creation for Table Union Search

    Authors: Koyena Pal, Aamod Khatiwada, Roee Shraga, Renée J. Miller

    Abstract: Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the success and adoption of database management systems. But more and more, data management problems are of a semantic nature. An important example is fi… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  13. arXiv:2306.09042  [pdf, ps, other

    cs.AI

    A Graphical Formalism for Commonsense Reasoning with Recipes

    Authors: Antonis Bikakis, Aissatou Diallo, Luke Dickens, Anthony Hunter, Rob Miller

    Abstract: Whilst cooking is a very important human activity, there has been little consideration given to how we can formalize recipes for use in a reasoning framework. We address this need by proposing a graphical formalization that captures the comestibles (ingredients, intermediate food items, and final products), and the actions on comestibles in the form of a labelled bipartite graph. We then propose f… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: 10 pages

  14. DIALITE: Discover, Align and Integrate Open Data Tables

    Authors: Aamod Khatiwada, Roee Shraga, Renée J. Miller

    Abstract: We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the-art table discovery techniques. Second, DIALITE integrates the discovered tables to produce an integrated table. Finally, it allows users to analyze… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: SIGMOD 2023

  15. Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

    Authors: William F. Godoy, Pedro Valero-Lara, T. Elise Dettling, Christian Trefftz, Ian Jorquera, Thomas Sheehy, Ross G. Miller, Marc Gonzalez-Tallada, Jeffrey S. Vetter, Valentin Churavy

    Abstract: We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We comp… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at the 28th HIPS workshop, held in conjunction with IPDPS 2023. 10 pages, 9 figures

  16. arXiv:2303.01430  [pdf, other

    cs.CR

    A Large-Scale Study of Personal Identifiability of Virtual Reality Motion Over Time

    Authors: Mark Roman Miller, Eugy Han, Cyan DeVeaux, Eliot Jones, Ryan Chen, Jeremy N. Bailenson

    Abstract: In recent years, social virtual reality (VR), sometimes described as the "metaverse," has become widely available. With its potential comes risks, including risks to privacy. To understand these risks, we study the identifiability of participants' motion in VR in a dataset of 232 VR users with eight weekly sessions of about thirty minutes each, totaling 764 hours of social interaction. The sample… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 15 pages, 5 figures

  17. arXiv:2301.13095  [pdf, other

    cs.DB

    Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V (Technical Report)

    Authors: Roee Shraga, Renée J. Miller

    Abstract: In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce \texttt{Explain-Da-V}, a framework aiming to explain changes between two give… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: To appear in VLDB 2023

  18. arXiv:2211.10335  [pdf, other

    eess.SP cs.LG

    Large Scale Radio Frequency Wideband Signal Detection & Recognition

    Authors: Luke Boegner, Garrett Vanhoy, Phillip Vallance, Manbir Gulati, Dresden Feitzinger, Bradley Comar, Robert D. Miller

    Abstract: Applications of deep learning to the radio frequency (RF) domain have largely concentrated on the task of narrowband signal classification after the signals of interest have already been detected and extracted from a wideband capture. To encourage broader research with wideband operations, we introduce the WidebandSig53 (WBSig53) dataset which consists of 550 thousand synthetically-generated sampl… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  19. arXiv:2210.03667  [pdf, other

    q-bio.NC cs.LG cs.SI

    CommsVAE: Learning the brain's macroscale communication dynamics using coupled sequential VAEs

    Authors: Eloy Geenjaar, Noah Lewis, Amrit Kashyap, Robyn Miller, Vince Calhoun

    Abstract: Communication within or between complex systems is commonplace in the natural sciences and fields such as graph neural networks. The brain is a perfect example of such a complex system, where communication between brain regions is constantly being orchestrated. To analyze communication, the brain is often split up into anatomical regions that each perform certain computations. These regions must i… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 14 pages, 8 figures

  20. arXiv:2210.01922  [pdf, other

    cs.DB

    Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning

    Authors: Grace Fan, ** Wang, Yuliang Li, Dan Zhang, Renée Miller

    Abstract: Dataset discovery from data lakes is essential in many real application scenarios. In this paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes (with table union search as the main use case). Our proposed framework features a contrastive learning method to train column encoders from pre-trained language models in a fully unsupervised manner. The column encoder o… ▽ More

    Submitted 15 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

  21. arXiv:2209.13589  [pdf, other

    cs.DB

    SANTOS: Relationship-based Semantic Table Union Search

    Authors: Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

    Abstract: Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. Consequently, we introduce a new n… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 15 pages, 10 figures, to appear at SIGMOD 2023

  22. arXiv:2209.09731  [pdf

    cs.DC cs.AR

    Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

    Authors: Wael Elwasif, William Godoy, Nick Hagerty, J. Austin Harris, Oscar Hernandez, Balint Joo, Paul Kent, Damien Lebrun-Grandie, Elijah Maccarthy, Veronica G. Melesse Vergara, Bronson Messer, Ross Miller, Sarp Opal, Sergei Bastrakov, Michael Bussmann, Alexander Debus, Klaus Steinger, Jan Stephan, Rene Widera, Spencer H. Bryngelson, Henry Le Berre, Anand Radhakrishnan, Jefferey Young, Sunita Chandrasekaran, Florina Ciorba , et al. (6 additional authors not shown)

    Abstract: This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The syst… ▽ More

    Submitted 19 December, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

  23. arXiv:2207.09918  [pdf, other

    cs.LG eess.SP

    Large Scale Radio Frequency Signal Classification

    Authors: Luke Boegner, Manbir Gulati, Garrett Vanhoy, Phillip Vallance, Bradley Comar, Silvija Kokalj-Filipovic, Craig Lennon, Robert D. Miller

    Abstract: Existing datasets used to train deep learning models for narrowband radio frequency (RF) signal classification lack enough diversity in signal types and channel impairments to sufficiently assess model performance in the real world. We introduce the Sig53 dataset consisting of 5 million synthetically-generated samples from 53 different signal classes and expertly chosen impairments. We also introd… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  24. arXiv:2205.13640  [pdf, other

    cs.CV

    Spatio-temporally separable non-linear latent factor learning: an application to somatomotor cortex fMRI data

    Authors: Eloy Geenjaar, Amrit Kashyap, Noah Lewis, Robyn Miller, Vince Calhoun

    Abstract: Functional magnetic resonance imaging (fMRI) data contain complex spatiotemporal dynamics, thus researchers have developed approaches that reduce the dimensionality of the signal while extracting relevant and interpretable dynamics. Models of fMRI data that can perform whole-brain discovery of dynamical latent factors are understudied. The benefits of approaches such as linear independent componen… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: 12 pages, 3 figures

  25. arXiv:2201.07552  [pdf, other

    q-bio.QM cs.CY cs.SI stat.CO

    Small Cohort of Epilepsy Patients Showed Increased Activity on Facebook before Sudden Unexpected Death

    Authors: Ian B. Wood, Rion Brattig Correia, Wendy R. Miller, Luis M. Rocha

    Abstract: Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: Submitted to Epilepsy & Behavior

    MSC Class: 62P10 (Primary) 92D50; 68U15; 92D30 (Secondary) ACM Class: J.3; I.5.4

  26. arXiv:2110.10780  [pdf

    cs.CL cs.IR

    An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)

    Authors: Sijia Liu, Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Robert Miller, Andrew Williams, Daniel Harris, Ramakanth Kavuluru, Mei Liu, Noor Abu-el-rub, Dalton Schutte, Rui Zhang, Masoud Rouhizadeh, John D. Osborne, Yongqun He, Umit Topaloglu, Stephanie S Hong, Joel H Saltz, Thomas Schaffter, Emily Pfaff, Christopher G. Chute, Tim Duong, Melissa A. Haendel, Rafael Fuentes , et al. (7 additional authors not shown)

    Abstract: While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algori… ▽ More

    Submitted 21 March, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: update on contents

  27. arXiv:2109.08425  [pdf, ps, other

    cs.AI

    Repurposing of Resources: from Everyday Problem Solving through to Crisis Management

    Authors: Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

    Abstract: The human ability to repurpose objects and processes is universal, but it is not a well-understood aspect of human intelligence. Repurposing arises in everyday situations such as finding substitutes for missing ingredients when cooking, or for unavailable tools when doing DIY. It also arises in critical, unprecedented situations needing crisis management. After natural disasters and during wartime… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: 16 pages

    ACM Class: I.2.4; I.2.6; I.2.7

  28. arXiv:2106.16087  [pdf, other

    eess.SP cs.LG

    Reservoir Based Edge Training on RF Data To Deliver Intelligent and Efficient IoT Spectrum Sensors

    Authors: Silvija Kokalj-Filipovic, Paul Toliver, William Johnson, Rob Miller

    Abstract: Current radio frequency (RF) sensors at the Edge lack the computational resources to support practical, in-situ training for intelligent spectrum monitoring, and sensor data classification in general. We propose a solution via Deep Delay Loop Reservoir Computing (DLR), a processing architecture that supports general machine learning algorithms on compact mobile devices by leveraging delay-loop res… ▽ More

    Submitted 1 April, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:2104.00751

  29. arXiv:2105.08053  [pdf, other

    cs.LG

    Algorithm-Agnostic Explainability for Unsupervised Clustering

    Authors: Charles A. Ellis, Mohammad S. E. Sendi, Eloy P. T. Geenjaar, Sergey M. Plis, Robyn L. Miller, Vince D. Calhoun

    Abstract: Supervised machine learning explainability has developed rapidly in recent years. However, clustering explainability has lagged behind. Here, we demonstrate the first adaptation of model-agnostic explainability methods to explain unsupervised clustering. We present two novel "algorithm-agnostic" explainability methods - global permutation percent change (G2PC) and local perturbation percent change… ▽ More

    Submitted 28 August, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: 22 pages, 6 figures

  30. arXiv:2104.00751  [pdf, other

    cs.LG cs.NE eess.SP

    Reservoir-Based Distributed Machine Learning for Edge Operation

    Authors: Silvija Kokalj-Filipovic, Paul Toliver, William Johnson, Rob Miller

    Abstract: We introduce a novel design for in-situ training of machine learning algorithms built into smart sensors, and illustrate distributed training scenarios using radio frequency (RF) spectrum sensors. Current RF sensors at the Edge lack the computational resources to support practical, in-situ training for intelligent signal classification. We propose a solution using Deepdelay Loop Reservoir Computin… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  31. arXiv:2103.09940  [pdf, other

    cs.DB

    DomainNet: Homograph Detection for Data Lake Disambiguation

    Authors: Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

    Abstract: Modern data lakes are deeply heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes: how can we determine if a data value occurring more than once in the lake has different meanings and is therefore a homograph? While word and entity disambiguation have been well studied in computational linguistics, data management and data science, we sh… ▽ More

    Submitted 22 March, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

    Comments: Full version of paper appearing in EDBT 2021

  32. arXiv:2010.12446  [pdf, other

    eess.IV cs.CV cs.LG

    Estimation of Cardiac Valve Annuli Motion with Deep Learning

    Authors: Eric Kerfoot, Carlos Escudero King, Tefvik Ismail, David Nordsletten, Renee Miller

    Abstract: Valve annuli motion and morphology, measured from non-invasive imaging, can be used to gain a better understanding of healthy and pathological heart function. Measurements such as long-axis strain as well as peak strain rates provide markers of systolic function. Likewise, early and late-diastolic filling velocities are used as indicators of diastolic function. Quantifying global strains, however,… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: 10 pages, STACOM abstract

  33. arXiv:2010.01745  [pdf, other

    cs.CL cs.LG

    On the Effects of Knowledge-Augmented Data in Word Embeddings

    Authors: Diego Ramirez-Echavarria, Antonis Bikakis, Luke Dickens, Rob Miller, Andreas Vlachidis

    Abstract: This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. W… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

    Comments: 10 pages, 5 figures, submitted to ACL 2020

    ACM Class: I.2.7

  34. arXiv:2008.02167  [pdf, other

    cs.DS

    GeoTree: a data structure for constant time geospatial search enabling a real-time mix-adjusted median property price index

    Authors: Robert Miller, Phil Maguire

    Abstract: A common problem appearing across the field of data science is $k$-NN ($k$-nearest neighbours), particularly within the context of Geographic Information Systems. In this article, we present a novel data structure, the GeoTree, which holds a collection of geohashes (string encodings of GPS co-ordinates). This enables a constant $O\left(1\right)$ time search algorithm that returns a set of geohashe… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: 7 pages, 7 figures, 2 tables

  35. arXiv:2008.01208  [pdf, other

    cs.DB

    Knowledge Translation: Extended Technical Report

    Authors: Bahar Ghadiri Bashardoost, Renée J. Miller, Kelly Lyons, Fatemeh Nargesian

    Abstract: We introduce Kensho, a tool for generating map** rules between two Knowledge Bases (KBs). To create the map** rules, Kensho starts with a set of correspondences and enriches them with additional semantic information automatically identified from the structure and constraints of the KBs. Our approach works in two phases. In the first phase, semantic associations between resources of each KB are… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Extended technical report of "Knowledge Translation" paper, accepted in VLDB 2020

  36. arXiv:1910.12603  [pdf, other

    cs.CY cs.CR cs.LG

    A blockchain-orchestrated Federated Learning architecture for healthcare consortia

    Authors: Jonathan Passerat-Palmbach, Tyler Farnan, Robert Miller, Marielle S. Gross, Heather Leigh Flannery, Bill Gleim

    Abstract: We propose a novel architecture for federated learning within healthcare consortia. At the heart of the solution is a unique integration of privacy preserving technologies, built upon native enterprise blockchain components available in the Ethereum ecosystem. We show how the specific characteristics and challenges of healthcare consortia informed our design choices, notably the conception of a ne… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

  37. arXiv:1908.10993  [pdf, other

    cs.CL cs.AI cs.DL

    Scientific Statement Classification over arXiv.org

    Authors: Deyan Ginev, Bruce R. Miller

    Abstract: We introduce a new classification task for scientific statements and release a large-scale dataset for supervised learning. Our resource is derived from a machine-readable representation of the ar** 10.5 million annotated paragraphs into thirteen classes. We demonst… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

  38. arXiv:1907.09052  [pdf, ps, other

    cs.RO math.OC

    Hardware-In-the-Loop for Connected Automated Vehicles Testing in Real Traffic

    Authors: Yeojun Kim, Samuel Tay, Jacopo Guanetti, Francesco Borrelli, Ryan Miller

    Abstract: We present a hardware-in-the-loop (HIL) simulation setup for repeatable testing of Connected Automated Vehicles (CAVs) in dynamic, real-world scenarios. Our goal is to test control and planning algorithms and their distributed implementation on the vehicle hardware and, possibly, in the cloud. The HIL setup combines PreScan for perception sensors, road topography, and signalized intersections; Vis… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

    Comments: This work was presented at the 14th International Symposium in Advanced Vehicle Control (AVEC '18)

  39. arXiv:1904.11874  [pdf, other

    cs.NI cs.LG eess.SP stat.ML

    AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols

    Authors: Silvija Kokalj-Filipovic, Rob Miller, Joshua Morman

    Abstract: We show that compact fully connected (FC) deep learning networks trained to classify wireless protocols using a hierarchy of multiple denoising autoencoders (AEs) outperform reference FC networks trained in a typical way, i.e., with a stochastic gradient based optimization of a given FC architecture. Not only is the complexity of such FC network, measured in number of trainable parameters and scal… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

  40. arXiv:1903.02407  [pdf, other

    cs.LG cs.GT stat.ML

    Explaining Anomalies Detected by Autoencoders Using SHAP

    Authors: Liat Antwarg, Ronnie Mindlin Miller, Bracha Shapira, Lior Rokach

    Abstract: Anomaly detection algorithms are often thought to be limited because they don't facilitate the process of validating results performed by domain experts. In Contrast, deep learning algorithms for anomaly detection, such as autoencoders, point out the outliers, saving experts the time-consuming task of examining normal cases in order to find anomalies. Most outlier detection algorithms output a sco… ▽ More

    Submitted 30 June, 2020; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: Added more evaluation

  41. arXiv:1902.08034  [pdf, other

    eess.SP cs.LG stat.ML

    Mitigation of Adversarial Examples in RF Deep Classifiers Utilizing AutoEncoder Pre-training

    Authors: Silvija Kokalj-Filipovic, Rob Miller, Nicholas Chang, Chi Leung Lau

    Abstract: Adversarial examples in machine learning for images are widely publicized and explored. Illustrations of misclassifications caused by slightly perturbed inputs are abundant and commonly known (e.g., a picture of panda imperceptibly perturbed to fool the classifier into incorrectly labeling it as a gibbon). Similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation… ▽ More

    Submitted 16 February, 2019; originally announced February 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1902.06044

  42. arXiv:1902.06044  [pdf, other

    cs.LG stat.ML

    Adversarial Examples in RF Deep Learning: Detection of the Attack and its Physical Robustness

    Authors: Silvija Kokalj-Filipovic, Rob Miller

    Abstract: While research on adversarial examples in machine learning for images has been prolific, similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work, with only one recent publication in the RF domain [1]. RF adversarial examples (AdExs) can cause drastic, targeted misclassification results mostly in spectrum se… ▽ More

    Submitted 16 February, 2019; originally announced February 2019.

  43. arXiv:1812.07024  [pdf, other

    cs.DB

    Data Lake Organization

    Authors: Fatemeh Nargesian, Ken Q. Pu, Bahar Ghadiri Bashardoost, Erkang Zhu, Renée J. Miller

    Abstract: We consider the problem of creating a navigation structure that allows a user to most effectively navigate a data lake. We define an organization as a graph that contains nodes representing sets of attributes within a data lake and edges indicating subset relationships among nodes. We present a new probabilistic model of how users interact with an organization and define the likelihood of a user f… ▽ More

    Submitted 2 March, 2020; v1 submitted 17 December, 2018; originally announced December 2018.

  44. arXiv:1811.02629  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

    Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

    Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More

    Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

  45. arXiv:1806.01949  [pdf, ps, other

    cs.CE math.NA physics.comp-ph stat.ML

    Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications

    Authors: A. Hunter, B. A. Moore, M. K. Mudunuru, V. T. Chau, R. L. Miller, R. B. Tchoua, C. Nyshadham, S. Karra, D. O. Malley, E. Rougier, H. S. Viswanathan, G. Srinivasan

    Abstract: In this paper, five different approaches for reduced-order modeling of brittle fracture in geomaterials, specifically concrete, are presented and compared. Four of the five methods rely on machine learning (ML) algorithms to approximate important aspects of the brittle fracture problem. In addition to the ML algorithms, each method incorporates different physics-based assumptions in order to reduc… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

    Comments: 25 pages, 8 figures

  46. arXiv:1710.02599  [pdf, other

    cs.HC

    Rotation Blurring: Use of Artificial Blurring to Reduce Cybersickness in Virtual Reality First Person Shooters

    Authors: Pulkit Budhiraja, Mark Roman Miller, Abhishek K Modi, David Forsyth

    Abstract: Users of Virtual Reality (VR) systems often experience vection, the perception of self-motion in the absence of any physical movement. While vection helps to improve presence in VR, it often leads to a form of motion sickness called cybersickness. Cybersickness is a major deterrent to large scale adoption of VR. Prior work has discovered that changing vection (changing the perceived speed or mov… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

  47. arXiv:1703.06815  [pdf, ps, other

    cs.AI

    Foundations for a Probabilistic Event Calculus

    Authors: Fabio Aurelio D'Asaro, Antonis Bikakis, Luke Dickens, Rob Miller

    Abstract: We present PEC, an Event Calculus (EC) style action language for reasoning about probabilistic causal and narrative information. It has an action language style syntax similar to that of the EC variant Modular-E. Its semantics is given in terms of possible worlds which constitute possible evolutions of the domain, and builds on that of EFEC, an epistemic extension of EC. We also describe an ASP im… ▽ More

    Submitted 30 June, 2017; v1 submitted 20 March, 2017; originally announced March 2017.

    Comments: Technical report

  48. arXiv:1702.03447  [pdf, ps, other

    cs.DB cs.LG

    A Collective, Probabilistic Approach to Schema Map**: Appendix

    Authors: Angelika Kimmig, Alex Memory, Renee J. Miller, Lise Getoor

    Abstract: In this appendix we provide additional supplementary material to "A Collective, Probabilistic Approach to Schema Map**." We include an additional extended example, supplementary experiment details, and proof for the complexity result stated in the main paper.

    Submitted 11 February, 2017; originally announced February 2017.

    Comments: This is the appendix to the paper "A Collective, Probabilistic Approach to Schema Map**" accepted to ICDE 2017

  49. arXiv:1603.07410  [pdf, other

    cs.DB

    LSH Ensemble: Internet-Scale Domain Search

    Authors: Erkang Zhu, Fatemeh Nargesian, Ken Q. Pu, Renée J. Miller

    Abstract: We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our choice of Jaccard set containment over Jaccard similarity makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarit… ▽ More

    Submitted 23 July, 2016; v1 submitted 23 March, 2016; originally announced March 2016.

    Comments: To appear in VLDB 2016

    ACM Class: H.2.5; H.3.3; H.3.1

  50. arXiv:1507.00524  [pdf, ps, other

    cs.DL

    Strategies for Parallel Markup

    Authors: Bruce R. Miller

    Abstract: Cross-referenced parallel markup for mathematics allows the combination of both presentation and content representations while associating the components of each. Interesting applications are enabled by such an arrangement, such as interaction with parts of the presentation to manipulate and querying the corresponding content, and enhanced search indexing. Although the idea of such markup is hardl… ▽ More

    Submitted 2 July, 2015; originally announced July 2015.