Skip to main content

Showing 1–35 of 35 results for author: Grossman, R

.
  1. arXiv:2404.15475  [pdf, ps, other

    cs.IR

    An Annotated Glossary for Data Commons, Data Meshes, and Other Data Platforms

    Authors: Robert L. Grossman

    Abstract: Cloud-based data commons, data meshes, data hubs, and other data platforms are important ways to manage, analyze and share data to accelerate research and to support reproducible research. This is an annotated glossary of some of the more common terms used in articles and discussions about these platforms.

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 6 pages

  2. arXiv:2311.05659  [pdf, other

    cs.LG cs.AI

    Enhancing Instance-Level Image Classification with Set-Level Labels

    Authors: Renyu Zhang, Aly A. Khan, Yuxin Chen, Robert L. Grossman

    Abstract: Instance-level image classification tasks have traditionally relied on single-instance labels to train models, e.g., few-shot learning and transfer learning. However, set-level coarse-grained labels that capture relationships among instances can provide richer information in real-world scenarios. In this paper, we present a novel approach to enhance instance-level image classification by leveragin… ▽ More

    Submitted 17 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  3. arXiv:2306.09276  [pdf, other

    math.GT

    Knot Mosaics with Corner Connection Tiles

    Authors: Aaron Heap, Una Donovan, Riley Grossman, Nickolas Laine, Connor McDermott, Marcus Paone, Drew Southcott

    Abstract: A knot mosaic is a representation of a knot or link on a square grid using a collection of tiles that are either blank or contain a portion of the knot diagram. Traditionally, a piece of the knot on one tile connects to a piece of the knot on an adjacent tile at a connection point that is located at the midpoint of a tile edge. We introduce a new set of tiles in which the connection points are loc… ▽ More

    Submitted 2 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    MSC Class: 57K10

    Journal ref: Pi Mu Epsilon J. 15, No. 9, 553-568 (2023)

  4. arXiv:2302.02425  [pdf, ps, other

    q-bio.OT

    Principles and Guidelines for Sharing Biomedical Data for Secondary Use: The University of Chicago Perspective

    Authors: Robert L. Grossman, Maryellen L. Giger, Julie A. Johnson, Jeremy D. Marks, Jessica P. Ridgway, Julian Solway, Walter M. Stadler

    Abstract: Academic medical centers are generating an increasing amount of biomedical data and there is an increasing demand for biomedical data for research purposes by research projects, research consortia, companies, and other third parties. At the same time, as the number of patients grows and the amount of data per patient grows, there is an increasing possibility that some information about some patien… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: 6 pages

  5. arXiv:2211.06522  [pdf

    eess.IV cs.CV q-bio.QM

    Deep Learning Generates Synthetic Cancer Histology for Explainability and Education

    Authors: James M. Dolezal, Rachelle Wolk, Hanna M. Hieromnimon, Frederick M. Howard, Andrew Srisuwananukorn, Dmitry Karpeyev, Siddhi Ramesh, Sara Kochanny, Jung Woo Kwon, Meghana Agni, Richard C. Simon, Chandni Desai, Raghad Kherallah, Tung D. Nguyen, Jefree J. Schulte, Kimberly Cole, Galina Khramtsova, Marina Chiara Garassino, Aliya N. Husain, Huihua Li, Robert Grossman, Nicole A. Cipriani, Alexander T. Pearson

    Abstract: Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic fea… ▽ More

    Submitted 9 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

  6. arXiv:2207.11167  [pdf, ps, other

    cs.DC

    Ten Lessons for Data Sharing With a Data Commons

    Authors: Robert L. Grossman

    Abstract: A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using the elastic scalability provided by cloud computing and to share data securely and compliantly, and, in this way, accelerate the pace of research. Over the past… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  7. arXiv:2203.05097  [pdf

    cs.DC

    A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

    Authors: Robert L. Grossman, Rebecca R. Boyles, Brandi N. Davis-Dusenbery, Amanda Haddock, Allison P. Heath, Brian D. O'Connor, Adam C. Resnick, Deanne M. Taylor, Stan Ahalt

    Abstract: As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms, as a growing amount of data is being hosted in cloud-based platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies… ▽ More

    Submitted 15 February, 2024; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 16 pages with 2 figures

    ACM Class: D.2.11; D.2.12; E.0

  8. arXiv:2202.04204  [pdf

    stat.AP q-bio.PE

    The Absurdity of Death Estimates Based on the Vaccine Adverse Event Reporting System

    Authors: Gordon V Cormack, Maura R Grossman

    Abstract: We demonstrate from first principles a core fallacy employed by a coterie of authors who claim that data from the Vaccine Adverse Reporting System (VAERS) show that hundreds of thousands of U.S. deaths are attributable to COVID vaccination.

    Submitted 8 February, 2022; originally announced February 2022.

  9. arXiv:2112.13737  [pdf, other

    cs.LG cs.AI

    Scalable Batch-Mode Deep Bayesian Active Learning via Equivalence Class Annealing

    Authors: Renyu Zhang, Aly A. Khan, Robert L. Grossman, Yuxin Chen

    Abstract: Active learning has demonstrated data efficiency in many fields. Existing active learning algorithms, especially in the context of batch-mode deep Bayesian active models, rely heavily on the quality of uncertainty estimations of the model, and are often challenging to scale to large batches. In this paper, we propose Batch-BALanCe, a scalable batch-mode active learning algorithm, which combines in… ▽ More

    Submitted 20 February, 2023; v1 submitted 27 December, 2021; originally announced December 2021.

  10. arXiv:2109.13908  [pdf

    cs.IR

    The eDiscovery Medicine Show

    Authors: Maura R. Grossman, Gordon V. Cormack

    Abstract: The practice of bloodletting gradually fell into disfavor as a growing body of scientific evidence showed its ineffectiveness and demonstrated the effectiveness of various pharmaceuticals for the prevention and treatment of certain diseases. At the same time, the patent medicine industry promoted ineffective remedies at medicine shows featuring entertainment, testimonials, and pseudo-scientific cl… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: To appear in Ohio State Technology Law Journal 18:1 (2021)

  11. arXiv:2011.01453  [pdf, other

    cs.IR

    Participation in TREC 2020 COVID Track Using Continuous Active Learning

    Authors: Xue Jun Wang, Maura R. Grossman, Seung Gyu Hyun

    Abstract: We describe our participation in all five rounds of the TREC 2020 COVID Track (TREC-COVID). The goal of TREC-COVID is to contribute to the response to the COVID-19 pandemic by identifying answers to many pressing questions and building infrastructure to improve search systems [8]. All five rounds of this Track challenged participants to perform a classic ad-hoc search task on the new data collecti… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: 7 pages, 5 figures

  12. arXiv:2007.09526  [pdf, ps, other

    math.RA math.DS

    The realization of input-output maps using bialgebras

    Authors: Robert L. Grossman, Richard G. Larson

    Abstract: We use the theory of bialgebras to provide the algebraic background for state space realization theorems for input-output maps of control systems. This allows us to consider from a common viewpoint classical results about formal state space realizations of nonlinear systems and more recent results involving analysis related to families of trees. If $H$ is a bialgebra, we say that $p \in H^*$ is di… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: 16 pages

    MSC Class: 93B15 (Primary) 16T10; 93C10; 05C05 (Secondary)

  13. arXiv:1809.01699  [pdf

    q-bio.GN cs.CY

    Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data

    Authors: Robert L. Grossman

    Abstract: Data commons collate data with cloud computing infrastructure and commonly used software services, tools and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize and share large scale genomics datasets. Data ecosystems can be built by interopera… ▽ More

    Submitted 24 December, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: 28 pages, 4 figures

  14. arXiv:1803.08988  [pdf, other

    cs.IR

    Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval

    Authors: Haotian Zhang, Gordon V. Cormack, Maura R. Grossman, Mark D. Smucker

    Abstract: This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance feedback. Under the weak assumption that more time and effort is required to review an entire document than a single sentence, simulation results indicat… ▽ More

    Submitted 27 March, 2019; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: 25 pages

  15. arXiv:1708.08123  [pdf, other

    cs.IR cs.CL

    Impact of Feature Selection on Micro-Text Classification

    Authors: Ankit Vadehra, Maura R. Grossman, Gordon V. Cormack

    Abstract: Social media datasets, especially Twitter tweets, are popular in the field of text classification. Tweets are a valuable source of micro-text (sometimes referred to as "micro-blogs"), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others. Tweets often include keywords referred to as "Hashtags" that can be used as labels for th… ▽ More

    Submitted 27 August, 2017; originally announced August 2017.

    Comments: 4 pages, 6 figures

  16. arXiv:1703.01692  [pdf, other

    stat.ME

    Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrap** (NB2)

    Authors: Maria T Patterson, Robert L Grossman

    Abstract: We introduce a method called neighbor-based bootstrap** (NB2) that can be used to quantify the geospatial variation of a variable. We applied this method to an analysis of the incidence rates of disease from electronic medical record data (ICD-9 codes) for approximately 100 million individuals in the US over a period of 8 years. We considered the incidence rate of disease in each county and its… ▽ More

    Submitted 5 March, 2017; originally announced March 2017.

  17. arXiv:1604.02608  [pdf, other

    cs.CY cs.DC

    A Case for Data Commons: Towards Data Science as a Service

    Authors: Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, Walt Wells

    Abstract: As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data storage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scienti… ▽ More

    Submitted 9 April, 2016; originally announced April 2016.

  18. The Matsu Wheel: A Cloud-based Framework for Efficient Analysis and Reanalysis of Earth Satellite Imagery

    Authors: Maria T Patterson, Nikolas Anderson, Collin Bennett, Jacob Bruggemann, Robert Grossman, Matthew Handy, Vuong Ly, Dan Mandl, Shane Pederson, Jim Pivarski, Ray Powell, Jonathan Spring, Walt Wells

    Abstract: Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on develo** open source technology for the cloud-based processing of Earth satellite imagery. A particular focus is the development of applications for detecting fires and floods to help support natural disaster detection and relief. Project Matsu has developed an open source cloud-based infrastructure to proce… ▽ More

    Submitted 22 February, 2016; originally announced February 2016.

    Comments: 10 pages, accepted for presentation to IEEE BigDataService 2016

  19. arXiv:1601.00323  [pdf, other

    cs.CE

    The Design of a Community Science Cloud: The Open Science Data Cloud Perspective

    Authors: Robert L. Grossman, Matthew Greenway, Allison P. Heath, Ray Powell, Rafael D. Suarez, Walt Wells, Kevin White, Malcolm Atkinson, Iraklis Klampanos, Heidi L. Alvarez, Christine Harvey, Joe J. Mambretti

    Abstract: In this paper we describe the design, and implementation of the Open Science Data Cloud, or OSDC. The goal of the OSDC is to provide petabyte-scale data cloud infrastructure and related services for scientists working with large quantities of data. Currently, the OSDC consists of more than 2000 cores and 2 PB of storage distributed across four data centers connected by 10G networks. We discuss som… ▽ More

    Submitted 3 January, 2016; originally announced January 2016.

    Comments: 12 pages, 3 figures

  20. arXiv:1504.06868  [pdf, other

    cs.IR cs.LG

    Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review

    Authors: Gordon V. Cormack, Maura R. Grossman

    Abstract: We enhance the autonomy of the continuous active learning method shown by Cormack and Grossman (SIGIR 2014) to be effective for technology-assisted review, in which documents from a collection are retrieved and reviewed, using relevance feedback, until substantially all of the relevant documents have been reviewed. Autonomy is enhanced through the elimination of topic-specific and dataset-specific… ▽ More

    Submitted 26 April, 2015; originally announced April 2015.

  21. arXiv:1108.1819  [pdf, other

    gr-qc

    Faster computation of adiabatic EMRIs using resonances

    Authors: Rebecca Grossman, Janna Levin, Gabe Perez-Giz

    Abstract: Motivated by the prohibitive computational cost of producing adiabatic extreme mass ratio inspirals, we explain how a judicious use of resonant orbits can dramatically expedite both that calculation and the generation of snapshot gravitational waves from geodesic sources. In the course of our argument, we clarify the resolution of a lingering debate on the appropriate adiabatic averaging prescript… ▽ More

    Submitted 8 August, 2011; originally announced August 2011.

    Comments: 30 pages, 7 figures. Submitted to Phys. Rev. D

  22. The harmonic structure of generic Kerr orbits

    Authors: Rebecca Grossman, Janna Levin, Gabe Perez-Giz

    Abstract: Generic Kerr orbits exhibit intricate three-dimensional motion. We offer a classification scheme for these intricate orbits in terms of periodic orbits. The crucial insight is that for a given effective angular momentum $L$ and angle of inclination $ι$, there exists a discrete set of orbits that are geometrically $n$-leaf clovers in a precessing {\it orbital plane}. When viewed in the full three d… ▽ More

    Submitted 29 May, 2011; originally announced May 2011.

    Comments: 14 pages, 8 figures. Submitted to Phys. Rev. D

  23. arXiv:1007.1261  [pdf, other

    cs.DC

    MalStone: Towards A Benchmark for Analytics on Large Data Clouds

    Authors: Collin Bennett, Robert L. Grossman, David Locke, Jonathan Seidman, Steve Vejcik

    Abstract: Develo** data mining algorithms that are suitable for cloud computing platforms is currently an active area of research, as is develo** cloud computing platforms appropriate for data mining. Currently, the most common benchmark for cloud computing is the Terasort (and related) benchmarks. Although the Terasort Benchmark is quite useful, it was not designed for data mining per se. In this paper… ▽ More

    Submitted 7 July, 2010; originally announced July 2010.

  24. arXiv:0907.4810  [pdf

    cs.DC

    The Open Cloud Testbed: A Wide Area Testbed for Cloud Computing Utilizing High Performance Network Services

    Authors: Robert Grossman, Yunhong Gu, Michal Sabala, Collin Bennet, Jonathan Seidman, Joe Mambratti

    Abstract: Recently, a number of cloud platforms and services have been developed for data intensive computing, including Hadoop, Sector, CloudStore (formerly KFS), HBase, and Thrift. In order to benchmark the performance of these systems, to investigate their interoperability, and to experiment with new services based on flexible compute node and network provisioning capabilities, we have designed and imp… ▽ More

    Submitted 27 July, 2009; originally announced July 2009.

  25. arXiv:0901.2735  [pdf, ps, other

    stat.ML math.RA math.ST

    State Space Realization Theorems For Data Mining

    Authors: Robert L Grossman, Richard G Larson

    Abstract: In this paper, we consider formal series associated with events, profiles derived from events, and statistical models that make predictions about events. We prove theorems about realizations for these formal series using the language and tools of Hopf algebras.

    Submitted 18 January, 2009; originally announced January 2009.

    MSC Class: 62A01; 16W30

  26. Dynamics of Black Hole Pairs II: Spherical Orbits and the Homoclinic Limit of Zoom-Whirliness

    Authors: Rebecca Grossman, Janna Levin

    Abstract: Spinning black hole pairs exhibit a range of complicated dynamical behaviors. An interest in eccentric and zoom-whirl orbits has ironically inspired the focus of this paper: the constant radius orbits. When black hole spins are misaligned, the constant radius orbits are not circles but rather lie on the surface of a sphere and have acquired the name "spherical orbits". The spherical orbits are s… ▽ More

    Submitted 23 November, 2008; originally announced November 2008.

    Comments: 16 pages, several figures

    Journal ref: Phys.Rev.D79:043017,2009

  27. arXiv:0809.1181  [pdf

    cs.DC

    Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data

    Authors: Yunhong Gu, Robert L Grossman

    Abstract: Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. In contrast to existing storage and compute clouds, Sector can manage data not only within a data center, but also ac… ▽ More

    Submitted 16 January, 2009; v1 submitted 6 September, 2008; originally announced September 2008.

  28. arXiv:0808.3019  [pdf, other

    cs.DC

    Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere

    Authors: Robert L Grossman, Yunhong Gu

    Abstract: We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it p… ▽ More

    Submitted 21 August, 2008; originally announced August 2008.

  29. arXiv:0808.1802  [pdf, other

    cs.DC

    Compute and Storage Clouds Using Wide Area High Performance Networks

    Authors: Robert L. Grossman, Yunhong Gu, Michael Sabala, Wanzhi Zhang

    Abstract: We describe a cloud based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure consists of a storage cloud called Sector and a compute cloud called Sphere. We describe two applications that we have built using the cloud and some experimental studies.

    Submitted 13 August, 2008; originally announced August 2008.

  30. arXiv:0711.3877  [pdf, ps, other

    math.RA math.CO

    Hopf-algebraic structures of families of trees

    Authors: R. L. Grossman, R. G. Larson

    Abstract: Description of cocommutative Hopf algebras associated with families of trees. Applications include Cayley's theorem on the number of rooted trees with n nodes, and Catalan's theorem on the number of rooted ordered trees with n nodes.

    Submitted 24 November, 2007; originally announced November 2007.

    Comments: 29 pages

    MSC Class: 16W30

    Journal ref: J. Algebra, 126 (1989), 184-210

  31. arXiv:0711.3875  [pdf, ps, other

    math.RA math.CO

    An Overview of Hopf Algebras of Trees and Their Actions on Functions

    Authors: Robert L. Grossman, Richard G. Larson

    Abstract: We provide an expository account of some of the Hopf algebras that can be defined using trees, labeled trees, ordered trees and heap ordered trees. We also describe some actions of these Hopf algebras on algebra of functions.

    Submitted 24 November, 2007; originally announced November 2007.

    MSC Class: 16W30; 05C05

  32. arXiv:0706.1327  [pdf, ps, other

    math.RA

    Hopf Algebras of Heap Ordered Trees and Permutations

    Authors: R. L. Grossman, R. G. Larson

    Abstract: It is known that there is a Hopf algebra structure on the vector space with basis all heap-ordered trees. We give a new bialgebra structure on the space with basis all permutations and show that there is a direct bialgebra isomorphism between the Hopf algebra of heap-ordered trees and the bialgebra of permutations.

    Submitted 14 November, 2007; v1 submitted 9 June, 2007; originally announced June 2007.

    Comments: 10 pages LaTeX, minor revision

    MSC Class: 16W30

  33. Benefits of Artificially Generated Gravity Gradients for Interferometric Gravitational-Wave Detectors

    Authors: L. Matone, P. Raffai, S. Marka, R. Grossman, P. Kalmus, Z. Marka, J. Rollins, V. Sannibale

    Abstract: We present an approach to experimentally evaluate gravity gradient noise, a potentially limiting noise source in advanced interferometric gravitational wave (GW) detectors. In addition, the method can be used to provide sub-percent calibration in phase and amplitude of modern interferometric GW detectors. Knowledge of calibration to such certainties shall enhance the scientific output of the ins… ▽ More

    Submitted 24 January, 2007; originally announced January 2007.

    Comments: 16 pages, 4 figures

    Report number: LIGO-P060056-00-Z

    Journal ref: Class.Quant.Grav.24:2217-2230,2007

  34. arXiv:math/0409006  [pdf, ps, other

    math.QA

    Differential Algebra Structures on Familes of Trees

    Authors: Robert L Grossman, Richard G Larson

    Abstract: It is known that the vector space spanned by labeled rooted trees forms a Hopf algebra. Let k be a field and let R be a commutative k-algebra. Let H denote the Hopf algebra of rooted trees labeled using derivations D in Der(R). In this paper, we introduce a construction which gives R a H-module algebra structure and show this induces a differential algebra structure of H acting on R. The work he… ▽ More

    Submitted 31 August, 2004; originally announced September 2004.

    Comments: 31 pages, 8 figures

  35. Thermal and Non-thermal Plasmas in the Galaxy Cluster 3C 129

    Authors: H. Krawczynski, D. E. Harris, R. Grossman, W. Lane, N. Kassim, A. G. Willis

    Abstract: We describe new Chandra spectroscopy data of the cluster which harbors the prototypical "head tail" radio galaxy 3C 129 and the weaker radio galaxy 3C 129.1. We combined the Chandra data with Very Large Array (VLA) radio data taken at 0.33, 5, and 8 GHz (archival data) and 1.4 GHz (new data). We also obtained new HI observations at the Dominion Radio Astrophysical Observatory (DRAO) to measure t… ▽ More

    Submitted 23 July, 2003; v1 submitted 3 February, 2003; originally announced February 2003.

    Comments: Accepted for publication in MNRAS. Refereed manuscript. 14 pages, 8 figures, additional panel of Fig. 3 shows asymmetric ICM distribution

    Journal ref: Mon.Not.Roy.Astron.Soc. 345 (2003) 1255