Data Commons
Authors:
Ramanathan V. Guha,
Prashanth Radhakrishnan,
Bo Xu,
Wei Sun,
Carolyn Au,
Ajai Tirumali,
Muhammad J. Amjad,
Samantha Piekos,
Natalie Diaz,
Jennifer Chen,
Julia Wu,
Prem Ramaswami,
James Manyika
Abstract:
Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC)) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This da…
▽ More
Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC)) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be joined easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This Knowledge Graph can then be searched over using Natural Language questions utilizing advances in Large Language Models. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
Machine-learned epidemiology: real-time detection of foodborne illness at scale
Authors:
Adam Sadilek,
Stephanie Caty,
Lauren DiPrete,
Raed Mansour,
Tom Schenk Jr,
Mark Bergtholdt,
Ashish Jha,
Prem Ramaswami,
Evgeniy Gabrilovich
Abstract:
Machine learning has become an increasingly powerful tool for solving complex problems, and its application in public health has been underutilized. The objective of this study is to test the efficacy of a machine-learned model of foodborne illness detection in a real-world setting. To this end, we built FINDER, a machine-learned model for real-time detection of foodborne illness using anonymous a…
▽ More
Machine learning has become an increasingly powerful tool for solving complex problems, and its application in public health has been underutilized. The objective of this study is to test the efficacy of a machine-learned model of foodborne illness detection in a real-world setting. To this end, we built FINDER, a machine-learned model for real-time detection of foodborne illness using anonymous and aggregated web search and location data. We computed the fraction of people who visited a particular restaurant and later searched for terms indicative of food poisoning to identify potentially unsafe restaurants. We used this information to focus restaurant inspections in two cities and demonstrated that FINDER improves the accuracy of health inspections; restaurants identified by FINDER are 3.1 times as likely to be deemed unsafe during the inspection as restaurants identified by existing methods. Additionally, FINDER enables us to ascertain previously intractable epidemiological information, for example, in 38% of cases the restaurant potentially causing food poisoning was not the last one visited, which may explain the lower precision of complaint-based inspections. We found that FINDER is able to reliably identify restaurants that have an active lapse in food safety, allowing for implementation of corrective actions that would prevent the potential spread of foodborne illness.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.