-
Satyrn: A Platform for Analytics Augmented Generation
Authors:
Marko Sterbentz,
Cameron Barrie,
Shubham Shahi,
Abhratanu Dutta,
Donna Hooshmand,
Harper Pack,
Kristian J. Hammond
Abstract:
Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way th…
▽ More
Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% accurate claims while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Lightweight Knowledge Representations for Automating Data Analysis
Authors:
Marko Sterbentz,
Cameron Barrie,
Donna Hooshmand,
Shubham Shahi,
Abhratanu Dutta,
Harper Pack,
Andong Li Zhao,
Andrew Paley,
Alexander Einarsson,
Kristian Hammond
Abstract:
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this w…
▽ More
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.
△ Less
Submitted 15 October, 2023;
originally announced November 2023.
-
Requirements for Open Political Information: Transparency Beyond Open Data
Authors:
Andong Luis Li Zhao,
Andrew Paley,
Rachel Adler,
Harper Pack,
Sergio Servantez,
Alexander Einarsson,
Cameron Barrie,
Marko Sterbentz,
Kristian Hammond
Abstract:
A politically informed citizenry is imperative for a welldeveloped democracy. While the US government has pursued policies for open data, these efforts have been insufficient in achieving an open government because only people with technical and domain knowledge can access information in the data. In this work, we conduct user interviews to identify wants and needs among stakeholders. We further u…
▽ More
A politically informed citizenry is imperative for a welldeveloped democracy. While the US government has pursued policies for open data, these efforts have been insufficient in achieving an open government because only people with technical and domain knowledge can access information in the data. In this work, we conduct user interviews to identify wants and needs among stakeholders. We further use this information to sketch out the foundational requirements for a functional political information technical system.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Hands-On Universe: A Global Program for Education and Public Outreach in Astronomy
Authors:
M. Boer,
H. Pack,
C. Pennypacker,
A. L. Melchior,
S. Faye,
T. Ebisuzaki
Abstract:
Hands-On Universe (HOU) is an educational program that enables students to investigate the Universe while applying tools and concepts from science, math, and technology. Using the Internet, HOU participants around the world request observations from an automated telescope, download images from a large image archive, and analyze them with the aid of user-friendly image processing software. This p…
▽ More
Hands-On Universe (HOU) is an educational program that enables students to investigate the Universe while applying tools and concepts from science, math, and technology. Using the Internet, HOU participants around the world request observations from an automated telescope, download images from a large image archive, and analyze them with the aid of user-friendly image processing software. This program is develo** now in many countries, including the USA, France, Germany, Sweden, Japan, Australia, and others. A network of telescopes has been established among these countries, many of them remotely operated, as shown in the accompanying demo. Using this feature, students in the classroom are able to make night observations during the day, using a telescope placed in another country. An archive of images taken on large telescopes is also accessible, as well as resources for teachers. Students are also dealing with real research projects, e.g. the search for asteroids, which resulted in the discovery of a Kuiper Belt object by high-school students. Not only Hands-On Universe gives the general public an access to professional astronomy, but it is also a more general tool to demonstrate the use of a complex automated system, the techniques of data processing and automation. Last but not least, through the use of telescopes located in many countries over the globe, a form of powerful and genuine cooperation between teachers and children from various countries is promoted, with a clear educational goal.
△ Less
Submitted 21 September, 2001;
originally announced September 2001.