-
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
Authors:
Subhabrata Mukherjee,
Paul Gamble,
Markel Sanz Ausin,
Neel Kant,
Kriti Aggarwal,
Neha Manjunath,
Debajyoti Datta,
Zhengliang Liu,
Jiayuan Ding,
Sophia Busacca,
Cezanne Bianco,
Swapnil Sharma,
Rae Lasko,
Michelle Voisard,
Sanchay Harneja,
Darya Filippova,
Gerry Meixiong,
Kevin Cha,
Amir Youssefi,
Meyhaa Buvanesh,
Howard Weingram,
Sebastian Bierman-Lytle,
Harpreet Singh Mangat,
Kim Parikh,
Saad Godil
, et al. (1 additional authors not shown)
Abstract:
We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful pr…
▽ More
We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful primary agent that focuses on driving an engaging conversation and several specialist support agents focused on healthcare tasks performed by nurses to increase safety and reduce hallucinations. We develop a sophisticated training protocol for iterative co-training of the agents that optimize for diverse objectives. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents. We align our models to speak like medical professionals, using organic healthcare conversations and simulated ones between patient actors and experienced nurses. This allows our system to express unique capabilities such as rapport building, trust building, empathy and bedside manner. Finally, we present the first comprehensive clinician evaluation of an LLM system for healthcare. We recruited over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system by posing as patients and rating the system on several measures. We demonstrate Polaris performs on par with human nurses on aggregate across dimensions such as medical safety, clinical readiness, conversational quality, and bedside manner. Additionally, we conduct a challenging task-based evaluation of the individual specialist support agents, where we demonstrate our LLM agents significantly outperform a much larger general-purpose LLM (GPT-4) as well as from its own medium-size class (LLaMA-2 70B).
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
X-COBOL: A Dataset of COBOL Repositories
Authors:
Mir Sameed Ali,
Nikhil Manjunath,
Sridhar Chimalakonda
Abstract:
Despite being proposed as early as 1959, COBOL (Common Business-Oriented Language) still predominantly acts as an integral part of the majority of operations of several financial, banking, and governmental organizations. To support the inevitable modernization and maintenance of legacy systems written in COBOL, it is essential for organizations, researchers, and developers to understand the nature…
▽ More
Despite being proposed as early as 1959, COBOL (Common Business-Oriented Language) still predominantly acts as an integral part of the majority of operations of several financial, banking, and governmental organizations. To support the inevitable modernization and maintenance of legacy systems written in COBOL, it is essential for organizations, researchers, and developers to understand the nature and source code of COBOL programs. However, to the best of our knowledge, we are unaware of any dataset that provides data on COBOL software projects, motivating the need for the dataset. Thus, to aid empirical research on comprehending COBOL in open-source repositories, we constructed a dataset of 84 COBOL repositories mined from GitHub, containing rich metadata on the development cycle of the projects. We envision that researchers can utilize our dataset to study COBOL projects' evolution, code properties and develop tools to support their development. Our dataset also provides 1255 COBOL files present inside the mined repositories. The dataset and artifacts are available at https://doi.org/10.5281/zenodo.7968845.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Adaptive Testing for Specification Coverage
Authors:
Ezio Bartocci,
Roderick Bloem,
Benedikt Maderbacher,
Niveditha Manjunath,
Dejan Ničković
Abstract:
Ensuring correctness of cyber-physical systems (CPS) is an extremely challenging task that is in practice often addressed with simulation based testing. Formal specification languages, such as Signal Temporal Logic (STL), are used to mathematically express CPS requirements and thus render the simulation activity more systematic and principled. We propose a novel method for adaptive generation of t…
▽ More
Ensuring correctness of cyber-physical systems (CPS) is an extremely challenging task that is in practice often addressed with simulation based testing. Formal specification languages, such as Signal Temporal Logic (STL), are used to mathematically express CPS requirements and thus render the simulation activity more systematic and principled. We propose a novel method for adaptive generation of tests with specification coverage for STL. To achieve this goal, we devise cooperative reachability games that we combine with numerical optimization to create tests that explore the system in a way that exercise various parts of the specification. To the best of our knowledge our approach is the first adaptive testing approach that can be applied directly to MATLAB\texttrademark\; Simulink/Stateflow models. We implemented our approach in a prototype tool and evaluated it on several illustrating examples and a case study from the avionics domain, demonstrating the effectiveness of adaptive testing to (1) incrementally build a test case that reaches a test objective, (2) generate a test suite that increases the specification coverage, and (3) infer what part of the specification is actually implemented.
△ Less
Submitted 26 January, 2021; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Automatic Failure Explanation in CPS Models
Authors:
Ezio Bartocci,
Niveditha Manjunath,
Leonardo Mariani,
Cristinel Mateis,
Dejan Ničković
Abstract:
Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious tas…
▽ More
Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious task left to the experience and domain knowledge of the designer. \n In this paper, we propose CPSDebug, a novel approach that by combining testing, specification mining, and failure analysis, can automatically explain failures in Simulink/Stateflow models. We evaluate CPSDebug on two case studies, involving two use scenarios and several classes of faults, demonstrating the potential value of our approach.
△ Less
Submitted 29 March, 2019;
originally announced March 2019.