Search | arXiv e-print repository

Polaris: A Safety-focused LLM Constellation Architecture for Healthcare

Authors: Subhabrata Mukherjee, Paul Gamble, Markel Sanz Ausin, Neel Kant, Kriti Aggarwal, Neha Manjunath, Debajyoti Datta, Zhengliang Liu, Jiayuan Ding, Sophia Busacca, Cezanne Bianco, Swapnil Sharma, Rae Lasko, Michelle Voisard, Sanchay Harneja, Darya Filippova, Gerry Meixiong, Kevin Cha, Amir Youssefi, Meyhaa Buvanesh, Howard Weingram, Sebastian Bierman-Lytle, Harpreet Singh Mangat, Kim Parikh, Saad Godil , et al. (1 additional authors not shown)

Abstract: We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful pr… ▽ More We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful primary agent that focuses on driving an engaging conversation and several specialist support agents focused on healthcare tasks performed by nurses to increase safety and reduce hallucinations. We develop a sophisticated training protocol for iterative co-training of the agents that optimize for diverse objectives. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents. We align our models to speak like medical professionals, using organic healthcare conversations and simulated ones between patient actors and experienced nurses. This allows our system to express unique capabilities such as rapport building, trust building, empathy and bedside manner. Finally, we present the first comprehensive clinician evaluation of an LLM system for healthcare. We recruited over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system by posing as patients and rating the system on several measures. We demonstrate Polaris performs on par with human nurses on aggregate across dimensions such as medical safety, clinical readiness, conversational quality, and bedside manner. Additionally, we conduct a challenging task-based evaluation of the individual specialist support agents, where we demonstrate our LLM agents significantly outperform a much larger general-purpose LLM (GPT-4) as well as from its own medium-size class (LLaMA-2 70B). △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2311.09528 [pdf, other]

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

Authors: Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev

Abstract: Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various as… ▽ More Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various aspects that make responses helpful. Specifically, our 37k-sample dataset has annotations for correctness, coherence, complexity, and verbosity in addition to overall helpfulness of responses. Training Llama 2 70B using the HelpSteer dataset with SteerLM technique produces a model that scores 7.54 on MT Bench, which is currently the highest score for open models that do not require training data from more powerful models (e.g. GPT4). We release this dataset with CC-BY-4.0 license at https://huggingface.co/datasets/nvidia/HelpSteer △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2210.14767 [pdf, ps, other]

Stabilization of Energy-Conserving Gaits for Point-Foot Planar Bipeds

Authors: Aakash Khandelwal, Nilay Kant, Ranjan Mukherjee

Abstract: The problem of designing and stabilizing impact-free, energy-conserving gaits is considered for underactuated, point-foot planar bipeds. Virtual holonomic constraints are used to design energy-conserving gaits. A desired gait corresponds to a periodic hybrid orbit and is stabilized using the Impulse Controlled Poincaré Map approach. Numerical simulations for the case of a five-link biped demonstra… ▽ More The problem of designing and stabilizing impact-free, energy-conserving gaits is considered for underactuated, point-foot planar bipeds. Virtual holonomic constraints are used to design energy-conserving gaits. A desired gait corresponds to a periodic hybrid orbit and is stabilized using the Impulse Controlled Poincaré Map approach. Numerical simulations for the case of a five-link biped demonstrate convergence to a desired gait from arbitrary initial conditions. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: 6 pages, 6 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2205.07000 [pdf, other]

doi 10.1109/DAC18074.2021.9586094

PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning

Authors: Rajarshi Roy, Jonathan Raiman, Neel Kant, Ilyas Elkin, Robert Kirby, Michael Siu, Stuart Oberman, Saad Godil, Bryan Catanzaro

Abstract: In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for const… ▽ More In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: ACM/IEEE Design Automation Conference (DAC), 2021, pp. 853-858

arXiv:2202.05819 [pdf, ps, other]

Nonprehensile Manipulation of a Stick Using Impulsive Forces

Authors: Aakash Khandelwal, Nilay Kant, Ranjan Mukherjee

Abstract: The problem of nonprehensile manipulation of a stick in three-dimensional space using intermittent impulsive forces is considered. The objective is to juggle the stick between a sequence of configurations that are rotationally symmetric about the vertical axis. The dynamics of the stick is described by five generalized coordinates and three control inputs. Between two consecutive configurations wh… ▽ More The problem of nonprehensile manipulation of a stick in three-dimensional space using intermittent impulsive forces is considered. The objective is to juggle the stick between a sequence of configurations that are rotationally symmetric about the vertical axis. The dynamics of the stick is described by five generalized coordinates and three control inputs. Between two consecutive configurations where impulsive inputs are applied, the dynamics is conveniently represented by a Poincaré map in the reference frame of the juggler. Stabilization of the orbit associated with a desired juggling motion is accomplished by stabilizing a fixed point on the Poincaré map. The Impulse Controlled Poincaré Map approach is used to stabilize the orbit, and numerical simulations are used to demonstrate convergence to the desired juggling motion from an arbitrary initial configuration. In the limiting case, where consecutive rotationally symmetric configurations are chosen arbitrarily close, it is shown that the dynamics reduces to that of steady precession of the stick on a hoop. △ Less

Submitted 6 July, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

Comments: This work has been submitted for possible publication. This version submitted to Nonlinear Dynamics on 28 Jun 2022

arXiv:2101.00408 [pdf, other]

End-to-End Training of Neural Retrievers for Open-Domain Question Answering

Authors: Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei **, William L Hamilton, Bryan Catanzaro

Abstract: Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse… ▽ More Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse Cloze Task and masked salient spans, followed by supervised finetuning using question-context pairs. This approach leads to absolute gains of 2+ points over the previous best result in the top-20 retrieval accuracy on Natural Questions and TriviaQA datasets. We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models. In the first approach, the reader considers each retrieved document separately while in the second approach, the reader considers all the retrieved documents together. Our experiments demonstrate the effectiveness of these approaches as we obtain new state-of-the-art results. On the Natural Questions dataset, we obtain a top-20 retrieval accuracy of 84, an improvement of 5 points over the recent DPR model. In addition, we achieve good results on answer extraction, outperforming recent models like REALM and RAG by 3+ points. We further scale up end-to-end training to large models and show consistent gains in performance over smaller models. △ Less

Submitted 1 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

Comments: ACL 2021

arXiv:1912.12345 [pdf, other]

Synthetic Datasets for Neural Program Synthesis

Authors: Richard Shin, Neel Kant, Kavi Gupta, Christopher Bender, Brandon Trabucco, Rishabh Singh, Dawn Song

Abstract: The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior. Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill. However, we empirically discover that applying test… ▽ More The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior. Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill. However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions; to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance. △ Less

Submitted 27 December, 2019; originally announced December 2019.

Comments: ICLR 2019

arXiv:1905.10615 [pdf, other]

Adversarial Policies: Attacking Deep Reinforcement Learning

Authors: Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell

Abstract: Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent's observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environ… ▽ More Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent's observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at https://adversarialpolicies.github.io/. △ Less

Submitted 17 January, 2021; v1 submitted 25 May, 2019; originally announced May 2019.

Comments: Presented at ICLR 2020

ACM Class: I.2.6

arXiv:1812.01207 [pdf, other]

Practical Text Classification With Large Pre-Trained Language Models

Authors: Neel Kant, Raul Puri, Nikolai Yakovenko, Bryan Catanzaro

Abstract: Multi-emotion sentiment classification is a natural language processing (NLP) problem with valuable use cases on real-world data. We demonstrate that large-scale unsupervised language modeling combined with finetuning offers a practical solution to this task on difficult datasets, including those with label class imbalance and domain-specific context. By training an attention-based Transformer net… ▽ More Multi-emotion sentiment classification is a natural language processing (NLP) problem with valuable use cases on real-world data. We demonstrate that large-scale unsupervised language modeling combined with finetuning offers a practical solution to this task on difficult datasets, including those with label class imbalance and domain-specific context. By training an attention-based Transformer network (Vaswani et al. 2017) on 40GB of text (Amazon reviews) (McAuley et al. 2015) and fine-tuning on the training set, our model achieves a 0.69 F1 score on the SemEval Task 1:E-c multi-dimensional emotion classification problem (Mohammad et al. 2018), based on the Plutchik wheel of emotions (Plutchik 1979). These results are competitive with state of the art models, including strong F1 scores on difficult (emotion) categories such as Fear (0.73), Disgust (0.77) and Anger (0.78), as well as competitive results on rare categories such as Anticipation (0.42) and Surprise (0.37). Furthermore, we demonstrate our application on a real world text classification task. We create a narrowly collected text dataset of real tweets on several topics, and show that our finetuned model outperforms general purpose commercially available APIs for sentiment and multidimensional emotion classification on this dataset by a significant margin. We also perform a variety of additional studies, investigating properties of deep learning architectures, datasets and algorithms for achieving practical multidimensional sentiment classification. Overall, we find that unsupervised language modeling and finetuning is a simple framework for achieving high quality results on real-world sentiment classification. △ Less

Submitted 3 December, 2018; originally announced December 2018.

Comments: 8 pages, submitted to AAAI 2019

arXiv:1802.02353 [pdf, other]

Recent Advances in Neural Program Synthesis

Authors: Neel Kant

Abstract: In recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recogniti… ▽ More In recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recognition and speech translation, since its abstract nature and demand for rigor make it difficult even for human minds to attempt. While it is still far from being solved or even competitive with most existing methods, neural program synthesis is a rapidly growing discipline which holds great promise if completely realized. In this paper, we start with exploring the problem statement and challenges of program synthesis. Then, we examine the fascinating evolution of program induction models, along with how they have succeeded, failed and been reimagined since. Finally, we conclude with a contrastive look at program synthesis and future research recommendations for the field. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Comments: 16 pages (without citations); Literature Review

Showing 1–10 of 10 results for author: Kant, N