-
Classification of Non-Degenerate Symmetric Bilinear and Quadratic Forms in the Verlinde Category $\mathrm{Ver}_4^+$
Authors:
Iz Chen,
Arun S. Kannan,
Krishna Pothapragada
Abstract:
Although Deligne's theorem classifies all symmetric tensor categories (STCs) with moderate growth over algebraically closed fields of characteristic zero, the classification does not extend to positive characteristic. At the forefront of the study of STCs is the search for an analog to Deligne's theorem in positive characteristic, and it has become increasingly apparent that the Verlinde categorie…
▽ More
Although Deligne's theorem classifies all symmetric tensor categories (STCs) with moderate growth over algebraically closed fields of characteristic zero, the classification does not extend to positive characteristic. At the forefront of the study of STCs is the search for an analog to Deligne's theorem in positive characteristic, and it has become increasingly apparent that the Verlinde categories are to play a significant role. Moreover, these categories are largely unstudied, but have already shown very interesting phenomena as both a generalization of and a departure from superalgebra and supergeometry. In this paper, we study $\mathrm{Ver}_4^+$, the simplest non-trivial Verlinde category in characteristic $2$. In particular, we classify all isomorphism classes of non-degenerate symmetric bilinear forms and non-degenerate quadratic forms and study the associated Witt semi-ring that arises from the addition and multiplication operations on bilinear forms.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold Learning
Authors:
Shay Deutsch,
Lionel Yelibi,
Alex Tong Lin,
Arjun Ravi Kannan
Abstract:
Deriving meaningful representations from complex, high-dimensional data in unsupervised settings is crucial across diverse machine learning applications. This paper introduces a framework for multi-scale graph network embedding based on spectral graph wavelets that employs a contrastive learning approach. A significant feature of the proposed embedding is its capacity to establish a correspondence…
▽ More
Deriving meaningful representations from complex, high-dimensional data in unsupervised settings is crucial across diverse machine learning applications. This paper introduces a framework for multi-scale graph network embedding based on spectral graph wavelets that employs a contrastive learning approach. A significant feature of the proposed embedding is its capacity to establish a correspondence between the embedding space and the input feature space which aids in deriving feature importance of the original features. We theoretically justify our approach and demonstrate that, in Paley-Wiener spaces on combinatorial graphs, the spectral graph wavelets operator offers greater flexibility and better control over smoothness properties compared to the Laplacian operator. We validate the effectiveness of our proposed graph embedding on a variety of public datasets through a range of downstream tasks, including clustering and unsupervised feature importance.
△ Less
Submitted 5 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
The Steinberg Tensor Product Theorem for General Linear Group Schemes in the Verlinde Category
Authors:
Arun S. Kannan
Abstract:
The Steinberg tensor product theorem is a fundamental result in the modular representation theory of reductive algebraic groups. It describes any finite-dimensional simple module of highest weight $λ$ over such a group as the tensor product of Frobenius twists of simple modules with highest weights the weights appearing in a $p$-adic decomposition of $λ$, thereby reducing the character problem to…
▽ More
The Steinberg tensor product theorem is a fundamental result in the modular representation theory of reductive algebraic groups. It describes any finite-dimensional simple module of highest weight $λ$ over such a group as the tensor product of Frobenius twists of simple modules with highest weights the weights appearing in a $p$-adic decomposition of $λ$, thereby reducing the character problem to a a finite collection of weights. In recent years this theorem has been extended to various quasi-reductive supergroup schemes. In this paper, we prove the analogous result for the general linear group scheme $GL(X)$ for any object $X$ in the Verlinde category $\mathrm{Ver}_p$.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
From the Albert algebra to Kac's ten-dimensional Jordan superalgebra via tensor categories in characteristic 5
Authors:
Alberto Elduque,
Pavel Etingof,
Arun S. Kannan
Abstract:
Kac's ten-dimensional simple Jordan superalgebra over a field of characteristic 5 is obtained from a process of semisimplification, via tensor categories, from the exceptional simple Jordan algebra (or Albert algebra), together with a suitable order 5 automorphism. This explains McCrimmon's 'bizarre result' asserting that, in characteristic 5, Kac's superalgebra is a sort of 'degree 3 Jordan super…
▽ More
Kac's ten-dimensional simple Jordan superalgebra over a field of characteristic 5 is obtained from a process of semisimplification, via tensor categories, from the exceptional simple Jordan algebra (or Albert algebra), together with a suitable order 5 automorphism. This explains McCrimmon's 'bizarre result' asserting that, in characteristic 5, Kac's superalgebra is a sort of 'degree 3 Jordan superalgebra'. As an outcome, the exceptional simple Lie superalgebra el(5;5), specific of characteristic 5, is obtained from the simple Lie algebra of type $E_8$ and an order 5 automorphism. In the process, precise recipes to obtain superalgebras from algebras in the category of representations of the cyclic group $C_p$, over a field of characteristic $p>2$, are given.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gaussian Harmony: Attaining Fairness in Diffusion-based Face Generation Models
Authors:
Basudha Pal,
Arunkumar Kannan,
Ram Prabhakar Kathirvel,
Alice J. O'Toole,
Rama Chellappa
Abstract:
Diffusion models have achieved great progress in face generation. However, these models amplify the bias in the generation process, leading to an imbalance in distribution of sensitive attributes such as age, gender and race. This paper proposes a novel solution to this problem by balancing the facial attributes of the generated images. We mitigate the bias by localizing the means of the facial at…
▽ More
Diffusion models have achieved great progress in face generation. However, these models amplify the bias in the generation process, leading to an imbalance in distribution of sensitive attributes such as age, gender and race. This paper proposes a novel solution to this problem by balancing the facial attributes of the generated images. We mitigate the bias by localizing the means of the facial attributes in the latent space of the diffusion model using Gaussian mixture models (GMM). Our motivation for choosing GMMs over other clustering frameworks comes from the flexible latent structure of diffusion model. Since each sampling step in diffusion models follows a Gaussian distribution, we show that fitting a GMM model helps us to localize the subspace responsible for generating a specific attribute. Furthermore, our method does not require retraining, we instead localize the subspace on-the-fly and mitigate the bias for generating a fair dataset. We evaluate our approach on multiple face attribute datasets to demonstrate the effectiveness of our approach. Our results demonstrate that our approach leads to a more fair data generation in terms of representational fairness while preserving the quality of generated samples.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Turn Down the Noise: Leveraging Diffusion Models for Test-time Adaptation via Pseudo-label Ensembling
Authors:
Mrigank Raman,
Rohan Shah,
Akash Kannan,
Pranit Chawla
Abstract:
The goal of test-time adaptation is to adapt a source-pretrained model to a continuously changing target domain without relying on any source data. Typically, this is either done by updating the parameters of the model (model adaptation) using inputs from the target domain or by modifying the inputs themselves (input adaptation). However, methods that modify the model suffer from the issue of comp…
▽ More
The goal of test-time adaptation is to adapt a source-pretrained model to a continuously changing target domain without relying on any source data. Typically, this is either done by updating the parameters of the model (model adaptation) using inputs from the target domain or by modifying the inputs themselves (input adaptation). However, methods that modify the model suffer from the issue of compounding noisy updates whereas methods that modify the input need to adapt to every new data point from scratch while also struggling with certain domain shifts. We introduce an approach that leverages a pre-trained diffusion model to project the target domain images closer to the source domain and iteratively updates the model via pseudo-label ensembling. Our method combines the advantages of model and input adaptations while mitigating their shortcomings. Our experiments on CIFAR-10C demonstrate the superiority of our approach, outperforming the strongest baseline by an average of 1.7% across 15 diverse corruptions and surpassing the strongest input adaptation baseline by an average of 18%.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Extrinsically-Focused Evaluation of Omissions in Medical Summarization
Authors:
Elliot Schumacher,
Daniel Rosenthal,
Varun Nair,
Luladay Price,
Geoffrey Tso,
Anitha Kannan
Abstract:
The goal of automated summarization techniques (Paice, 1990; Kupiec et al, 1995) is to condense text by focusing on the most critical information. Generative large language models (LLMs) have shown to be robust summarizers, yet traditional metrics struggle to capture resulting performance (Goyal et al, 2022) in more powerful LLMs. In safety-critical domains such as medicine, more rigorous evaluati…
▽ More
The goal of automated summarization techniques (Paice, 1990; Kupiec et al, 1995) is to condense text by focusing on the most critical information. Generative large language models (LLMs) have shown to be robust summarizers, yet traditional metrics struggle to capture resulting performance (Goyal et al, 2022) in more powerful LLMs. In safety-critical domains such as medicine, more rigorous evaluation is required, especially given the potential for LLMs to omit important information in the resulting summary. We propose MED-OMIT, a new omission benchmark for medical summarization. Given a doctor-patient conversation and a generated summary, MED-OMIT categorizes the chat into a set of facts and identifies which are omitted from the summary. We further propose to determine fact importance by simulating the impact of each fact on a downstream clinical task: differential diagnosis (DDx) generation. MED-OMIT leverages LLM prompt-based approaches which categorize the importance of facts and cluster them as supporting or negating evidence to the diagnosis. We evaluate MED-OMIT on a publicly-released dataset of patient-doctor conversations and find that MED-OMIT captures omissions better than alternative metrics.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
DEFT: Dexterous Fine-Tuning for Real-World Hand Policies
Authors:
Aditya Kannan,
Kenneth Shaw,
Shikhar Bahl,
Pragna Mannam,
Deepak Pathak
Abstract:
Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent…
▽ More
Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.
△ Less
Submitted 12 December, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Injecting knowledge into language generation: a case study in auto-charting after-visit care instructions from medical dialogue
Authors:
Maksim Eremeev,
Ilya Valmianski,
Xavier Amatriain,
Anitha Kannan
Abstract:
Factual correctness is often the limiting factor in practical applications of natural language generation in high-stakes domains such as healthcare. An essential requirement for maintaining factuality is the ability to deal with rare tokens. This paper focuses on rare tokens that appear in both the source and the reference sequences, and which, when missed during generation, decrease the factual c…
▽ More
Factual correctness is often the limiting factor in practical applications of natural language generation in high-stakes domains such as healthcare. An essential requirement for maintaining factuality is the ability to deal with rare tokens. This paper focuses on rare tokens that appear in both the source and the reference sequences, and which, when missed during generation, decrease the factual correctness of the output text. For high-stake domains that are also knowledge-rich, we show how to use knowledge to (a) identify which rare tokens that appear in both source and reference are important and (b) uplift their conditional probability. We introduce the ``utilization rate'' that encodes knowledge and serves as a regularizer by maximizing the marginal probability of selected tokens. We present a study in a knowledge-rich domain of healthcare, where we tackle the problem of generating after-visit care instructions based on patient-doctor dialogues. We verify that, in our dataset, specific medical concepts with high utilization rates are underestimated by conventionally trained sequence-to-sequence models. We observe that correcting this with our approach to knowledge injection reduces the uncertainty of the model as well as improves factuality and coherence without negatively impacting fluency.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Unsupervised Spiking Neural Network Model of Prefrontal Cortex to study Task Switching with Synaptic deficiency
Authors:
Ashwin Viswanathan Kannan,
Goutam Mylavarapu,
Johnson P Thomas
Abstract:
In this study, we build a computational model of Prefrontal Cortex (PFC) using Spiking Neural Networks (SNN) to understand how neurons adapt and respond to tasks switched under short and longer duration of stimulus changes. We also explore behavioral deficits arising out of the PFC lesions by simulating lesioned states in our Spiking architecture model. Although there are some computational models…
▽ More
In this study, we build a computational model of Prefrontal Cortex (PFC) using Spiking Neural Networks (SNN) to understand how neurons adapt and respond to tasks switched under short and longer duration of stimulus changes. We also explore behavioral deficits arising out of the PFC lesions by simulating lesioned states in our Spiking architecture model. Although there are some computational models of the PFC, SNN's have not been used to model them. In this study, we use SNN's having parameters close to biologically plausible values and train the model using unsupervised Spike Timing Dependent Plasticity (STDP) learning rule. Our model is based on connectionist architectures and exhibits neural phenomena like sustained activity which helps in generating short-term or working memory. We use these features to simulate lesions by deactivating synaptic pathways and record the weight adjustments of learned patterns and capture the accuracy of learning tasks in such conditions. All our experiments are trained and recorded using a real-world Fashion MNIST (FMNIST) dataset and through this work, we bridge the gap between bio-realistic models and those that perform well in pattern recognition tasks
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models
Authors:
Varun Nair,
Elliot Schumacher,
Anitha Kannan
Abstract:
A medical provider's summary of a patient visit serves several critical purposes, including clinical decision-making, facilitating hand-offs between providers, and as a reference for the patient. An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue, despite the complexity of patient-generated language. Even minor inaccuracies…
▽ More
A medical provider's summary of a patient visit serves several critical purposes, including clinical decision-making, facilitating hand-offs between providers, and as a reference for the patient. An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue, despite the complexity of patient-generated language. Even minor inaccuracies in visit summaries (for example, summarizing "patient does not have a fever" when a fever is present) can be detrimental to the outcome of care for the patient.
This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks that are sequentially built upon. First, we identify medical entities and their affirmations within the conversation to serve as building blocks. We study dynamically constructing few-shot prompts for tasks by conditioning on relevant patient information and use GPT-3 as the backbone for our experiments. We also develop GPT-derived summarization metrics to measure performance against reference summaries quantitatively. Both our human evaluation study and metrics for medical correctness show that summaries generated using this approach are clinically accurate and outperform the baseline approach of summarizing the dialog in a zero-shot, single-prompt setting.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
Authors:
Albert Yu Sun,
Varun Nair,
Elliot Schumacher,
Anitha Kannan
Abstract:
A wave of new task-based virtual assistants has been fueled by increasingly powerful large language models (LLMs), such as GPT-4 (OpenAI, 2023). A major challenge in deploying LLM-based virtual conversational assistants in real world settings is ensuring they operate within what is admissible for the task. To overcome this challenge, the designers of these virtual assistants rely on an independent…
▽ More
A wave of new task-based virtual assistants has been fueled by increasingly powerful large language models (LLMs), such as GPT-4 (OpenAI, 2023). A major challenge in deploying LLM-based virtual conversational assistants in real world settings is ensuring they operate within what is admissible for the task. To overcome this challenge, the designers of these virtual assistants rely on an independent guardrail system that verifies the virtual assistant's output aligns with the constraints required for the task. However, relying on commonly used, prompt-based guardrails can be difficult to engineer correctly and comprehensively. To address these challenges, we propose CONSCENDI. We use CONSCENDI to exhaustively generate training data with two key LLM-powered components: scenario-augmented generation and contrastive training examples. When generating conversational data, we generate a set of rule-breaking scenarios, which enumerate a diverse set of high-level ways a rule can be violated. This scenario-guided approach produces a diverse training set and provides chatbot designers greater control. To generate contrastive examples, we prompt the LLM to alter conversations with violations into acceptable conversations to enable fine-grained distinctions. We then use this data, generated by CONSCENDI, to train a smaller model. We find that CONSCENDI results in guardrail models that improve over baselines in multiple dialogue domains.
△ Less
Submitted 3 April, 2024; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Dialogue-Contextualized Re-ranking for Medical History-Taking
Authors:
Jian Zhu,
Ilya Valmianski,
Anitha Kannan
Abstract:
AI-driven medical history-taking is an important component in symptom checking, automated patient intake, triage, and other AI virtual care applications. As history-taking is extremely varied, machine learning models require a significant amount of data to train. To overcome this challenge, existing systems are developed using indirect data or expert knowledge. This leads to a training-inference g…
▽ More
AI-driven medical history-taking is an important component in symptom checking, automated patient intake, triage, and other AI virtual care applications. As history-taking is extremely varied, machine learning models require a significant amount of data to train. To overcome this challenge, existing systems are developed using indirect data or expert knowledge. This leads to a training-inference gap as models are trained on different kinds of data than what they observe at inference time. In this work, we present a two-stage re-ranking approach that helps close the training-inference gap by re-ranking the first-stage question candidates using a dialogue-contextualized model. For this, we propose a new model, global re-ranker, which cross-encodes the dialogue with all questions simultaneously, and compare it with several existing neural baselines. We test both transformer and S4-based language model backbones. We find that relative to the expert system, the best performance is achieved by our proposed global re-ranker with a transformer backbone, resulting in a 30% higher normalized discount cumulative gain (nDCG) and a 77% higher mean average precision (mAP).
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
Authors:
Varun Nair,
Elliot Schumacher,
Geoffrey Tso,
Anitha Kannan
Abstract:
Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate outputs that are factually accurate and complete. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased convers…
▽ More
Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate outputs that are factually accurate and complete. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs, namely GPT-4. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types - a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher's information and makes judgments on the final output.
We test DERA against three clinically-focused tasks. For medical conversation summarization and care plan generation, DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics. In a new finding, we also show that GPT-4's performance (70%) on an open-ended version of the MedQA question-answering (QA) dataset (** et al. 2021, USMLE) is well above the passing level (60%), with DERA showing similar performance. We release the open-ended MEDQA dataset at https://github.com/curai/curai-research/tree/main/DERA.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Approximation of group explainers with coalition structure using Monte Carlo sampling on the product space of coalitions and features
Authors:
Konstandinos Kotsiopoulos,
Alexey Miroshnikov,
Khashayar Filom,
Arjun Ravi Kannan
Abstract:
In recent years, many Machine Learning (ML) explanation techniques have been designed using ideas from cooperative game theory. These game-theoretic explainers suffer from high complexity, hindering their exact computation in practical settings. In our work, we focus on a wide class of linear game values, as well as coalitional values, for the marginal game based on a given ML model and predictor…
▽ More
In recent years, many Machine Learning (ML) explanation techniques have been designed using ideas from cooperative game theory. These game-theoretic explainers suffer from high complexity, hindering their exact computation in practical settings. In our work, we focus on a wide class of linear game values, as well as coalitional values, for the marginal game based on a given ML model and predictor vector. By viewing these explainers as expectations over appropriate sample spaces, we design a novel Monte Carlo sampling algorithm that estimates them at a reduced complexity that depends linearly on the size of the background dataset. We set up a rigorous framework for the statistical analysis and obtain error bounds for our sampling methods. The advantage of this approach is that it is fast, easily implementable, and model-agnostic. Furthermore, it has similar statistical accuracy as other known estimation techniques that are more complex and model-specific. We provide rigorous proofs of statistical convergence, as well as numerical experiments whose results agree with our theoretical findings.
△ Less
Submitted 18 April, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
FLINT: A Platform for Federated Learning Integration
Authors:
Ewen Wang,
Ajay Kannan,
Yuefeng Liang,
Boyi Chen,
Mosharaf Chowdhury
Abstract:
Cross-device federated learning (FL) has been well-studied from algorithmic, system scalability, and training speed perspectives. Nonetheless, moving from centralized training to cross-device FL for millions or billions of devices presents many risks, including performance loss, developer inertia, poor user experience, and unexpected application failures. In addition, the corresponding infrastruct…
▽ More
Cross-device federated learning (FL) has been well-studied from algorithmic, system scalability, and training speed perspectives. Nonetheless, moving from centralized training to cross-device FL for millions or billions of devices presents many risks, including performance loss, developer inertia, poor user experience, and unexpected application failures. In addition, the corresponding infrastructure, development costs, and return on investment are difficult to estimate. In this paper, we present a device-cloud collaborative FL platform that integrates with an existing machine learning platform, providing tools to measure real-world constraints, assess infrastructure capabilities, evaluate model training performance, and estimate system resource requirements to responsibly bring FL into production. We also present a decision workflow that leverages the FL-integrated platform to comprehensively evaluate the trade-offs of cross-device FL and share our empirical evaluations of business-critical machine learning applications that impact hundreds of millions of users.
△ Less
Submitted 10 March, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
On marginal feature attributions of tree-based models
Authors:
Khashayar Filom,
Alexey Miroshnikov,
Konstandinos Kotsiopoulos,
Arjun Ravi Kannan
Abstract:
Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent onl…
▽ More
Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent only on the input-output function of the model. We contrast this with the popular TreeSHAP algorithm by presenting two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features, whereas the marginal Shapley values coincide. Furthermore, we discuss how the internal structure of tree-based models may be leveraged to help with computing their marginal feature attributions according to a linear game value. One important observation is that these are simple (piecewise-constant) functions with respect to a certain grid partition of the input space determined by the trained model. Another crucial observation, showcased by experiments with XGBoost, LightGBM and CatBoost libraries, is that only a portion of all features appears in a tree from the ensemble. Thus, the complexity of computing marginal Shapley (or Owen or Banzhaf) feature attributions may be reduced. This remains valid for a broader class of game values which we shall axiomatically characterize. A prime example is the case of CatBoost models where the trees are oblivious (symmetric) and the number of features in each of them is no larger than the depth. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models. This results in a fast, accurate algorithm for estimating these feature attributions.
△ Less
Submitted 5 May, 2024; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Benefits of Multiobjective Learning in Solar Energy Prediction
Authors:
Aswin Kannan
Abstract:
While the space of renewable energy forecasting has received significant attention in the last decade, literature has primarily focused on machine learning models that train on only one objective at a time. A host of classification (and regression) tasks in energy markets lead to highly imbalanced training data. Say, to balance reserves, it is natural for market regulators to have a choice to be m…
▽ More
While the space of renewable energy forecasting has received significant attention in the last decade, literature has primarily focused on machine learning models that train on only one objective at a time. A host of classification (and regression) tasks in energy markets lead to highly imbalanced training data. Say, to balance reserves, it is natural for market regulators to have a choice to be more/less averse to false negatives (can lead to poor operating efficiency and costs) than to false positives (can lead to market shortfall). Besides accuracy, other metrics like algorithmic bias, RMBE (in regression problems), inferencing time, and model sparsity are also very crucial. This paper is amongst the firsts in the field of renewable energy forecasting that attempts to present a Pareto frontier of solutions (tradeoffs), that answers the question on handling multiple objectives by means of using the XGBoost model (Gradient Boosted Trees). Our proposed algorithm relies on using a sequence of weighted (uniform meshes) single objective model training routines. Real world data examples from the Amherst (Massachusetts, United States) solar energy prediction panels with both triobjective (focus on accuracy) and biojective (focus on fairness/bias) classification instances are considered. Numerical experiments appear promising and clear advantages over single objective methods are seen by observing the spread and variety of solutions (model configurations).
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Beyond Codebook-Based Analog Beamforming at mmWave: Compressed Sensing and Machine Learning Methods
Authors:
Hamed Pezeshki,
Fabio Valerio Massoli,
Arash Behboodi,
Taesang Yoo,
Arumugam Kannan,
Mahmoud Taherzadeh Boroujeni,
Qiaoyu Li,
Tao Luo,
Joseph B. Soriaga
Abstract:
Analog beamforming is the predominant approach for millimeter wave (mmWave) communication given its favorable characteristics for limited-resource devices. In this work, we aim at reducing the spectral efficiency gap between analog and digital beamforming methods. We propose a method for refined beam selection based on the estimated raw channel. The channel estimation, an underdetermined problem,…
▽ More
Analog beamforming is the predominant approach for millimeter wave (mmWave) communication given its favorable characteristics for limited-resource devices. In this work, we aim at reducing the spectral efficiency gap between analog and digital beamforming methods. We propose a method for refined beam selection based on the estimated raw channel. The channel estimation, an underdetermined problem, is solved using compressed sensing (CS) methods leveraging angular domain sparsity of the channel. To reduce the complexity of CS methods, we propose dictionary learning iterative soft-thresholding algorithm, which jointly learns the sparsifying dictionary and signal reconstruction. We evaluate the proposed method on a realistic mmWave setup and show considerable performance improvement with respect to code-book based analog beamforming approaches.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Learning functional sections in medical conversations: iterative pseudo-labeling and human-in-the-loop approach
Authors:
Mengqian Wang,
Ilya Valmianski,
Xavier Amatriain,
Anitha Kannan
Abstract:
Medical conversations between patients and medical professionals have implicit functional sections, such as "history taking", "summarization", "education", and "care plan." In this work, we are interested in learning to automatically extract these sections. A direct approach would require collecting large amounts of expert annotations for this task, which is inherently costly due to the contextual…
▽ More
Medical conversations between patients and medical professionals have implicit functional sections, such as "history taking", "summarization", "education", and "care plan." In this work, we are interested in learning to automatically extract these sections. A direct approach would require collecting large amounts of expert annotations for this task, which is inherently costly due to the contextual inter-and-intra variability between these sections. This paper presents an approach that tackles the problem of learning to classify medical dialogue into functional sections without requiring a large number of annotations. Our approach combines pseudo-labeling and human-in-the-loop. First, we bootstrap using weak supervision with pseudo-labeling to generate dialogue turn-level pseudo-labels and train a transformer-based model, which is then applied to individual sentences to create noisy sentence-level labels. Second, we iteratively refine sentence-level labels using a cluster-based human-in-the-loop approach. Each iteration requires only a few dozen annotator decisions. We evaluate the results on an expert-annotated dataset of 100 dialogues and find that while our models start with 69.5% accuracy, we can iteratively improve it to 82.5%. The code used to perform all experiments described in this paper can be found here: https://github.com/curai/curai-research/tree/main/functional-sections.
△ Less
Submitted 7 October, 2022; v1 submitted 5 October, 2022;
originally announced October 2022.
-
OSLAT: Open Set Label Attention Transformer for Medical Entity Retrieval and Span Extraction
Authors:
Raymond Li,
Ilya Valmianski,
Li Deng,
Xavier Amatriain,
Anitha Kannan
Abstract:
Medical entity span extraction and linking are critical steps for many healthcare NLP tasks. Most existing entity extraction methods either have a fixed vocabulary of medical entities or require span annotations. In this paper, we propose a method for linking an open set of entities that does not require any span annotations. Our method, Open Set Label Attention Transformer (OSLAT), uses the label…
▽ More
Medical entity span extraction and linking are critical steps for many healthcare NLP tasks. Most existing entity extraction methods either have a fixed vocabulary of medical entities or require span annotations. In this paper, we propose a method for linking an open set of entities that does not require any span annotations. Our method, Open Set Label Attention Transformer (OSLAT), uses the label-attention mechanism to learn candidate-entity contextualized text representations. We find that OSLAT can not only link entities but is also able to implicitly learn spans associated with entities. We evaluate OSLAT on two tasks: (1) span extraction trained without explicit span annotations, and (2) entity linking trained without span-level annotation. We test the generalizability of our method by training two separate models on two datasets with low entity overlap and comparing cross-dataset performance.
△ Less
Submitted 20 November, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Differentially Private Estimation of Heterogeneous Causal Effects
Authors:
Fengshi Niu,
Harsha Nori,
Brian Quistorff,
Rich Caruana,
Donald Ngwe,
Aadharsh Kannan
Abstract:
Estimating heterogeneous treatment effects in domains such as healthcare or social science often involves sensitive data where protecting privacy is important. We introduce a general meta-algorithm for estimating conditional average treatment effects (CATE) with differential privacy (DP) guarantees. Our meta-algorithm can work with simple, single-stage CATE estimators such as S-learner and more co…
▽ More
Estimating heterogeneous treatment effects in domains such as healthcare or social science often involves sensitive data where protecting privacy is important. We introduce a general meta-algorithm for estimating conditional average treatment effects (CATE) with differential privacy (DP) guarantees. Our meta-algorithm can work with simple, single-stage CATE estimators such as S-learner and more complex multi-stage estimators such as DR and R-learner. We perform a tight privacy analysis by taking advantage of sample splitting in our meta-algorithm and the parallel composition property of differential privacy. In this paper, we implement our approach using DP-EBMs as the base learner. DP-EBMs are interpretable, high-accuracy models with privacy guarantees, which allow us to directly observe the impact of DP noise on the learned causal model. Our experiments show that multi-stage CATE estimators incur larger accuracy loss than single-stage CATE or ATE estimators and that most of the accuracy loss from differential privacy is due to an increase in variance, not biased estimates of treatment effects.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Representation Stability and Finite Orthogonal Groups
Authors:
Zifan Wang,
Arun S. Kannan
Abstract:
In this paper, we prove stability results about orthogonal groups over finite commutative rings where 2 is a unit. Inspired by Putman and Sam (2017), we construct a category $\mathbf{OrI}(R)$ and prove a Noetherianity theorem for the category of $\mathbf{OrI}(R)$-modules. This implies an asymptotic structure theorem for orthogonal groups. In addition, we show general homological stability theorems…
▽ More
In this paper, we prove stability results about orthogonal groups over finite commutative rings where 2 is a unit. Inspired by Putman and Sam (2017), we construct a category $\mathbf{OrI}(R)$ and prove a Noetherianity theorem for the category of $\mathbf{OrI}(R)$-modules. This implies an asymptotic structure theorem for orthogonal groups. In addition, we show general homological stability theorems for orthogonal groups, with both untwisted and twisted coefficients, partially generalizing a result of Charney (1987).
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Stable Centres II: Finite Classical Groups
Authors:
Arun S. Kannan,
Christopher Ryba
Abstract:
Farahat and Higman constructed an algebra $\mathrm{FH}$ interpolating the centres of symmetric group algebras $Z(\mathbb{Z}S_n)$ by proving that the structure constants in these rings are "polynomial in $n$". Inspired by a construction of $\mathrm{FH}$ due to Ivanov and Kerov, we prove for $G_n = GL_n, U_n, Sp_{2n}, O_n$, that the structure constants of $Z(\mathbb{Z}G_n(\mathbb{F}_q))$ are "polyno…
▽ More
Farahat and Higman constructed an algebra $\mathrm{FH}$ interpolating the centres of symmetric group algebras $Z(\mathbb{Z}S_n)$ by proving that the structure constants in these rings are "polynomial in $n$". Inspired by a construction of $\mathrm{FH}$ due to Ivanov and Kerov, we prove for $G_n = GL_n, U_n, Sp_{2n}, O_n$, that the structure constants of $Z(\mathbb{Z}G_n(\mathbb{F}_q))$ are "polynomial in $q^n$", allowing us to construct an equivalent of the Farahat-Higman algebra in each case.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Model-agnostic bias mitigation methods with regressor distribution control for Wasserstein-based fairness metrics
Authors:
Alexey Miroshnikov,
Konstandinos Kotsiopoulos,
Ryan Franks,
Arjun Ravi Kannan
Abstract:
This article is a companion paper to our earlier work Miroshnikov et al. (2021) on fairness interpretability, which introduces bias explanations. In the current work, we propose a bias mitigation methodology based upon the construction of post-processed models with fairer regressor distributions for Wasserstein-based fairness metrics. By identifying the list of predictors contributing the most to…
▽ More
This article is a companion paper to our earlier work Miroshnikov et al. (2021) on fairness interpretability, which introduces bias explanations. In the current work, we propose a bias mitigation methodology based upon the construction of post-processed models with fairer regressor distributions for Wasserstein-based fairness metrics. By identifying the list of predictors contributing the most to the bias, we reduce the dimensionality of the problem by mitigating the bias originating from those predictors. The post-processing methodology involves resha** the predictor distributions by balancing the positive and negative bias explanations and allows for the regressor bias to decrease. We design an algorithm that uses Bayesian optimization to construct the bias-performance efficient frontier over the family of post-processed models, from which an optimal model is selected. Our novel methodology performs optimization in low-dimensional spaces and avoids expensive model retraining.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
MEDCOD: A Medically-Accurate, Emotive, Diverse, and Controllable Dialog System
Authors:
Rhys Compton,
Ilya Valmianski,
Li Deng,
Costa Huang,
Namit Katariya,
Xavier Amatriain,
Anitha Kannan
Abstract:
We present MEDCOD, a Medically-Accurate, Emotive, Diverse, and Controllable Dialog system with a unique approach to the natural language generator module. MEDCOD has been developed and evaluated specifically for the history taking task. It integrates the advantage of a traditional modular approach to incorporate (medical) domain knowledge with modern deep learning techniques to generate flexible,…
▽ More
We present MEDCOD, a Medically-Accurate, Emotive, Diverse, and Controllable Dialog system with a unique approach to the natural language generator module. MEDCOD has been developed and evaluated specifically for the history taking task. It integrates the advantage of a traditional modular approach to incorporate (medical) domain knowledge with modern deep learning techniques to generate flexible, human-like natural language expressions. Two key aspects of MEDCOD's natural language output are described in detail. First, the generated sentences are emotive and empathetic, similar to how a doctor would communicate to the patient. Second, the generated sentence structures and phrasings are varied and diverse while maintaining medical consistency with the desired medical concept (provided by the dialogue manager module of MEDCOD). Experimental results demonstrate the effectiveness of our approach in creating a human-like medical dialogue system. Relevant code is available at https://github.com/curai/curai-research/tree/main/MEDCOD
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Adding more data does not always help: A study in medical conversation summarization with PEGASUS
Authors:
Varun Nair,
Namit Katariya,
Xavier Amatriain,
Ilya Valmianski,
Anitha Kannan
Abstract:
Medical conversation summarization is integral in capturing information gathered during interactions between patients and physicians. Summarized conversations are used to facilitate patient hand-offs between physicians, and as part of providing care in the future. Summaries, however, can be time-consuming to produce and require domain expertise. Modern pre-trained NLP models such as PEGASUS have e…
▽ More
Medical conversation summarization is integral in capturing information gathered during interactions between patients and physicians. Summarized conversations are used to facilitate patient hand-offs between physicians, and as part of providing care in the future. Summaries, however, can be time-consuming to produce and require domain expertise. Modern pre-trained NLP models such as PEGASUS have emerged as capable alternatives to human summarization, reaching state-of-the-art performance on many summarization benchmarks. However, many downstream tasks still require at least moderately sized datasets to achieve satisfactory performance. In this work we (1) explore the effect of dataset size on transfer learning medical conversation summarization using PEGASUS and (2) evaluate various iterative labeling strategies in the low-data regime, following their success in the classification setting. We find that model performance saturates with increase in dataset size and that the various active-learning strategies evaluated all show equivalent performance consistent with simple dataset size increase. We also find that naive iterative pseudo-labeling is on-par or slightly worse than no pseudo-labeling. Our work sheds light on the successes and challenges of translating low-data regime techniques in classification to medical conversation summarization and helps guides future work in this space. Relevant code available at \url{https://github.com/curai/curai-research/tree/main/medical-summarization-ML4H-2021}.
△ Less
Submitted 28 November, 2021; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
Authors:
Bharath Chintagunta,
Namit Katariya,
Xavier Amatriain,
Anitha Kannan
Abstract:
In medical dialogue summarization, summaries must be coherent and must capture all the medically relevant information in the dialogue. However, learning effective models for summarization require large amounts of labeled data which is especially hard to obtain. We present an algorithm to create synthetic training data with an explicit focus on capturing medically relevant information. We utilize G…
▽ More
In medical dialogue summarization, summaries must be coherent and must capture all the medically relevant information in the dialogue. However, learning effective models for summarization require large amounts of labeled data which is especially hard to obtain. We present an algorithm to create synthetic training data with an explicit focus on capturing medically relevant information. We utilize GPT-3 as the backbone of our algorithm and scale 210 human labeled examples to yield results comparable to using 6400 human labeled examples (~30x) leveraging low-shot learning and an ensemble method. In detailed experiments, we show that this approach produces high quality training data that can further be combined with human labeled data to get summaries that are strongly preferable to those produced by models trained on human data alone both in terms of medical accuracy and coherency.
△ Less
Submitted 9 September, 2021;
originally announced October 2021.
-
New Constructions of Exceptional Simple Lie Superalgebras with Integer Cartan Matrix in Characteristics 3 and 5 via Tensor Categories
Authors:
Arun S. Kannan
Abstract:
Using tensor categories, we present new constructions of several of the exceptional simple Lie superalgebras with integer Cartan matrix in characteristic $p = 3$ and $p = 5$ from the complete classification of modular Lie superalgebras with indecomposable Cartan matrix and their simple subquotients over algebraically closed fields by Bouarroudj, Grozman, and Leites in 2009. Specifically, let…
▽ More
Using tensor categories, we present new constructions of several of the exceptional simple Lie superalgebras with integer Cartan matrix in characteristic $p = 3$ and $p = 5$ from the complete classification of modular Lie superalgebras with indecomposable Cartan matrix and their simple subquotients over algebraically closed fields by Bouarroudj, Grozman, and Leites in 2009. Specifically, let $\mathbfα_p$ denote the kernel of the Frobenius endomorphism on the additive group scheme $\mathbb{G}_a$ over an algebraically closed field of characteristic $p$. The Verlinde category $\mathrm{Ver}_p$ is the semisimplification of the representation category $\mathrm{Rep} \ \mathbfα_p$, and $\mathrm{Ver}_p$ contains the category of super vector spaces as a full subcategory. Each exceptional Lie superalgebra we construct is realized as the image of an exceptional Lie algebra equipped with a nilpotent derivation of order at most $p$ under the semisimplification functor from $\mathrm{Rep} \ \mathbfα_p$ to $\mathrm{Ver}_p$.
△ Less
Submitted 16 May, 2022; v1 submitted 12 August, 2021;
originally announced August 2021.
-
Document Structure aware Relational Graph Convolutional Networks for Ontology Population
Authors:
Abhay M Shalghar,
Ayush Kumar,
Balaji Ganesan,
Aswin Kannan,
Akshay Parekh,
Shobha G
Abstract:
Ontologies comprising of concepts, their attributes, and relationships are used in many knowledge based AI systems. While there have been efforts towards populating domain specific ontologies, we examine the role of document structure in learning ontological relationships between concepts in any document corpus. Inspired by ideas from hypernym discovery and explainability, our method performs abou…
▽ More
Ontologies comprising of concepts, their attributes, and relationships are used in many knowledge based AI systems. While there have been efforts towards populating domain specific ontologies, we examine the role of document structure in learning ontological relationships between concepts in any document corpus. Inspired by ideas from hypernym discovery and explainability, our method performs about 15 points more accurate than a stand-alone R-GCN model for this task.
△ Less
Submitted 12 April, 2022; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Language model fusion for streaming end to end speech recognition
Authors:
Rodrigo Cabrera,
Xiaofeng Liu,
Mohammadreza Ghodsi,
Zebulun Matteson,
Eugene Weinstein,
Anjuli Kannan
Abstract:
Streaming processing of speech audio is required for many contemporary practical speech recognition tasks. Even with the large corpora of manually transcribed speech data available today, it is impossible for such corpora to cover adequately the long tail of linguistic content that's important for tasks such as open-ended dictation and voice search. We seek to address both the streaming and the ta…
▽ More
Streaming processing of speech audio is required for many contemporary practical speech recognition tasks. Even with the large corpora of manually transcribed speech data available today, it is impossible for such corpora to cover adequately the long tail of linguistic content that's important for tasks such as open-ended dictation and voice search. We seek to address both the streaming and the tail recognition challenges by using a language model (LM) trained on unpaired text data to enhance the end-to-end (E2E) model. We extend shallow fusion and cold fusion approaches to streaming Recurrent Neural Network Transducer (RNNT), and also propose two new competitive fusion approaches that further enhance the RNNT architecture. Our results on multiple languages with varying training set sizes show that these fusion methods improve streaming RNNT performance through introducing extra linguistic features. Cold fusion works consistently better on streaming RNNT with up to a 8.5% WER improvement.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Lectures on Symmetric Tensor Categories
Authors:
Pavel Etingof,
Arun S. Kannan
Abstract:
This is an expanded version of the notes by the second author of the lectures on symmetric tensor categories given by the first author at Ohio State University in March 2019 and later at ICRA-2020 in November 2020. We review some aspects of the current state of the theory of symmetric tensor categories and discuss their applications, including ones unavailable in the literature.
This is an expanded version of the notes by the second author of the lectures on symmetric tensor categories given by the first author at Ohio State University in March 2019 and later at ICRA-2020 in November 2020. We review some aspects of the current state of the theory of symmetric tensor categories and discuss their applications, including ones unavailable in the literature.
△ Less
Submitted 10 November, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Stability theory of game-theoretic group feature explanations for machine learning models
Authors:
Alexey Miroshnikov,
Konstandinos Kotsiopoulos,
Khashayar Filom,
Arjun Ravi Kannan
Abstract:
In this article, we study feature attributions of Machine Learning (ML) models originating from linear game values and coalitional values defined as operators on appropriate functional spaces. The main focus is on random games based on the conditional and marginal expectations. The first part of our work formulates a stability theory for these explanation operators by establishing certain bounds f…
▽ More
In this article, we study feature attributions of Machine Learning (ML) models originating from linear game values and coalitional values defined as operators on appropriate functional spaces. The main focus is on random games based on the conditional and marginal expectations. The first part of our work formulates a stability theory for these explanation operators by establishing certain bounds for both marginal and conditional explanations. The differences between the two games are then elucidated, such as showing that the marginal explanations can become discontinuous on some naturally-designed domains, while the conditional explanations remain stable. In the second part of our work, group explanation methodologies are devised based on game values with coalition structure, where the features are grouped based on dependencies. We show analytically that grou** features this way has a stabilizing effect on the marginal operator on both group and individual levels, and allows for the unification of marginal and conditional explanations. Our results are verified in a number of numerical experiments where an information-theoretic measure of dependence is used for grou**.
△ Less
Submitted 3 April, 2024; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Medical symptom recognition from patient text: An active learning approach for long-tailed multilabel distributions
Authors:
Ali Mottaghi,
Prathusha K Sarma,
Xavier Amatriain,
Serena Yeung,
Anitha Kannan
Abstract:
We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on a…
▽ More
We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on account of i) the lack of availability of voluminous annotated data as well as ii) the large unknown universe of multiple symptoms that a single text can map to. Furthermore, patient text is often characterized by a long tail in the data (i.e., some labels/symptoms occur more frequently than others for e.g "fever" vs "hematochezia"). In this paper, we introduce an active learning method that leverages underlying structure of a continually refined, learned latent space to select the most informative examples to label. This enables the selection of the most informative examples that progressively increases the coverage on the universe of symptoms via the learned model, despite the long tail in data distribution.
△ Less
Submitted 28 March, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
Wasserstein-based fairness interpretability framework for machine learning models
Authors:
Alexey Miroshnikov,
Konstandinos Kotsiopoulos,
Ryan Franks,
Arjun Ravi Kannan
Abstract:
The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the fa…
▽ More
The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.
△ Less
Submitted 8 March, 2022; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Simulations of Argon Plasma Decay in a Thermionic Converter
Authors:
R. E. Groenewald,
S. Clark,
A. Kannan,
P. Scherpelz
Abstract:
The dynamics of an argon plasma in the gap of a thermionic diode is investigated using particle-in-cell (PIC) simulations. The time-averaged diode current, as a function of the relative electrical potential between the electrodes, is studied while the plasma density depletes due to recombination on the electrode surfaces. Simulations were performed in both 1D and 2D and significant differences wer…
▽ More
The dynamics of an argon plasma in the gap of a thermionic diode is investigated using particle-in-cell (PIC) simulations. The time-averaged diode current, as a function of the relative electrical potential between the electrodes, is studied while the plasma density depletes due to recombination on the electrode surfaces. Simulations were performed in both 1D and 2D and significant differences were observed in the plasma decay between the two cases. Specifically, in 2D it was found that the electrostatic potential gradually changes as the plasma decays, while in 1D fluctuations in the plasma led to large potential fluctuations which changed the plasma decay characteristics relative to the 2D case. This creates significant differences in the time-averaged diode current. Furthermore, it was found that the maximum time-averaged current is collected when the diode voltage is set to the flat-band condition, where the cathode and anode vacuum biases are equal. This suggests a novel technique of measuring the difference in work functions between the cathode and anode in a thermionic converter.
△ Less
Submitted 31 January, 2021; v1 submitted 7 October, 2020;
originally announced October 2020.
-
Dr. Summarize: Global Summarization of Medical Dialogue by Exploiting Local Structures
Authors:
Anirudh Joshi,
Namit Katariya,
Xavier Amatriain,
Anitha Kannan
Abstract:
Understanding a medical conversation between a patient and a physician poses a unique natural language understanding challenge since it combines elements of standard open ended conversation with very domain specific elements that require expertise and medical knowledge. Summarization of medical conversations is a particularly important aspect of medical conversation understanding since it addresse…
▽ More
Understanding a medical conversation between a patient and a physician poses a unique natural language understanding challenge since it combines elements of standard open ended conversation with very domain specific elements that require expertise and medical knowledge. Summarization of medical conversations is a particularly important aspect of medical conversation understanding since it addresses a very real need in medical practice: capturing the most important aspects of a medical encounter so that they can be used for medical decision making and subsequent follow ups.
In this paper we present a novel approach to medical conversation summarization that leverages the unique and independent local structures created when gathering a patient's medical history. Our approach is a variation of the pointer generator network where we introduce a penalty on the generator distribution, and we explicitly model negations. The model also captures important properties of medical conversations such as medical knowledge coming from standardized medical ontologies better than when those concepts are introduced explicitly. Through evaluation by doctors, we show that our approach is preferred on twice the number of summaries to the baseline pointer generator model and captures most or all of the information in 80% of the conversations making it a realistic alternative to costly manual summarization by medical experts.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs
Authors:
Clara H. McCreery,
Namit Katariya,
Anitha Kannan,
Manish Chablani,
Xavier Amatriain
Abstract:
People increasingly search online for answers to their medical questions but the rate at which medical questions are asked online significantly exceeds the capacity of qualified people to answer them. This leaves many questions unanswered or inadequately answered. Many of these questions are not unique, and reliable identification of similar questions would enable more efficient and effective ques…
▽ More
People increasingly search online for answers to their medical questions but the rate at which medical questions are asked online significantly exceeds the capacity of qualified people to answer them. This leaves many questions unanswered or inadequately answered. Many of these questions are not unique, and reliable identification of similar questions would enable more efficient and effective question answering schema. COVID-19 has only exacerbated this problem. Almost every government agency and healthcare organization has tried to meet the informational need of users by building online FAQs, but there is no way for people to ask their question and know if it is answered on one of these pages. While many research efforts have focused on the problem of general question similarity, these approaches do not generalize well to domains that require expert knowledge to determine semantic similarity, such as the medical domain. In this paper, we show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs followed by fine-tuning on medical question-question pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pretraining tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, an accuracy of 80.0% with a much smaller training set, and an accuracy of 84.5% when the full corpus of medical question-answer data is used. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQs.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
COVID-19 in differential diagnosis of online symptom assessments
Authors:
Anitha Kannan,
Richard Chen,
Vignesh Venkataraman,
Geoffrey J. Tso,
Xavier Amatriain
Abstract:
The COVID-19 pandemic has magnified an already existing trend of people looking for healthcare solutions online. One class of solutions are symptom checkers, which have become very popular in the context of COVID-19. Traditional symptom checkers, however, are based on manually curated expert systems that are inflexible and hard to modify, especially in a quickly changing situation like the one we…
▽ More
The COVID-19 pandemic has magnified an already existing trend of people looking for healthcare solutions online. One class of solutions are symptom checkers, which have become very popular in the context of COVID-19. Traditional symptom checkers, however, are based on manually curated expert systems that are inflexible and hard to modify, especially in a quickly changing situation like the one we are facing today. That is why all COVID-19 existing solutions are manual symptom checkers that can only estimate the probability of this disease and cannot contemplate alternative hypothesis or come up with a differential diagnosis. While machine learning offers an alternative, the lack of reliable data does not make it easy to apply to COVID-19 either. In this paper we present an approach that combines the strengths of traditional AI expert systems and novel deep learning models. In doing so we can leverage prior knowledge as well as any amount of existing data to quickly derive models that best adapt to the current state of the world and latest scientific knowledge. We use the approach to train a COVID-19 aware differential diagnosis model that can be used for medical decision support both for doctors or patients. We show that our approach is able to accurately model new incoming data about COVID-19 while still preserving accuracy on conditions that had been modeled in the past. While our approach shows evident and clear advantages for an extreme situation like the one we are currently facing, we also show that its flexibility generalizes beyond this concrete, but very important, example.
△ Less
Submitted 30 November, 2020; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Bubble coalescence in worm-like micellar solutions
Authors:
Vineeth Chandran Suja,
Aadithya Kannan,
Bruce Kubicka,
Alex Hadidi,
Gerald G. Fuller
Abstract:
Surfactants in aqueous solutions self-assemble in the presence of salt, to form long, flexible worm-like micelles (WLM). WLM solutions exhibit viscoelastic properties and are used in many applications, such as for cosmetic products, drag reduction and hydraulic fracturing. The dynamics of bubbles in WLM solutions are important considerations for the stability of many of these products. In this man…
▽ More
Surfactants in aqueous solutions self-assemble in the presence of salt, to form long, flexible worm-like micelles (WLM). WLM solutions exhibit viscoelastic properties and are used in many applications, such as for cosmetic products, drag reduction and hydraulic fracturing. The dynamics of bubbles in WLM solutions are important considerations for the stability of many of these products. In this manuscript, we investigate the thin film drainage dynamics leading up to the coalescence of bubbles in WLM solutions. The salts and surfactant type and concentrations were chosen so as to have the viscoelastic properties of the tested WLM solutions span over two orders of magnitude in moduli and relaxation times. The various stages in drainage and coalescence - the formation of a thick region at the apex - a dimple, the thinning and washout of this dimple and the final stages of drainage before rupture, are modified by the viscoelasticity of the wormlike micellar solutions. As a result of the unique viscoelastic properties of the WLM solutions, we also observe a number of interesting fluid dynamic phenomena during the drainage process including elastic recoil, thin film rip** and single-step terminal drainage.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Characters for Projective Modules in the BGG Category $\mathcal{O}$ for the Orthosymplectic Lie Superalgebra $\mathfrak{osp}(3|4)$
Authors:
Arun S. Kannan,
Honglin Zhu
Abstract:
We determine the Verma multiplicities of standard filtrations of projective modules for integral atypical blocks in the BGG category $\mathcal{O}$ for the orthosymplectic Lie superalgebras $\mathfrak{osp}(3|4)$ by way of translation functors. We then explicitly determine the composition factor multiplicities of Verma modules using BGG reciprocity.
We determine the Verma multiplicities of standard filtrations of projective modules for integral atypical blocks in the BGG category $\mathcal{O}$ for the orthosymplectic Lie superalgebras $\mathfrak{osp}(3|4)$ by way of translation functors. We then explicitly determine the composition factor multiplicities of Verma modules using BGG reciprocity.
△ Less
Submitted 20 November, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Symmetry breaking and chaos in evaporation driven Marangoni flows over bubbles
Authors:
Vineeth Chandran Suja,
Alex Hadidi,
Aadithya Kannan,
Gerald G Fuller
Abstract:
Understanding the dynamics of liquid films that make up bubbles is of practical and fundamental importance. Practically, this understanding is crucial for tuning bubble stability, while fundamentally, thin films are an excellent platform to study 2D flows. Here we study the spatiotemporal film thickness dynamics of bubbles subjected to evaporation driven Marangoni flows. Initially, we demonstrate…
▽ More
Understanding the dynamics of liquid films that make up bubbles is of practical and fundamental importance. Practically, this understanding is crucial for tuning bubble stability, while fundamentally, thin films are an excellent platform to study 2D flows. Here we study the spatiotemporal film thickness dynamics of bubbles subjected to evaporation driven Marangoni flows. Initially, we demonstrate how bubble stability can be dramatically tuned with the help of evaporation driven flows. Subsequently, we reveal that the spatial symmetry of thickness profiles evolves non-monotonically with the volatile species concentration, with profiles being axisymmetric at the two extremes in concentration. At $50\%$ concentration, spatial symmetry breaks down and thickness fluctuations are chaotic everywhere in space, with the fluctuation statistics becoming spatially invariant and ergodic. For these cases, the power spectrum of thickness fluctuations follow the Kolmogorov $-5/3$ scaling - a first such demonstration for forcing by evaporation. These results along with the reported setup provide an excellent framework to further investigate 2D chaotic flows.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Language-agnostic Multilingual Modeling
Authors:
Arindrima Datta,
Bhuvana Ramabhadran,
Jesse Emond,
Anjuli Kannan,
Brian Roark
Abstract:
Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarce languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scal…
▽ More
Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarce languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scalable when expanding to newer languages. Language-independent multilingual models help to address this issue, and are also better suited for multicultural societies where several languages are frequently used together (but often rendered with different writing systems). In this paper, we propose a new approach to building a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer. Thus, similar sounding acoustics are mapped to a single, canonical target sequence of graphemes, effectively separating the modeling and rendering problems. We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the language-agnostic multilingual model achieves up to 10% relative reduction in Word Error Rate (WER) over a language-dependent multilingual model.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency
Authors:
Tara N. Sainath,
Yanzhang He,
Bo Li,
Arun Narayanan,
Ruoming Pang,
Antoine Bruguier,
Shuo-yiin Chang,
Wei Li,
Raziel Alvarez,
Zhifeng Chen,
Chung-Cheng Chiu,
David Garcia,
Alex Gruenstein,
Ke Hu,
Minho **,
Anjuli Kannan,
Qiao Liang,
Ian McGraw,
Cal Peyser,
Rohit Prabhavalkar,
Golan Pundak,
David Rybach,
Yuan Shangguan,
Yash Sheth,
Trevor Strohman
, et al. (4 additional authors not shown)
Abstract:
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that…
▽ More
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.
△ Less
Submitted 1 May, 2020; v1 submitted 28 March, 2020;
originally announced March 2020.
-
The accuracy vs. coverage trade-off in patient-facing diagnosis models
Authors:
Anitha Kannan,
Jason Alan Fries,
Eric Kramer,
Jen Jen Chen,
Nigam Shah,
Xavier Amatriain
Abstract:
A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, these models must have high accuracy while covering…
▽ More
A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, these models must have high accuracy while covering a meaningful space of symptoms and diagnoses. To the best of our knowledge, this paper is the first in studying the trade-off between the coverage of the model and its performance for diagnosis. To this end, we learn diagnosis models with different coverage from EHR data. We find a 1\% drop in top-3 accuracy for every 10 diseases added to the coverage. We also observe that complexity for these models does not affect performance, with linear models performing as well as neural networks.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Classification as Decoder: Trading Flexibility for Control in Medical Dialogue
Authors:
Sam Shleifer,
Manish Chablani,
Anitha Kannan,
Namit Katariya,
Xavier Amatriain
Abstract:
Generative seq2seq dialogue systems are trained to predict the next word in dialogues that have already occurred. They can learn from large unlabeled conversation datasets, build a deeper understanding of conversational context, and generate a wide variety of responses. This flexibility comes at the cost of control, a concerning tradeoff in doctor/patient interactions. Inaccuracies, typos, or unde…
▽ More
Generative seq2seq dialogue systems are trained to predict the next word in dialogues that have already occurred. They can learn from large unlabeled conversation datasets, build a deeper understanding of conversational context, and generate a wide variety of responses. This flexibility comes at the cost of control, a concerning tradeoff in doctor/patient interactions. Inaccuracies, typos, or undesirable content in the training data will be reproduced by the model at inference time. We trade a small amount of labeling effort and some loss of response variety in exchange for quality control. More specifically, a pretrained language model encodes the conversational context, and we finetune a classification head to map an encoded conversational context to a response class, where each class is a noisily labeled group of interchangeable responses. Experts can update these exemplar responses over time as best practices change without retraining the classifier or invalidating old training data. Expert evaluation of 775 unseen doctor/patient conversations shows that only 12% of the discriminative model's responses are worse than the what the doctor ended up writing, compared to 18% for the generative model.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Accurate Protein Structure Prediction by Embeddings and Deep Learning Representations
Authors:
Iddo Drori,
Darshan Thaker,
Arjun Srivatsa,
Daniel Jeong,
Yueqi Wang,
Linyong Nan,
Fan Wu,
Dimitri Leggas,
**hao Lei,
Weiyi Lu,
Weilong Fu,
Yuan Gao,
Sashank Karri,
Anand Kannan,
Antonio Moretti,
Mohammed AlQuraishi,
Chen Keasar,
Itsik Pe'er
Abstract:
Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonst…
▽ More
Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonstrate state-of-the-art protein structure prediction (PSP) results using embeddings and deep learning models for prediction of backbone atom distance matrices and torsion angles. We recover 3D coordinates of backbone atoms and reconstruct full atom protein by optimization. We create a new gold standard dataset of proteins which is comprehensive and easy to use. Our dataset consists of amino acid sequences, Q8 secondary structures, position specific scoring matrices, multiple sequence alignment co-evolutionary features, backbone atom distance matrices, torsion angles, and 3D coordinates. We evaluate the quality of our structure prediction by RMSD on the latest Critical Assessment of Techniques for Protein Structure Prediction (CASP) test data and demonstrate competitive results with the winning teams and AlphaFold in CASP13 and supersede the results of the winning teams in CASP12. We make our data, models, and code publicly available.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
A comparison of end-to-end models for long-form speech recognition
Authors:
Chung-Cheng Chiu,
Wei Han,
Yu Zhang,
Ruoming Pang,
Sergey Kishchenko,
Patrick Nguyen,
Arun Narayanan,
Hank Liao,
Shuyuan Zhang,
Anjuli Kannan,
Rohit Prabhavalkar,
Zhifeng Chen,
Tara Sainath,
Yonghui Wu
Abstract:
End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of seconds. Whether such architectures are practical…
▽ More
End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of seconds. Whether such architectures are practical on long utterances that last from minutes to hours remains an open question. In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription. We first present an empirical comparison of different end-to-end models on a real world long-form task and demonstrate that the RNN-T model is much more robust than attention-based systems in this regime. We next explore two improvements to attention-based systems that significantly improve its performance: restricting the attention to be monotonic, and applying a novel decoding algorithm that breaks long utterances into shorter overlap** segments. Combining these two improvements, we show that attention-based end-to-end models can be very competitive to RNN-T on long-form speech recognition.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.