Search | arXiv e-print repository

arXiv:2405.19479 [pdf, other]

doi 10.1145/3630106.3658992

Participation in the age of foundation models

Authors: Harini Suresh, Emily Tseng, Meg Young, Mary L. Gray, Emma Pierson, Karen Levy

Abstract: Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized… ▽ More Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question. First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model that is intended to be universally applicable. In response, we develop a blueprint for participatory foundation models that identifies more local, application-oriented opportunities for meaningful participation. In addition to the "foundation" layer, our framework proposes the "subfloor'' layer, in which stakeholders develop shared technical infrastructure, norms and governance for a grounded domain, and the "surface'' layer, in which affected communities shape the use of a foundation model for a specific downstream task. The intermediate "subfloor'' layer scopes the range of potential harms to consider, and affords communities more concrete avenues for deliberation and intervention. At the same time, it avoids duplicative effort by scaling input across relevant use cases. Through three case studies in clinical care, financial services, and journalism, we illustrate how this multi-layer model can create more meaningful opportunities for participation than solely intervening at the foundation layer. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 13 pages, 2 figures. Appeared at FAccT '24

Journal ref: In The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24), June 3-6, 2024, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 13 pages

arXiv:2403.12046 [pdf, other]

GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

Authors: Senthujan Senkaiahliyan, Augustin Toma, Jun Ma, An-Wen Chan, Andrew Ha, Kevin R. An, Hrishikesh Suresh, Barry Rubin, Bo Wang

Abstract: OpenAI's large multimodal model, GPT-4V(ision), was recently developed for general image interpretation. However, less is known about its capabilities with medical image interpretation and diagnosis. Board-certified physicians and senior residents assessed GPT-4V's proficiency across a range of medical conditions using imaging modalities such as CT scans, MRIs, ECGs, and clinical photographs. Alth… ▽ More OpenAI's large multimodal model, GPT-4V(ision), was recently developed for general image interpretation. However, less is known about its capabilities with medical image interpretation and diagnosis. Board-certified physicians and senior residents assessed GPT-4V's proficiency across a range of medical conditions using imaging modalities such as CT scans, MRIs, ECGs, and clinical photographs. Although GPT-4V is able to identify and explain medical images, its diagnostic accuracy and clinical decision-making abilities are poor, posing risks to patient safety. Despite the potential that large language models may have in enhancing medical education and delivery, the current limitations of GPT-4V in interpreting medical images reinforces the importance of appropriate caution when using it for clinical decision-making. △ Less

Submitted 14 November, 2023; originally announced March 2024.

arXiv:2312.14804 [pdf, other]

Use large language models to promote equity

Authors: Emma Pierson, Divya Shanmugam, Rajiv Movva, Jon Kleinberg, Monica Agrawal, Mark Dredze, Kadija Ferryman, Judy Wawira Gichoya, Dan Jurafsky, Pang Wei Koh, Karen Levy, Sendhil Mullainathan, Ziad Obermeyer, Harini Suresh, Keyon Vafa

Abstract: Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biase… ▽ More Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biases have been well-documented. But equally vital, and much less discussed, is the more opportunity-focused counterpoint: "what promising applications do LLMs enable that could promote equity?" If LLMs are to enable a more equitable world, it is not enough just to play defense against their biases and failure modes. We must also go on offense, applying them positively to equity-enhancing use cases to increase opportunities for underserved groups and reduce societal discrimination. There are many choices which determine the impact of AI, and a fundamental choice very early in the pipeline is the problems we choose to apply it to. If we focus only later in the pipeline -- making LLMs marginally more fair as they facilitate use cases which intrinsically entrench power -- we will miss an important opportunity to guide them to equitable impacts. Here, we highlight the emerging potential of LLMs to promote equity by presenting four newly possible, promising research directions, while kee** risks and cautionary points in clear view. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2206.13607 [pdf, other]

Improved Text Classification via Test-Time Augmentation

Authors: Helen Lu, Divya Shanmugam, Harini Suresh, John Guttag

Abstract: Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP du… ▽ More Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP due in part to the difficulty of identifying label-preserving transformations. In this paper, we present augmentation policies that yield significant accuracy improvements with language models. A key finding is that augmentation policy design -- for instance, the number of samples generated from a single, non-deterministic augmentation -- has a considerable impact on the benefit of TTA. Experiments across a binary classification task and dataset show that test-time augmentation can deliver consistent improvements over current state-of-the-art approaches. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.02958 [pdf, other]

doi 10.1145/3593013.3593997

Saliency Cards: A Framework to Characterize and Compare Saliency Methods

Authors: Angie Boggust, Harini Suresh, Hendrik Strobelt, John V. Guttag, Arvind Satyanarayan

Abstract: Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in e… ▽ More Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics. Through a review of 25 saliency method papers and 33 method evaluations, we identify 10 attributes that users should account for when choosing a method. We group these attributes into three categories that span the process of computing and interpreting saliency: methodology, or how the saliency is calculated; sensitivity, or the relationship between the saliency and the underlying model and data; and, perceptibility, or how an end user ultimately interprets the result. By collating this information, saliency cards allow users to more holistically assess and compare the implications of different methods. Through nine semi-structured interviews with users from various backgrounds, including researchers, radiologists, and computational biologists, we find that saliency cards provide a detailed vocabulary for discussing individual methods and allow for a more systematic selection of task-appropriate methods. Moreover, with saliency cards, we are able to analyze the research landscape in a more structured fashion to identify opportunities for new methods and evaluation metrics for unmet user needs. △ Less

Submitted 30 May, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Published at FAccT 2023, 19 pages, 8 figures, 2 tables

arXiv:2111.03816 [pdf]

doi 10.14445/22315381/IJETT-V69I10P211

Design, Modelling, and Simulation analysis of a Single Axis MEMS-based Capacitive Accelerometer

Authors: Veena. S, Newton Rai, H. L. Suresh, Veda Sandeep Nagaraja

Abstract: This paper presents the design, simulation, and analytical modeling of the single proposed axis MEMSbased capacitive accelerometer. Analytical modeling has been done for frequency and displacement sensitivity. The performance of the accelerometer was tested for both static and dynamic conditions, and the corresponding static capacitance value was calculated and was found to be C0=0.730455pF, a res… ▽ More This paper presents the design, simulation, and analytical modeling of the single proposed axis MEMSbased capacitive accelerometer. Analytical modeling has been done for frequency and displacement sensitivity. The performance of the accelerometer was tested for both static and dynamic conditions, and the corresponding static capacitance value was calculated and was found to be C0=0.730455pF, a response time of 95.17μs, and settling time of 7.261ms and the displacement sensitivity Sd= 3.5362* m/g. It was observed that the sensitivity of the accelerometer depends on its design parameters like beam length, overlap area of comb, sensing mass, and the number of interdigital fingers. A novel capacitive accelerometer has been designed for an operating frequency of 2.1kHz The accelerometer was designed using COMSOL Multiphysics and analyzed using the MATLAB simulator tool. The single proposed axis MEMS-based capacitive accelerometer is suitable for automobile applications such as airbag deployment and navigation. △ Less

Submitted 6 November, 2021; originally announced November 2021.

Comments: 7 pages, 14 figures, Published with International Journal of Engineering Trends and Technology (IJETT)

Journal ref: International Journal of Engineering Trends and Technology 69.10(2021):82-88

arXiv:2102.08540 [pdf, other]

Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

Authors: Harini Suresh, Kathleen M. Lewis, John V. Guttag, Arvind Satyanarayan

Abstract: Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help user… ▽ More Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 14 physicians are better able to align the model's uncertainty with domain-relevant factors and build intuition about its capabilities and limitations. △ Less

Submitted 9 July, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2101.09824 [pdf, other]

doi 10.1145/3411764.3445088

Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs

Authors: Harini Suresh, Steven R. Gomez, Kevin K. Nam, Arvind Satyanarayan

Abstract: To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders' knowledge from their in… ▽ More To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders' knowledge from their interpretability needs. We characterize stakeholders by their formal, instrumental, and personal knowledge and how it manifests in the contexts of machine learning, the data domain, and the general milieu. We additionally distill a hierarchical typology of stakeholder needs that distinguishes higher-level domain goals from lower-level interpretability tasks. In assessing the descriptive, evaluative, and generative powers of our framework, we find our more nuanced treatment of stakeholders reveals gaps and opportunities in the interpretability literature, adds precision to the design and comparison of user studies, and facilitates a more reflexive approach to conducting this research. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: In CHI Conference on Human Factors in Computing Systems (CHI '21)

arXiv:2011.03395 [pdf, other]

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne , et al. (15 additional authors not shown)

Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict… ▽ More ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain. △ Less

Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: Updates: Updated statistical analysis in Section 6; Additional citations

arXiv:2005.10960 [pdf, other]

doi 10.1145/3394231.3397922

Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making

Authors: Harini Suresh, Natalie Lao, Ilaria Liccardi

Abstract: ML decision-aid systems are increasingly common on the web, but their successful integration relies on people trusting them appropriately: they should use the system to fill in gaps in their ability, but recognize signals that the system might be incorrect. We measured how people's trust in ML recommendations differs by expertise and with more system information through a task-based study of 175 a… ▽ More ML decision-aid systems are increasingly common on the web, but their successful integration relies on people trusting them appropriately: they should use the system to fill in gaps in their ability, but recognize signals that the system might be incorrect. We measured how people's trust in ML recommendations differs by expertise and with more system information through a task-based study of 175 adults. We used two tasks that are difficult for humans: comparing large crowd sizes and identifying similar-looking animals. Our results provide three key insights: (1) People trust incorrect ML recommendations for tasks that they perform correctly the majority of the time, even if they have high prior knowledge about ML or are given information indicating the system is not confident in its prediction; (2) Four different types of system information all increased people's trust in recommendations; and (3) Math and logic skills may be as important as ML for decision-makers working with ML recommendations. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 10 pages

Journal ref: 12th ACM Conference on Web Science, July 6-10, 2020, Southampton, United Kingdom

arXiv:1912.00262 [pdf, other]

Image segmentation of liver stage malaria infection with spatial uncertainty sampling

Authors: Ava P. Soleimany, Harini Suresh, Jose Javier Gonzalez Ortiz, Divya Shanmugam, Nil Gural, John Guttag, Sangeeta N. Bhatia

Abstract: Global eradication of malaria depends on the development of drugs effective against the silent, yet obligate liver stage of the disease. The gold standard in drug development remains microscopic imaging of liver stage parasites in in vitro cell culture models. Image analysis presents a major bottleneck in this pipeline since the parasite has significant variability in size, shape, and density in t… ▽ More Global eradication of malaria depends on the development of drugs effective against the silent, yet obligate liver stage of the disease. The gold standard in drug development remains microscopic imaging of liver stage parasites in in vitro cell culture models. Image analysis presents a major bottleneck in this pipeline since the parasite has significant variability in size, shape, and density in these models. As with other highly variable datasets, traditional segmentation models have poor generalizability as they rely on hand-crafted features; thus, manual annotation of liver stage malaria images remains standard. To address this need, we develop a convolutional neural network architecture that utilizes spatial dropout sampling for parasite segmentation and epistemic uncertainty estimation in images of liver stage malaria. Our pipeline produces high-precision segmentations nearly identical to expert annotations, generalizes well on a diverse dataset of liver stage malaria parasites, and promotes independence between learned feature maps to model the uncertainty of generated predictions. △ Less

Submitted 30 November, 2019; originally announced December 2019.

arXiv:1901.10002 [pdf, other]

doi 10.1145/3465416.3483305

A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle

Authors: Harini Suresh, John V. Guttag

Abstract: As machine learning (ML) increasingly affects people and society, awareness of its potential unwanted consequences has also grown. To anticipate, prevent, and mitigate undesirable downstream consequences, it is critical that we understand when and how harm might be introduced throughout the ML life cycle. In this paper, we provide a framework that identifies seven distinct potential sources of dow… ▽ More As machine learning (ML) increasingly affects people and society, awareness of its potential unwanted consequences has also grown. To anticipate, prevent, and mitigate undesirable downstream consequences, it is critical that we understand when and how harm might be introduced throughout the ML life cycle. In this paper, we provide a framework that identifies seven distinct potential sources of downstream harm in machine learning, spanning data collection, development, and deployment. In doing so, we aim to facilitate more productive and precise communication around these issues, as well as more direct, application-grounded ways to mitigate them. △ Less

Submitted 1 December, 2021; v1 submitted 28 January, 2019; originally announced January 2019.

Journal ref: EAAMO 2021: Equity and Access in Algorithms, Mechanisms, and Optimization

arXiv:1808.03827 [pdf, other]

Racial Disparities and Mistrust in End-of-Life Care

Authors: Willie Boag, Harini Suresh, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi

Abstract: There are established racial disparities in healthcare, including during end-of-life care, when poor communication and trust can lead to suboptimal outcomes for patients and their families. In this work, we find that racial disparities which have been reported in existing literature are also present in the MIMIC-III database. We hypothesize that one underlying cause of this disparity is due to mis… ▽ More There are established racial disparities in healthcare, including during end-of-life care, when poor communication and trust can lead to suboptimal outcomes for patients and their families. In this work, we find that racial disparities which have been reported in existing literature are also present in the MIMIC-III database. We hypothesize that one underlying cause of this disparity is due to mistrust between patient and caregivers, and we develop multiple possible trust metric proxies (using coded interpersonal variables and clinical notes) to measure this phenomenon more directly. These metrics show even stronger disparities in end-of-life care than race does, and they also tend to demonstrate statistically significant higher levels of mistrust for black patients than white ones. Finally, we demonstrate that these metrics improve performance on three clinical tasks: in-hospital mortality, discharge against medical advice (AMA) and modified care status (e.g., DNR, DNI, etc.). △ Less

Submitted 15 August, 2018; v1 submitted 11 August, 2018; originally announced August 2018.

arXiv:1807.00124 [pdf, other]

Modeling Mistrust in End-of-Life Care

Authors: Willie Boag, Harini Suresh, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi

Abstract: In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologi… ▽ More In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologically-created severity scores. Finally, we describe sentiment analysis experiments indicating patients with higher levels of mistrust have worse experiences and interactions with their caregivers. This work is a step towards measuring fairer machine learning in the healthcare domain. △ Less

Submitted 2 July, 2019; v1 submitted 30 June, 2018; originally announced July 2018.

arXiv:1806.02878 [pdf, other]

doi 10.1145/3219819.3219930

Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU

Authors: Harini Suresh, Jen J. Gong, John Guttag

Abstract: Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant pat… ▽ More Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: KDD 2018

arXiv:1705.08498 [pdf, other]

Clinical Intervention Prediction and Understanding using Deep Networks

Authors: Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi

Abstract: Real-time prediction of clinical interventions remains a challenge within intensive care units (ICUs). This task is complicated by data sources that are noisy, sparse, heterogeneous and outcomes that are imbalanced. In this paper, we integrate data from all available ICU sources (vitals, labs, notes, demographics) and focus on learning rich representations of this data to predict onset and weaning… ▽ More Real-time prediction of clinical interventions remains a challenge within intensive care units (ICUs). This task is complicated by data sources that are noisy, sparse, heterogeneous and outcomes that are imbalanced. In this paper, we integrate data from all available ICU sources (vitals, labs, notes, demographics) and focus on learning rich representations of this data to predict onset and weaning of multiple invasive interventions. In particular, we compare both long short-term memory networks (LSTM) and convolutional neural networks (CNN) for prediction of five intervention tasks: invasive ventilation, non-invasive ventilation, vasopressors, colloid boluses, and crystalloid boluses. Our predictions are done in a forward-facing manner to enable "real-time" performance, and predictions are made with a six hour gap time to support clinically actionable planning. We achieve state-of-the-art results on our predictive tasks using deep architectures. We explore the use of feature occlusion to interpret LSTM models, and compare this to the interpretability gained from examining inputs that maximally activate CNN outputs. We show that our models are able to significantly outperform baselines in intervention prediction, and provide insight into model learning, which is crucial for the adoption of such models in practice. △ Less

Submitted 23 May, 2017; originally announced May 2017.

arXiv:1703.07004 [pdf, other]

The Use of Autoencoders for Discovering Patient Phenotypes

Authors: Harini Suresh, Peter Szolovits, Marzyeh Ghassemi

Abstract: We use autoencoders to create low-dimensional embeddings of underlying patient phenotypes that we hypothesize are a governing factor in determining how different patients will react to different interventions. We compare the performance of autoencoders that take fixed length sequences of concatenated timesteps as input with a recurrent sequence-to-sequence autoencoder. We evaluate our methods on a… ▽ More We use autoencoders to create low-dimensional embeddings of underlying patient phenotypes that we hypothesize are a governing factor in determining how different patients will react to different interventions. We compare the performance of autoencoders that take fixed length sequences of concatenated timesteps as input with a recurrent sequence-to-sequence autoencoder. We evaluate our methods on around 35,500 patients from the latest MIMIC III dataset from Beth Israel Deaconess Hospital. △ Less

Submitted 20 March, 2017; originally announced March 2017.

Journal ref: NIPS Workshop on Machine Learning for Healthcare (NIPS ML4HC) 2016

arXiv:1512.05294

Feature Representation for ICU Mortality

Authors: Harini Suresh

Abstract: Good predictors of ICU Mortality have the potential to identify high-risk patients earlier, improve ICU resource allocation, or create more accurate population-level risk models. Machine learning practitioners typically make choices about how to represent features in a particular model, but these choices are seldom evaluated quantitatively. This study compares the performance of different represen… ▽ More Good predictors of ICU Mortality have the potential to identify high-risk patients earlier, improve ICU resource allocation, or create more accurate population-level risk models. Machine learning practitioners typically make choices about how to represent features in a particular model, but these choices are seldom evaluated quantitatively. This study compares the performance of different representations of clinical event data from MIMIC II in a logistic regression model to predict 36-hour ICU mortality. The most common representations are linear (normalized counts) and binary (yes/no). These, along with a new representation termed "hill", are compared using both L1 and L2 regularization. Results indicate that the introduced "hill" representation outperforms both the binary and linear representations, the hill representation thus has the potential to improve existing models of ICU mortality. △ Less

Submitted 7 February, 2016; v1 submitted 16 December, 2015; originally announced December 2015.

Comments: This article has been withdrawn due by the author due to the need for more testing to verify results

arXiv:1501.02527 [pdf, other]

Autodetection and Classification of Hidden Cultural City Districts from Yelp Reviews

Authors: Harini Suresh, Nicholas Locascio

Abstract: Topic models are a way to discover underlying themes in an otherwise unstructured collection of documents. In this study, we specifically used the Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to classify restaurants based off of their reviews. Furthermore, we hypothesize that within a city, restaurants can be grouped into similar "clusters" based on both location and… ▽ More Topic models are a way to discover underlying themes in an otherwise unstructured collection of documents. In this study, we specifically used the Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to classify restaurants based off of their reviews. Furthermore, we hypothesize that within a city, restaurants can be grouped into similar "clusters" based on both location and similarity. We used several different clustering methods, including K-means Clustering and a Probabilistic Mixture Model, in order to uncover and classify districts, both well-known and hidden (i.e. cultural areas like Chinatown or hearsay like "the best street for Italian restaurants") within a city. We use these models to display and label different clusters on a map. We also introduce a topic similarity heatmap that displays the similarity distribution in a city to a new restaurant. △ Less

Submitted 11 January, 2015; originally announced January 2015.

Showing 1–19 of 19 results for author: Suresh, H