Skip to main content

Showing 1–16 of 16 results for author: Banda, J M

.
  1. arXiv:2403.07911  [pdf

    cs.CY cs.AI

    Standing on FURM ground -- A framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems

    Authors: Alison Callahan, Duncan McElfresh, Juan M. Banda, Gabrielle Bunney, Danton Char, Jonathan Chen, Conor K. Corbin, Debadutta Dash, Norman L. Downing, Sneha S. Jain, Nikesh Kotecha, Jonathan Masterson, Michelle M. Mello, Keith Morse, Srikar Nallan, Abby Pandya, Anurang Revri, Aditya Sharma, Christopher Sharp, Rahul Thapa, Michael Wornow, Alaa Youssef, Michael A. Pfeffer, Nigam H. Shah

    Abstract: The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge… ▽ More

    Submitted 14 March, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

  2. arXiv:2309.06503  [pdf

    cs.CL cs.LG cs.SI

    Leveraging Large Language Models and Weak Supervision for Social Media data annotation: an evaluation using COVID-19 self-reported vaccination tweets

    Authors: Ramya Tekumalla, Juan M. Banda

    Abstract: The COVID-19 pandemic has presented significant challenges to the healthcare industry and society as a whole. With the rapid development of COVID-19 vaccines, social media platforms have become a popular medium for discussions on vaccine-related topics. Identifying vaccine-related tweets and analyzing them can provide valuable insights for public health research-ers and policymakers. However, manu… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  3. arXiv:2304.13714  [pdf

    cs.AI cs.CL cs.IR

    Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery

    Authors: Debadutta Dash, Rahul Thapa, Juan M. Banda, Akshay Swaminathan, Morgan Cheatham, Mehr Kashyap, Nikesh Kotecha, Jonathan H. Chen, Saurabh Gombar, Lance Downing, Rachel Pedreira, Ethan Goh, Angel Arnaout, Garret Kenn Morris, Honor Magon, Matthew P Lungren, Eric Horvitz, Nigam H. Shah

    Abstract: Despite growing interest in using large language models (LLMs) in healthcare, current explorations do not assess the real-world utility and safety of LLMs in clinical settings. Our objective was to determine whether two LLMs can serve information needs submitted by physicians as questions to an informatics consultation service in a safe and concordant manner. Sixty six questions from an informatic… ▽ More

    Submitted 30 April, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: 27 pages including supplemental information

  4. arXiv:2209.12614  [pdf

    cs.CL cs.LG

    Identifying epidemic related Tweets using noisy learning

    Authors: Ramya Tekumalla, Juan M. Banda

    Abstract: Supervised learning algorithms are heavily reliant on annotated datasets to train machine learning models. However, the curation of the annotated datasets is laborious and time consuming due to the manual effort involved and has become a huge bottleneck in supervised learning. In this work, we apply the theory of noisy learning to generate weak supervision signals instead of manual annotation. We… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

    Comments: 3 pages

  5. arXiv:2209.04732  [pdf

    cs.DB cs.AI

    Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

    Authors: Tiffany J. Callahan, Adrianne L. Stefanski, Jordan M. Wyrwa, Chenjie Zeng, Anna Ostropolets, Juan M. Banda, William A. Baumgartner Jr., Richard D. Boyce, Elena Casiraghi, Ben D. Coleman, Janine H. Collins, Sara J. Deakyne-Davies, James A. Feinstein, Melissa A. Haendel, Asiyah Y. Lin, Blake Martin, Nicolas A. Matentzoglu, Daniella Meeker, Justin Reese, Jessica Sinclair, Sanya B. Taneja, Katy E. Trinkley, Nicole A. Vasilevsky, Andrew Williams, Xingman A. Zhang , et al. (7 additional authors not shown)

    Abstract: Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenoty**. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, map** EHR data to OB… ▽ More

    Submitted 30 January, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

    Comments: Supplementary Material is included at the end of the manuscript

    ACM Class: J.3

  6. arXiv:2207.04947  [pdf

    cs.CL cs.IR cs.LG

    TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

    Authors: Ramya Tekumalla, Juan M. Banda

    Abstract: Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered from the Twitter stream using the name of the natural disaster and the filtered tweets are sent for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming, at times inaccurate, and mor… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: 12 pages

  7. arXiv:2107.12565  [pdf

    cs.IR cs.SI

    A Biomedically oriented automatically annotated Twitter COVID-19 Dataset

    Authors: Luis Alberto Robles Hernandez, Tiffany J. Callahan, Juan M. Banda

    Abstract: The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manu… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: 8 Pages, 3 tables

  8. arXiv:2102.06836  [pdf, other

    cs.SI cs.CY

    Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media

    Authors: Julia Wu, Venkatesh Sivaraman, Dheekshita Kumar, Juan M. Banda, David Sontag

    Abstract: The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-releva… ▽ More

    Submitted 28 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: 24 pages, 5 figures. To be published in the Journal of Biomedical Informatics

  9. arXiv:2007.10276  [pdf

    cs.IR cs.CL

    Characterizing drug mentions in COVID-19 Twitter Chatter

    Authors: Ramya Tekumalla, Juan M. Banda

    Abstract: Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter to identify discourse around drug mentions. While… ▽ More

    Submitted 9 October, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

    Comments: 7 pages, 2 figures and 5 tables

  10. arXiv:2005.09740  [pdf

    cs.IR cs.CL

    GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction

    Authors: Javad Rafiei Asl, Juan M. Banda

    Abstract: Automated methods for granular categorization of large corpora of text documents have become increasingly more important with the rate scientific, news, medical, and web documents are growing in the last few years. Automatic keyphrase extraction (AKE) aims to automatically detect a small set of single or multi-words from within a single textual document that captures the main topics of the documen… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  11. A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration

    Authors: Juan M. Banda, Ramya Tekumalla, Guanyu Wang, **gyuan Yu, Tuo Liu, Yuning Ding, Katya Artemova, Elena Tutubalina, Gerardo Chowell

    Abstract: As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated in the front lines of the COV… ▽ More

    Submitted 13 November, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 8 pages, 1 figure 2 table. Update: new version of paper with up-to-date statistics and new co-authors

  12. arXiv:2003.13900  [pdf

    cs.IR cs.SI

    A large-scale Twitter dataset for drug safety applications mined from publicly existing resources

    Authors: Ramya Tekumalla, Juan M. Banda

    Abstract: With the increase in popularity of deep learning models for natural language processing (NLP) tasks, in the field of Pharmacovigilance, more specifically for the identification of Adverse Drug Reactions (ADRs), there is an inherent need for large-scale social-media datasets aimed at such tasks. With most researchers allocating large amounts of time to crawl Twitter or buying expensive pre-curated… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: 8 tables, 2 figures, 7 pages, accepted after peer review as a workshop paper in ACM Conference on Health, Inference, and Learning (CHIL) 2020 https://www.chilconference.org/agenda/

  13. Social Media Mining Toolkit (SMMT)

    Authors: Ramya Tekumalla, Juan M. Banda

    Abstract: There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minima… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: 3 figures, 8 pages double spaced, Under review in Genomics & Informatics journal

    Journal ref: Genomics & Informatics 2020; 18(2): e16

  14. Solar Event Tracking with Deep Regression Networks: A Proof of Concept Evaluation

    Authors: Toqi Tahamid Sarker, Juan M. Banda

    Abstract: With the advent of deep learning for computer vision tasks, the need for accurately labeled data in large volumes is vital for any application. The increasingly available large amounts of solar image data generated by the Solar Dynamic Observatory (SDO) mission make this domain particularly interesting for the development and testing of deep learning systems. The currently available labeled solar… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: 8 pages, 5 figures, this has been submitted and accepted for publication at IEEE Big Data 2019 - SABID Workshop

    Journal ref: 2019 IEEE International Conference on Big Data (Big Data)

  15. arXiv:1809.06532  [pdf, other

    cs.DL

    Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data

    Authors: Tobias Kuhn, Albert Meroño-Peñuela, Alexander Malic, Jorrit H. Poelen, Allen H. Hurlbert, Emilio Centeno Ortiz, Laura I. Furlong, Núria Queralt-Rosinach, Christine Chichester, Juan M. Banda, Egon Willighagen, Friederike Ehrhart, Chris Evelo, Tareq B. Malas, Michel Dumontier

    Abstract: Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this atomic level. While the nanopublications format i… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Journal ref: In Proceedings of IEEE eScience 2018

  16. arXiv:1507.05408  [pdf, other

    cs.CY

    Provenance-Centered Dataset of Drug-Drug Interactions

    Authors: Juan M. Banda, Tobias Kuhn, Nigam H. Shah, Michel Dumontier

    Abstract: Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this pa… ▽ More

    Submitted 20 July, 2015; originally announced July 2015.

    Comments: In Proceedings of the 14th International Semantic Web Conference (ISWC) 2015