Skip to main content

Showing 1–20 of 20 results for author: Nourbakhsh, A

.
  1. arXiv:2405.01769  [pdf, other

    cs.CL

    A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

    Authors: Zhiyu Zoey Chen, **g Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang

    Abstract: In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 35 pages, 6 figures

  2. arXiv:2404.04003  [pdf, other

    cs.CL

    BuDDIE: A Business Document Dataset for Multi-task Information Extraction

    Authors: Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

    Abstract: The field of visually rich document understanding (VRDU) aims to solve a multitude of well-researched NLP tasks in a multi-modal domain. Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia. These datasets cover documents like invoices and receipts with sparse ann… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  3. arXiv:2403.09260  [pdf, other

    cs.SI physics.soc-ph

    Belief and Persuasion in Scientific Discourse on Social Media: A Study of the COVID-19 Pandemic

    Authors: Salwa Alamir, Armineh Nourbakhsh, Cecilia Tilli, Sameena Shah, Manuela Veloso

    Abstract: Research into COVID-19 has been rapidly evolving since the onset of the pandemic. This occasionally results in contradictory recommendations by credible sources of scientific opinion, public health authorities, and medical professionals. In this study, we examine whether this has resulted in a lack of trust in scientific opinion, by examining the belief patterns of social media users and their rea… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  4. arXiv:2402.05282  [pdf, other

    cs.CL

    TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

    Authors: Ran Zmigrod, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

    Abstract: Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and descr… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  5. arXiv:2401.02823  [pdf, other

    cs.CL cs.IR

    DocGraphLM: Documental Graph Language Model for Information Extraction

    Authors: Dongsheng Wang, Zhiqiang Ma, Armineh Nourbakhsh, Kang Gu, Sameena Shah

    Abstract: Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs, and Graph Neural Networks. In this paper, we introduce DocGraphLM, a novel framework that combines pre-trained language models with graph semantics. To achieve t… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Published at SIGIR'23 (repost for easier access)

  6. arXiv:2401.00942  [pdf, other

    cs.CE

    The Influence of Biomedical Research on Future Business Funding: Analyzing Scientific Impact and Content in Industrial Investments

    Authors: Reza Khanmohammadi, Simerjot Kaur, Charese H. Smiley, Tuka Alhanai, Ivan Brugere, Armineh Nourbakhsh, Mohammad M. Ghassemi

    Abstract: This paper investigates the relationship between scientific innovation in biomedical sciences and its impact on industrial activities, focusing on how the historical impact and content of scientific papers influenced future funding and innovation grant application content for small businesses. The research incorporates bibliometric analyses along with SBIR (Small Business Innovation Research) data… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  7. arXiv:2401.00908  [pdf, other

    cs.CL

    DocLLM: A layout-aware generative language model for multimodal document understanding

    Authors: Dongsheng Wang, Natraj Raman, Mathieu Sibue, Zhiqiang Ma, Petr Babkin, Simerjot Kaur, Yulong Pei, Armineh Nourbakhsh, Xiaomo Liu

    Abstract: Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: 16 pages, 4 figures

  8. The Dark Side of the Language: Pre-trained Transformers in the DarkNet

    Authors: Leonardo Ranaldi, Aria Nourbakhsh, Arianna Patrizi, Elena Sofia Ruzzetti, Dario Onorati, Francesca Fallucchi, Fabio Massimo Zanzotto

    Abstract: Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that synta… ▽ More

    Submitted 17 November, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

    Report number: 2023.ranlp-1.102

    Journal ref: 2023.ranlp-1.102

  9. arXiv:2201.02823  [pdf, other

    quant-ph

    Quantum Computing: Fundamentals, Trends and Perspectives for Chemical and Biochemical Engineers

    Authors: Amirhossein Nourbakhsh, Mark Nicholas Jones, Kaur Kristjuhan, Deborah Carberry, Jay Karon, Christian Beenfeldt, Kyarash Shahriari, Martin P. Andersson, Mojgan A. Jadidi, Seyed Soheil Mansouri

    Abstract: We use the benefits and components of classical computers every day. However, there are many types of problems which, as they grow in size, their computational complexity grows larger than classical computers will ever be able to solve. Quantum computing (QC) is a computation model that uses quantum physical properties to solve such problems. QC is at the early stage of large-scale adoption in var… ▽ More

    Submitted 8 January, 2022; originally announced January 2022.

  10. arXiv:2111.01911  [pdf, other

    cs.IR cs.AI q-fin.CP

    Parameterized Explanations for Investor / Company Matching

    Authors: Simerjot Kaur, Ivan Brugere, Andrea Stefanucci, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso

    Abstract: Matching companies and investors is usually considered a highly specialized decision making process. Building an AI agent that can automate such recommendation process can significantly help reduce costs, and eliminate human biases and errors. However, limited sample size of financial data-sets and the need for not only good recommendations, but also explaining why a particular recommendation is b… ▽ More

    Submitted 27 October, 2021; originally announced November 2021.

    Comments: 8 pages, 7 figures, 4 tables, 2 algorithms

  11. arXiv:2109.09103  [pdf, other

    cs.AI

    A Framework for Institutional Risk Identification using Knowledge Graphs and Automated News Profiling

    Authors: Mahmoud Mahfouz, Armineh Nourbakhsh, Sameena Shah

    Abstract: Organizations around the world face an array of risks impacting their operations globally. It is imperative to have a robust risk identification process to detect and evaluate the impact of potential risks before they materialize. Given the nature of the task and the current requirements of deep subject matter expertise, most organizations utilize a heavily manual process. In our work, we develop… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

  12. arXiv:2010.12681  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Robust Document Representations using Latent Topics and Metadata

    Authors: Natraj Raman, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso

    Abstract: Task specific fine-tuning of a pre-trained neural language model using a custom softmax output layer is the de facto approach of late when dealing with document classification problems. This technique is not adequate when labeled examples are not available at training time and when the metadata artifacts in a document must be exploited. We address these challenges by generating document representa… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: 9 pages, 7 figures

    ACM Class: I.2.7; I.7.0

  13. arXiv:2010.01169  [pdf, other

    cs.CL

    DocuBot : Generating financial reports using natural language interactions

    Authors: Vineeth Ravi, Selim Amrouni, Andrea Stefanucci, Armineh Nourbakhsh, Prashant Reddy, Manuela Veloso

    Abstract: The financial services industry perpetually processes an overwhelming amount of complex data. Digital reports are often created based on tedious manual analysis as well as visualization of the underlying trends and characteristics of data. Often, the accruing costs of human computation errors in creating these reports are very high. We present DocuBot, a novel AI-powered virtual assistant for crea… ▽ More

    Submitted 1 February, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted at :- AAAI 2021 Workshop on Content Authoring and Design (CAD21) and NeurIPS 2019 Workshop on Robust AI in Financial Services: Data, Fairness, Explainability, Trustworthiness, and Privacy

  14. arXiv:2006.16517  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Impact of $Al_2O_3$ Passivation on the Photovoltaic Performance of Vertical $WSe_2$ Schottky Junction Solar Cells

    Authors: Elaine McVay, Ahmad Zubair, Yuxuan Lin, Amirhasan Nourbakhsh, Tomás Palacios

    Abstract: Transition metal dichalcogenide (TMD) materials have emerged as promising candidates for thin film solar cells due to their wide bandgap range across the visible wavelengths, high absorption coefficient and ease of integration with both arbitrary substrates as well as conventional semiconductor technologies. However, reported TMD-based solar cells suffer from relatively low external quantum effici… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  15. arXiv:2005.12966  [pdf, other

    cs.IR cs.LG

    SPot: A tool for identifying operating segments in financial tables

    Authors: Zhiqiang Ma, Steven Pomerville, Mingyang Di, Armineh Nourbakhsh

    Abstract: In this paper we present SPot, an automated tool for detecting operating segments and their related performance indicators from earnings reports. Due to their company-specific nature, operating segments cannot be detected using taxonomy-based approaches. Instead, we train a Bidirectional RNN classifier that can distinguish between common metrics such as "revenue" and company-specific metrics that… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

    Comments: This manuscript has been reviewed and accepted by SIGIR 2020

  16. arXiv:1908.09921  [pdf, ps, other

    cs.CL cs.AI

    Toward Dialogue Modeling: A Semantic Annotation Scheme for Questions and Answers

    Authors: Maria-Andrea Cruz-Blandón, Gosse Minnema, Aria Nourbakhsh, Maria Boritchev, Maxime Amblard

    Abstract: The present study proposes an annotation scheme for classifying the content and discourse contribution of question-answer pairs. We propose detailed guidelines for using the scheme and apply them to dialogues in English, Spanish, and Dutch. Finally, we report on initial machine learning experiments for automatic annotation.

    Submitted 23 August, 2019; originally announced August 2019.

    Journal ref: LAW XIII 2019 - Linguistic Annotation Workshop - ACL Workshop, Jul 2019, Florence, Italy

  17. arXiv:1908.09156  [pdf, other

    cs.CL cs.AI

    A framework for anomaly detection using language modeling, and its applications to finance

    Authors: Armineh Nourbakhsh, Grace Bang

    Abstract: In the finance sector, studies focused on anomaly detection are often associated with time-series and transactional data analytics. In this paper, we lay out the opportunities for applying anomaly and deviation detection methods to text corpora and challenges associated with them. We argue that language models that use distributional semantics can play a significant role in advancing these studies… ▽ More

    Submitted 24 August, 2019; originally announced August 2019.

    Comments: 5 pages, 2 figures, presented at the 2nd KDD Workshop on Anomaly Detection in Finance, 2019

  18. arXiv:1711.04068  [pdf, other

    cs.SI

    Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data

    Authors: Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Sameena Shah, Robert Martin, John Duprey

    Abstract: To deal with the sheer volume of information and gain competitive advantage, the news industry has started to explore and invest in news automation. In this paper, we present Reuters Tracer, a system that automates end-to-end news production using Twitter data. It is capable of detecting, classifying, annotating, and disseminating news in real time for Reuters journalists without manual interventi… ▽ More

    Submitted 10 November, 2017; originally announced November 2017.

    Comments: Accepted by IEEE Big Data 2017

  19. arXiv:1709.02510  [pdf, other

    cs.SI physics.soc-ph

    "Breaking" Disasters: Predicting and Characterizing the Global News Value of Natural and Man-made Disasters

    Authors: Armineh Nourbakhsh, Quanzhi Li, Xiaomo Liu, Sameena Shah

    Abstract: Due to their often unexpected nature, natural and man-made disasters are difficult to monitor and detect for journalists and disaster management response teams. Journalists are increasingly relying on signals from social media to detect such stories in their early stage of development. Twitter, which features a vast network of local news outlets, is a major source of early signal for disaster dete… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: Accepted by KDD 2017 Data Science + Journalism workshop

  20. arXiv:1708.03994  [pdf

    cs.CL cs.SI

    Data Sets: Word Embeddings Learned from Tweets and General Data

    Authors: Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh

    Abstract: A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.