Skip to main content

Showing 1–50 of 51 results for author: Glass, M

.
  1. arXiv:2308.13560  [pdf, ps, other

    cs.DB

    Open Government Data Corpus for Table Search

    Authors: Michael Glass, Sugato Bagchi, Oktie Hassanzadeh, Gaetano Rossiello, Alfio Gliozzo

    Abstract: Increasing amounts of structured data can provide value for research and business if the relevant data can be located. Often the data is in a data lake without a consistent schema, making locating useful data challenging. Table search is a growing research area, but existing benchmarks have been limited to displayed tables. Tables sized and formatted for display in a Wikipedia page or ArXiv paper… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  2. arXiv:2306.11843  [pdf, other

    cs.CL cs.AI cs.DB cs.IR

    Retrieval-Based Transformer for Table Augmentation

    Authors: Michael Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, Alfio Gliozzo

    Abstract: Data preparation, also called data wrangling, is considered one of the most expensive and time-consuming steps when performing analytics or building machine learning models. Preparing data typically involves collecting and merging data from complex heterogeneous, and often large-scale data sources, such as data lakes. In this paper, we introduce a novel approach toward automatic data wrangling in… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Findings of ACL 2023

  3. arXiv:2210.05572  [pdf, other

    cs.LG cs.AI cs.IR

    Knowledge-Driven New Drug Recommendation

    Authors: Zhenbang Wu, Huaxiu Yao, Zhe Su, David M Liebovitz, Lucas M Glass, James Zou, Chelsea Finn, Jimeng Sun

    Abstract: Drug recommendation assists doctors in prescribing personalized medications to patients based on their health conditions. Existing drug recommendation solutions adopt the supervised multi-label classification setup and only work with existing drugs with sufficient prescription data from many patients. However, newly approved drugs do not have much historical prescription data and cannot leverage e… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  4. arXiv:2209.09023  [pdf, other

    q-bio.QM cs.AI cs.LG

    Artificial Intelligence for In Silico Clinical Trials: A Review

    Authors: Zifeng Wang, Chufan Gao, Lucas M. Glass, Jimeng Sun

    Abstract: A clinical trial is an essential step in drug development, which is often costly and time-consuming. In silico trials are clinical trials conducted digitally through simulation and modeling as an alternative to traditional clinical trials. AI-enabled in silico trials can increase the case group size by creating virtual cohorts as controls. In addition, it also enables automation and optimization o… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

  5. arXiv:2207.06300  [pdf, other

    cs.CL cs.AI cs.IR

    Re2G: Retrieve, Rerank, Generate

    Authors: Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, Alfio Gliozzo

    Abstract: As demonstrated by GPT-3 and T5, transformers grow in capability as parameter spaces become larger and larger. However, for tasks that require a large amount of knowledge, non-parametric memory allows models to grow dramatically with a sub-linear increase in computational cost and GPU memory requirements. Recent models such as RAG and REALM have introduced retrieval into conditional generation. Th… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Accepted at NAACL 2022

  6. arXiv:2204.03985  [pdf, other

    cs.CL cs.AI cs.LG

    KGI: An Integrated Framework for Knowledge Intensive Language Tasks

    Authors: Md Faisal Mahbub Chowdhury, Michael Glass, Gaetano Rossiello, Alfio Gliozzo, Nandana Mihindukulasooriya

    Abstract: In this paper, we present a system to showcase the capabilities of the latest state-of-the-art retrieval augmented generation models trained on knowledge-intensive language tasks, such as slot filling, open domain question answering, dialogue, and fact-checking. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine the outputs of each oth… ▽ More

    Submitted 21 September, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: EMNLP 2022 Demo Track

  7. arXiv:2203.16714  [pdf, other

    cs.CL

    End-to-End Table Question Answering via Retrieval-Augmented Generation

    Authors: Feifei Pan, Mustafa Canim, Michael Glass, Alfio Gliozzo, James Hendler

    Abstract: Most existing end-to-end Table Question Answering (Table QA) models consist of a two-stage framework with a retriever to select relevant table candidates from a corpus and a reader to locate the correct answers from table candidates. Even though the accuracy of the reader models is significantly improved with the recent transformer-based approaches, the overall performance of such frameworks still… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  8. arXiv:2203.02446  [pdf, other

    cs.AI cs.LG

    AutoMap: Automatic Medical Code Map** for Clinical Prediction Model Deployment

    Authors: Zhenbang Wu, Cao Xiao, Lucas M Glass, David M Liebovitz, Jimeng Sun

    Abstract: Given a deep learning model trained on data from a source site, how to deploy the model to a target hospital automatically? How to accommodate heterogeneous medical coding systems across different hospitals? Standard approaches rely on existing medical code map** tools, which have significant practical limitations. To tackle this problem, we propose AutoMap to automatically map the medical cod… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

  9. Multi-Objective Design Space Exploration for the Optimization of the HEVC Mode Decision Process

    Authors: Christian Herglotz, Rafael Rosales, Michael Glass, Jürgen Teich, André Kaup

    Abstract: Finding the best possible encoding decisions for compressing a video sequence is a highly complex problem. In this work, we propose a multi-objective Design Space Exploration (DSE) method to automatically find HEVC encoder implementations that are optimized for several different criteria. The DSE shall optimize the coding mode evaluation order of the mode decision process and jointly explore early… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: 5 pages, 4 figures, 2016 Picture Coding Symposium (PCS)

  10. PopNet: Real-Time Population-Level Disease Prediction with Data Latency

    Authors: Junyi Gao, Cao Xiao, Lucas M. Glass, Jimeng Sun

    Abstract: Population-level disease prediction estimates the number of potential patients of particular diseases in some location at a future time based on (frequently updated) historical disease statistics. Existing approaches often assume the existing disease statistics are reliable and will not change. However, in practice, data collection is often time-consuming and has time delays, with both historical… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  11. arXiv:2201.05302  [pdf, other

    cs.CL cs.AI

    Applying a Generic Sequence-to-Sequence Model for Simple and Effective Keyphrase Generation

    Authors: Md Faisal Mahbub Chowdhury, Gaetano Rossiello, Michael Glass, Nandana Mihindukulasooriya, Alfio Gliozzo

    Abstract: In recent years, a number of keyphrase generation (KPG) approaches were proposed consisting of complex model architectures, dedicated training paradigms and decoding strategies. In this work, we opt for simplicity and show how a commonly used seq2seq language model, BART, can be easily adapted to generate keyphrases from the text in a single batch computation using a simple training procedure. Emp… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

  12. arXiv:2108.13934  [pdf, other

    cs.CL cs.AI cs.IR

    Robust Retrieval Augmented Generation for Zero-shot Slot Filling

    Authors: Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Alfio Gliozzo

    Abstract: Automatically inducing high quality knowledge graphs from a given collection of documents still remains a challenging problem in AI. One way to make headway for this problem is through advancements in a related task known as slot filling. In this task, given an entity query in form of [Entity, Slot, ?], a system is asked to fill the slot by generating or extracting the missing value exploiting evi… ▽ More

    Submitted 13 September, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: Accepted at EMNLP 2021. arXiv admin note: substantial text overlap with arXiv:2104.08610

  13. arXiv:2106.12944  [pdf, other

    cs.CL cs.AI

    AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry

    Authors: Yannis Katsis, Saneem Chemmengath, Vishwajeet Kumar, Samarth Bharadwaj, Mustafa Canim, Michael Glass, Alfio Gliozzo, Feifei Pan, Jaydeep Sen, Karthik Sankaranarayanan, Soumen Chakrabarti

    Abstract: Recent advances in transformers have enabled Table Question Answering (Table QA) systems to achieve high accuracy and SOTA results on open domain datasets like WikiTableQuestions and WikiSQL. Such transformers are frequently pre-trained on open-domain content such as Wikipedia, where they effectively encode questions and corresponding tables from Wikipedia as seen in Table QA dataset. However, web… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  14. arXiv:2106.04441  [pdf, other

    cs.CL

    CLTR: An End-to-End, Transformer-Based System for Cell Level Table Retrieval and Table Question Answering

    Authors: Feifei Pan, Mustafa Canim, Michael Glass, Alfio Gliozzo, Peter Fox

    Abstract: We present the first end-to-end, transformer-based table question answering (QA) system that takes natural language questions and massive table corpus as inputs to retrieve the most relevant tables and locate the correct table cells to answer the question. Our system, CLTR, extends the current state-of-the-art QA over tables model to build an end-to-end table QA architecture. This system has succe… ▽ More

    Submitted 9 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  15. arXiv:2105.01171  [pdf, other

    cs.LG q-bio.GN q-bio.QM

    Machine Learning Applications for Therapeutic Tasks with Genomics Data

    Authors: Kexin Huang, Cao Xiao, Lucas M. Glass, Cathy W. Critchlow, Greg Gibson, Jimeng Sun

    Abstract: Thanks to the increasing availability of genomics and other biomedical data, many machine learning approaches have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electron… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

  16. arXiv:2104.08610  [pdf, other

    cs.AI cs.CL

    Zero-shot Slot Filling with DPR and RAG

    Authors: Michael Glass, Gaetano Rossiello, Alfio Gliozzo

    Abstract: The ability to automatically extract Knowledge Graphs (KG) from a given collection of documents is a long-standing problem in Artificial Intelligence. One way to assess this capability is through the task of slot filling. Given an entity query in form of [Entity, Slot, ?], a system is asked to `fill' the slot by generating or extracting the missing value from a relevant passage or passages. This c… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

  17. arXiv:2104.08303  [pdf, other

    cs.AI cs.CL

    Capturing Row and Column Semantics in Transformer Based Question Answering over Tables

    Authors: Michael Glass, Mustafa Canim, Alfio Gliozzo, Saneem Chemmengath, Vishwajeet Kumar, Rishav Chakravarti, Avi Sil, Feifei Pan, Samarth Bharadwaj, Nicolas Rodolfo Fauceglia

    Abstract: Transformer based architectures are recently used for the task of answering questions over tables. In order to improve the accuracy on this task, specialized pre-training techniques have been developed and applied on millions of open-domain web tables. In this paper, we propose two novel approaches demonstrating that one can achieve superior performance on table QA task without even using any of t… ▽ More

    Submitted 26 April, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: To appear at NAACL 2021

  18. arXiv:2102.04252  [pdf, other

    cs.CY cs.AI cs.LG

    HINT: Hierarchical Interaction Network for Trial Outcome Prediction Leveraging Web Data

    Authors: Tianfan Fu, Kexin Huang, Cao Xiao, Lucas M. Glass, Jimeng Sun

    Abstract: Clinical trials are crucial for drug development but are time consuming, expensive, and often burdensome on patients. More importantly, clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment. If we were better at predicting the results of clinical trials, we could avoid having to run trials that will inevitably fail more resources could be… ▽ More

    Submitted 12 March, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

  19. arXiv:2012.04747  [pdf, other

    cs.LG q-bio.PE

    STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological Regularization

    Authors: Nikos Kargas, Cheng Qian, Nicholas D. Sidiropoulos, Cao Xiao, Lucas M. Glass, Jimeng Sun

    Abstract: Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent… ▽ More

    Submitted 17 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: AAAI 2021

  20. arXiv:2010.16039  [pdf

    eess.IV cs.CV cs.LG

    FLANNEL: Focal Loss Based Neural Network Ensemble for COVID-19 Detection

    Authors: Zhi Qiao, Austin Bae, Lucas M. Glass, Cao Xiao, Jimeng Sun

    Abstract: To test the possibility of differentiating chest x-ray images of COVID-19 against other pneumonia and healthy patients using deep neural networks. We construct the X-ray imaging data from two publicly available sources, which include 5508 chest x-ray images across 2874 patients with four classes: normal, bacterial pneumonia, non-COVID-19 viral pneumonia, and COVID-19. To identify COVID-19, we prop… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  21. arXiv:2010.11389  [pdf, other

    cs.LG

    UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data

    Authors: Chacha Chen, Junjie Liang, Fenglong Ma, Lucas M. Glass, Jimeng Sun, Cao Xiao

    Abstract: Successful health risk prediction demands accuracy and reliability of the model. Existing predictive models mainly depend on mining electronic health records (EHR) with advanced deep learning techniques to improve model accuracy. However, they all ignore the importance of publicly available online health data, especially socioeconomic status, environmental factors, and detailed demographic informa… ▽ More

    Submitted 25 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

  22. arXiv:2010.03951  [pdf, other

    q-bio.QM cs.HC cs.LG

    MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning

    Authors: Kexin Huang, Tianfan Fu, Dawood Khan, Ali Abid, Ali Abdalla, Abubakar Abid, Lucas M. Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun

    Abstract: The efficacy of a drug depends on its binding affinity to the therapeutic target and pharmacokinetics. Deep learning (DL) has demonstrated remarkable progress in predicting drug efficacy. We develop MolDesigner, a human-in-the-loop web user-interface (UI), to assist drug developers leverage DL predictions to design more effective drugs. A developer can draw a drug molecule in the interface. In the… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020 Demonstration Track

  23. arXiv:2010.02318  [pdf, other

    cs.LG cs.AI

    MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

    Authors: Tianfan Fu, Cao Xiao, Xinhao Li, Lucas M. Glass, Jimeng Sun

    Abstract: Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address suc… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted by AAAI 2021

  24. arXiv:2010.01450  [pdf, other

    cs.LG cs.CL cs.IR q-bio.QM

    SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization

    Authors: Yue Yu, Kexin Huang, Chao Zhang, Lucas M. Glass, Jimeng Sun, Cao Xiao

    Abstract: Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less benefic… ▽ More

    Submitted 6 May, 2021; v1 submitted 3 October, 2020; originally announced October 2020.

    Comments: Published in Bioinformatics 2021

  25. arXiv:2008.04215  [pdf

    cs.SI physics.soc-ph q-bio.PE

    STAN: Spatio-Temporal Attention Network for Pandemic Prediction Using Real World Evidence

    Authors: Junyi Gao, Rakshith Sharma, Cheng Qian, Lucas M. Glass, Jeffrey Spaeder, Justin Romberg, Jimeng Sun, Cao Xiao

    Abstract: Objective: The COVID-19 pandemic has created many challenges that need immediate attention. Various epidemiological and deep learning models have been developed to predict the COVID-19 outbreak, but all have limitations that affect the accuracy and robustness of the predictions. Our method aims at addressing these limitations and making earlier and more accurate pandemic outbreak predictions by (1… ▽ More

    Submitted 7 December, 2020; v1 submitted 23 July, 2020; originally announced August 2020.

  26. arXiv:2006.08765  [pdf, other

    cs.LG cs.AI

    COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching

    Authors: Junyi Gao, Cao Xiao, Lucas M. Glass, Jimeng Sun

    Abstract: Clinical trials play important roles in drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The availability of massive electronic health records (EHR) data and trial eligibility criteria (EC) bring a new opportunity to data driven patient recruitment. One key task named patient-trial matching is to find qualified patients for clinical trials given st… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted by KDD'20

  27. arXiv:2002.11701  [pdf, other

    cs.LG cs.CV cs.HC stat.ML

    CLARA: Clinical Report Auto-completion

    Authors: Siddharth Biswal, Cao Xiao, Lucas M. Glass, M. Brandon Westover, Jimeng Sun

    Abstract: Generating clinical reports from raw recordings such as X-rays and electroencephalogram (EEG) is an essential and routine task for doctors. However, it is often time-consuming to write accurate and detailed reports. Most existing methods try to generate the whole reports from the raw input with limited success because 1) generated reports often contain errors that need manual review and correction… ▽ More

    Submitted 4 March, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  28. arXiv:2001.10054  [pdf, other

    cs.LG cs.AI stat.ML

    StageNet: Stage-Aware Neural Networks for Health Risk Prediction

    Authors: Junyi Gao, Cao Xiao, Yasha Wang, Wen Tang, Lucas M. Glass, Jimeng Sun

    Abstract: Deep learning has demonstrated success in health risk prediction especially for patients with chronic and progressing conditions. Most existing works focus on learning disease Network (StageNet) model to extract disease stage information from patient data and integrate it into risk prediction. StageNet is enabled by (1) a stage-aware long short-term memory (LSTM) module that extracts health stage… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.

  29. arXiv:2001.08179  [pdf, other

    cs.AI cs.LG

    DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction

    Authors: Xingyao Zhang, Cao Xiao, Lucas M. Glass, Jimeng Sun

    Abstract: Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The core problem of patient-trial matching is to find qualified patients for a trial, where patient information is stored in electronic health records (EHR) while trial eligibility criteria (EC) are described in text documents available on the web. How to represent l… ▽ More

    Submitted 22 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

    Comments: accepted by The World Wide Web Conference 2020

  30. arXiv:2001.00076  [pdf, other

    cs.LG cs.DS stat.ML

    Scalable Hierarchical Clustering with Tree Grafting

    Authors: Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael Glass, Andrew McCallum

    Abstract: We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notio… ▽ More

    Submitted 31 December, 2019; originally announced January 2020.

    Comments: 23 pages (appendix included), published at KDD 2019

  31. arXiv:1911.13232  [pdf, other

    cs.LG cs.CL

    CONAN: Complementary Pattern Augmentation for Rare Disease Detection

    Authors: Limeng Cui, Siddharth Biswal, Lucas M. Glass, Greg Lever, Jimeng Sun, Cao Xiao

    Abstract: Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  32. arXiv:1911.10395  [pdf, other

    cs.LG cs.CY stat.ML

    Doctor2Vec: Dynamic Doctor Representation Learning for Clinical Trial Recruitment

    Authors: Siddharth Biswal, Cao Xiao, Lucas M. Glass, Elizabeth Milkovits, Jimeng Sun

    Abstract: Massive electronic health records (EHRs) enable the success of learning accurate patient representations to support various predictive health applications. In contrast, doctor representation was not well studied despite that doctors play pivotal roles in healthcare. How to construct the right doctor representations? How to use doctor representation to solve important health analytic problems? In t… ▽ More

    Submitted 23 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020

  33. arXiv:1911.06446  [pdf, other

    cs.LG q-bio.QM stat.ML

    CASTER: Predicting Drug Interactions with Chemical Substructure Representation

    Authors: Kexin Huang, Cao Xiao, Trong Nghia Hoang, Lucas M. Glass, Jimeng Sun

    Abstract: Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on li… ▽ More

    Submitted 19 November, 2019; v1 submitted 14 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020

  34. arXiv:1909.05286  [pdf, other

    cs.CL

    Frustratingly Easy Natural Question Answering

    Authors: Lin Pan, Rishav Chakravarti, Anthony Ferritto, Michael Glass, Alfio Gliozzo, Salim Roukos, Radu Florian, Avirup Sil

    Abstract: Existing literature on Question Answering (QA) mostly focuses on algorithmic novelty, data augmentation, or increasingly large pre-trained language models like XLNet and RoBERTa. Additionally, a lot of systems on the QA leaderboards do not have associated research documentation in order to successfully replicate their experiments. In this paper, we outline these algorithmic components such as Atte… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

  35. arXiv:1909.04120  [pdf, other

    cs.CL cs.AI cs.LG

    Span Selection Pre-training for Question Answering

    Authors: Michael Glass, Alfio Gliozzo, Rishav Chakravarti, Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avirup Sil

    Abstract: BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better… ▽ More

    Submitted 18 June, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted at ACL2020

  36. arXiv:1908.08104  [pdf, other

    cs.CL

    Populating Web Scale Knowledge Graphs using Distantly Supervised Relation Extraction and Validation

    Authors: Sarthak Dash, Michael R. Glass, Alfio Gliozzo, Mustafa Canim

    Abstract: In this paper, we propose a fully automated system to extend knowledge graphs using external information from web-scale corpora. The designed system leverages a deep learning based technology for relation extraction that can be trained by a distantly supervised approach. In addition to that, the system uses a deep learning approach for knowledge base completion by utilizing the global structure in… ▽ More

    Submitted 10 September, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: 11 pages, 6 figures

  37. arXiv:1908.07630  [pdf, other

    cs.LG cs.AI cs.CV

    P2L: Predicting Transfer Learning for Images and Semantic Relations

    Authors: Bishwaranjan Bhattacharjee, John R. Kender, Matthew Hill, Parijat Dube, Siyu Huo, Michael R. Glass, Brian Belgodere, Sharath Pankanti, Noel Codella, Patrick Watson

    Abstract: Transfer learning enhances learning across tasks, by leveraging previously learned representations -- if they are properly chosen. We describe an efficient method to accurately estimate the appropriateness of a previously trained model for use in a new learning task. We use this measure, which we call "Predict To Learn" ("P2L"), in the two very different domains of images and semantic relations, w… ▽ More

    Submitted 15 October, 2020; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: 10 pages, 8 figures, 4 tables

  38. CFO: A Framework for Building Production NLP Systems

    Authors: Rishav Chakravarti, Cezar Pendus, Andrzej Sakrajda, Anthony Ferritto, Lin Pan, Michael Glass, Vittorio Castelli, J. William Murdock, Radu Florian, Salim Roukos, Avirup Sil

    Abstract: This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Readi… ▽ More

    Submitted 19 June, 2020; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: http://ibm.biz/cfo_framework

    Report number: D19-3006

    Journal ref: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

  39. arXiv:1906.07100  [pdf, other

    physics.optics physics.ins-det

    A hierarchical approach for modelling X-ray beamlines. Application to a coherent beamline

    Authors: Manuel Sanchez del Rio, Rafael Celestre, Mark Glass, Giovanni Pirro, Juan Reyes-Herrera, Ray Barrett, Julio Cesar da Silva, Peter Cloetens, Xianbo Shi, Luca Rebuffi

    Abstract: We consider different approaches to simulate a modern X-ray beamline. Several methodologies with increasing complexity are applied to discuss the relevant parameters that quantify the beamline performance. Parameters such as flux, dimensions and intensity distribution of the focused beam and coherence properties are obtained from simple analytical calculations to sophisticated computer simulations… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

  40. arXiv:1711.05932  [pdf, other

    cs.DC cs.MA cs.NI eess.SY

    A Design-Time/Run-Time Application Map** Methodology for Predictable Execution Time in MPSoCs

    Authors: Andreas Weichslgartner, Stefan Wildermann, Deepak Gangadharan, Michael Glaß, Jürgen Teich

    Abstract: Executing multiple applications on a single MPSoC brings the major challenge of satisfying multiple quality requirements regarding real-time, energy, etc. Hybrid application map** denotes the combination of design-time analysis with run-time application map**. In this article, we present such a methodology, which comprises a design space exploration coupled with a formal performance analysis.… ▽ More

    Submitted 16 November, 2017; originally announced November 2017.

  41. arXiv:1709.08074  [pdf, other

    cs.CL

    Language Independent Acquisition of Abbreviations

    Authors: Michael R. Glass, Md Faisal Mahbub Chowdhury, Alfio M. Gliozzo

    Abstract: This paper addresses automatic extraction of abbreviations (encompassing acronyms and initialisms) and corresponding long-form expansions from plain unstructured text. We create and are going to release a multilingual resource for abbreviations and their corresponding expansions, built automatically by exploiting Wikipedia redirect and disambiguation pages, that can be used as a benchmark for eval… ▽ More

    Submitted 23 September, 2017; originally announced September 2017.

    Comments: 9 pages, 7 figues, 2 tables

  42. Coherent modes of X-ray beams emitted by undulators in new storage rings

    Authors: Mark Glass, Manuel Sanchez del Rio

    Abstract: Synchrotron radiation emitted by electrons passing through an undulator placed in a storage ring is decomposed in coherent modes. The case of ultimate storage rings where the electron emittance is comparable to the emittance of the photon fan is analyzed by means of the cross spectral density and the coherent mode spectrum. The proposed method permits naturally the statistical analysis and propaga… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: 5 pages, 6 Figures, 17 references

    MSC Class: 78A10; 78A40

    Journal ref: EPL, 119 3 (2017) 34004

  43. arXiv:1701.07235  [pdf, ps, other

    math.GR

    Recognizing the real line

    Authors: A. M. W. Glass, John S. Wilson

    Abstract: Let $(Ω, \leq)$ be a totally ordered set. We prove that if Aut$(Ω,\leq)$ is transitive and satisfies the same first-order sentences as the automorphism group of the real line (in the language of groups) then $Ω$ and and the real line are isomorphic ordered sets. This improvement of a theorem of Gurevich and Holland is obtained as a consequence of a study of centralizers associated with certain tra… ▽ More

    Submitted 25 January, 2017; originally announced January 2017.

    Comments: 13 pages. arXiv admin note: substantial text overlap with arXiv:1606.00312

    MSC Class: 20B07; 06F15

  44. arXiv:1606.00312  [pdf, ps, other

    math.GR

    The first-order theory of $\ell$-permutation groups

    Authors: A. M. W. Glass, John S. Wilson

    Abstract: Let $(Ω, \leq)$ be a totally ordered set. We prove that if $\Aut(Ω,\leq)$ is transitive and satisfies the same first-order sentences as $\Aut(\RR,\leq)$ (in the language of lattice-ordered groups) then $Ω$ and $\RR$ are isomorphic ordered sets. This improvement of a theorem of Gurevich and Holland is obtained as one of many consequences of a study of centralizers and coloured chains associated wit… ▽ More

    Submitted 1 June, 2016; originally announced June 2016.

    Comments: 23 pages, 0 figures

  45. arXiv:1508.01585  [pdf, ps, other

    cs.CL cs.LG

    Applying Deep Learning to Answer Selection: A Study and An Open Task

    Authors: Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou

    Abstract: We apply a general deep learning framework to address the non-factoid question answering task. Our approach does not rely on any linguistic tools and can be applied to different languages or domains. Various architectures are presented and compared. We create and release a QA corpus and setup a new QA task in the insurance domain. Experimental results demonstrate superior performance compared to t… ▽ More

    Submitted 2 October, 2015; v1 submitted 6 August, 2015; originally announced August 2015.

    Comments: To appear in the proceedings of ASRU 2015

  46. arXiv:1405.2914  [pdf

    cs.DC

    Towards Cross-layer Reliability Analysis of Transient and Permanent Faults

    Authors: Hananeh Aliee, Liang Chen, Mojtaba Ebrahimi, Michael Glaß, Faramarz Khosravi, Mehdi B. Tahoori

    Abstract: Due to the increasing complexity of Multi-Processor Systems on Chip (MPSoCs), system-level design methodologies have got a lot of attention in recent years. However, the significant gap between the system-level reliability analysis and the level where the actual faults occur necessitates a cross-layer approach in which the sufficient data about the effects of faults at low levels are passed to the… ▽ More

    Submitted 12 May, 2014; originally announced May 2014.

    Comments: Presented at 1st Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014) (arXiv:1405.2281)

    Report number: Racing/2014/08

  47. Residual nilpotence and ordering in one-relator groups and knot groups

    Authors: I. M. Chiswell, A. M. W. Glass, John S. Wilson

    Abstract: Let $G=< x,t\mid w>$ be a one-relator group, where $w$ is a word in $x,t$. If $w$ is a product of conjugates of $x$ then, associated with $w$, there is a polynomial $A_w(X)$ over the integers, which in the case when $G$ is a knot group, is the Alexander polynomial of the knot. We prove, subject to certain restrictions on $w$, that if all roots of $A_w(X)$ are real and positive then $G$ is bi-order… ▽ More

    Submitted 2 November, 2014; v1 submitted 5 May, 2014; originally announced May 2014.

    Comments: Minor changes, references added

    Journal ref: Math. Proc. Camb. Phil. Soc. 158 (2015) 275-288

  48. A finitely presented orderable group with insoluble word problem

    Authors: V. V. Bludov, A. M. W. Glass

    Abstract: We construct a finitely presented (two-sided) totally orderable group with insoluble word problem.

    Submitted 3 August, 2010; originally announced August 2010.

    Comments: 17 pages

  49. arXiv:0908.1584  [pdf

    cs.HC cs.SE

    Reducing the Risk of Spreadsheet Usage - a Case Study

    Authors: Mel Glass, David Ford, Sebastian Dewhurst

    Abstract: The frequency with which spreadsheets are used and the associated risk is well known. Many tools and techniques have been developed which help reduce risks associate with creating and maintaining spreadsheet. However, little consideration has been given to reducing the risks of routine usage by the "consumers" - for example when entering and editing data. EASA's solution, available commercially,… ▽ More

    Submitted 11 August, 2009; originally announced August 2009.

    Comments: 10 Pages, 7 Figures

    Journal ref: Proc. European Spreadsheet Risks Int. Grp. (EuSpRIG) 2009 163-172 ISBN 978-1-905617-89-0

  50. arXiv:0906.2621  [pdf, ps, other

    math.GR

    Unsolved problems in ordered and orderable groups

    Authors: V. V. Bludov, A. M. W. Glass, V. M. Kopytov, N. Ya. Medvedev

    Abstract: We provide a list of (mainly unsolved) problems in ordered and orderable groups. These were originally compiled 10 years ago by the last two authors. New problems have been added to the list. Progress on some of these is noted and references provided. A few have been solved and their solutions are noted and referenced. We hope that this submission will act as a spur to mathematicians to solve so… ▽ More

    Submitted 15 June, 2009; originally announced June 2009.

    Comments: pdf file has 27 pages and 0 figures

    MSC Class: 06F15; 20F60; 20E06; 20B27; 20F10