Skip to main content

Showing 1–9 of 9 results for author: Smith, D A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00250  [pdf, other

    cs.CV cs.CL

    Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription

    Authors: Jaydeep Borkar, David A. Smith

    Abstract: Historical documents frequently suffer from damage and inconsistencies, including missing or illegible text resulting from issues such as holes, ink problems, and storage damage. These missing portions or gaps are referred to as lacunae. In this study, we employ transformer-based optical character recognition (OCR) models trained on synthetic data containing lacunae in a supervised manner. We demo… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted to ICDAR 2024 Workshop on Computational Paleography

  2. arXiv:2306.03168  [pdf, other

    cs.CL cs.CV

    Composition and Deformance: Measuring Imageability with a Text-to-Image Model

    Authors: Si Wu, David A. Smith

    Abstract: Although psycholinguists and psychologists have long studied the tendency of linguistic strings to evoke mental images in hearers or readers, most computational studies have applied this concept of imageability only to isolated words. Using recent developments in text-to-image generation models, such as DALLE mini, we propose computational methods that use generated images to measure the imageabil… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  3. arXiv:2305.03819  [pdf, other

    cs.CL

    Adapting Transformer Language Models for Predictive Ty** in Brain-Computer Interfaces

    Authors: Shijia Liu, David A. Smith

    Abstract: Brain-computer interfaces (BCI) are an important mode of alternative and augmentative communication for many people. Unlike keyboards, many BCI systems do not display even the 26 letters of English at one time, let alone all the symbols in more complex systems. Using language models to make character-level predictions, therefore, can greatly speed up BCI ty** (Ghosh and Kristensson, 2017). While… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  4. Digital Editions as Distant Supervision for Layout Analysis of Printed Books

    Authors: Alejandro H. Toselli, Si Wu, David A. Smith

    Abstract: Archivists, textual scholars, and historians often produce digital editions of historical documents. Using markup schemes such as those of the Text Encoding Initiative and EpiDoc, these digital editions often record documents' semantic regions (such as notes and figures) and physical features (such as page and line breaks) as well as transcribing their textual content. We describe methods for expl… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: 15 pages, 2 figures. International Conference on Document Analysis and Recognition. Springer, Cham, 2021

  5. arXiv:1812.04677  [pdf, other

    cs.SI cs.LG stat.ML

    Contrastive Training for Models of Information Cascades

    Authors: Shaobin Xu, David A. Smith

    Abstract: This paper proposes a model of information cascades as directed spanning trees (DSTs) over observed documents. In addition, we propose a contrastive training procedure that exploits partial temporal ordering of node infections in lieu of labeled training links. This combination of model and unsupervised training makes it possible to improve on models that use infection times alone and to exploit a… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: Accepted in AAAI-18

  6. arXiv:1712.06704  [pdf, ps, other

    stat.ML cs.CL cs.IR

    Multilingual Topic Models

    Authors: Kriste Krstovski, Michael J. Kurtz, David A. Smith, Alberto Accomazzi

    Abstract: Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.

    Comments: 18 pages, 9 figures

  7. arXiv:1601.01611  [pdf, other

    cs.IR

    Automatic Construction of Evaluation Sets and Evaluation of Document Similarity Models in Large Scholarly Retrieval Systems

    Authors: Kriste Krstovski, David A. Smith, Michael J. Kurtz

    Abstract: Retrieval systems for scholarly literature offer the ability for the scientific community to search, explore and download scholarly articles across various scientific disciplines. Mostly used by the experts in the particular field, these systems contain user community logs including information on user specific downloaded articles. In this paper we present a novel approach for automatically evalua… ▽ More

    Submitted 7 January, 2016; originally announced January 2016.

  8. arXiv:1410.0741  [pdf, other

    cs.LG

    Generalized Laguerre Reduction of the Volterra Kernel for Practical Identification of Nonlinear Dynamic Systems

    Authors: Brett W. Israelsen, Dale A. Smith

    Abstract: The Volterra series can be used to model a large subset of nonlinear, dynamic systems. A major drawback is the number of coefficients required model such systems. In order to reduce the number of required coefficients, Laguerre polynomials are used to estimate the Volterra kernels. Existing literature proposes algorithms for a fixed number of Volterra kernels, and Laguerre series. This paper prese… ▽ More

    Submitted 2 October, 2014; originally announced October 2014.

    Comments: 16 pages

    Journal ref: AIChE Spring Meeting 2014, Paper 349438

  9. arXiv:1203.3511  [pdf

    cs.LG cs.CL stat.ML

    Inference by Minimizing Size, Divergence, or their Sum

    Authors: Sebastian Riedel, David A. Smith, Andrew McCallum

    Abstract: We speed up marginal inference by ignoring factors that do not significantly contribute to overall accuracy. In order to pick a suitable subset of factors to ignore, we propose three schemes: minimizing the number of model factors under a bound on the KL divergence between pruned and full models; minimizing the KL divergence under a bound on factor count; and minimizing the weighted sum of KL dive… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-492-499