Skip to main content

Showing 1–50 of 86 results for author: McCarthy, A

.
  1. arXiv:2407.09194  [pdf, other

    astro-ph.SR astro-ph.EP

    The JWST Weather Report from the Nearest Brown Dwarfs I: multi-period JWST NIRSpec + MIRI monitoring of the benchmark binary brown dwarf WISE 1049AB

    Authors: Beth A. Biller, Johanna M. Vos, Yifan Zhou, Allison M. McCarthy, Xianyu Tan, Ian J. M. Crossfield, Niall Whiteford, Genaro Suarez, Jacqueline Faherty, Elena Manjavacas, Xueqing Chen, Pengyu Liu, Ben J. Sutlieff, Mary Anne Limbach, Paul Molliere, Trent J. Dupuy, Natalia Oliveros-Gomez, Philip S. Muirhead, Thomas Henning, Gregory Mace, Nicolas Crouzet, Theodora Karalidi, Caroline V. Morley, Pascal Tremblin, Tiffany Kataria

    Abstract: We report results from 8 hours of JWST/MIRI LRS spectroscopic monitoring directly followed by 7 hours of JWST/NIRSpec prism spectroscopic monitoring of the benchmark binary brown dwarf WISE 1049AB, the closest, brightest brown dwarfs known. We find water, methane, and CO absorption features in both components, including the 3.3 $μ$m methane absorption feature and a tentative detection of small gra… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 28 pages, 27 figures, accepted to MNRAS

  2. arXiv:2405.11309  [pdf, other

    astro-ph.HE

    Klein-Nishina Corrections to the Spectra and Light Curves of Gamma-ray Burst Afterglows

    Authors: George A. McCarthy, Tanmoy Laskar

    Abstract: Multi-wavelength modeling of the synchrotron radiation from relativistic transients such as Gamma-ray Burst (GRB) afterglows is a powerful means of exploring the physics of relativistic shocks and of deriving properties of the explosion, such as the kinetic energy of the associated relativistic outflows. Capturing the location and evolution of the synchrotron cooling break is critical to break par… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  3. arXiv:2404.02127  [pdf, other

    cs.CL cs.AI cs.LG

    FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

    Authors: Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M. Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher Manning

    Abstract: Instruction tuning is an important step in making language models useful for direct user interaction. However, many legal tasks remain out of reach for most open LLMs and there do not yet exist any large scale instruction datasets for the domain. This critically limits research in this application area. In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictio… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    MSC Class: 68T50 ACM Class: I.2

  4. arXiv:2402.15001  [pdf, other

    astro-ph.EP astro-ph.SR

    Multiple Patchy Cloud Layers in the Planetary Mass Object SIMP0136+0933

    Authors: Allison M. McCarthy, Philip S. Muirhead, Patrick Tamburo, Johanna M. Vos, Caroline V. Morley, Jacqueline Faherty, Daniella C. Bardalez Gagliuffi, Eric Agol, Christopher Theissen

    Abstract: Multi-wavelength photometry of brown dwarfs and planetary-mass objects provides insight into their atmospheres and cloud layers. We present near-simultaneous $J-$ and $K_s-$band multi-wavelength observations of the highly variable T2.5 planetary-mass object, SIMP J013656.5+093347. We reanalyze observations acquired over a single night in 2015 using a recently developed data reduction pipeline. For… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 15 pages, 8 figures, Accepted for publication in ApJ

  5. arXiv:2310.13678  [pdf, other

    cs.CL cs.AI cs.LG

    Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

    Authors: Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu

    Abstract: One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs… ▽ More

    Submitted 23 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: accepted to the Findings of EMNLP 2023. arXiv admin note: text overlap with arXiv:2212.09895

  6. arXiv:2302.07912  [pdf, other

    cs.CL

    Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

    Authors: Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, Luis Chiruzzo, John E. Ortega, Gustavo A. Giménez-Lugo, Rolando Coto-Solano, Katharina Kann

    Abstract: Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribu… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: EACL 2023

  7. arXiv:2302.01528  [pdf

    stat.ME q-bio.PE stat.AP

    Four principles for improved statistical ecology

    Authors: Gordana Popovic, Tanya J. Mason, Tiago A. Marques, Joanne Potts, Szymon M. Drobniak, Rocío Joo, Res Altwegg, Carolyn C. I. Burns, Michael A. McCarthy, Alison Johnston, Shinichi Nakagawa, Louise McMillan, Kadambari Devarajan, Patrick l. Taggart, Alison C. Wunderlich, Magdalena M. Mair, Juan Andrés Martínez-Lanfranco, Malgorzata Lagisz, Patrice P. Pottier

    Abstract: Increasing attention has been drawn to the misuse of statistical methods over recent years, with particular concern about the prevalence of practices such as poor experimental design, cherry-picking and inadequate reporting. These failures are largely unintentional and no more common in ecology than in other scientific disciplines, with many of them easily remedied given the right guidance. Orig… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 19 pages, 2 figures

  8. arXiv:2212.09895  [pdf, other

    cs.CL

    Improved Long-Form Spoken Language Translation with Large Language Models

    Authors: Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng

    Abstract: A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmen… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  9. arXiv:2212.01233  [pdf, other

    cs.LG cs.CR cs.IR

    Safe machine learning model release from Trusted Research Environments: The AI-SDC package

    Authors: Jim Smith, Richard J. Preen, Andrew McCarthy, Alba Crespi-Boixader, James Liley, Simon Rogers

    Abstract: We present AI-SDC, an integrated suite of open source Python tools to facilitate Statistical Disclosure Control (SDC) of Machine Learning (ML) models trained on confidential data prior to public release. AI-SDC combines (i) a SafeModel package that extends commonly used ML models to provide ante-hoc SDC by assessing the vulnerability of disclosure posed by the training regime; and (ii) an Attacks… ▽ More

    Submitted 6 December, 2022; v1 submitted 2 December, 2022; originally announced December 2022.

  10. arXiv:2211.16858  [pdf, other

    cs.CL

    A Major Obstacle for NLP Research: Let's Talk about Time Allocation!

    Authors: Katharina Kann, Shiran Dudy, Arya D. McCarthy

    Abstract: The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products. However, this paper argues that we have been less successful than we should have been and reflects on where and how the field fails to ta… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: To appear at EMNLP 2022

  11. arXiv:2211.01656  [pdf

    cs.LG cs.AI cs.CR

    GRAIMATTER Green Paper: Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs)

    Authors: Emily Jefferson, James Liley, Maeve Malone, Smarti Reel, Alba Crespi-Boixader, Xaroula Kerasidou, Francesco Tava, Andrew McCarthy, Richard Preen, Alberto Blanco-Justicia, Esma Mansouri-Benssassi, Josep Domingo-Ferrer, Jillian Beggs, Antony Chuter, Christian Cole, Felix Ritchie, Angela Daly, Simon Rogers, Jim Smith

    Abstract: TREs are widely, and increasingly used to support statistical analysis of sensitive data across a range of sectors (e.g., health, police, tax and education) as they enable secure and transparent research whilst protecting data confidentiality. There is an increasing desire from academia and industry to train AI models in TREs. The field of AI is develo** quickly with applications including spott… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  12. arXiv:2210.04462  [pdf, other

    astro-ph.EP astro-ph.IM astro-ph.SR

    The Perkins INfrared Exosatellite Survey (PINES) II. Transit Candidates and Implications for Planet Occurrence around L and T Dwarfs

    Authors: Patrick Tamburo, Philip S. Muirhead, Allison M. McCarthy, Murdock Hart, Johanna M. Vos, Eric Agol, Christopher Theissen, David Gracia, Daniella C. Bardalez Gagliuffi, Jacqueline Faherty

    Abstract: We describe a new transit detection algorithm designed to detect single transit events in discontinuous Perkins INfrared Exosatellite Survey (PINES) observations of L and T dwarfs. We use this algorithm to search for transits in 131 PINES light curves and identify two transit candidates: 2MASS J18212815+1414010 (2MASS J1821+1414) and 2MASS J08350622+1953050 (2MASS J0835+1953). We disfavor 2MASS J1… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 23 pages, 15 figures, accepted to AJ

  13. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  14. arXiv:2203.08909  [pdf, other

    cs.CL

    Morphological Processing of Low-Resource Languages: Where We Are and What's Next

    Authors: Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann

    Abstract: Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Findings of ACL 2022

  15. arXiv:2203.08850  [pdf, other

    cs.CL

    Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?

    Authors: En-Shiun Annie Lee, Sarubi Thillainathan, Shravan Nayak, Surangika Ranathunga, David Ifeoluwa Adelani, Ruisi Su, Arya D. McCarthy

    Abstract: What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages? We conduct a thorough empirical experiment in 10 languages to ascertain this, considering five factors: (1) the amount of fine-tuning data, (2) the noise in the fine-tuning data, (3) the amount of pre-training data in the model, (4) the impact of domain mismatch, and (5) langu… ▽ More

    Submitted 30 April, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to Findings of ACL 2022

  16. arXiv:2201.01794  [pdf, other

    astro-ph.EP astro-ph.IM

    The Perkins INfrared Exosatellite Survey (PINES) I. Survey Overview, Reduction Pipeline, and Early Results

    Authors: Patrick Tamburo, Philip S. Muirhead, Allison M. McCarthy, Murdock Hart, David Gracia, Johanna M. Vos, Daniella C. Bardalez Gagliuffi, Jacqueline Faherty, Christopher Theissen, Eric Agol, Julie N. Skinner, Sheila Sagear

    Abstract: We describe the Perkins INfrared Exosatellite Survey (PINES), a near-infrared photometric search for short-period transiting planets and moons around a sample of 393 spectroscopically confirmed L- and T-type dwarfs. PINES is performed with Boston University's 1.8 m Perkins Telescope Observatory, located on Anderson Mesa, Arizona. We discuss the observational strategy of the survey, which was desig… ▽ More

    Submitted 21 April, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: 25 pages, 15 figures, accepted to AJ

  17. arXiv:2101.10245  [pdf, other

    cs.HC

    AirWare: Utilizing Embedded Audio and Infrared Signals for In-Air Hand-Gesture Recognition

    Authors: Nibhrat Lohia, Raunak Mundada, Arya D. McCarthy, Eric C. Larson

    Abstract: We introduce AirWare, an in-air hand-gesture recognition system that uses the already embedded speaker and microphone in most electronic devices, together with embedded infrared proximity sensors. Gestures identified by AirWare are performed in the air above a touchscreen or a mobile phone. AirWare utilizes convolutional neural networks to classify a large vocabulary of hand gestures using multi-m… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

  18. arXiv:2012.10295  [pdf

    q-bio.GN

    Twelve years of SAMtools and BCFtools

    Authors: Petr Danecek, James K. Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A. McCarthy, Robert M. Davies, Heng Li

    Abstract: Background SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. Findings The first version appeared online twelve years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used… ▽ More

    Submitted 2 February, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

  19. Federated Learning for Breast Density Classification: A Real-World Implementation

    Authors: Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz , et al. (18 additional authors not shown)

    Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report… ▽ More

    Submitted 20 October, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: Accepted at the 1st MICCAI Workshop on "Distributed And Collaborative Learning"; add citation to Fig. 1 & 2 and update Fig. 5; fix typo in affiliations

    Journal ref: In: Albarqouni S. et al. (eds) Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART 2020, DCL 2020. Lecture Notes in Computer Science, vol 12444. Springer, Cham

  20. arXiv:2008.01019  [pdf, other

    stat.AP stat.ME

    Combining Breast Cancer Risk Prediction Models

    Authors: Zoe Guan, Theodore Huang, Anne Marie McCarthy, Kevin S. Hughes, Alan Semine, Hajime Uno, Lorenzo Trippa, Giovanni Parmigiani, Danielle Braun

    Abstract: Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Numerous breast cancer risk prediction models have been developed, but they often give predictions with conflicting clinical implications. Integrating information from different models may improve the accuracy of risk predictions, which would be valuable for both clinicians a… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

  21. arXiv:2005.13756  [pdf, other

    cs.CL

    The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

    Authors: Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden

    Abstract: In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms, i.e., the entire morphological paradigm, of each lemma. In order to si… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: SIGMORPHON 2020

  22. arXiv:2005.00970  [pdf, other

    cs.CL

    Unsupervised Morphological Paradigm Completion

    Authors: Huiming **, Liwei Cai, Yihui Peng, Chen Xia, Arya D. McCarthy, Katharina Kann

    Abstract: We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language processing (NLP) perspective, this is a challenging unsupervised task, and high-performing systems have the potential to improve tools for low-resource languages or… ▽ More

    Submitted 20 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted by ACL 2020

  23. arXiv:2005.00626  [pdf, other

    cs.CL

    Predicting Declension Class from Form and Meaning

    Authors: Adina Williams, Tiago Pimentel, Arya D. McCarthy, Hagen Blix, Eleanor Chodroff, Ryan Cotterell

    Abstract: The noun lexica of many natural languages are divided into several declension classes with characteristic morphological properties. Class membership is far from deterministic, but the phonological form of a noun and/or its meaning can often provide imperfect clues. Here, we investigate the strength of those clues. More specifically, we operationalize this by measuring how much information, in bits… ▽ More

    Submitted 28 May, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: 14 pages, 2 figures, the is the camera-ready version accepted at the 2020 Annual Conference of the Association for Computational Linguistics (ACL 2020)

  24. Robust 3D reconstruction of dynamic scenes from single-photon lidar using Beta-divergences

    Authors: Quentin Legros, Julian Tachella, Rachael Tobin, Aongus McCarthy, Sylvain Meignen, Gerald S. Buller, Yoann Altmann, Stephen McLaughlin, Michael E. Davies

    Abstract: In this paper, we present a new algorithm for fast, online 3D reconstruction of dynamic scenes using times of arrival of photons recorded by single-photon detector arrays. One of the main challenges in 3D imaging using single-photon lidar in practical applications is the presence of strong ambient illumination which corrupts the data and can jeopardize the detection of peaks/surface in the signals… ▽ More

    Submitted 18 December, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: 12 pages

  25. arXiv:2002.12231  [pdf, other

    eess.AS cs.CL cs.SD

    SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

    Authors: Arya D. McCarthy, Liezl Puzon, Juan Pino

    Abstract: We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method compares favorably to SpecAugment on English$\to$French and English$\to$Romanian automatic speech translation (AST) tasks as well as on a low-resource English a… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020

  26. The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

    Authors: Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

    Abstract: The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low… ▽ More

    Submitted 25 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: Presented at SIGMORPHON 2019

    Journal ref: Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology (2019) 229-244

  27. arXiv:1910.01531  [pdf, other

    cs.CL

    Modeling Color Terminology Across Thousands of Languages

    Authors: Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky

    Abstract: There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Co… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: Accepted for presentation at EMNLP-IJCNLP 2019

  28. arXiv:1909.09237  [pdf, other

    cs.CL

    Improved Variational Neural Machine Translation by Promoting Mutual Information

    Authors: Arya D. McCarthy, Xian Li, Jiatao Gu, Ning Dong

    Abstract: Posterior collapse plagues VAEs for text, especially for conditional text generation with strong autoregressive decoders. In this work, we address this problem in variational neural machine translation by explicitly promoting mutual information between the latent variables and the data. Our model extends the conditional variational autoencoder (CVAE) with two new ingredients: first, we propose a m… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

  29. arXiv:1909.06515  [pdf, other

    cs.CL cs.SD eess.AS

    Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade

    Authors: Juan Pino, Liezl Puzon, Jiatao Gu, Xutai Ma, Arya D. McCarthy, Deepak Gopinath

    Abstract: For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then translate with machine translation (MT). A major cause of the performance gap is that, while existing AST corpora are small, massive datasets exist for both the ASR and MT subsystems. In this work, we evaluate several data augmentation and… ▽ More

    Submitted 22 October, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: IWSLT 2019

  30. arXiv:1906.05906  [pdf, other

    cs.CL

    Meaning to Form: Measuring Systematicity as Information

    Authors: Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

    Abstract: A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade? For instance, does the character bigram \textit{gl} have any systematic relationship to the meaning of words like \textit{glisten}, \textit{gleam} and \textit{gl… ▽ More

    Submitted 26 July, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at ACL 2019

  31. arXiv:1905.06700  [pdf, other

    eess.IV physics.optics

    Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers

    Authors: Julián Tachella, Yoann Altmann, Nicolas Mellado, Aongus McCarthy, Rachael Tobin, Gerald S. Buller, Jean-Yves Tourneret, Stephen McLaughlin

    Abstract: Single-photon lidar has emerged as a prime candidate technology for depth imaging through challenging environments. Until now, a major limitation has been the significant amount of time required for the analysis of the recorded data. Here we show a new computational framework for real-time three-dimensional (3D) scene reconstruction from single-photon data. By combining statistical models with hig… ▽ More

    Submitted 4 October, 2019; v1 submitted 16 May, 2019; originally announced May 2019.

  32. An Exact No Free Lunch Theorem for Community Detection

    Authors: Arya D. McCarthy, Tongfei Chen, Seth Ebner

    Abstract: A precondition for a No Free Lunch theorem is evaluation with a loss function which does not assume a priori superiority of some outputs over others. A previous result for community detection by Peel et al. (2017) relies on a mismatch between the loss function and the problem domain. The loss function computes an expectation over only a subset of the universe of possible outputs; thus, it is only… ▽ More

    Submitted 24 March, 2019; originally announced March 2019.

    Journal ref: Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881

  33. arXiv:1901.01354  [pdf, other

    cs.SI physics.soc-ph

    Metrics matter in community detection

    Authors: Arya D. McCarthy, Tongfei Chen, Rachel Rudinger, David W. Matula

    Abstract: We present a critical evaluation of normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method's performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences u… ▽ More

    Submitted 4 January, 2019; originally announced January 2019.

    Journal ref: Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881

  34. Bayesian 3D Reconstruction of Complex Scenes from Single-Photon Lidar Data

    Authors: Julián Tachella, Yoann Altmann, Ximing Ren, Aongus McCarthy, Gerald S. Buller, Jean-Yves Tourneret, Steve McLaughlin

    Abstract: Light detection and ranging (Lidar) data can be used to capture the depth and intensity profile of a 3D scene. This modality relies on constructing, for each pixel, a histogram of time delays between emitted light pulses and detected photon arrivals. In a general setting, more than one surface can be observed in a single pixel. The problem of estimating the number of surfaces, their reflectivity a… ▽ More

    Submitted 27 October, 2018; originally announced October 2018.

    Journal ref: SIAM Journal on Imaging Sciences 2019 12:1, 521-550

  35. arXiv:1810.11101  [pdf, other

    cs.CL

    UniMorph 2.0: Universal Morphology

    Authors: Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. The project releases annotated morphological data using a universal tagset, the UniMorph schema. Each inflected form is associated with a lemma, which typically carries its underlying lexical meaning, and a bundle of morphological features from our schema.… ▽ More

    Submitted 25 February, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: LREC 2018

  36. arXiv:1810.07125  [pdf, other

    cs.CL

    The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

    Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a… ▽ More

    Submitted 25 February, 2020; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: CoNLL 2018. arXiv admin note: text overlap with arXiv:1706.09031

  37. Marrying Universal Dependencies and Universal Morphology

    Authors: Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky

    Abstract: The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. Wi… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

    Comments: UDW18

    Journal ref: Proceedings of the Second Workshop on Universal Dependencies (2018) 91-101

  38. Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

    Authors: Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

    Abstract: To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surpri… ▽ More

    Submitted 15 January, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

    Comments: presented at WMT 2018. Please cite using the bib entry from here: http://www.statmt.org/wmt18/bib/WMT013.bib

    Journal ref: Proceedings of the Third Conference on Machine Translation: Research Papers (2018) 124-132

  39. arXiv:1805.04755  [pdf, other

    stat.ML cs.LG

    A Simple and Effective Model-Based Variable Importance Measure

    Authors: Brandon M. Greenwell, Bradley C. Boehmke, Andrew J. McCarthy

    Abstract: In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted… ▽ More

    Submitted 12 May, 2018; originally announced May 2018.

  40. arXiv:1803.02903  [pdf, other

    physics.ins-det astro-ph.IM hep-ex

    Nuclear-recoil energy scale in CDMS II silicon dark-matter detectors

    Authors: R. Agnese, A. J. Anderson, T. Aramaki, W. Baker, D. Balakishiyeva, S. Banik, D. Barker, R. Basu Thakur, D. A. Bauer, T. Binder, A. Borgland, M. A. Bowles, P. L. Brink, R. Bunker, B. Cabrera, D. O. Caldwell, R. Calkins, C. Cartaro, D. G. Cerdeno, H. Chagani, Y. -Y. Chang, Y. Chen, J. Cooley, B. Cornell, P. Cushman , et al. (84 additional authors not shown)

    Abstract: The Cryogenic Dark Matter Search (CDMS II) experiment aims to detect dark matter particles that elastically scatter from nuclei in semiconductor detectors. The resulting nuclear-recoil energy depositions are detected by ionization and phonon sensors. Neutrons produce a similar spectrum of low-energy nuclear recoils in such detectors, while most other backgrounds produce electron recoils. The absol… ▽ More

    Submitted 27 July, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: 22 pages, 17 figures, 1 table, 1 appendix

  41. arXiv:1801.08428  [pdf, other

    math.DG

    Discrete projective minimal surfaces

    Authors: A. McCarthy, W. K. Schief

    Abstract: We propose a natural discretisation scheme for classical projective minimal surfaces. We follow the classical geometric characterisation and classification of projective minimal surfaces and introduce at each step canonical discrete models of the associated geometric notions and objects. Thus, we introduce discrete analogues of classical Lie quadrics and their envelopes and classify discrete proje… ▽ More

    Submitted 25 January, 2018; originally announced January 2018.

  42. arXiv:1712.03353  [pdf, other

    stat.ML

    Variational Inference over Non-differentiable Cardiac Simulators using Bayesian Optimization

    Authors: Adam McCarthy, Blanca Rodriguez, Ana Minchole

    Abstract: Performing inference over simulators is generally intractable as their runtime means we cannot compute a marginal likelihood. We develop a likelihood-free inference method to infer parameters for a cardiac simulator, which replicates electrical flow through the heart to the body surface. We improve the fit of a state-of-the-art simulator to an electrocardiogram (ECG) recorded from a real patient.

    Submitted 9 December, 2017; originally announced December 2017.

    Comments: Workshops on Deep Learning for Physical Sciences and Machine Learning 4 Health, NIPS 2017

  43. arXiv:1612.00662  [pdf, other

    stat.ML cs.LG

    Predicting Patient State-of-Health using Sliding Window and Recurrent Classifiers

    Authors: Adam McCarthy, Christopher K. I. Williams

    Abstract: Bedside monitors in Intensive Care Units (ICUs) frequently sound incorrectly, slowing response times and desensitising nurses to alarms (Chambrin, 2001), causing true alarms to be missed (Hug et al., 2011). We compare sliding window predictors with recurrent predictors to classify patient state-of-health from ICU multivariate time series; we report slightly improved performance for the RNN for thr… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

    Comments: NIPS 2016 Workshop on Machine Learning for Health

  44. arXiv:1610.04107  [pdf, other

    stat.AP

    Robust spectral unmixing of sparse multispectral Lidar waveforms using gamma Markov random fields

    Authors: Yoann Altmann, Aurora Maccarone, Aongus McCarthy, Gregory Newstadt, Gerald S. Buller, Steve McLaughlin, Alfred Hero

    Abstract: This paper presents a new Bayesian spectral unmixing algorithm to analyse remote scenes sensed via sparse multispectral Lidar measurements. To a first approximation, in the presence of a target, each Lidar waveform consists of a main peak, whose position depends on the target distance and whose amplitude depends on the wavelength of the laser source considered (i.e, on the target reflectivity). Be… ▽ More

    Submitted 13 June, 2017; v1 submitted 13 October, 2016; originally announced October 2016.

  45. arXiv:1608.06143  [pdf, other

    stat.CO stat.ME

    Object Depth Profile and Reflectivity Restoration from Sparse Single-Photon Data Acquired in Underwater Environments

    Authors: Abderrahim Halimi, Aurora Maccarone, Aongus McCarthy, Steve McLaughlin, Gerald S. Buller

    Abstract: This paper presents two new algorithms for the joint restoration of depth and reflectivity (DR) images constructed from time-correlated single-photon counting (TCSPC) measurements. Two extreme cases are considered: (i) a reduced acquisition time that leads to very low photon counts and (ii) a highly attenuating environment (such as a turbid medium) which makes the reflectivity estimation more diff… ▽ More

    Submitted 22 August, 2016; originally announced August 2016.

  46. arXiv:1601.06149  [pdf, other

    physics.ins-det stat.AP

    Robust Bayesian target detection algorithm for depth imaging from sparse single-photon data

    Authors: Yoann Altmann, Ximing Ren, Aongus McCarthy, Gerald S. Buller, Steve McLaughlin

    Abstract: This paper presents a new Bayesian model and associated algorithm for depth and intensity profiling using full waveforms from time-correlated single-photon counting (TCSPC) measurements in the limit of very low photon counts (i.e., typically less than 20 photons per pixel). The model represents each Lidar waveform as an unknown constant background level, which is combined in the presence of a targ… ▽ More

    Submitted 13 October, 2016; v1 submitted 21 January, 2016; originally announced January 2016.

    Comments: arXiv admin note: text overlap with arXiv:1507.02511

  47. Lidar waveform based analysis of depth images constructed using sparse single-photon data

    Authors: Yoann Altmann, Ximing Ren, Aongus McCarthy, Gerald S. Buller, Steve McLaughlin

    Abstract: This paper presents a new Bayesian model and algorithm used for depth and intensity profiling using full waveforms from the time-correlated single photon counting (TCSPC) measurement in the limit of very low photon counts. The model proposed represents each Lidar waveform as a combination of a known impulse response, weighted by the target intensity, and an unknown constant background, corrupted b… ▽ More

    Submitted 9 July, 2015; originally announced July 2015.

  48. arXiv:1504.05871  [pdf, other

    hep-ex astro-ph.CO

    Improved WIMP-search reach of the CDMS II germanium data

    Authors: R. Agnese, A. J. Anderson, M. Asai, D. Balakishiyeva, D. Barker, R. Basu Thakur, D. A. Bauer, J. Billard, A. Borgland, M. A. Bowles, D. Brandt, P. L. Brink, R. Bunker, B. Cabrera, D. O. Caldwell, R. Calkins, D. G. Cerdeño, H. Chagani, Y. Chen, J. Cooley, B. Cornell, C. H. Crewdson, P. Cushman, M. Daal, P. C. F. Di Stefano , et al. (64 additional authors not shown)

    Abstract: CDMS II data from the 5-tower runs at the Soudan Underground Laboratory were reprocessed with an improved charge-pulse fitting algorithm. Two new analysis techniques to reject surface-event backgrounds were applied to the 612 kg days germanium-detector WIMP-search exposure. An extended analysis was also completed by decreasing the 10 keV analysis threshold to $\sim$5 keV, to increase sensitivity n… ▽ More

    Submitted 13 October, 2015; v1 submitted 22 April, 2015; originally announced April 2015.

    Comments: 25 pages, 15 figures, slightly updated organization and text consistent with PRD referee process, Fig. 14 updated

    Report number: IPPP/15/24, DCTP/15/48

    Journal ref: Phys. Rev. D 92, 072003 (2015)

  49. Characterization of the infectious reservoir of malaria with an agent-based model calibrated to age-stratified parasite densities and infectiousness

    Authors: Jaline Gerardin, Andre Lin Ouedraogo, Kevin A. McCarthy, Bocar Kouyate, Philip A. Eckhoff, Edward A. Wenger

    Abstract: Background Elimination of malaria can only be achieved through removal of all vectors or complete depletion of the infectious reservoir in humans. Mechanistic models can be built to synthesize diverse observations from the field collected under a variety of conditions and subsequently used to query the infectious reservoir in great detail. Methods The EMOD model of malaria transmission was calibra… ▽ More

    Submitted 31 March, 2015; originally announced March 2015.

    Comments: submitted to Malaria Journal on March 31, 2015

    Journal ref: Malaria Journal 2015, 14:231

  50. arXiv:1503.03379  [pdf, other

    astro-ph.CO hep-ex hep-ph

    Dark matter effective field theory scattering in direct detection experiments

    Authors: K. Schneck, B. Cabrera, D. G. Cerdeno, V. Mandic, H. E. Rogers, R. Agnese, A. J. Anderson, M. Asai, D. Balakishiyeva, D. Barker, R. Basu Thakur, D. A. Bauer, J. Billard, A. Borgland, D. Brandt, P. L. Brink, R. Bunker, D. O. Caldwell, R. Calkins, H. Chagani, Y. Chen, J. Cooley, B. Cornell, C. H. Crewdson, P. Cushman , et al. (62 additional authors not shown)

    Abstract: We examine the consequences of the effective field theory (EFT) of dark matter-nucleon scattering for current and proposed direct detection experiments. Exclusion limits on EFT coupling constants computed using the optimum interval method are presented for SuperCDMS Soudan, CDMS II, and LUX, and the necessity of combining results from multiple experiments in order to determine dark matter paramete… ▽ More

    Submitted 16 August, 2016; v1 submitted 11 March, 2015; originally announced March 2015.

    Comments: Newest version includes erratum

    Journal ref: Phys. Rev. D 91, 092004 (2015)