Search | arXiv e-print repository

Population-based wind farm monitoring based on a spatial autoregressive approach

Abstract: An important challenge faced by wind farm operators is to reduce operation and maintenance cost. Structural health monitoring provides a means of cost reduction through minimising unnecessary maintenance trips as well as prolonging turbine service life. Population-based structural health monitoring can further reduce the cost of health monitoring systems by implementing one system for multiple str… ▽ More An important challenge faced by wind farm operators is to reduce operation and maintenance cost. Structural health monitoring provides a means of cost reduction through minimising unnecessary maintenance trips as well as prolonging turbine service life. Population-based structural health monitoring can further reduce the cost of health monitoring systems by implementing one system for multiple structures (i.e.~turbines). At the same time, shared data within a population of structures may improve the predictions of structural behaviour. To monitor turbine performance at a population/farm level, an important initial step is to construct a model that describes the behaviour of all turbines under normal conditions. This paper proposes a population-level model that explicitly captures the spatial and temporal correlations (between turbines) induced by the wake effect. The proposed model is a Gaussian process-based spatial autoregressive model, named here a GP-SPARX model. This approach is developed since (a) it reflects our physical understanding of the wake effect, and (b) it benefits from a stochastic data-based learner. A case study is provided to demonstrate the capability of the GP-SPARX model in capturing spatial and temporal variations as well as its potential applicability in a health monitoring system. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures, submitted to the Modern Practice in Stress and Vibration Analysis (MPSVA) 2022 Conference Proceedings

arXiv:2310.05807 [pdf, other]

Sharing Information Between Machine Tools to Improve Surface Finish Forecasting

Authors: Daniel R. Clarkson, Lawrence A. Bull, Tina A. Dardeno, Chandula T. Wickramarachchi, Elizabeth J. Cross, Timothy J. Rogers, Keith Worden, Nikolaos Dervilis, Aidan J. Hughes

Abstract: At present, most surface-quality prediction methods can only perform single-task prediction which results in under-utilised datasets, repetitive work and increased experimental costs. To counter this, the authors propose a Bayesian hierarchical model to predict surface-roughness measurements for a turning machining process. The hierarchical model is compared to multiple independent Bayesian linear… ▽ More At present, most surface-quality prediction methods can only perform single-task prediction which results in under-utilised datasets, repetitive work and increased experimental costs. To counter this, the authors propose a Bayesian hierarchical model to predict surface-roughness measurements for a turning machining process. The hierarchical model is compared to multiple independent Bayesian linear regression models to showcase the benefits of partial pooling in a machining setting with respect to prediction accuracy and uncertainty quantification. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Submitted to International Workshop on Structural Health Monitoring 2023, Stanford University, California, USA

arXiv:2309.10656 [pdf, other]

A spectrum of physics-informed Gaussian processes for regression in engineering

Authors: Elizabeth J Cross, Timothy J Rogers, Daniel J Pitchforth, Samuel J Gibson, Matthew R Jones

Abstract: Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach. The vast data and resources available to capture human activity are unmatched in our engineered world, and, even in cases where data could be referred to as ``big,'' they will rarely hold information across op… ▽ More Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach. The vast data and resources available to capture human activity are unmatched in our engineered world, and, even in cases where data could be referred to as ``big,'' they will rarely hold information across operational windows or life spans. This paper pursues the combination of machine learning technology and physics-based reasoning to enhance our ability to make predictive models with limited data. By explicitly linking the physics-based view of stochastic processes with a data-based regression approach, a spectrum of possible Gaussian process models are introduced that enable the incorporation of different levels of expert knowledge of a system. Examples illustrate how these approaches can significantly reduce reliance on data collection whilst also increasing the interpretability of the model, another important consideration in this context. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2305.08657 [pdf, other]

Encoding Domain Expertise into Multilevel Models for Source Location

Authors: Lawrence A. Bull, Matthew R. Jones, Elizabeth J. Cross, Andrew Duncan, Mark Girolami

Abstract: Data from populations of systems are prevalent in many industrial applications. Machines and infrastructure are increasingly instrumented with sensing systems, emitting streams of telemetry data with complex interdependencies. In practice, data-centric monitoring procedures tend to consider these assets (and respective models) as distinct -- operating in isolation and associated with independent d… ▽ More Data from populations of systems are prevalent in many industrial applications. Machines and infrastructure are increasingly instrumented with sensing systems, emitting streams of telemetry data with complex interdependencies. In practice, data-centric monitoring procedures tend to consider these assets (and respective models) as distinct -- operating in isolation and associated with independent data. In contrast, this work captures the statistical correlations and interdependencies between models of a group of systems. Utilising a Bayesian multilevel approach, the value of data can be extended, since the population can be considered as a whole, rather than constituent parts. Most interestingly, domain expertise and knowledge of the underlying physics can be encoded in the model at the system, subgroup, or population level. We present an example of acoustic emission (time-of-arrival) map** for source location, to illustrate how multilevel models naturally lend themselves to representing aggregate systems in engineering. In particular, we focus on constraining the combined models with domain knowledge to enhance transfer learning and enable further insights at the population level. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.13385 [pdf, other]

doi 10.1016/j.media.2023.102807

Low-field magnetic resonance image enhancement via stochastic image quality transfer

Authors: Hongxiang Lin, Matteo Figini, Felice D'Arco, Godwin Ogbole, Ryutaro Tanno, Stefano B. Blumberg, Lisa Ronan, Biobele J. Brown, David W. Carmichael, Ikeoluwa Lagunju, Judith Helen Cross, Delmiro Fernandez-Reyes, Daniel C. Alexander

Abstract: Low-field (<1T) magnetic resonance imaging (MRI) scanners remain in widespread use in low- and middle-income countries (LMICs) and are commonly used for some applications in higher income countries e.g. for small child patients with obesity, claustrophobia, implants, or tattoos. However, low-field MR images commonly have lower resolution and poorer contrast than images from high field (1.5T, 3T, a… ▽ More Low-field (<1T) magnetic resonance imaging (MRI) scanners remain in widespread use in low- and middle-income countries (LMICs) and are commonly used for some applications in higher income countries e.g. for small child patients with obesity, claustrophobia, implants, or tattoos. However, low-field MR images commonly have lower resolution and poorer contrast than images from high field (1.5T, 3T, and above). Here, we present Image Quality Transfer (IQT) to enhance low-field structural MRI by estimating from a low-field image the image we would have obtained from the same subject at high field. Our approach uses (i) a stochastic low-field image simulator as the forward model to capture uncertainty and variation in the contrast of low-field images corresponding to a particular high-field image, and (ii) an anisotropic U-Net variant specifically designed for the IQT inverse problem. We evaluate the proposed algorithm both in simulation and using multi-contrast (T1-weighted, T2-weighted, and fluid attenuated inversion recovery (FLAIR)) clinical low-field MRI data from an LMIC hospital. We show the efficacy of IQT in improving contrast and resolution of low-field MR images. We demonstrate that IQT-enhanced images have potential for enhancing visualisation of anatomical structures and pathological lesions of clinical relevance from the perspective of radiologists. IQT is proved to have capability of boosting the diagnostic value of low-field MRI, especially in low-resource settings. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted in Medical Image Analysis

arXiv:2302.03528 [pdf, other]

Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

Authors: Simeng Sun, Maha Elbayad, Anna Sun, James Cross

Abstract: With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making a… ▽ More With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline model with 30% computation and recover the performance of a larger model trained from scratch with over 50% reduction in computation. Furthermore, our analysis reveals that the introduced techniques help learn the new directions more effectively and alleviate catastrophic forgetting at the same time. We hope our work will guide research into more efficient approaches to growing languages for these MMT models and ultimately maximize the reuse of existing models. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: Accepted to EACL 2023 (Main)

arXiv:2207.04672 [pdf]

No Language Left Behind: Scaling Human-Centered Machine Translation

Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while kee** ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at https://github.com/facebookresearch/fairseq/tree/nllb. △ Less

Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: 190 pages

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2206.15303 [pdf, other]

doi 10.1007/978-3-030-81716-9_17

Physics-informed machine learning for Structural Health Monitoring

Authors: Elizabeth J Cross, Samuel J Gibson, Matthew R Jones, Daniel J Pitchforth, Sikai Zhang, Timothy J Rogers

Abstract: The use of machine learning in Structural Health Monitoring is becoming more common, as many of the inherent tasks (such as regression and classification) in develo** condition-based assessment fall naturally into its remit. This chapter introduces the concept of physics-informed machine learning, where one adapts ML algorithms to account for the physical insight an engineer will often have of t… ▽ More The use of machine learning in Structural Health Monitoring is becoming more common, as many of the inherent tasks (such as regression and classification) in develo** condition-based assessment fall naturally into its remit. This chapter introduces the concept of physics-informed machine learning, where one adapts ML algorithms to account for the physical insight an engineer will often have of the structure they are attempting to model or assess. The chapter will demonstrate how grey-box models, that combine simple physics-based models with data-driven ones, can improve predictive capability in an SHM setting. A particular strength of the approach demonstrated here is the capacity of the models to generalise, with enhanced predictive capability in different regimes. This is a key issue when life-time assessment is a requirement, or when monitoring data do not span the operational conditions a structure will undergo. The chapter will provide an overview of physics-informed ML, introducing a number of new approaches for grey-box modelling in a Bayesian setting. The main ML tool discussed will be Gaussian process regression, we will demonstrate how physical assumptions/models can be incorporated through constraints, through the mean function and kernel design, and finally in a state-space setting. A range of SHM applications will be demonstrated, from loads monitoring tasks for off-shore and aerospace structures, through to performance monitoring for long-span bridges. △ Less

Submitted 30 June, 2022; originally announced June 2022.

arXiv:2206.02079 [pdf, other]

Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

Authors: Xiang Kong, Adithya Renduchintala, James Cross, Yuqing Tang, Jiatao Gu, Xian Li

Abstract: Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity. However, the extra latency and memory costs introduced by this approach may make it unacceptable for efficiency-constrained applications. It has recently been shown for bilingual translation that using a deep encoder and shallow decoder (DESD) c… ▽ More Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity. However, the extra latency and memory costs introduced by this approach may make it unacceptable for efficiency-constrained applications. It has recently been shown for bilingual translation that using a deep encoder and shallow decoder (DESD) can reduce inference latency while maintaining translation quality, so we study similar speed-accuracy trade-offs for multilingual translation. We find that for many-to-one translation we can indeed increase decoder speed without sacrificing quality using this approach, but for one-to-many translation, shallow decoders cause a clear quality drop. To ameliorate this drop, we propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages. Specifically, the DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality. △ Less

Submitted 4 June, 2022; originally announced June 2022.

Comments: EACL 2021

arXiv:2206.01495 [pdf, other]

doi 10.1016/j.ymssp.2022.109984

Constraining Gaussian processes for physics-informed acoustic emission map**

Authors: Matthew R Jones, Timothy J Rogers, Elizabeth J Cross

Abstract: The automated localisation of damage in structures is a challenging but critical ingredient in the path towards predictive or condition-based maintenance of high value structures. The use of acoustic emission time of arrival map** is a promising approach to this challenge, but is severely hindered by the need to collect a dense set of artificial acoustic emission measurements across the structur… ▽ More The automated localisation of damage in structures is a challenging but critical ingredient in the path towards predictive or condition-based maintenance of high value structures. The use of acoustic emission time of arrival map** is a promising approach to this challenge, but is severely hindered by the need to collect a dense set of artificial acoustic emission measurements across the structure, resulting in a lengthy and often impractical data acquisition process. In this paper, we consider the use of physics-informed Gaussian processes for learning these maps to alleviate this problem. In the approach, the Gaussian process is constrained to the physical domain such that information relating to the geometry and boundary conditions of the structure are embedded directly into the learning process, returning a model that guarantees that any predictions made satisfy physically-consistent behaviour at the boundary. A number of scenarios that arise when training measurement acquisition is limited, including where training data are sparse, and also of limited coverage over the structure of interest. Using a complex plate-like structure as an experimental case study, we show that our approach significantly reduces the burden of data collection, where it is seen that incorporation of boundary condition knowledge significantly improves predictive accuracy as training observations are reduced, particularly when training measurements are not available across all parts of the structure. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2205.10835 [pdf, other]

Multilingual Machine Translation with Hyper-Adapters

Authors: Christos Baziotis, Mikel Artetxe, James Cross, Shruti Bhosale

Abstract: Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of related languages are unable to transfer information, and their total number of parameters becomes prohibitively expensive as the number of languages grows. In this work, we overcome these drawbacks… ▽ More Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of related languages are unable to transfer information, and their total number of parameters becomes prohibitively expensive as the number of languages grows. In this work, we overcome these drawbacks using hyper-adapters -- hyper-networks that generate adapters from language and layer embeddings. While past work had poor results when scaling hyper-networks, we propose a rescaling fix that significantly improves convergence and enables training larger hyper-networks. We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters. When using the same number of parameters and FLOPS, our approach consistently outperforms regular adapters. Also, hyper-adapters converge faster than alternative approaches and scale better than regular dense networks. Our analysis shows that hyper-adapters learn to encode language relatedness, enabling positive transfer across languages. △ Less

Submitted 5 December, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

Comments: EMNLP 2022 camera-ready version. Code at github.com/cbaziotis/fairseq under the "hyperadapters" branch (see instructions at https://github.com/cbaziotis/fairseq/tree/hyperadapters/examples/adapters)

arXiv:2205.06266 [pdf, other]

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Authors: Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe

Abstract: Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while kee** the total number of trainable parameters per language constant. In contrast with prior work that learn… ▽ More Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while kee** the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: NAACL 2022

arXiv:2204.14268 [pdf, other]

How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

Authors: Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman

Abstract: A multilingual tokenizer is a fundamental component of multilingual neural machine translation. It is trained from a multilingual corpus. Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus. However, few works have systematically answered how language imbalance in tokenizer training affects downstream performance. In… ▽ More A multilingual tokenizer is a fundamental component of multilingual neural machine translation. It is trained from a multilingual corpus. Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus. However, few works have systematically answered how language imbalance in tokenizer training affects downstream performance. In this work, we analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus. We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected. Two features, UNK rate and closeness to the character level, can warn of poor downstream performance before performing the task. We also distinguish language sampling for tokenizer training from sampling for model training and show that the model is more sensitive to the latter. △ Less

Submitted 10 September, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: AMTA 2022

arXiv:2203.13867 [pdf, other]

Data Selection Curriculum for Neural Machine Translation

Authors: Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, Shafiq Joty

Abstract: Neural Machine Translation (NMT) models are typically trained on heterogeneous data that are concatenated and randomly shuffled. However, not all of the training data are equally useful to the model. Curriculum training aims to present the data to the NMT models in a meaningful order. In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model o… ▽ More Neural Machine Translation (NMT) models are typically trained on heterogeneous data that are concatenated and randomly shuffled. However, not all of the training data are equally useful to the model. Curriculum training aims to present the data to the NMT models in a meaningful order. In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model. Through comprehensive experiments on six language pairs comprising low- and high-resource languages from WMT'21, we have shown that our curriculum strategies consistently demonstrate better quality (up to +2.2 BLEU improvement) and faster convergence (approximately 50% fewer updates). △ Less

Submitted 25 March, 2022; originally announced March 2022.

arXiv:2111.15496 [pdf, other]

doi 10.1016/j.ymssp.2021.108530

Bayesian Modelling of Multivalued Power Curves from an Operational Wind Farm

Authors: L. A. Bull, P. A. Gardner, T. J. Rogers, N. Dervilis, E. J. Cross, E. Papatheou, A. E. Maguire, C. Campos, K. Worden

Abstract: Power curves capture the relationship between wind speed and output power for a specific wind turbine. Accurate regression models of this function prove useful in monitoring, maintenance, design, and planning. In practice, however, the measurements do not always correspond to the ideal curve: power curtailments will appear as (additional) functional components. Such multivalued relationships canno… ▽ More Power curves capture the relationship between wind speed and output power for a specific wind turbine. Accurate regression models of this function prove useful in monitoring, maintenance, design, and planning. In practice, however, the measurements do not always correspond to the ideal curve: power curtailments will appear as (additional) functional components. Such multivalued relationships cannot be modelled by conventional regression, and the associated data are usually removed during pre-processing. The current work suggests an alternative method to infer multivalued relationships in curtailed power data. Using a population-based approach, an overlap** mixture of probabilistic regression models is applied to signals recorded from turbines within an operational wind farm. The model is shown to provide an accurate representation of practical power data across the population. △ Less

Submitted 30 November, 2021; originally announced November 2021.

Journal ref: Mechanical Systems and Signal Processing (2021): 108530

arXiv:2110.08246 [pdf, ps, other]

Tricks for Training Sparse Translation Models

Authors: Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan

Abstract: Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks. Sparse scaling architectures, such as BASELayers, provide flexible mechanisms for different tasks to have a variable number of parameters, which can be useful to counterbalance skewed data distributions. We find that t… ▽ More Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks. Sparse scaling architectures, such as BASELayers, provide flexible mechanisms for different tasks to have a variable number of parameters, which can be useful to counterbalance skewed data distributions. We find that that sparse architectures for multilingual machine translation can perform poorly out of the box, and propose two straightforward techniques to mitigate this - a temperature heating mechanism and dense pre-training. Overall, these methods improve performance on two multilingual translation benchmarks compared to standard BASELayers and Dense scaling baselines, and in combination, more than 2x model convergence speed. △ Less

Submitted 15 October, 2021; originally announced October 2021.

arXiv:2110.07804 [pdf, other]

Alternative Input Signals Ease Transfer in Multilingual Machine Translation

Authors: Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman

Abstract: Recent work in multilingual machine translation (MMT) has focused on the potential of positive transfer between languages, particularly cases where higher-resourced languages can benefit lower-resourced ones. While training an MMT model, the supervision signals learned from one language pair can be transferred to the other via the tokens shared by multiple source languages. However, the transfer i… ▽ More Recent work in multilingual machine translation (MMT) has focused on the potential of positive transfer between languages, particularly cases where higher-resourced languages can benefit lower-resourced ones. While training an MMT model, the supervision signals learned from one language pair can be transferred to the other via the tokens shared by multiple source languages. However, the transfer is inhibited when the token overlap among source languages is small, which manifests naturally when languages use different writing systems. In this paper, we tackle inhibited transfer by augmenting the training data with alternative signals that unify different writing systems, such as phonetic, romanized, and transliterated input. We test these signals on Indic and Turkic languages, two language families where the writing systems differ but languages still share common features. Our results indicate that a straightforward multi-source self-ensemble -- training a model on a mixture of various signals and ensembling the outputs of the same model fed with different signals during inference, outperforms strong ensemble baselines by 1.3 BLEU points on both language families. Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible. Finally, our analysis demonstrates that including alternative signals yields more consistency and translates named entities more accurately, which is crucial for increased factuality of automated systems. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2109.08627 [pdf, other]

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

Authors: Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia

Abstract: Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels. Recent QE models have achieved previously-unseen levels of correlation with human judgments, but they rely on large multilingual contextualized language models that are computationally expens… ▽ More Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels. Recent QE models have achieved previously-unseen levels of correlation with human judgments, but they rely on large multilingual contextualized language models that are computationally expensive and make them infeasible for real-world applications. In this work, we evaluate several model compression techniques for QE and find that, despite their popularity in other NLP tasks, they lead to poor performance in this regression setting. We observe that a full model parameterization is required to achieve SoTA results in a regression task. However, we argue that the level of expressiveness of a model in a continuous range is unnecessary given the downstream applications of QE, and show that reframing QE as a classification problem and evaluating QE models using classification metrics would better reflect their actual performance in real-world applications. △ Less

Submitted 17 September, 2021; originally announced September 2021.

Comments: EMNLP 2021

arXiv:2108.03265 [pdf, other]

Facebook AI WMT21 News Translation Task Submission

Authors: Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan

Abstract: We describe Facebook's multilingual model submission to the WMT2021 shared task on news translation. We participate in 14 language directions: English to and from Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese. To develop systems covering all these directions, we focus on multilingual models. We utilize data from all available sources --- WMT, large-scale data mining, and in-domai… ▽ More We describe Facebook's multilingual model submission to the WMT2021 shared task on news translation. We participate in 14 language directions: English to and from Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese. To develop systems covering all these directions, we focus on multilingual models. We utilize data from all available sources --- WMT, large-scale data mining, and in-domain backtranslation --- to create high quality bilingual and multilingual baselines. Subsequently, we investigate strategies for scaling multilingual model size, such that one system has sufficient capacity for high quality representations of all eight languages. Our final submission is an ensemble of dense and sparse Mixture-of-Expert multilingual translation models, followed by finetuning on in-domain news data and noisy channel reranking. Compared to previous year's winning submissions, our multilingual system improved the translation quality on all language directions, with an average improvement of 2.0 BLEU. In the WMT2021 task, our system ranks first in 10 directions based on automatic evaluation. △ Less

Submitted 6 August, 2021; originally announced August 2021.

arXiv:2106.11891 [pdf, other]

On the Evaluation of Machine Translation for Terminology Consistency

Authors: Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina

Abstract: As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies. In many scenarios and particularly in cases of domain adaptation, one expects the MT output to adhere to the constraints provided by a terminology. In this work, we propose metrics to measure the consistency of MT output with… ▽ More As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies. In many scenarios and particularly in cases of domain adaptation, one expects the MT output to adhere to the constraints provided by a terminology. In this work, we propose metrics to measure the consistency of MT output with regards to a domain terminology. We perform studies on the COVID-19 domain over 5 languages, also performing terminology-targeted human evaluation. We open-source the code for computing all proposed metrics: https://github.com/mahfuzibnalam/terminology_evaluation △ Less

Submitted 24 June, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: preprint

arXiv:2106.08247 [pdf, ps, other]

Canonical-Correlation-Based Fast Feature Selection

Authors: Sikai Zhang, Tingna Wang, Keith Worden, Elizabeth J. Cross

Abstract: This paper proposes a canonical-correlation-based filter method for feature selection. The sum of squared canonical correlation coefficients is adopted as the feature ranking criterion. The proposed method boosts the computational speed of the ranking criterion in greedy search. The supporting theorems developed for the feature selection method are fundamental to the understanding of the canonical… ▽ More This paper proposes a canonical-correlation-based filter method for feature selection. The sum of squared canonical correlation coefficients is adopted as the feature ranking criterion. The proposed method boosts the computational speed of the ranking criterion in greedy search. The supporting theorems developed for the feature selection method are fundamental to the understanding of the canonical correlation analysis. In empirical studies, a synthetic dataset is used to demonstrate the speed advantage of the proposed method, and eight real datasets are applied to show the effectiveness of the proposed feature ranking criterion in both classification and regression. The results show that the proposed method is considerably faster than the definition-based method, and the proposed ranking criterion is competitive compared with the seven mutual-information-based criteria. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2105.13813 [pdf, other]

doi 10.1016/j.ymssp.2021.107741

Grey-box models for wave loading prediction

Authors: Daniel J Pitchforth, Timothy J Rogers, Ulf T Tygesen, Elizabeth J Cross

Abstract: The quantification of wave loading on offshore structures and components is a crucial element in the assessment of their useful remaining life. In many applications the well-known Morison's equation is employed to estimate the forcing from waves with assumed particle velocities and accelerations. This paper develops a grey-box modelling approach to improve the predictions of the force on structura… ▽ More The quantification of wave loading on offshore structures and components is a crucial element in the assessment of their useful remaining life. In many applications the well-known Morison's equation is employed to estimate the forcing from waves with assumed particle velocities and accelerations. This paper develops a grey-box modelling approach to improve the predictions of the force on structural members. A grey-box model intends to exploit the enhanced predictive capabilities of data-based modelling whilst retaining physical insight into the behaviour of the system; in the context of the work carried out here, this can be considered as physics-informed machine learning. There are a number of possible approaches to establish a grey-box model. This paper demonstrates two means of combining physics (white box) and data-based (black box) components; one where the model is a simple summation of the two components, the second where the white-box prediction is fed into the black box as an additional input. Here Morison's equation is used as the physics-based component in combination with a data-based Gaussian process NARX - a dynamic variant of the more well-known Gaussian process regression. Two key challenges with employing the GP-NARX formulation that are addressed here are the selection of appropriate lag terms and the proper treatment of uncertainty propagation within the dynamic GP. The best performing grey-box model, the residual modelling GP-NARX, was able to achieve a 29.13\% and 5.48\% relative reduction in NMSE over Morison's Equation and a black-box GP-NARX respectively, alongside significant benefits in extrapolative capabilities of the model, in circumstances of low dataset coverage. △ Less

Submitted 30 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Journal ref: Mechanical Systems and Signal Processing, Volume 159, 2021

arXiv:2104.08597 [pdf, other]

XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment

Authors: Ahmed El-Kishky, Adithya Renduchintala, James Cross, Francisco Guzmán, Philipp Koehn

Abstract: Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align)… ▽ More Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align), a technique to automatically mine cross-lingual entity lexica from mined web data. We demonstrate LSP-Align outperforms baselines at extracting cross-lingual entity pairs and mine 164 million entity pairs from 120 different languages aligned with English. We release these cross-lingual entity pairs along with the massively multilingual tagged named entity corpus as a resource to the NLP community. △ Less

Submitted 10 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

arXiv:2103.01676 [pdf, other]

doi 10.1061/AJRUA6.0001106

Probabilistic Inference for Structural Health Monitoring: New Modes of Learning from Data

Authors: Lawrence A. Bull, Paul Gardner, Timothy J. Rogers, Elizabeth J. Cross, Nikolaos Dervilis, Keith Worden

Abstract: In data-driven SHM, the signals recorded from systems in operation can be noisy and incomplete. Data corresponding to each of the operational, environmental, and damage states are rarely available a priori; furthermore, labelling to describe the measurements is often unavailable. In consequence, the algorithms used to implement SHM should be robust and adaptive, while accommodating for missing inf… ▽ More In data-driven SHM, the signals recorded from systems in operation can be noisy and incomplete. Data corresponding to each of the operational, environmental, and damage states are rarely available a priori; furthermore, labelling to describe the measurements is often unavailable. In consequence, the algorithms used to implement SHM should be robust and adaptive, while accommodating for missing information in the training-data -- such that new information can be included if it becomes available. By reviewing novel techniques for statistical learning (introduced in previous work), it is argued that probabilistic algorithms offer a natural solution to the modelling of SHM data in practice. In three case-studies, probabilistic methods are adapted for applications to SHM signals -- including semi-supervised learning, active learning, and multi-task learning. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: This material may be downloaded for personal use only. Any other use requires prior permission of the American Society of Civil Engineers. This material may be found at https://doi.org/10.1061/AJRUA6.0001106

Journal ref: ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering 7.1 (2021): 03120003

arXiv:2101.11711 [pdf, other]

doi 10.1016/j.renene.2020.12.119

Damage detection in operational wind turbine blades using a new approach based on machine learning

Authors: Kartik Chandrasekhar, Nevena Stevanovic, Elizabeth J. Cross, Nikolaos Dervilis, Keith Worden

Abstract: The application of reliable structural health monitoring (SHM) technologies to operational wind turbine blades is a challenging task, due to the uncertain nature of the environments they operate in. In this paper, a novel SHM methodology, which uses Gaussian Processes (GPs) is proposed. The methodology takes advantage of the fact that the blades on a turbine are nominally identical in structural p… ▽ More The application of reliable structural health monitoring (SHM) technologies to operational wind turbine blades is a challenging task, due to the uncertain nature of the environments they operate in. In this paper, a novel SHM methodology, which uses Gaussian Processes (GPs) is proposed. The methodology takes advantage of the fact that the blades on a turbine are nominally identical in structural properties and encounter the same environmental and operational variables (EOVs). The properties of interest are the first edgewise frequencies of the blades. The GPs are used to predict the edge frequencies of one blade given that of another, after these relationships between the pairs of blades have been learned when the blades are in a healthy state. In using this approach, the proposed SHM methodology is able to identify when the blades start behaving differently from one another over time. To validate the concept, the proposed SHM system is applied to real onshore wind turbine blade data, where some form of damage was known to have taken place. X-bar control chart analysis of the residual errors between the GP predictions and actual frequencies show that the system successfully identified early onset of damage as early as six months before it was identified and remedied. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Journal ref: This is an author produced version of a paper subsequently published in Renewable Energy, Elsevier, 2021. Uploaded in accordance with the publisher's self-archiving policy

arXiv:2101.01506 [pdf, other]

doi 10.1016/j.ymssp.2021.107628

Structured Machine Learning Tools for Modelling Characteristics of Guided Waves

Authors: Marcus Haywood-Alexander, Nikolaos Dervilis, Keith Worden, Elizabeth J. Cross, Robin S. Mills, Timothy J. Rogers

Abstract: The use of ultrasonic guided waves to probe the materials/structures for damage continues to increase in popularity for non-destructive evaluation (NDE) and structural health monitoring (SHM). The use of high-frequency waves such as these offers an advantage over low-frequency methods from their ability to detect damage on a smaller scale. However, in order to assess damage in a structure, and imp… ▽ More The use of ultrasonic guided waves to probe the materials/structures for damage continues to increase in popularity for non-destructive evaluation (NDE) and structural health monitoring (SHM). The use of high-frequency waves such as these offers an advantage over low-frequency methods from their ability to detect damage on a smaller scale. However, in order to assess damage in a structure, and implement any NDE or SHM tool, knowledge of the behaviour of a guided wave throughout the material/structure is important (especially when designing sensor placement for SHM systems). Determining this behaviour is extremely diffcult in complex materials, such as fibre-matrix composites, where unique phenomena such as continuous mode conversion takes place. This paper introduces a novel method for modelling the feature-space of guided waves in a composite material. This technique is based on a data-driven model, where prior physical knowledge can be used to create structured machine learning tools; where constraints are applied to provide said structure. The method shown makes use of Gaussian processes, a full Bayesian analysis tool, and in this paper it is shown how physical knowledge of the guided waves can be utilised in modelling using an ML tool. This paper shows that through careful consideration when applying machine learning techniques, more robust models can be generated which offer advantages such as extrapolation ability and physical interpretation. △ Less

Submitted 5 January, 2021; originally announced January 2021.

Comments: 33 pages, 11 figures

arXiv:2012.15127 [pdf, other]

Improving Zero-Shot Translation by Disentangling Positional Information

Authors: Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li

Abstract: Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. W… ▽ More Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations. △ Less

Submitted 30 June, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

Comments: ACL 2021

arXiv:2012.11058 [pdf, other]

A Bayesian methodology for localising acoustic emission sources in complex structures

Authors: Matthew R. Jones, Tim J. Rogers, Keith Worden, Elizabeth J. Cross

Abstract: In the field of structural health monitoring (SHM), the acquisition of acoustic emissions to localise damage sources has emerged as a popular approach. Despite recent advances, the task of locating damage within composite materials and structures that contain non-trivial geometrical features, still poses a significant challenge. Within this paper, a Bayesian source localisation strategy that is ro… ▽ More In the field of structural health monitoring (SHM), the acquisition of acoustic emissions to localise damage sources has emerged as a popular approach. Despite recent advances, the task of locating damage within composite materials and structures that contain non-trivial geometrical features, still poses a significant challenge. Within this paper, a Bayesian source localisation strategy that is robust to these complexities is presented. Under this new framework, a Gaussian process is first used to learn the relationship between source locations and the corresponding difference-in-time-of-arrival values for a number of sensor pairings. As an acoustic emission event with an unknown origin is observed, a map** is then generated that quantifies the likelihood of the emission location across the surface of the structure. The new probabilistic map** offers multiple benefits, leading to a localisation strategy that is more informative than deterministic predictions or single-point estimates with an associated confidence bound. The performance of the approach is investigated on a structure with numerous complex geometrical features and demonstrates a favourable performance in comparison to other similar localisation methods. △ Less

Submitted 20 December, 2020; originally announced December 2020.

Comments: 17 pages, 7 figures

arXiv:2008.10077 [pdf, ps, other]

Learn to Talk via Proactive Knowledge Transfer

Authors: Qing Sun, James Cross

Abstract: Knowledge Transfer has been applied in solving a wide variety of problems. For example, knowledge can be transferred between tasks (e.g., learning to handle novel situations by leveraging prior knowledge) or between agents (e.g., learning from others without direct experience). Without loss of generality, we relate knowledge transfer to KL-divergence minimization, i.e., matching the (belief) distr… ▽ More Knowledge Transfer has been applied in solving a wide variety of problems. For example, knowledge can be transferred between tasks (e.g., learning to handle novel situations by leveraging prior knowledge) or between agents (e.g., learning from others without direct experience). Without loss of generality, we relate knowledge transfer to KL-divergence minimization, i.e., matching the (belief) distributions of learners and teachers. The equivalence gives us a new perspective in understanding variants of the KL-divergence by looking at how learners structure their interaction with teachers in order to acquire knowledge. In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward. In contrast, learners are supervised in Forward. Moreover, our analysis is gradient-based, so it can be generalized to arbitrary tasks and help to decide which order to minimize given the property of the task. By replacing Forward with Backward in Knowledge Distillation, we observed +0.7-1.1 BLEU gains on the WMT'17 De-En and IWSLT'15 Th-En machine translation tasks. △ Less

Submitted 23 August, 2020; originally announced August 2020.

arXiv:2006.10369 [pdf, other]

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Authors: Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith

Abstract: Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where generation is sequential, the former allows generation to be parallelized across target token positions. Some of the latest non-autoregressive models have achieved… ▽ More Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where generation is sequential, the former allows generation to be parallelized across target token positions. Some of the latest non-autoregressive models have achieved impressive translation quality-speed tradeoffs compared to autoregressive baselines. In this work, we reexamine this tradeoff and argue that autoregressive baselines can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed. We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation. Our results establish a new protocol for future research toward fast, accurate machine translation. Our code is available at https://github.com/jungokasai/deep-shallow. △ Less

Submitted 24 June, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: ICLR 2021 Final Version

arXiv:2003.07216 [pdf, other]

Image Quality Transfer Enhances Contrast and Resolution of Low-Field Brain MRI in African Paediatric Epilepsy Patients

Authors: Matteo Figini, Hongxiang Lin, Godwin Ogbole, Felice D Arco, Stefano B. Blumberg, David W. Carmichael, Ryutaro Tanno, Enrico Kaden, Biobele J. Brown, Ikeoluwa Lagunju, Helen J. Cross, Delmiro Fernandez-Reyes, Daniel C. Alexander

Abstract: 1.5T or 3T scanners are the current standard for clinical MRI, but low-field (<1T) scanners are still common in many lower- and middle-income countries for reasons of cost and robustness to power failures. Compared to modern high-field scanners, low-field scanners provide images with lower signal-to-noise ratio at equivalent resolution, leaving practitioners to compensate by using large slice thic… ▽ More 1.5T or 3T scanners are the current standard for clinical MRI, but low-field (<1T) scanners are still common in many lower- and middle-income countries for reasons of cost and robustness to power failures. Compared to modern high-field scanners, low-field scanners provide images with lower signal-to-noise ratio at equivalent resolution, leaving practitioners to compensate by using large slice thickness and incomplete spatial coverage. Furthermore, the contrast between different types of brain tissue may be substantially reduced even at equal signal-to-noise ratio, which limits diagnostic value. Recently the paradigm of Image Quality Transfer has been applied to enhance 0.36T structural images aiming to approximate the resolution, spatial coverage, and contrast of typical 1.5T or 3T images. A variant of the neural network U-Net was trained using low-field images simulated from the publicly available 3T Human Connectome Project dataset. Here we present qualitative results from real and simulated clinical low-field brain images showing the potential value of IQT to enhance the clinical utility of readily accessible low-field MRIs in the management of epilepsy. △ Less

Submitted 18 March, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Comments: 6 pages, 3 figures, accepted at ICLR 2020 workshop on Artificial Intelligence for Affordable Healthcare

arXiv:2001.05136 [pdf, other]

Non-Autoregressive Machine Translation with Disentangled Context Transformer

Authors: Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu

Abstract: State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo)… ▽ More State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 translation directions with varying data sizes demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average. Our code is available at https://github.com/facebookresearch/DisCo. △ Less

Submitted 30 June, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: ICML 2020

arXiv:1912.11936 [pdf, other]

Smell Pittsburgh: Engaging Community Citizen Science for Air Quality

Authors: Yen-Chia Hsu, Jennifer Cross, Paul Dille, Michael Tasota, Beatrice Dias, Randy Sargent, Ting-Hao 'Kenneth' Huang, Illah Nourbakhsh

Abstract: Urban air pollution has been linked to various human health concerns, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently conc… ▽ More Urban air pollution has been linked to various human health concerns, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. We also applied regression analysis to identify statistically significant effects of push notifications on user engagement. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality. All citizen-contributed smell data are publicly accessible and can be downloaded from https://smellpgh.org. △ Less

Submitted 20 November, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

Comments: Accepted by ACM Transactions on Interactive Intelligent Systems on 2020. This is an extended version of the arXiv:1810.11143, which was accepted by the ACM IUI 2019 conference. arXiv admin note: substantial text overlap with arXiv:1810.11143

arXiv:1909.12406 [pdf, other]

Monotonic Multihead Attention

Authors: Xutai Ma, Juan Pino, James Cross, Liezl Puzon, Jiatao Gu

Abstract: Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attentio… ▽ More Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention. We also introduce two novel and interpretable approaches for latency control that are specifically designed for multiple attentions heads. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We also analyze how the latency controls affect the attention span and we motivate the introduction of our model by analyzing the effect of the number of decoder layers and heads on quality and latency. △ Less

Submitted 26 September, 2019; originally announced September 2019.

arXiv:1909.06763 [pdf, other]

Deep Learning for Low-Field to High-Field MR: Image Quality Transfer with Probabilistic Decimation Simulator

Authors: Hongxiang Lin, Matteo Figini, Ryutaro Tanno, Stefano B. Blumberg, Enrico Kaden, Godwin Ogbole, Biobele J. Brown, Felice D'Arco, David W. Carmichael, Ikeoluwa Lagunju, Helen J. Cross, Delmiro Fernandez-Reyes, Daniel C. Alexander

Abstract: MR images scanned at low magnetic field ($<1$T) have lower resolution in the slice direction and lower contrast, due to a relatively small signal-to-noise ratio (SNR) than those from high field (typically 1.5T and 3T). We adapt the recent idea of Image Quality Transfer (IQT) to enhance very low-field structural images aiming to estimate the resolution, spatial coverage, and contrast of high-field… ▽ More MR images scanned at low magnetic field ($<1$T) have lower resolution in the slice direction and lower contrast, due to a relatively small signal-to-noise ratio (SNR) than those from high field (typically 1.5T and 3T). We adapt the recent idea of Image Quality Transfer (IQT) to enhance very low-field structural images aiming to estimate the resolution, spatial coverage, and contrast of high-field images. Analogous to many learning-based image enhancement techniques, IQT generates training data from high-field scans alone by simulating low-field images through a pre-defined decimation model. However, the ground truth decimation model is not well-known in practice, and lack of its specification can bias the trained model, aggravating performance on the real low-field scans. In this paper we propose a probabilistic decimation simulator to improve robustness of model training. It is used to generate and augment various low-field images whose parameters are random variables and sampled from an empirical distribution related to tissue-specific SNR on a 0.36T scanner. The probabilistic decimation simulator is model-agnostic, that is, it can be used with any super-resolution networks. Furthermore we propose a variant of U-Net architecture to improve its learning performance. We show promising qualitative results from clinical low-field images confirming the strong efficacy of IQT in an important new application area: epilepsy diagnosis in sub-Saharan Africa where only low-field scanners are normally available. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1810.11143 [pdf, other]

Smell Pittsburgh: Community-Empowered Mobile Smell Reporting System

Authors: Yen-Chia Hsu, Jennifer Cross, Paul Dille, Michael Tasota, Beatrice Dias, Randy Sargent, Ting-Hao 'Kenneth' Huang, Illah Nourbakhsh

Abstract: Urban air pollution has been linked to various human health considerations, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequentl… ▽ More Urban air pollution has been linked to various human health considerations, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality. △ Less

Submitted 1 July, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: Accepted by ACM IUI 2019 conference, with error corrections

arXiv:1809.00125 [pdf, other]

Simple Fusion: Return of the Language Model

Authors: Felix Stahlberg, James Cross, Veselin Stoyanov

Abstract: Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation. We investigate an alternative simple method to use monolingual data for NMT training: We combine the scores of a pre-trained and fixed language model (LM) with the scores of a translation model (TM) while the TM is trained from scratch. To achieve that, we train the translation model to predi… ▽ More Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation. We investigate an alternative simple method to use monolingual data for NMT training: We combine the scores of a pre-trained and fixed language model (LM) with the scores of a translation model (TM) while the TM is trained from scratch. To achieve that, we train the translation model to predict the residual probability of the training data added to the prediction of the LM. This enables the TM to focus its capacity on modeling the source sentence since it can rely on the LM for fluency. We show that our method outperforms previous approaches to integrate LMs into NMT while the architecture is simpler as it does not require gating networks to balance TM and LM. We observe gains of between +0.24 and +2.36 BLEU on all four test sets (English-Turkish, Turkish-English, Estonian-English, Xhosa-English) on top of ensembles without LM. We compare our method with alternative ways to utilize monolingual data such as backtranslation, shallow fusion, and cold fusion. △ Less

Submitted 24 January, 2019; v1 submitted 1 September, 2018; originally announced September 2018.

Comments: WMT18 paper

arXiv:1804.03293 [pdf, other]

Community-Empowered Air Quality Monitoring System

Authors: Yen-Chia Hsu, Paul Dille, Jennifer Cross, Beatrice Dias, Randy Sargent, Illah Nourbakhsh

Abstract: Develo** information technology to democratize scientific knowledge and support citizen empowerment is a challenging task. In our case, a local community suffered from air pollution caused by industrial activity. The residents lacked the technological fluency to gather and curate diverse scientific data to advocate for regulatory change. We collaborated with the community in develo** an air qu… ▽ More Develo** information technology to democratize scientific knowledge and support citizen empowerment is a challenging task. In our case, a local community suffered from air pollution caused by industrial activity. The residents lacked the technological fluency to gather and curate diverse scientific data to advocate for regulatory change. We collaborated with the community in develo** an air quality monitoring system which integrated heterogeneous data over a large spatial and temporal scale. The system afforded strong scientific evidence by using animated smoke images, air quality data, crowdsourced smell reports, and wind data. In our evaluation, we report patterns of sharing smoke images among stakeholders. Our survey study shows that the scientific knowledge provided by the system encourages agonistic discussions with regulators, empowers the community to support policy making, and rebalances the power relationship between stakeholders. △ Less

Submitted 9 April, 2018; originally announced April 2018.

Comments: Accepted by 2017 ACM Conference on Human Factors in Computing Systems (CHI 2017)

arXiv:1804.03263 [pdf, other]

Visualization Tool for Environmental Sensing and Public Health Data

Authors: Yen-Chia Hsu, Jennifer Cross, Paul Dille, Illah Nourbakhsh, Leann Leiter, Ryan Grode

Abstract: To assist residents affected by oil and gas development, public health professionals in a non-profit organization have collected community data, including symptoms, air quality, and personal stories. However, the organization was unable to aggregate and visualize these data computationally. We present the Environmental Health Channel, an interactive web-based tool for visualizing environmental sen… ▽ More To assist residents affected by oil and gas development, public health professionals in a non-profit organization have collected community data, including symptoms, air quality, and personal stories. However, the organization was unable to aggregate and visualize these data computationally. We present the Environmental Health Channel, an interactive web-based tool for visualizing environmental sensing and public health data. This tool enables discussing and disseminating scientific evidence to reveal local environmental and health impacts of industrial activities. △ Less

Submitted 9 April, 2018; originally announced April 2018.

Comments: Accepted by 2018 ACM Conference Companion Publication on Designing Interactive Systems (DIS 2018)

arXiv:1612.06475 [pdf, other]

Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles

Authors: James Cross, Liang Huang

Abstract: Parsing accuracy using efficient greedy transition systems has improved dramatically in recent years thanks to neural networks. Despite striking results in dependency parsing, however, neural models have not surpassed state-of-the-art approaches in constituency parsing. To remedy this, we introduce a new shift-reduce system whose stack contains merely sentence spans, represented by a bare minimum… ▽ More Parsing accuracy using efficient greedy transition systems has improved dramatically in recent years thanks to neural networks. Despite striking results in dependency parsing, however, neural models have not surpassed state-of-the-art approaches in constituency parsing. To remedy this, we introduce a new shift-reduce system whose stack contains merely sentence spans, represented by a bare minimum of LSTM features. We also design the first provably optimal dynamic oracle for constituency parsing, which runs in amortized O(1) time, compared to O(n^3) oracles for standard dependency parsing. Training with this oracle, we achieve the best F1 scores on both English and French of any parser that does not use reranking or external data. △ Less

Submitted 19 December, 2016; originally announced December 2016.

Comments: EMNLP 2016

arXiv:1608.06459 [pdf, other]

Tracking Amendments to Legislation and Other Political Texts with a Novel Minimum-Edit-Distance Algorithm: DocuToads

Authors: Henrik Hermansson, James P. Cross

Abstract: Political scientists often find themselves tracking amendments to political texts. As different actors weigh in, texts change as they are drafted and redrafted, reflecting political preferences and power. This study provides a novel solution to the prob- lem of detecting amendments to political text based upon minimum edit distances. We demonstrate the usefulness of two language-insensitive, trans… ▽ More Political scientists often find themselves tracking amendments to political texts. As different actors weigh in, texts change as they are drafted and redrafted, reflecting political preferences and power. This study provides a novel solution to the prob- lem of detecting amendments to political text based upon minimum edit distances. We demonstrate the usefulness of two language-insensitive, transparent, and efficient minimum-edit-distance algorithms suited for the task. These algorithms are capable of providing an account of the types (insertions, deletions, substitutions, and trans- positions) and substantive amount of amendments made between version of texts. To illustrate the usefulness and efficiency of the approach we replicate two existing stud- ies from the field of legislative studies. Our results demonstrate that minimum edit distance methods can produce superior measures of text amendments to hand-coded efforts in a fraction of the time and resource costs. △ Less

Submitted 23 August, 2016; originally announced August 2016.

arXiv:1607.03055 [pdf, other]

Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

Authors: Derek Greene, James P. Cross

Abstract: This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP speech content is analyzed using a new dynamic topic… ▽ More This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP speech content is analyzed using a new dynamic topic modeling method based on two layers of Non-negative Matrix Factorization (NMF). This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that two-layer NMF is a valuable alternative to existing dynamic topic modeling approaches found in the literature, and can unveil niche topics and associated vocabularies not captured by existing methods. Substantively, our findings suggest that the political agenda of the EP evolves significantly over time and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis. MEP contributions to the plenary agenda are also found to be impacted upon by voting behaviour and the committee structure of the Parliament. △ Less

Submitted 11 July, 2016; originally announced July 2016.

Comments: Long version including appendix. arXiv admin note: substantial text overlap with arXiv:1505.07302

arXiv:1606.06406 [pdf, other]

Incremental Parsing with Minimal Features Using Bi-Directional LSTM

Authors: James Cross, Liang Huang

Abstract: Recently, neural network approaches for parsing have largely automated the combination of individual features, but still rely on (often a larger number of) atomic features created from human linguistic intuition, and potentially omitting important global context. To further reduce feature engineering to the bare minimum, we use bi-directional LSTM sentence representations to model a parser state w… ▽ More Recently, neural network approaches for parsing have largely automated the combination of individual features, but still rely on (often a larger number of) atomic features created from human linguistic intuition, and potentially omitting important global context. To further reduce feature engineering to the bare minimum, we use bi-directional LSTM sentence representations to model a parser state with only three sentence positions, which automatically identifies important aspects of the entire sentence. This model achieves state-of-the-art results among greedy dependency parsers for English. We also introduce a novel transition system for constituency parsing which does not require binarization, and together with the above architecture, achieves state-of-the-art results among greedy parsers for both English and Chinese. △ Less

Submitted 20 June, 2016; originally announced June 2016.

Comments: Pre-print of paper appearing in ACL 2016

arXiv:1511.06312 [pdf, ps, other]

Good, Better, Best: Choosing Word Embedding Context

Authors: James Cross, Bing Xiang, Bowen Zhou

Abstract: We propose two methods of learning vector representations of words and phrases that each combine sentence context with structural features extracted from dependency trees. Using several variations of neural network classifier, we show that these combined methods lead to improved performance when used as input features for supervised term-matching. We propose two methods of learning vector representations of words and phrases that each combine sentence context with structural features extracted from dependency trees. Using several variations of neural network classifier, we show that these combined methods lead to improved performance when used as input features for supervised term-matching. △ Less

Submitted 19 November, 2015; originally announced November 2015.

arXiv:1505.07302 [pdf, other]

Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis

Authors: Derek Greene, James P. Cross

Abstract: This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those… ▽ More This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those speeches. To detect latent themes in legislative speeches over time, speech content is analyzed using a new dynamic topic modeling method, based on two layers of matrix factorization. This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that the political agenda of the EP has evolved significantly over time, is impacted upon by the committee structure of the Parliament, and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis have a significant impact on what is being discussed in Parliament. △ Less

Submitted 7 July, 2015; v1 submitted 27 May, 2015; originally announced May 2015.

Comments: Add link to implementation code on Github

Showing 1–45 of 45 results for author: Cross, J