Search | arXiv e-print repository

arXiv:2404.01981 [pdf, ps, other]

Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials

Authors: Ali Akram, Marija Stanojevic, Malikeh Ehghaghi, Jekaterina Novikova

Abstract: Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients… ▽ More Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial. Since clinical studies are often conducted across different countries, creating a system that can perform speaker verification in diverse languages without additional development effort is imperative. We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages. Our results demonstrate that tested models can effectively generalize to clinical speakers, with less than 2.7% EER for European Languages and 8.26% EER for Arabic. This represents a significant step in develo** more versatile and efficient speaker verification systems for cognitive and mental health clinical trials that can be used across a wide range of languages and dialects, substantially reducing the effort required to develop speaker verification systems for multiple languages. We also evaluate how speech tasks and number of speakers involved in the trial influence the performance and show that the type of speech tasks impacts the model performance. △ Less

Submitted 5 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2401.18046 [pdf, other]

Multipath parsing in the brain

Authors: Berta Franzluebbers, Donald Dunagan, Miloš Stanojević, Jan Buys, John T. Hale

Abstract: Humans understand sentences word-by-word, in the order that they hear them. This incrementality entails resolving temporary ambiguities about syntactic relationships. We investigate how humans process these syntactic ambiguities by correlating predictions from incremental generative dependency parsers with timecourse data from people undergoing functional neuroimaging while listening to an audiobo… ▽ More Humans understand sentences word-by-word, in the order that they hear them. This incrementality entails resolving temporary ambiguities about syntactic relationships. We investigate how humans process these syntactic ambiguities by correlating predictions from incremental generative dependency parsers with timecourse data from people undergoing functional neuroimaging while listening to an audiobook. In particular, we compare competing hypotheses regarding the number of develo** syntactic analyses in play during word-by-word comprehension: one vs more than one. This comparison involves evaluating syntactic surprisal from a state-of-the-art dependency parser with LLM-adapted encodings against an existing fMRI dataset. In both English and Chinese data, we find evidence for multipath parsing. Brain regions associated with this multipath effect include bilateral superior temporal gyrus. △ Less

Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: Accepted at ACL2024, main conference. 15 pages

arXiv:2308.03291 [pdf, other]

SynJax: Structured Probability Distributions for JAX

Authors: Miloš Stanojević, Laurent Sartran

Abstract: The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to th… ▽ More The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form. SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. This is done by exploiting the connection between algorithms for automatic differentiation and probabilistic inference. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/google-deepmind/synjax △ Less

Submitted 15 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2306.12444 [pdf, other]

Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials

Authors: Malikeh Ehghaghi, Marija Stanojevic, Ali Akram, Jekaterina Novikova

Abstract: Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Develo** accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals and remove duplicates, but the size and quality o… ▽ More Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Develo** accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals and remove duplicates, but the size and quality of data influence ASV performance. However, there has been limited investigation into the factors that can affect ASV capabilities in clinical environments. In this paper, we bridge the gap by conducting analysis of how participant demographic characteristics, audio quality criteria, and severity level of Alzheimer's disease (AD) impact the performance of ASV utilizing a dataset of speech recordings from 659 participants with varying levels of AD, obtained through multiple speech tasks. Our results indicate that ASV performance: 1) is slightly better on male speakers than on female speakers; 2) degrades for individuals who are above 70 years old; 3) is comparatively better for non-native English speakers than for native English speakers; 4) is negatively affected by clinician interference, noisy background, and unclear participant speech; 5) tends to decrease with an increase in the severity level of AD. Our study finds that voice biometrics raise fairness concerns as certain subgroups exhibit different ASV performances owing to their inherent voice characteristics. Moreover, the performance of ASV is influenced by the quality of speech recordings, which underscores the importance of improving the data collection settings in clinical trials. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Accepted to the 5th Clinical Natural Language Processing Workshop (ClinicalNLP) at ACL 2023

arXiv:2212.14490 [pdf, other]

Multi-modal deep learning system for depression and anxiety detection

Authors: Brian Diep, Marija Stanojevic, Jekaterina Novikova

Abstract: Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a rich and readily available source of insight into… ▽ More Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a rich and readily available source of insight into an individual's cognitive state and by leveraging different aspects of speech, we can develop new digital biomarkers for depression and anxiety. To this end, we propose a multi-modal system for the screening of depression and anxiety from self-administered speech tasks. The proposed model integrates deep-learned features from audio and text, as well as hand-crafted features that are informed by clinically-validated domain knowledge. We find that augmenting hand-crafted features with deep-learned features improves our overall classification F1 score comparing to a baseline of hand-crafted features alone from 0.58 to 0.63 for depression and from 0.54 to 0.57 for anxiety. The findings of our work suggest that speech-based biomarkers for depression and anxiety hold significant promise in the future of digital health. △ Less

Submitted 29 December, 2022; originally announced December 2022.

Comments: accepted to the PAI4MH workshop at NeurIPS 2022

arXiv:2210.16147 [pdf, other]

Modeling structure-building in the brain with CCG parsing and large language models

Authors: Miloš Stanojević, Jonathan R. Brennan, Donald Dunagan, Mark Steedman, John T. Hale

Abstract: To model behavioral and neural correlates of language comprehension in naturalistic environments researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFG), yet such formalisms are not sufficiently expressive for human languages. Combinator… ▽ More To model behavioral and neural correlates of language comprehension in naturalistic environments researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFG), yet such formalisms are not sufficiently expressive for human languages. Combinatory Categorial Grammars (CCGs) are sufficiently expressive directly compositional models of grammar with flexible constituency that affords incremental interpretation. In this work we evaluate whether a more expressive CCG provides a better model than a CFG for human neural signals collected with fMRI while participants listen to an audiobook story. We further test between variants of CCG that differ in how they handle optional adjuncts. These evaluations are carried out against a baseline that includes estimates of next-word predictability from a Transformer neural network language model. Such a comparison reveals unique contributions of CCG structure-building predominantly in the left posterior temporal lobe: CCG-derived measures offer a superior fit to neural signals compared to those derived from a CFG. These effects are spatially distinct from bilateral superior temporal effects that are unique to predictability. Neural effects for structure-building are thus separable from predictability during naturalistic listening, and those effects are best characterized by a grammar whose expressive power is motivated on independent linguistic grounds. △ Less

Submitted 16 April, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

arXiv:2205.12621 [pdf, other]

Unbiased and Efficient Sampling of Dependency Trees

Authors: Miloš Stanojević

Abstract: Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are suboptimal for inference over dep… ▽ More Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are suboptimal for inference over dependency trees. Zmigrod et al. (2021b) proposed algorithms for sampling with and without replacement from the dependency tree distribution that incorporate the single-root constraint. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of algorithm for sampling k trees without replacement to O(kn3). These algorithms are both asymptotically and practically more efficient. △ Less

Submitted 28 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: 16 pages, 4 algorithms, 7 figures

arXiv:2203.00633 [pdf, other]

doi 10.1162/tacl_a_00526

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Authors: Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer

Abstract: We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentenc… ▽ More We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism -- one that is independent of composed syntactic representations -- plays an important role in current successful models of long text. △ Less

Submitted 6 December, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 17 pages, 5 figures, 2 tables and 1 algorithm. To appear in TACL, to be presented at EMNLP 2022

arXiv:2109.09393 [pdf, other]

doi 10.18653/v1/2021.case-1.6

Modality and Negation in Event Extraction

Authors: Sander Bijl de Vroe, Liane Guillou, Miloš Stanojević, Nick McKenna, Mark Steedman

Abstract: Language provides speakers with a rich system of modality for expressing thoughts about events, without being committed to their actual occurrence. Modality is commonly used in the political news domain, where both actual and possible courses of events are discussed. NLP systems struggle with these semantic phenomena, often incorrectly extracting events which did not happen, which can lead to issu… ▽ More Language provides speakers with a rich system of modality for expressing thoughts about events, without being committed to their actual occurrence. Modality is commonly used in the political news domain, where both actual and possible courses of events are discussed. NLP systems struggle with these semantic phenomena, often incorrectly extracting events which did not happen, which can lead to issues in downstream applications. We present an open-domain, lexicon-based event extraction system that captures various types of modality. This information is valuable for Question Answering, Knowledge Graph construction and Fact-checking tasks, and our evaluation shows that the system is sufficiently strong to be used in downstream applications. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: S. Bijl de Vroe, L. Guillou, M. Stanojević, N. McKenna, and M. Steedman. 2021. Modality and Negation in Event Extraction. In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), pages 31-42, online. Association for Computational Linguistics

Journal ref: In Proceedings of CASE 2021, pages 31-42, online. Association for Computational Linguistics

arXiv:2103.06130 [pdf, other]

Stay on Topic, Please: Aligning User Comments to the Content of a News Article

Authors: Jumanah Alshehri, Marija Stanojevic, Eduard Dragut, Zoran Obradovic

Abstract: Social scientists have shown that up to 50% if the content posted to a news article have no relation to its journalistic content. In this study we propose a classification algorithm to categorize user comments posted to a new article base don their alignment to its content. The alignment seek to match user comments to an article based on similarity off content, entities in discussion, and topic. W… ▽ More Social scientists have shown that up to 50% if the content posted to a news article have no relation to its journalistic content. In this study we propose a classification algorithm to categorize user comments posted to a new article base don their alignment to its content. The alignment seek to match user comments to an article based on similarity off content, entities in discussion, and topic. We proposed a BERTAC, BAERT-based approach that learn jointly article-comment embeddings and infers the relevance class of comments. We introduce an ordinal classification loss that penalizes the difference between the predicted and true label. We conduct a thorough study to show influence of the proposed loss on the learning process. The results on five representative news outlets show that our approach can learn the comment class with up to 36% average accuracy improvement compering to the baselines, and up to 25% compering to the BA-BC model. BA-BC is out approach that consists of two models aimed to capture dis-jointly the formal language of news articles and the informal language of comments. We also conduct a user study to evaluate human labeling performance to understand the difficulty of the classification task. The user agreement on comment-article alignment is "moderate" per Krippendorff's alpha score, which suggests that the classification task is difficult. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: Accepted as a full paper at the 43rd European Conference on Information Retrieval

arXiv:2005.00950 [pdf]

Extracting Entities and Topics from News and Connecting Criminal Records

Authors: Quang Pham, Marija Stanojevic, Zoran Obradovic

Abstract: The goal of this paper is to summarize methodologies used in extracting entities and topics from a database of criminal records and from a database of newspapers. Statistical models had successfully been used in studying the topics of roughly 300,000 New York Times articles. In addition, these models had also been used to successfully analyze entities related to people, organizations, and places (… ▽ More The goal of this paper is to summarize methodologies used in extracting entities and topics from a database of criminal records and from a database of newspapers. Statistical models had successfully been used in studying the topics of roughly 300,000 New York Times articles. In addition, these models had also been used to successfully analyze entities related to people, organizations, and places (D Newman, 2006). Additionally, analytical approaches, especially in hotspot map**, were used in some researches with an aim to predict crime locations and circumstances in the future, and those approaches had been tested quite successfully (S Chainey, 2008). Based on the two above notions, this research was performed with the intention to apply data science techniques in analyzing a big amount of data, selecting valuable intelligence, clustering violations depending on their types of crime, and creating a crime graph that changes through time. In this research, the task was to download criminal datasets from Kaggle and a collection of news articles from Kaggle and EAGER project databases, and then to merge these datasets into one general dataset. The most important goal of this project was performing statistical and natural language processing methods to extract entities and topics as well as to group similar data points into correct clusters, in order to understand public data about U.S related crimes better. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: This is a report submitted by an undergraduate student as preliminary work on this problem

arXiv:1508.02445 [pdf, ps, other]

Removing Biases from Trainable MT Metrics by Using Self-Training

Authors: Miloš Stanojević

Abstract: Most trainable machine translation (MT) metrics train their weights on human judgments of state-of-the-art MT systems outputs. This makes trainable metrics biases in many ways. One of them is preferring longer translations. These biased metrics when used for tuning are evaluating different types of translations -- n-best lists of translations with very diverse quality. Systems tuned with these met… ▽ More Most trainable machine translation (MT) metrics train their weights on human judgments of state-of-the-art MT systems outputs. This makes trainable metrics biases in many ways. One of them is preferring longer translations. These biased metrics when used for tuning are evaluating different types of translations -- n-best lists of translations with very diverse quality. Systems tuned with these metrics tend to produce overly long translations that are preferred by the metric but not by humans. This is usually solved by manually tweaking metric's weights to equally value recall and precision. Our solution is more general: (1) it does not address only the recall bias but also all other biases that might be present in the data and (2) it does not require any knowledge of the types of features used which is useful in cases when manual tuning of metric's weights is not possible. This is accomplished by self-training on unlabeled n-best lists by using metric that was initially trained on standard human judgments. One way of looking at this is as domain adaptation from the domain of state-of-the-art MT translations to diverse n-best list translations. △ Less

Submitted 10 August, 2015; originally announced August 2015.

Showing 1–12 of 12 results for author: Stanojevic, M