Skip to main content

Showing 1–21 of 21 results for author: Paraskevopoulos, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15284  [pdf, other

    cs.CL cs.SD eess.AS

    The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data

    Authors: Georgios Paraskevopoulos, Chara Tsoukala, Athanasios Katsamanis, Vassilis Katsouros

    Abstract: The development of speech technologies for languages with limited digital representation poses significant challenges, primarily due to the scarcity of available data. This issue is exacerbated in the era of large, data-intensive models. Recent research has underscored the potential of leveraging weak supervision to augment the pool of available data. In this study, we compile an 800-hour corpus o… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: To be presented at Interspeech 2024

  2. arXiv:2402.02302  [pdf, other

    eess.AS cs.CL

    Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

    Authors: Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

    Abstract: While massively multilingual speech models like wav2vec 2.0 XLSR-128 can be directly fine-tuned for automatic speech recognition (ASR), downstream performance can still be relatively poor on languages that are under-represented in the pre-training data. Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted for SIGTYP2024

  3. arXiv:2309.11140  [pdf, other

    cs.SD cs.LG eess.AS

    Investigating Personalization Methods in Text to Music Generation

    Authors: Manos Plitsis, Theodoros Kouzelis, Georgios Paraskevopoulos, Vassilis Katsouros, Yannis Panagakis

    Abstract: In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. We experiment with the effect of audio-specific data augmentation on the overall system performance and a… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024, Examples at https://zelaki.github.io/

  4. arXiv:2306.00996  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling

    Authors: Theodoros Kouzelis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros

    Abstract: The study of speech disorders can benefit greatly from time-aligned data. However, audio-text mismatches in disfluent speech cause rapid performance degradation for modern speech aligners, hindering the use of automatic approaches. In this work, we propose a simple and effective modification of alignment graph construction of CTC-based models using Weighted Finite State Transducers. The proposed w… ▽ More

    Submitted 30 May, 2023; originally announced June 2023.

    Comments: Interspeech 2023

  5. arXiv:2303.14279  [pdf, other

    cs.CL

    Depression detection in social media posts using affective and social norm features

    Authors: Ilias Triantafyllopoulos, Georgios Paraskevopoulos, Alexandros Potamianos

    Abstract: We propose a deep architecture for depression detection from social media posts. The proposed architecture builds upon BERT to extract language representations from social media posts and combines these representations using an attentive bidirectional GRU network. We incorporate affective information, by augmenting the text representations with features extracted from a pretrained emotion classifi… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  6. arXiv:2301.00304  [pdf, other

    cs.CL cs.SD eess.AS

    Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek

    Authors: Georgios Paraskevopoulos, Theodoros Kouzelis, Georgios Rouvalis, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos

    Abstract: Modern speech recognition systems exhibits rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where diversity of training data is limited. In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervis… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

  7. arXiv:2212.00678  [pdf, other

    cs.CL cs.CV cs.LG

    Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis

    Authors: Odysseas S. Chlapanis, Georgios Paraskevopoulos, Alexandros Potamianos

    Abstract: Multimodal learning pipelines have benefited from the success of pretrained language models. However, this comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. The adapter adjusts the pretrained language model for the task at… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  8. arXiv:2210.01191  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Extending Compositional Attention Networks for Social Reasoning in Videos

    Authors: Christina Sartzetaki, Georgios Paraskevopoulos, Alexandros Potamianos

    Abstract: We propose a novel deep architecture for the task of reasoning about social interactions in videos. We leverage the multi-step reasoning capabilities of Compositional Attention Networks (MAC), and propose a multimodal extension (MAC-X). MAC-X is based on a recurrent cell that performs iterative mid-level fusion of input modalities (visual, auditory, text) over multiple reasoning steps, by use of a… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Journal ref: Proc. Interspeech 2022, 1116-1120

  9. arXiv:2204.13437  [pdf, other

    cs.SD cs.LG eess.AS

    Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

    Authors: Efthymios Georgiou, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos

    Abstract: Recent deep learning Text-to-Speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of Tacotron2 which aims to alleviate the training iss… ▽ More

    Submitted 14 July, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

  10. arXiv:2201.09828  [pdf, other

    cs.LG cs.CV

    MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis

    Authors: Georgios Paraskevopoulos, Efthymios Georgiou, Alexandros Potamianos

    Abstract: Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations (late/mid fusion) or low level sensory inputs (early fusion). Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived, i.e. cognition affects perception. These top-down inte… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Accepted, ICASSP 2022

  11. arXiv:2111.00310  [pdf, other

    cs.CL cs.AI

    EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments

    Authors: Emmanouil Zaranis, Georgios Paraskevopoulos, Athanasios Katsamanis, Alexandros Potamianos

    Abstract: In this paper, we introduce EmpBot: an end-to-end empathetic chatbot. Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner and respond appropriately. To this end, we propose a method based on a transformer pretrained language model (T5). Specifically, during finetuning we propose to use three obje… ▽ More

    Submitted 30 October, 2021; originally announced November 2021.

  12. arXiv:2110.06986  [pdf, other

    cs.IT cs.CV cs.IR cs.LG

    ADMM-DAD net: a deep unfolding network for analysis compressed sensing

    Authors: Vasiliki Kouni, Georgios Paraskevopoulos, Holger Rauhut, George C. Alexandropoulos

    Abstract: In this paper, we propose a new deep unfolding neural network based on the ADMM algorithm for analysis Compressed Sensing. The proposed network jointly learns a redundant analysis operator for sparsification and reconstructs the signal of interest. We compare our proposed network with a state-of-the-art unfolded ISTA decoder, that also learns an orthogonal sparsifier. Moreover, we consider not onl… ▽ More

    Submitted 2 May, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 1506-1510

  13. arXiv:2104.07078  [pdf, other

    cs.CL

    UDALM: Unsupervised Domain Adaptation through Language Modeling

    Authors: Constantinos Karouzos, Georgios Paraskevopoulos, Alexandros Potamianos

    Abstract: In this work we explore Unsupervised Domain Adaptation (UDA) of pretrained language models for downstream tasks. We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss, that can adapt to the target domain distribution in a robust and sample efficient manner. Our experiments show that performance of models trained with the mixed loss scales with the… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: Accepted for publication in 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

  14. Unsupervised low-rank representations for speech emotion recognition

    Authors: Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos

    Abstract: We examine the use of linear and non-linear dimensionality reduction algorithms for extracting low-rank feature representations for speech emotion recognition. Two feature sets are used, one based on low-level descriptors and their aggregations (IS10) and one modeling recurrence dynamics of speech (RQA), as well as their fusion. We report speech emotion recognition (SER) results for learned repres… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: Published at Interspeech 2019 https://www.isca-speech.org/archive/Interspeech_2019/abstracts/2769.html

  15. arXiv:2006.08336  [pdf, other

    cs.CL cs.LG stat.ML

    Affective Conditioning on Hierarchical Networks applied to Depression Detection from Transcribed Clinical Interviews

    Authors: D. Xezonaki, G. Paraskevopoulos, A. Potamianos, S. Narayanan

    Abstract: In this work we propose a machine learning model for depression detection from transcribed clinical interviews. Depression is a mental disorder that impacts not only the subject's mood but also the use of language. To this end we use a Hierarchical Attention Network to classify interviews of depressed subjects. We augment the attention layer of our model with a conditioning mechanism on linguistic… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  16. arXiv:2004.14840  [pdf, other

    eess.AS cs.CV cs.LG cs.SD stat.ML

    Multiresolution and Multimodal Speech Recognition with Transformers

    Authors: Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

    Abstract: This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract representations for audio features in the encoder layers of the transformer and fuse video features using an additional crossmodal multihead attention layer. Additionally… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted for ACL 2020

  17. arXiv:1811.04133  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Integrating Recurrence Dynamics for Speech Emotion Recognition

    Authors: Efthymios Tzinis, Georgios Paraskevopoulos, Christos Baziotis, Alexandros Potamianos

    Abstract: We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER). Reconstruction of the phase space of each speech frame and the computation of its respective Recurrence Plot (RP) reveals complex structures which can be measured by performing Recurrence Quantification Analysis (RQA). These measu… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Journal ref: Proc. Interspeech 2018, pp. 927-931

  18. arXiv:1806.00416  [pdf, other

    cs.LG stat.ML

    Pattern Search Multidimensional Scaling

    Authors: Georgios Paraskevopoulos, Efthymios Tzinis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Alexandros Potamianos

    Abstract: We present a novel view of nonlinear manifold learning using derivative-free optimization techniques. Specifically, we propose an extension of the classical multi-dimensional scaling (MDS) method, where instead of performing gradient descent, we sample and evaluate possible "moves" in a sphere of fixed radius for each point in the embedded space. A fixed-point convergence guarantee can be shown by… ▽ More

    Submitted 30 October, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 36 pages, Under review for JMLR

  19. arXiv:1804.06659  [pdf, other

    cs.CL

    NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs

    Authors: Christos Baziotis, Nikos Athanasiou, Pinelopi Papalampidi, Athanasia Kolovou, Georgios Paraskevopoulos, Nikolaos Ellinas, Alexandros Potamianos

    Abstract: In this paper we present two deep-learning systems that competed at SemEval-2018 Task 3 "Irony detection in English tweets". We design and ensemble two independent models, based on recurrent neural networks (Bi-LSTM), which operate at the word and character level, in order to capture both the semantic and syntactic information in tweets. Our models are augmented with a self-attention mechanism, in… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: SemEval-2018, Task 3 "Irony detection in English tweets"

  20. arXiv:1804.06658  [pdf, other

    cs.CL

    NTUA-SLP at SemEval-2018 Task 1: Predicting Affective Content in Tweets with Deep Attentive RNNs and Transfer Learning

    Authors: Christos Baziotis, Nikos Athanasiou, Alexandra Chronopoulou, Athanasia Kolovou, Georgios Paraskevopoulos, Nikolaos Ellinas, Shrikanth Narayanan, Alexandros Potamianos

    Abstract: In this paper we present deep-learning models that submitted to the SemEval-2018 Task~1 competition: "Affect in Tweets". We participated in all subtasks for English tweets. We propose a Bi-LSTM architecture equipped with a multi-layer self attention mechanism. The attention mechanism improves the model performance and allows us to identify salient words in tweets, as well as gain insight into the… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: Semeval 2018, Task 1 "Affect in Tweets"

  21. arXiv:1804.06657  [pdf, other

    cs.CL

    NTUA-SLP at SemEval-2018 Task 2: Predicting Emojis using RNNs with Context-aware Attention

    Authors: Christos Baziotis, Nikos Athanasiou, Georgios Paraskevopoulos, Nikolaos Ellinas, Athanasia Kolovou, Alexandros Potamianos

    Abstract: In this paper we present a deep-learning model that competed at SemEval-2018 Task 2 "Multilingual Emoji Prediction". We participated in subtask A, in which we are called to predict the most likely associated emoji in English tweets. The proposed architecture relies on a Long Short-Term Memory network, augmented with an attention mechanism, that conditions the weight of each word, on a "context vec… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: SemEval-2018, Task 2 "Multilingual Emoji Prediction"