Skip to main content

Showing 1–27 of 27 results for author: Ristea, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.14444  [pdf, other

    cs.SD cs.AI cs.CV eess.AS

    ICASSP 2024 Speech Signal Improvement Challenge

    Authors: Nicolae Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets

    Abstract: The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. This marks our second challenge, building upon the success from the previous ICASSP 2023 Grand Challenge. We enhance the competition by introducing a dataset synthesizer, enabling all participating teams to start at a higher baseli… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  2. arXiv:2401.07575  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Cascaded Cross-Modal Transformer for Audio-Textual Classification

    Authors: Nicolae-Catalin Ristea, Andrei Anghel, Radu Tudor Ionescu

    Abstract: Speech classification tasks often require powerful language understanding models to grasp useful features, which becomes problematic when limited training data is available. To attain superior classification performance, we propose to harness the inherent value of multimodal representations by transcribing speech using automatic speech recognition (ASR) models and translating the transcripts into… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2309.12553  [pdf, other

    eess.AS cs.SD

    ICASSP 2023 Acoustic Echo Cancellation Challenge

    Authors: Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Evgenii Indenbom, Nicolae-Catalin Ristea, Jegor Gužvin, Hannes Gamper, Sebastian Braun, Robert Aichner

    Abstract: The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which is an important area of speech enhancement and is still a top issue in audio communication. This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2202.13290, arXiv:2009.04972

  4. arXiv:2309.07385  [pdf, other

    eess.AS cs.SD

    Multi-dimensional Speech Quality Assessment in Crowdsourcing

    Authors: Babak Naderi, Ross Cutler, Nicolae-Catalin Ristea

    Abstract: Subjective speech quality assessment is the gold standard for evaluating speech enhancement processing and telecommunication systems. The commonly used standard ITU-T Rec. P.800 defines how to measure speech quality in lab environments, and ITU-T Rec.~P.808 extended it for crowdsourcing. ITU-T Rec. P.835 extends P.800 to measure the quality of speech in the presence of noise. ITU-T Rec. P.804 targ… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.06566

  5. arXiv:2309.03378  [pdf, other

    cs.CL cs.SD eess.AS

    RoDia: A New Dataset for Romanian Dialect Identification from Speech

    Authors: Codrut Rotaru, Nicolae-Catalin Ristea, Radu Tudor Ionescu

    Abstract: We introduce RoDia, the first dataset for Romanian dialect identification from speech. The RoDia dataset includes a varied compilation of speech samples from five distinct regions of Romania, covering both urban and rural environments, totaling 2 hours of manually annotated speech data. Along with our dataset, we introduce a set of competitive models to be used as baselines for future research. Th… ▽ More

    Submitted 20 March, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted at NAACL 2024

  6. arXiv:2308.16572  [pdf, other

    cs.CV cs.AI cs.LG

    CL-MAE: Curriculum-Learned Masked Autoencoders

    Authors: Neelu Madan, Nicolae-Catalin Ristea, Kamal Nasrollahi, Thomas B. Moeslund, Radu Tudor Ionescu

    Abstract: Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the… ▽ More

    Submitted 28 February, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted at WACV 2024

  7. arXiv:2307.15097  [pdf, other

    cs.CL cs.LG cs.MM eess.AS

    Cascaded Cross-Modal Transformer for Request and Complaint Detection

    Authors: Nicolae-Catalin Ristea, Radu Tudor Ionescu

    Abstract: We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations. Our approach leverages a multimodal paradigm by transcribing the speech using automatic speech recognition (ASR) models and translating the transcripts into different languages. Subsequently, we combine language-specific BERT-based… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted at ACMMM 2023

  8. arXiv:2306.12041  [pdf, other

    cs.CV cs.LG

    Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

    Authors: Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student de… ▽ More

    Submitted 9 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Accepted at CVPR 2024

  9. arXiv:2306.07649  [pdf, other

    cs.CV eess.IV

    Sea Ice Segmentation From SAR Data by Convolutional Transformer Networks

    Authors: Nicolae-Catalin Ristea, Andrei Anghel, Mihai Datcu

    Abstract: Sea ice is a crucial component of the Earth's climate system and is highly sensitive to changes in temperature and atmospheric conditions. Accurate and timely measurement of sea ice parameters is important for understanding and predicting the impacts of climate change. Nevertheless, the amount of satellite data acquired over ice areas is huge, making the subjective measurements ineffective. Theref… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  10. arXiv:2306.03177  [pdf, other

    cs.SD cs.CV eess.AS

    DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation

    Authors: Evgenii Indenbom, Nicolae-Catalin Ristea, Ando Saabas, Tanel Parnamaa, Jegor Guzvin, Ross Cutler

    Abstract: Acoustic echo cancellation (AEC), noise suppression (NS) and dereverberation (DR) are an integral part of modern full-duplex communication systems. As the demand for teleconferencing systems increases, addressing these tasks is required for an effective and efficient online meeting experience. Most prior research proposes solutions for these tasks separately, combining them with digital signal pro… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  11. arXiv:2211.15597  [pdf, other

    cs.CV cs.AI cs.LG cs.MM stat.ML

    Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

    Authors: Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each tea… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  12. Guided Unsupervised Learning by Subaperture Decomposition for Ocean SAR Image Retrieval

    Authors: Nicolae-Cătălin Ristea, Andrei Anghel, Mihai Datcu, Bertrand Chapron

    Abstract: Spaceborne synthetic aperture radar (SAR) can provide accurate images of the ocean surface roughness day-or-night in nearly all weather conditions, being an unique asset for many geophysical applications. Considering the huge amount of data daily acquired by satellites, automated techniques for physical features extraction are needed. Even if supervised deep learning methods attain state-of-the-ar… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  13. Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

    Authors: Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where t… ▽ More

    Submitted 5 October, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

  14. arXiv:2208.11308  [pdf, other

    cs.SD cs.CV eess.AS

    Deep model with built-in cross-attention alignment for acoustic echo cancellation

    Authors: Evgenii Indenbom, Nicolae-Cătălin Ristea, Ando Saabas, Tanel Pärnamaa, Jegor Gužvin

    Abstract: With recent research advances, deep learning models have become an attractive choice for acoustic echo cancellation (AEC) in real-time teleconferencing applications. Since acoustic echo is one of the major sources of poor audio quality, a wide variety of deep models have been proposed. However, an important but often omitted requirement for good echo cancellation quality is the synchronization of… ▽ More

    Submitted 14 March, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

  15. arXiv:2205.09180  [pdf, other

    cs.LG cs.CL cs.CV

    LeRaC: Learning Rate Curriculum

    Authors: Florinel-Alin Croitoru, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Nicu Sebe

    Abstract: Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-free curriculum during the initial training epochs. More spec… ▽ More

    Submitted 19 November, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: Main paper + supplementary

  16. arXiv:2204.04438  [pdf, other

    cs.CV

    Guided deep learning by subaperture decomposition: ocean patterns from SAR imagery

    Authors: Nicolae-Catalin Ristea, Andrei Anghel, Mihai Datcu, Bertrand Chapron

    Abstract: Spaceborne synthetic aperture radar can provide meters scale images of the ocean surface roughness day or night in nearly all weather conditions. This makes it a unique asset for many geophysical applications. Sentinel 1 SAR wave mode vignettes have made possible to capture many important oceanic and atmospheric phenomena since 2014. However, considering the amount of data provided, expanding appl… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

  17. arXiv:2204.04218  [pdf, other

    eess.IV cs.CV cs.LG

    Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes for Medical Image Super-Resolution

    Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Andreea-Iuliana Miron, Olivian Savencu, Nicolae-Catalin Ristea, Nicolae Verga, Fahad Shahbaz Khan

    Abstract: Super-resolving medical images can help physicians in providing more accurate diagnostics. In many situations, computed tomography (CT) or magnetic resonance imaging (MRI) techniques capture several scans (modes) during a single investigation, which can jointly be used (in a multimodal fashion) to further boost the quality of super-resolution results. To this end, we propose a novel multimodal mul… ▽ More

    Submitted 12 October, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at WACV 2023 (main paper + supplementary)

  18. arXiv:2203.09581  [pdf, other

    cs.CV cs.LG

    SepTr: Separable Transformer for Audio Spectrogram Processing

    Authors: Nicolae-Catalin Ristea, Radu Tudor Ionescu, Fahad Shahbaz Khan

    Abstract: Following the successful application of vision transformers in multiple computer vision tasks, these models have drawn the attention of the signal processing community. This is because signals are often represented as spectrograms (e.g. through Discrete Fourier Transform) which can be directly provided as input to vision transformers. However, naively applying transformers to spectrograms is subop… ▽ More

    Submitted 20 June, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted at INTERSPEECH 2022

  19. arXiv:2111.09099  [pdf, other

    cs.CV cs.LG

    Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection

    Authors: Nicolae-Catalin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: Anomaly detection is commonly pursued as a one-class classification problem, where models can only learn from normal training samples, while being evaluated on both normal and abnormal test samples. Among the successful approaches for anomaly detection, a distinguished category of methods relies on predicting masked information (e.g. patches, future frames, etc.) and leveraging the reconstruction… ▽ More

    Submitted 14 March, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: Accepted at CVPR 2022. Paper + supplementary (14 pages, 9 figures)

  20. CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation

    Authors: Nicolae-Catalin Ristea, Andreea-Iuliana Miron, Olivian Savencu, Mariana-Iuliana Georgescu, Nicolae Verga, Fahad Shahbaz Khan, Radu Tudor Ionescu

    Abstract: We propose a novel approach to translate unpaired contrast computed tomography (CT) scans to non-contrast CT scans and the other way around. Solving this task has two important applications: (i) to automatically generate contrast CT scans for patients for whom injecting contrast substance is not an option, and (ii) to enhance the alignment between contrast and non-contrast CT by reducing the diffe… ▽ More

    Submitted 5 April, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in Neurocomputing

  21. arXiv:2103.11988  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Self-paced ensemble learning for speech and audio classification

    Authors: Nicolae-Catalin Ristea, Radu Tudor Ionescu

    Abstract: Combining multiple machine learning models into an ensemble is known to provide superior performance levels compared to the individual components forming the ensemble. This is because models can complement each other in taking better decisions. Instead of just combining the models, we propose a self-paced ensemble learning scheme in which models learn from each other over several iterations. Durin… ▽ More

    Submitted 8 June, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted at INTERSPEECH 2021

  22. arXiv:2010.10357  [pdf, other

    eess.SP cs.CV cs.LG

    Automotive Radar Interference Mitigation with Unfolded Robust PCA based on Residual Overcomplete Auto-Encoder Blocks

    Authors: Nicolae-Cătălin Ristea, Andrei Anghel, Radu Tudor Ionescu, Yonina C. Eldar

    Abstract: In autonomous driving, radar systems play an important role in detecting targets such as other vehicles on the road. Radars mounted on different cars can interfere with each other, degrading the detection performance. Deep learning methods for automotive radar interference mitigation can succesfully estimate the amplitude of targets, but fail to recover the phase of the respective targets. In this… ▽ More

    Submitted 17 April, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted at the CVPR 2021 Embedded Vision Workshop

  23. arXiv:2008.05948  [pdf, other

    eess.SP cs.CV cs.LG

    Estimating the Magnitude and Phase of Automotive Radar Signals under Multiple Interference Sources with Fully Convolutional Networks

    Authors: Nicolae-Cătălin Ristea, Andrei Anghel, Radu Tudor Ionescu

    Abstract: Radar sensors are gradually becoming a wide-spread equipment for road vehicles, playing a crucial role in autonomous driving and road safety. The broad adoption of radar sensors increases the chance of interference among sensors from different vehicles, generating corrupted range profiles and range-Doppler maps. In order to extract distance and velocity of multiple targets from range-Doppler maps,… ▽ More

    Submitted 6 November, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Access

  24. arXiv:2007.04364  [pdf, other

    cs.CV cs.AI

    Temporal aggregation of audio-visual modalities for emotion recognition

    Authors: Andreea Birhala, Catalin Nicolae Ristea, Anamaria Radoi, Liviu Cristian Dutu

    Abstract: Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the su… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  25. arXiv:2006.10147  [pdf, other

    eess.AS cs.CV cs.SD

    Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs

    Authors: Nicolae-Cătălin Ristea, Radu Tudor Ionescu

    Abstract: The task of detecting whether a person wears a face mask from speech is useful in modelling speech in forensic investigations, communication between surgeons or people protecting themselves against infectious diseases such as COVID-19. In this paper, we propose a novel data augmentation approach for mask detection from speech. Our approach is based on (i) training Generative Adversarial Networks (… ▽ More

    Submitted 25 July, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted at INTERSPEECH 2020

  26. arXiv:2003.03229  [pdf, other

    cs.NE cs.CV cs.LG stat.ML

    Non-linear Neurons with Human-like Apical Dendrite Activations

    Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Nicolae-Catalin Ristea, Nicu Sebe

    Abstract: In order to classify linearly non-separable data, neurons are typically organized into multi-layer neural networks that are equipped with at least one hidden layer. Inspired by some recent discoveries in neuroscience, we propose a new model of artificial neuron along with a novel activation function enabling the learning of nonlinear decision boundaries using a single neuron. We show that a standa… ▽ More

    Submitted 10 August, 2023; v1 submitted 2 February, 2020; originally announced March 2020.

    Comments: Accepted for publication in Applied Intelligence

  27. arXiv:2003.00351  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

    Authors: Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi

    Abstract: Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the pers… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.