Skip to main content

Showing 1–48 of 48 results for author: Serra, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05782  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    Sequential Contrastive Audio-Visual Learning

    Authors: Ioannis Tsiamas, Santiago Pascual, Chunghsin Yeh, Joan Serrà

    Abstract: Contrastive learning has emerged as a powerful technique in audio-visual representation learning, leveraging the natural co-occurrence of audio and visual modalities in extensive web-scale video datasets to achieve significant advancements. However, conventional contrastive audio-visual learning methodologies often rely on aggregated representations derived through temporal aggregation, which negl… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2310.00140  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    GASS: Generalizing Audio Source Separation with Large-scale Data

    Authors: Jordi Pons, Xiaoyu Liu, Santiago Pascual, Joan Serrà

    Abstract: Universal source separation targets at separating the audio sources of an arbitrary mix, removing the constraint to operate on a specific domain like speech or music. Yet, the potential of universal source separation is limited because most existing works focus on mixes with predominantly sound events, and small training datasets also limit its potential for supervised learning. Here, we study a s… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  3. arXiv:2306.14647  [pdf, other

    cs.SD cs.LG eess.AS

    Mono-to-stereo through parametric stereo generation

    Authors: Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle

    Abstract: Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we als… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 7 pages, 1 figure; accepted for ISMIR23

  4. arXiv:2306.09635  [pdf, other

    cs.SD cs.LG cs.MM eess.AS eess.SP

    CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

    Authors: Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge… ▽ More

    Submitted 23 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by WASPAA 2023. Demo: https://salu133445.github.io/clipsonic/

  5. arXiv:2210.14661  [pdf, other

    cs.SD cs.LG eess.AS

    Full-band General Audio Synthesis with Score-based Diffusion

    Authors: Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà

    Abstract: Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds. Such models operate on band-limited signals and, as a result of an autoregressive approach, they are typically conformed by pre-trained latent encoders and/or several cascaded modules. In this work, we propose a d… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  6. arXiv:2210.12635  [pdf, other

    cs.SD cs.AI eess.AS

    Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

    Authors: Xiaoyu Liu, Xu Li, Joan Serrà

    Abstract: Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep learning TSS framework consists of an upstream model that obtains enrollment speaker embeddings and a downstream model that performs the separation conditioned on the embeddings. In this paper, we look into several impo… ▽ More

    Submitted 26 October, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: Submitted version to ICASSP 2023

  7. arXiv:2210.12108  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial Permutation Invariant Training for Universal Sound Separation

    Authors: Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà

    Abstract: Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so. In this work, we complement PIT with adversarial losses but find it challenging with the standard formulation used in speech source separation. We overcome this challenge with a novel I-replacement context-bas… ▽ More

    Submitted 6 March, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Demo page: http://jordipons.me/apps/adversarialPIT/, Accepted at ICASSP-23

  8. arXiv:2206.03065  [pdf, other

    cs.SD cs.LG eess.AS

    Universal Speech Enhancement with Score-based Diffusion

    Authors: Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini

    Abstract: Removing background noise from speech audio has been the subject of considerable effort, especially in recent years due to the rise of virtual communication and amateur recordings. Yet background noise is not the only unpleasant disturbance that can prevent intelligibility: reverb, clip**, codec artifacts, problematic equalization, limited bandwidth, or inconsistent loudness are equally disturbi… ▽ More

    Submitted 16 September, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: 24 pages, 6 figures; includes appendix; examples in https://serrjoa.github.io/projects/universe/

  9. arXiv:2202.07968  [pdf, other

    cs.SD cs.LG eess.AS

    On loss functions and evaluation metrics for music source separation

    Authors: Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà

    Abstract: We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation. To that end, we first survey the most representative audio source separation losses we identified, to later consistently benchmark them in a controlled experimental setup. We also explore using such losses as evaluation metrics, via cross-correlating them with the… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted to ICASSP 2022

  10. arXiv:2111.11773  [pdf, other

    cs.SD cs.AI eess.AS

    Upsampling layers for music source separation

    Authors: Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini

    Abstract: Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling. Also, depending on the used upsampling layer, such artifacts can either be tonal artifacts (additive high-frequency noise) or filtering artifacts (substractive, attenuating some bands). In this work we investigate the practical implications of having upsampling artifacts in t… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: Demo page: http://www.jordipons.me/apps/upsamplers/

  11. arXiv:2109.15188  [pdf, other

    cs.SD cs.IR eess.AS

    Assessing Algorithmic Biases for Musical Version Identification

    Authors: Furkan Yesiler, Marius Miron, Joan Serrà, Emilia Gómez

    Abstract: Version identification (VI) systems now offer accurate and scalable solutions for detecting different renditions of a musical composition, allowing the use of these systems in industrial applications and throughout the wider music ecosystem. Such use can have an important impact on various stakeholders regarding recognition and financial benefits, including how royalties are circulated for digital… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  12. Audio-based Musical Version Identification: Elements and Challenges

    Authors: Furkan Yesiler, Guillaume Doras, Rachel M. Bittner, Christopher J. Tralie, Joan Serrà

    Abstract: In this article, we aim to provide a review of the key ideas and approaches proposed in 20 years of scientific literature around musical version identification (VI) research and connect them to current practice. For more than a decade, VI systems suffered from the accuracy-scalability trade-off, with attempts to increase accuracy that typically resulted in cumbersome, non-scalable systems. Recent… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted to be published in IEEE Signal Processing Magazine

  13. arXiv:2107.03100  [pdf, other

    cs.SD eess.AS

    Adversarial Auto-Encoding for Packet Loss Concealment

    Authors: Santiago Pascual, Joan Serrà, Jordi Pons

    Abstract: Communication technologies like voice over IP operate under constrained real-time conditions, with voice packets being subject to delays and losses from the network. In such cases, the packet loss concealment (PLC) algorithm reconstructs missing frames until a new real packet is received. Recently, autoregressive deep neural networks have been shown to surpass the quality of signal processing meth… ▽ More

    Submitted 8 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

  14. arXiv:2104.04143  [pdf, other

    cs.SD eess.AS physics.soc-ph

    Heaps' Law and Vocabulary Richness in the History of Classical Music Harmony

    Authors: Marc Serra-Peralta, Joan Serrà, Álvaro Corral

    Abstract: Music is a fundamental human construct, and harmony provides the building blocks of musical language. Using the Kunstderfuge corpus of classical music, we analyze the historical evolution of the richness of harmonic vocabulary of 76 classical composers, covering almost 6 centuries. Such corpus comprises about 9500 pieces, resulting in more than 5 million tokens of music codewords. The fulfilment o… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 12 pages

  15. arXiv:2104.03725  [pdf, other

    cs.LG cs.AI cs.CV cs.SD eess.AS

    On tuning consistent annealed sampling for denoising score matching

    Authors: Joan Serrà, Santiago Pascual, Jordi Pons

    Abstract: Score-based generative models provide state-of-the-art quality for image and audio synthesis. Sampling from these models is performed iteratively, typically employing a discretized series of noise levels and a predefined scheme. In this note, we first overview three common sampling schemes for models trained with denoising score matching. Next, we focus on one of them, consistent annealed sampling… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 3 pages and 1 figure

  16. arXiv:2101.02098  [pdf, other

    cs.SD cs.IR eess.AS

    Investigating the efficacy of music version retrieval systems for setlist identification

    Authors: Furkan Yesiler, Emilio Molina, Joan Serrà, Emilia Gómez

    Abstract: The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events. Due to various musical and non-musical changes in live performances, develo** automatic SLI systems is still a challenging task that, despite its industrial relevance, has been under-explored in the academic literat… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  17. arXiv:2010.14356  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Upsampling artifacts in neural audio synthesis

    Authors: Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà

    Abstract: A number of recent advances in neural audio synthesis rely on upsampling layers, which can introduce undesired artifacts. In computer vision, upsampling artifacts have been studied and are known as checkerboard artifacts (due to their characteristic visual pattern). However, their effect has been overlooked so far in audio processing. Here, we address this gap by studying this problem from the aud… ▽ More

    Submitted 9 February, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: In proceedings of ICASSP2021. Code: https://github.com/DolbyLaboratories/neural-upsampling-artifacts-audio

  18. arXiv:2010.10291  [pdf, other

    eess.AS cs.SD

    Automatic multitrack mixing with a differentiable mixing console of neural audio effects

    Authors: Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

    Abstract: Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weig… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  19. arXiv:2010.03284  [pdf, other

    cs.SD cs.LG eess.AS

    Less is more: Faster and better music version identification with embedding distillation

    Authors: Furkan Yesiler, Joan Serrà, Emilia Gómez

    Abstract: Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to the 21st International Society for Music Information Retrieval Conference (ISMIR 2020)

  20. arXiv:2010.00368  [pdf, other

    eess.AS cs.LG cs.SD

    SESQA: semi-supervised learning for speech quality assessment

    Authors: Joan Serrà, Jordi Pons, Santiago Pascual

    Abstract: Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3… ▽ More

    Submitted 8 February, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Long version (with appendix) of the paper with the same title accepted for ICASSP2021

  21. arXiv:1910.12551  [pdf, other

    cs.SD cs.LG eess.AS

    Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

    Authors: Furkan Yesiler, Joan Serrà, Emilia Gómez

    Abstract: The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE… ▽ More

    Submitted 13 April, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

  22. arXiv:1909.11480  [pdf, other

    cs.LG stat.ML

    Input complexity and out-of-distribution detection with likelihood-based generative models

    Authors: Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F. Núñez, Jordi Luque

    Abstract: Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to… ▽ More

    Submitted 17 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: Accepted for ICLR2020

  23. arXiv:1906.00794  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

    Authors: Joan Serrà, Santiago Pascual, Carlos Segura

    Abstract: End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations. Voice conversion, in which a model has to impersonate a speaker in a recording, is one of those situations. In this paper, we propose Blow, a single-scale normalizing flow using hypernetwork conditioning to perform many-to-many voice conv… ▽ More

    Submitted 5 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: Includes appendix. Accepted for NeurIPS2019

  24. arXiv:1904.03418  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Generalized Speech Enhancement with Generative Adversarial Networks

    Authors: Santiago Pascual, Joan Serrà, Antonio Bonafonte

    Abstract: The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility. However, little attention is drawn to other, perhaps more aggressive signal distortions like clip**, chunk elimination, or frequency-band removal. Such distortions can have a large impact not only on intelligibility, but also on naturaln… ▽ More

    Submitted 6 April, 2019; originally announced April 2019.

  25. arXiv:1904.03416  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

    Authors: Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

    Abstract: Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This… ▽ More

    Submitted 6 April, 2019; originally announced April 2019.

  26. arXiv:1811.12507  [pdf, other

    stat.ML cs.LG

    Regression and Classification by Zonal Kriging

    Authors: Jean Serra, Jesus Angulo, B Ravi Kiran

    Abstract: Consider a family $Z=\{\boldsymbol{x_{i}},y_{i}$,$1\leq i\leq N\}$ of $N$ pairs of vectors $\boldsymbol{x_{i}} \in \mathbb{R}^d$ and scalars $y_{i}$ that we aim to predict for a new sample vector $\mathbf{x}_0$. Kriging models $y$ as a sum of a deterministic function $m$, a drift which depends on the point $\boldsymbol{x}$, and a random function $z$ with zero mean. The zonality hypothesis interpre… ▽ More

    Submitted 11 December, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Technical Report

  27. arXiv:1810.10274  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Training neural audio classifiers with few data

    Authors: Jordi Pons, Joan Serrà, Xavier Serra

    Abstract: We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections. In particular, we study whether (i) a naive regularization of the solution space, (ii) prototypical networks, (iii) transfer learning, or (iv) their combination, can foster deep learning models to better leverage a small amount of training examples. To this en… ▽ More

    Submitted 3 November, 2018; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: Code: https://github.com/jordipons/neural-classifiers-with-few-audio/

  28. arXiv:1808.10687  [pdf, other

    cs.SD cs.LG eess.AS

    Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

    Authors: Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez

    Abstract: Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech. Apart from intelligibility, this type of speech lacks expressiveness and naturalness due to the absence of pitch (whispered speech) or artificial generation of it (monotone speech). Existing techniques to restore prosodic information typically combine a vocoder, which parameterises th… ▽ More

    Submitted 5 November, 2018; v1 submitted 31 August, 2018; originally announced August 2018.

  29. arXiv:1808.10678  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Attention Linguistic-Acoustic Decoder

    Authors: Santiago Pascual, Antonio Bonafonte, Joan Serrà

    Abstract: The conversion from text to speech relies on the accurate map** from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure tends to make them slow to train and to sample from. In this work, w… ▽ More

    Submitted 5 November, 2018; v1 submitted 31 August, 2018; originally announced August 2018.

  30. arXiv:1806.03192  [pdf

    cs.AI cs.HC

    Assessing the impact of machine intelligence on human behaviour: an interdisciplinary endeavour

    Authors: Emilia Gómez, Carlos Castillo, Vicky Charisi, Verónica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel Ángel González-Ballester, Fabien Gouyon, José Hernández-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ramón López de Mántaras, Bertin Martens, Marius Miron, Rubén Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serrà, Songül Tolan , et al. (1 additional authors not shown)

    Abstract: This document contains the outcome of the first Human behaviour and machine intelligence (HUMAINT) workshop that took place 5-6 March 2018 in Barcelona, Spain. The workshop was organized in the context of a new research programme at the Centre for Advanced Studies, Joint Research Centre of the European Commission, which focuses on studying the potential impact of artificial intelligence on human b… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: Proceedings of 1st HUMAINT (Human Behaviour and Machine Intelligence) workshop, Barcelona, Spain, March 5-6, 2018, edited by European Commission, Seville, 2018, JRC111773 https://ec.europa.eu/jrc/communities/community/humaint/document/assessing-impact-machine-intelligence-human-behaviour-interdisciplinary. arXiv admin note: text overlap with arXiv:1409.3097 by other authors

    Report number: JRC111773

  31. arXiv:1806.02701  [pdf, other

    cs.CR

    There goes Wally: Anonymously sharing your location gives you away

    Authors: Apostolos Pyrgelis, Nicolas Kourtellis, Ilias Leontiadis, Joan Serrà, Claudio Soriente

    Abstract: With current technology, a number of entities have access to user mobility traces at different levels of spatio-temporal granularity. At the same time, users frequently reveal their location through different means, including geo-tagged social media posts and mobile app usage. Such leaks are often bound to a pseudonym or a fake identity in an attempt to preserve one's privacy. In this work, we inv… ▽ More

    Submitted 15 November, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

    Comments: To appear in the 2018 IEEE International Conference on Big Data

  32. arXiv:1805.03908  [pdf, other

    cs.LG cs.NE stat.ML

    Towards a universal neural network encoder for time series

    Authors: Joan Serrà, Santiago Pascual, Alexandros Karatzoglou

    Abstract: We study the use of a time series encoder to learn representations that are useful on data set types with which it has not been trained on. The encoder is formed of a convolutional neural network whose temporal output is summarized by a convolutional attention mechanism. This way, we obtain a compact, fixed-length representation from longer, variable-length time series. We evaluate the performance… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

    Comments: 10 pages, 2 figures

  33. The CTTC 5G end-to-end experimental platform: Integrating heterogeneous wireless/optical networks, distributed cloud, and IoT devices

    Authors: Raul Muñóz, Josep Mangues, Ricard Vilalta, Christos Verikoukis, Jesús Alonso-Zarate, Nikolaos Bartzoudis, Apostolos Georgiadis, Miquel Payaró, Ana Pérez-Neira, Ramon Casellas, Ricardo Martínez, José Núñez-Martínez, Manuel Requena-Esteso, David Pubill, Oriol Font-Bach, Pol Henarejos, Jordi Serra, Francisco Vazquez-Gallego

    Abstract: The Internet of Things (IoT) will facilitate a wide variety of applications in different domains, such as smart cities, smart grids, industrial automation (Industry 4.0), smart driving, assistance of the elderly, and home automation. Billions of heterogeneous smart devices with different application requirements will be connected to the networks and will generate huge aggregated volumes of data th… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

  34. arXiv:1801.01423  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Overcoming catastrophic forgetting with hard attention to the task

    Authors: Joan Serrà, Dídac Surís, Marius Miron, Alexandros Karatzoglou

    Abstract: Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A h… ▽ More

    Submitted 29 May, 2018; v1 submitted 4 January, 2018; originally announced January 2018.

    Comments: Includes appendix. Accepted for ICML 2018

  35. arXiv:1712.07120  [pdf, other

    cs.HC

    Continual Prediction of Notification Attendance with Classical and Deep Network Approaches

    Authors: Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà

    Abstract: We investigate to what extent mobile use patterns can predict -- at the moment it is posted -- whether a notification will be clicked within the next 10 minutes. We use a data set containing the detailed mobile phone usage logs of 279 users, who over the course of 5 weeks received 446,268 notifications from a variety of apps. Besides using classical gradient-boosted trees, we demonstrate how to ma… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    Comments: 15 pages

  36. arXiv:1712.06340  [pdf, other

    cs.SD cs.LG eess.AS

    Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

    Authors: Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn

    Abstract: Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data. W… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.

  37. arXiv:1709.10299  [pdf, other

    cs.HC

    MobInsight: A Framework Using Semantic Neighborhood Features for Localized Interpretations of Urban Mobility

    Authors: Souneil Park, Joan Serra, Enrique Frias Martinez, Nuria Oliver

    Abstract: Collective urban mobility embodies the residents' local insights on the city. Mobility practices of the residents are produced from their spatial choices, which involve various considerations such as the atmosphere of destinations, distance, past experiences, and preferences. The advances in mobile computing and the rise of geo-social platforms have provided the means for capturing the mobility pr… ▽ More

    Submitted 29 September, 2017; originally announced September 2017.

  38. arXiv:1706.03993  [pdf, other

    cs.LG cs.AI cs.IR cs.NE

    Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

    Authors: Joan Serrà, Alexandros Karatzoglou

    Abstract: Recommendation algorithms that incorporate techniques from deep learning are becoming increasingly popular. Due to the structure of the data coming from recommendation domains (i.e., one-hot-encoded vectors of item preferences), these algorithms tend to have large input and output dimensionalities that dominate their overall size. This makes them difficult to train, due to the limited memory of gr… ▽ More

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016

  39. Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions

    Authors: Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà

    Abstract: We present a practical approach for processing mobile sensor time series data for continual deep learning predictions. The approach comprises data cleaning, normalization, cap**, time-based compression, and finally classification with a recurrent neural network. We demonstrate the effectiveness of the approach in a case study with 279 participants. On the basis of sparse sensor events, the netwo… ▽ More

    Submitted 17 May, 2017; originally announced May 2017.

    Comments: 6 pages, 3 figures, 3 tables

    Journal ref: DeepMobile Workshop, MobileHCI 2017

  40. arXiv:1704.05249  [pdf, other

    cs.LG cs.NI eess.SY

    Hot or not? Forecasting cellular network hot spots using sector performance indicators

    Authors: Joan Serrà, Ilias Leontiadis, Alexandros Karatzoglou, Konstantina Papagiannaki

    Abstract: To manage and maintain large-scale cellular networks, operators need to know which sectors underperform at any given time. For this purpose, they use the so-called hot spot score, which is the result of a combination of multiple network measurements and reflects the instantaneous overall performance of individual sectors. While operators have a good understanding of the current performance of a ne… ▽ More

    Submitted 18 April, 2017; originally announced April 2017.

    Comments: Accepted for publication at ICDE 2017 - Industrial Track

  41. arXiv:1703.09452  [pdf, other

    cs.LG cs.NE cs.SD

    SEGAN: Speech Enhancement Generative Adversarial Network

    Authors: Santiago Pascual, Antonio Bonafonte, Joan Serrà

    Abstract: Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of gener… ▽ More

    Submitted 9 June, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: 5 pages, 4 figures, accepted in INTERSPEECH 2017

  42. arXiv:1703.05430  [pdf, other

    stat.ML cs.LG

    Cost-complexity pruning of random forests

    Authors: Kiran Bangalore Ravi, Jean Serra

    Abstract: Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first… ▽ More

    Submitted 19 July, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Previous version in proceedings of ISMM 2017

  43. arXiv:1511.04986  [pdf, other

    cs.LG cs.NE

    A genetic algorithm to discover flexible motifs with support

    Authors: Joan Serrà, Aleksandar Matic, Josep Luis Arcos, Alexandros Karatzoglou

    Abstract: Finding repeated patterns or motifs in a time series is an important unsupervised task that has still a number of open issues, starting by the definition of motif. In this paper, we revise the notion of motif support, characterizing it as the number of patterns or repetitions that define a motif. We then propose GENMOTIF, a genetic algorithm to discover motifs with support which, at the same time,… ▽ More

    Submitted 5 December, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: 9 pages, 8 figures, code available at https://github.com/joansj/genmotif

  44. Ranking and significance of variable-length similarity-based time series motifs

    Authors: Joan Serrà, Isabel Serra, Álvaro Corral, Josep Lluis Arcos

    Abstract: The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their… ▽ More

    Submitted 6 March, 2015; originally announced March 2015.

    Comments: 20 pages, 10 figures

    Journal ref: Expert Systems with Applications 55: 452-460. Aug 2016

  45. Particle swarm optimization for time series motif discovery

    Authors: Joan Serrà, Josep Lluis Arcos

    Abstract: Efficiently finding similar segments or motifs in time series data is a fundamental task that, due to the ubiquity of these data, is present in a wide range of domains and situations. Because of this, countless solutions have been devised but, to date, none of them seems to be fully satisfactory and flexible. In this article, we propose an innovative standpoint and present a solution coming from i… ▽ More

    Submitted 29 January, 2015; originally announced January 2015.

    Comments: 12 pages, 9 figures, 2 tables

    Journal ref: Knowledge-Based Systems 92: 127-137. Jan 2016

  46. arXiv:1401.3973  [pdf, other

    cs.LG cs.CV stat.ML

    An Empirical Evaluation of Similarity Measures for Time Series Classification

    Authors: Joan Serrà, Josep Lluis Arcos

    Abstract: Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of this importance, countless approaches to estimate time series similarity have been proposed. However, there is a lack of comparative studies using em… ▽ More

    Submitted 16 January, 2014; originally announced January 2014.

    Comments: 28 pages, 5 figures, 3 tables

    Journal ref: Knowledge-Based Systems 67: 305-314, 2014

  47. arXiv:1205.5651  [pdf, other

    cs.SD cs.IR cs.MM physics.soc-ph stat.AP

    Measuring the evolution of contemporary western popular music

    Authors: Joan Serrà, Álvaro Corral, Marián Boguñá, Martín Haro, Josep Lluis Arcos

    Abstract: Popular music is a key cultural expression that has captured listeners' attention for ages. Many of the structural regularities underlying musical discourse are yet to be discovered and, accordingly, their historical evolution remains formally unknown. Here we unveil a number of patterns and metrics characterizing the generic usage of primary musical facets such as pitch, timbre, and loudness in c… ▽ More

    Submitted 25 May, 2012; originally announced May 2012.

    Comments: Supplementary materials not included. Please see the journal reference or contact the authors

    Journal ref: Scientific Reports 2, 521 (2012)

  48. arXiv:1108.6003  [pdf, ps, other

    cs.IR cs.MM cs.SI physics.data-an stat.ML

    Characterization and exploitation of community structure in cover song networks

    Authors: Joan Serrà, Massimiliano Zanin, Perfecto Herrera, Xavier Serra

    Abstract: The use of community detection algorithms is explored within the framework of cover song identification, i.e. the automatic detection of different audio renditions of the same underlying musical piece. Until now, this task has been posed as a typical query-by-example task, where one submits a query song and the system retrieves a list of possible matches ranked by their similarity to the query. In… ▽ More

    Submitted 12 September, 2011; v1 submitted 29 August, 2011; originally announced August 2011.

    Journal ref: Pattern Recognition Letters 33(9): 1032-1041, 2012