Skip to main content

Showing 1–9 of 9 results for author: Durand, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.07160  [pdf, other

    cs.SD cs.LG eess.AS

    LLark: A Multimodal Instruction-Following Language Model for Music

    Authors: Josh Gardner, Simon Durand, Daniel Stoller, Rachel M. Bittner

    Abstract: Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for \emph{music} understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets… ▽ More

    Submitted 2 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICML camera-ready version

  2. arXiv:2307.10515  [pdf, other

    cs.IT q-bio.NC

    Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data

    Authors: Praveen Venkatesh, Corbett Bennett, Sam Gale, Tamina K. Ramirez, Greggory Heller, Severine Durand, Shawn Olsen, Stefan Mihalas

    Abstract: Recent advances in neuroscientific experimental techniques have enabled us to simultaneously record the activity of thousands of neurons across multiple brain regions. This has led to a growing need for computational tools capable of analyzing how task-relevant information is represented and communicated between several brain regions. Partial information decompositions (PIDs) have emerged as one s… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  3. Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages

    Authors: Simon Durand, Daniel Stoller, Sebastian Ewert

    Abstract: Lyrics alignment gained considerable attention in recent years. State-of-the-art systems either re-use established speech recognition toolkits, or design end-to-end solutions involving a Connectionist Temporal Classification (CTC) loss. However, both approaches suffer from specific weaknesses: toolkits are known for their complexity, and CTC systems use a loss designed for transcription which can… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

  4. arXiv:2009.09946  [pdf, other

    cs.GT cs.MA eess.SY math.OC

    Optimal Targeting in Super-Modular Games

    Authors: Giacomo Como, Stéphane Durand, Fabio Fagnani

    Abstract: We study an optimal targeting problem for super-modular games with binary actions and finitely many players. The considered problem consists in the selection of a subset of players of minimum size such that, when the actions of these players are forced to a controlled value while the others are left to repeatedly play a best response action, the system will converge to the greatest Nash equilibriu… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  5. arXiv:2008.02069  [pdf, other

    cs.LG cs.IR cs.SD eess.AS stat.ML

    Data Cleansing with Contrastive Learning for Vocal Note Event Annotations

    Authors: Gabriel Meseguer-Brocal, Rachel Bittner, Simon Durand, Brian Brost

    Abstract: Data cleansing is a well studied strategy for cleaning erroneous labels in datasets, which has not yet been widely adopted in Music Information Retrieval. Previously proposed data cleansing models do not consider structured (e.g. time varying) labels, such as those common to music data. We propose a novel data cleansing model for time-varying, structured labels which exploits the local structure o… ▽ More

    Submitted 27 April, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: 21st International Society for Music Information Retrieval Conference 11-15 October 2020, Montreal, Canada

  6. arXiv:2007.12581  [pdf, other

    eess.AS cs.LG cs.SD

    Dereverberation using joint estimation of dry speech signal and acoustic system

    Authors: Sanna Wager, Keunwoo Choi, Simon Durand

    Abstract: The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response. We explore deep learning models that apply to each task separately, and how these can be combined in a joi… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  7. arXiv:1912.07859  [pdf, other

    cs.GT math.OC

    Controlling network coordination games

    Authors: Stephane Durand, Giacomo Como, Fabio Fagnani

    Abstract: We study a novel control problem in the context of network coordination games: the individuation of the smallest set of players capable of driving the system, globally, from one Nash equilibrium to another one. Our main contribution is the design of a randomized algorithm based on a time-reversible Markov chain with provable convergence garantees.

    Submitted 17 December, 2019; originally announced December 2019.

    Comments: submitted to the conference IFAC

  8. arXiv:1902.06797  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

    Authors: Daniel Stoller, Simon Durand, Sebastian Ewert

    Abstract: Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training… ▽ More

    Submitted 18 February, 2019; originally announced February 2019.

    Comments: 5 pages (1 for references), 2 figures, 2 tables. Camera-ready version, accepted at the International Conference on Acoustics, Speech, and Signal Processing 2019 (ICASSP)

  9. arXiv:1605.08396  [pdf, other

    cs.SD cs.NE

    Robust Downbeat Tracking Using an Ensemble of Convolutional Networks

    Authors: S. Durand, J. P. Bello, B. David, G. Richard

    Abstract: In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature charac… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.