Skip to main content

Showing 1–8 of 8 results for author: Tzanetakis, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.08813  [pdf, other

    cs.HC cs.SD eess.AS

    Interactive Sonification for Health and Energy using ChucK and Unity

    Authors: Yichun Zhao, George Tzanetakis

    Abstract: Sonification can provide valuable insights about data but most existing approaches are not designed to be controlled by the user in an interactive fashion. Interactions enable the designer of the sonification to more rapidly experiment with sound design and allow the sonification to be modified in real-time by interacting with various control parameters. In this paper, we describe two case studies… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: In the Proceedings of the Conference on Sonification of Health and Environmental Data (SoniHED 2022). http://dx.doi.org/10.5281/zenodo.7243950

    Journal ref: Conference on Sonification of Health and Environmental Data (SoniHED 2022)

  2. arXiv:2208.02337  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Estimating Visual Information From Audio Through Manifold Learning

    Authors: Fabrizio Pedersoli, Dryden Wiebe, Amin Banitalebi, Yong Zhang, George Tzanetakis, Kwang Moo Yi

    Abstract: We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to occlusions and changes in illumination, and can function as a backup in case vision/lidar sensors fail. Therefore, audio-based methods can be useful even for app… ▽ More

    Submitted 13 September, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

  3. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu **, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  4. arXiv:2104.12922  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    One Billion Audio Sounds from GPU-enabled Modular Synthesis

    Authors: Joseph Turian, Jordie Shier, George Tzanetakis, Kirk McNally, Max Henry

    Abstract: We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU.… ▽ More

    Submitted 20 July, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

  5. arXiv:2002.05511  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Autotuner: a Pitch Correcting Network for Singing Performances

    Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim

    Abstract: We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: arXiv admin note: text overlap with arXiv:1902.00956

    Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

  6. arXiv:1902.00956  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

    Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Lijiang Guo, Aswin Sivaraman, Minje Kim

    Abstract: We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompani… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

  7. arXiv:1705.07175  [pdf, ps, other

    cs.DC cs.CV cs.LG cs.NE

    Espresso: Efficient Forward Propagation for BCNNs

    Authors: Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi

    Abstract: There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is… ▽ More

    Submitted 7 March, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: 10 pages, 4 figures

    MSC Class: 62M45 ACM Class: I.2.6

  8. arXiv:1307.0589  [pdf, other

    cs.LG cs.DB cs.SD

    The Orchive : Data mining a massive bioacoustic archive

    Authors: Steven Ness, Helena Symonds, Paul Spong, George Tzanetakis

    Abstract: The Orchive is a large collection of over 20,000 hours of audio recordings from the OrcaLab research facility located off the northern tip of Vancouver Island. It contains recorded orca vocalizations from the 1980 to the present time and is one of the largest resources of bioacoustic data in the world. We have developed a web-based interface that allows researchers to listen to these recordings, v… ▽ More

    Submitted 2 July, 2013; originally announced July 2013.

    Comments: ICML 2013 Workshop on Machine Learning for Bioacoustics