Skip to main content

Showing 1–20 of 20 results for author: Bryan, N J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.10493  [pdf, other

    cs.SD eess.AS eess.SP

    MusicHiFi: Fast High-Fidelity Stereo Vocoding

    Authors: Ge Zhu, Juan-Pablo Caceres, Zhiyao Duan, Nicholas J. Bryan

    Abstract: Diffusion-based audio and music generation models commonly generate music by constructing an image representation of audio (e.g., a mel-spectrogram) and then converting it to audio using a phase reconstruction model or vocoder. Typical vocoders, however, produce monophonic audio at lower resolutions (e.g., 16-24 kHz), which limits their effectiveness. We propose MusicHiFi -- an efficient high-fide… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  2. arXiv:2403.00977  [pdf, other

    cs.SD eess.AS

    Scaling Up Adaptive Filter Optimizers

    Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: We introduce a new online adaptive filtering method called supervised multi-step adaptive filters (SMS-AF). Our method uses neural networks to control or optimize linear multi-delay or multi-channel frequency-domain filters and can flexibly scale-up performance at the cost of increased compute -- a property rarely addressed in the AF literature, but critical for many applications. To do so, we ext… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  3. arXiv:2401.12179  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    DITTO: Diffusion Inference-Time T-Optimization for Music Generation

    Authors: Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

    Abstract: We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Oral at ICML 2024

  4. arXiv:2311.07069  [pdf, other

    cs.SD eess.AS

    Music ControlNet: Multiple Time-varying Controls for Music Generation

    Authors: Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan

    Abstract: Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less suitable for precise control over time-varying attributes such as the positions of beats in time or the changing dynamics of the music. We propose Music ControlN… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 11 pages, 4 figure, 5 tables, Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  5. arXiv:2209.09955  [pdf, other

    cs.SD eess.AS

    Meta-Learning for Adaptive Filters with Higher-Order Frequency Dependencies

    Authors: Junkai Wu, Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: Adaptive filters are applicable to many signal processing tasks including acoustic echo cancellation, beamforming, and more. Adaptive filters are typically controlled using algorithms such as least-mean squares(LMS), recursive least squares(RLS), or Kalman filter updates. Such models are often applied in the frequency domain, assume frequency independent processing, and do not exploit higher-order… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Source code and audio examples: https://jmcasebeer.github.io/metaaf/higher-order

  6. arXiv:2207.08759  [pdf, other

    cs.SD eess.AS

    Style Transfer of Audio Effects with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Nicholas J. Bryan, Joshua D. Reiss

    Abstract: We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effe… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Preprint. To appear in the Journal of the Audio Engineering Society

  7. arXiv:2204.11942  [pdf, other

    cs.SD eess.AS eess.SP

    Meta-AF: Meta-Learning for Adaptive Filters

    Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: Adaptive filtering algorithms are pervasive throughout signal processing and have had a material impact on a wide variety of domains including audio processing, telecommunications, biomedical sensing, astrophysics and cosmology, seismology, and many more. Adaptive filters typically operate via specialized online, iterative optimization methods such as least-mean squares or recursive least squares… ▽ More

    Submitted 21 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted to ACM/IEEE TASLP. Source code and audio examples: https://jmcasebeer.github.io/projects/metaaf

  8. arXiv:2110.09600  [pdf, other

    cs.SD eess.AS

    Who calls the shots? Rethinking Few-Shot Learning for Audio

    Authors: Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, Juan Pablo Bello

    Abstract: Few-shot learning aims to train models that can recognize novel classes given just a handful of labeled examples, known as the support set. While the field has seen notable advances in recent years, they have often focused on multi-class image classification. Audio, in contrast, is often multi-label due to overlap** sounds, resulting in unique properties such as polyphony and signal-to-noise rat… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: WASPAA 2021

  9. arXiv:2110.04284  [pdf, other

    cs.SD eess.AS

    Auto-DSP: Learning to Optimize Acoustic Echo Cancellers

    Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: Adaptive filtering algorithms are commonplace in signal processing and have wide-ranging applications from single-channel denoising to multi-channel acoustic echo cancellation and adaptive beamforming. Such algorithms typically operate via specialized online, iterative optimization methods and have achieved tremendous success, but require expert knowledge, are slow to develop, and are difficult to… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted to the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Source code and audio examples: https://jmcasebeer.github.io/projects/auto-dsp/

  10. arXiv:2110.02360  [pdf, other

    eess.AS cs.SD

    Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

    Authors: Max Morrison, Zeyu **, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

    Abstract: Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis. Thus far, methods for pitch-shifting and time-stretching that use digital signal processing (DSP) have been favored over deep learning approaches due to their speed and relatively higher quality.… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  11. arXiv:2107.13634  [pdf, other

    eess.AS cs.SD

    Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization

    Authors: Haici Yang, Shivani Firodiya, Nicholas J. Bryan, Minje Kim

    Abstract: The task of manipulating the level and/or effects of individual instruments to recompose a mixture of recordings, or remixing, is common across a variety of applications such as music production, audio-visual post-production, podcasts, and more. This process, however, traditionally requires access to individual source recordings, restricting the creative process. To work around this, source separa… ▽ More

    Submitted 22 October, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

  12. arXiv:2105.04752  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Differentiable Signal Processing With Black-Box Audio Effects

    Authors: Marco A. Martínez Ramírez, Oliver Wang, Paris Smaragdis, Nicholas J. Bryan

    Abstract: We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to perform the desired signal manipulation, requiring only input-target paired audio data as supervision. To train our network with non-differentiable blac… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: Presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), June 2021. Source code, demo and audio examples: https://mchijmma.github.io/DeepAFx/

  13. arXiv:2102.08328  [pdf, other

    eess.AS cs.LG cs.SD

    Context-Aware Prosody Correction for Text-Based Speech Editing

    Authors: Max Morrison, Lucas Rencker, Zeyu **, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

    Abstract: Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript. A major drawback of current systems, however, is that edited recordings often sound unnatural because of prosody mismatches around edited regions. In our work, we propose a new context-aware method for more natural sounding text-bas… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: To appear in proceedings of ICASSP 2021

  14. arXiv:2008.03729  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Metric Learning vs Classification for Disentangled Music Representation Learning

    Authors: Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu **, Juhan Nam

    Abstract: Deep representation learning offers a powerful paradigm for map** input data onto an organized embedding space and is useful for many music information retrieval tasks. Two central methods for representation learning include deep metric learning and classification, both having the same goal of learning a representation that can generalize well across tasks. Along with generalization, the emergin… ▽ More

    Submitted 12 August, 2020; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 21st International Society for Music Information Retrieval Conference (ISMIR 2020)

  15. arXiv:2008.03720  [pdf, other

    eess.AS cs.LG cs.SD

    Disentangled Multidimensional Metric Learning for Music Similarity

    Authors: Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu **, Juhan Nam

    Abstract: Music similarity search is useful for a variety of creative tasks such as replacing one music recording with another recording with a similar "feel", a common task in video editing. For this task, it is typically necessary to define a similarity metric to compare one recording to another. Music similarity, however, is hard to define and depends on multiple simultaneous notions of similarity (i.e.… ▽ More

    Submitted 12 August, 2020; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  16. arXiv:2008.03388  [pdf, other

    eess.AS cs.LG cs.SD

    Controllable Neural Prosody Synthesis

    Authors: Max Morrison, Zeyu **, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore

    Abstract: Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators. However, these systems lack intuitive user controls over prosody, making them unable to rectify prosody errors (e.g., misplaced emphases and contextually inappropriate emotions) or generate prosodies with diverse speaker excitement levels and emotions. We… ▽ More

    Submitted 11 August, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: To appear in proceedings of INTERSPEECH 2020

  17. arXiv:2008.02791  [pdf, other

    cs.SD eess.AS

    Few-Shot Drum Transcription in Polyphonic Music

    Authors: Yu Wang, Justin Salamon, Mark Cartwright, Nicholas J. Bryan, Juan Pablo Bello

    Abstract: Data-driven approaches to automatic drum transcription (ADT) are often limited to a predefined, small vocabulary of percussion instrument classes. Such models cannot recognize out-of-vocabulary classes nor are they able to adapt to finer-grained vocabularies. In this work, we address open vocabulary ADT by introducing few-shot learning to the task. We train a Prototypical Network on a synthetic da… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: ISMIR 2020 camera-ready

  18. arXiv:2001.04460  [pdf, other

    eess.AS cs.SD

    A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

    Authors: Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu **

    Abstract: Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a… ▽ More

    Submitted 18 May, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: Dataset, code and sound examples can be found at https://pixl.cs.princeton.edu/pubs/Manocha_2020_ADP/

  19. arXiv:1911.06245  [pdf, other

    cs.SD cs.GR cs.MM eess.AS

    Scene-Aware Audio Rendering via Deep Acoustic Analysis

    Authors: Zhenyu Tang, Nicholas J. Bryan, Dingzeyu Li, Timothy R. Langlois, Dinesh Manocha

    Abstract: We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models. Given the captured audio and an approximate geometric model of a real-world room, we present a novel learning-based method to estimate its acoustic material properties. Our approach is based on de… ▽ More

    Submitted 9 February, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

    Comments: Accepted to IEEE VR 2020 Journal Track (TVCG)

    Journal ref: IEEE Transactions on Visualization and Computer Graphics ( Volume: 26, Issue: 5, May 2020)

  20. arXiv:1909.03642  [pdf, other

    cs.SD eess.AS

    Impulse Response Data Augmentation and Deep Neural Networks for Blind Room Acoustic Parameter Estimation

    Authors: Nicholas J. Bryan

    Abstract: The reverberation time (T60) and the direct-to-reverberant ratio (DRR) are commonly used to characterize room acoustic environments. Both parameters can be measured from an acoustic impulse response (AIR) or using blind estimation methods that perform estimation directly from speech. When neural networks are used for blind estimation, however, a large realistic dataset is needed, which is expensiv… ▽ More

    Submitted 21 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Under Review