Skip to main content

Showing 1–50 of 62 results for author: Sandler, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14180  [pdf, other

    cs.LG

    Linear Transformers are Versatile In-Context Learners

    Authors: Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge

    Abstract: Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that any linear transformer maintains an implicit linear model and can be interpreted as… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  2. arXiv:2401.16587  [pdf, other

    cs.CL cs.AI cs.CY

    A Linguistic Comparison between Human and ChatGPT-Generated Conversations

    Authors: Morgan Sandler, Hyesun Choung, Arun Ross, Prabu David

    Abstract: This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenti… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Proceedings of the 4th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), Jeju, Korea, 2024

  3. arXiv:2310.04811  [pdf, other

    cs.SD cs.NE eess.AS eess.SY

    FM Tone Transfer with Envelope Learning

    Authors: Franco Caspe, Andrew McPherson, Mark Sandler

    Abstract: Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while kee** their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted to Audio Mostly 2023

  4. arXiv:2309.06649  [pdf, other

    cs.SD eess.AS

    Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

    Authors: Jordie Shier, Franco Caspe, Andrew Robertson, Mark Sandler, Charalampos Saitis, Andrew McPherson

    Abstract: Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified sy… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: To be published in The Proceedings of Forum Acusticum, Sep 2023, Turin, Italy

  5. arXiv:2309.05858  [pdf, other

    cs.LG cs.AI

    Uncovering mesa-optimization algorithms in Transformers

    Authors: Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Sei** Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

    Abstract: Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  6. arXiv:2309.02404  [pdf, other

    cs.SD cs.CV eess.AS

    Voice Morphing: Two Identities in One Voice

    Authors: Sushanta K. Pani, Anurag Chowdhury, Morgan Sandler, Arun Ross

    Abstract: In a biometric system, each biometric sample or template is typically associated with a single identity. However, recent research has demonstrated the possibility of generating "morph" biometric samples that can successfully match more than a single identity. Morph attacks are now recognized as a potential security threat to biometric systems. However, most morph attacks have been studied on biome… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted oral paper at BIOSIG 2023

  7. arXiv:2305.14867  [pdf, other

    cs.SD cs.HC eess.AS

    Interactive Neural Resonators

    Authors: Rodrigo Diaz, Charalampos Saitis, Mark Sandler

    Abstract: In this work, we propose a method for the controllable synthesis of real-time contact sounds using neural resonators. Previous works have used physically inspired statistical methods and physical modelling for object materials and excitation signals. Our method incorporates differentiable second-order resonators and estimates their coefficients using a neural network that is conditioned on physica… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  8. arXiv:2305.07997  [pdf, other

    eess.AS cs.SD

    Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

    Authors: Morgan Sandler, Arun Ross

    Abstract: The accuracy of automated speaker recognition is negatively impacted by change in emotions in a person's speech. In this paper, we hypothesize that speaker identity is composed of various vocal style factors that may be learned from unlabeled data and re-combined using a neural network to generate a holistic speaker identity representation for affective scenarios. In this regard, we propose the E-… ▽ More

    Submitted 3 August, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

    Comments: Proceedings of the IEEE 2023 International Joint Conference on Biometrics (IJCB)

  9. arXiv:2301.04584  [pdf, other

    cs.LG cs.CV

    Continual Few-Shot Learning Using HyperTransformers

    Authors: Max Vladymyrov, Andrey Zhmoginov, Mark Sandler

    Abstract: We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from… ▽ More

    Submitted 12 January, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  10. arXiv:2301.02312  [pdf, other

    cs.LG

    Training trajectories, mini-batch losses and the curious role of the learning rate

    Authors: Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Nolan Miller

    Abstract: Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for Res… ▽ More

    Submitted 1 February, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: 21 pages, 14 figures

  11. arXiv:2211.15774  [pdf, other

    cs.LG cs.CV

    Decentralized Learning with Multi-Headed Distillation

    Authors: Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov

    Abstract: Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxilia… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  12. arXiv:2211.08213  [pdf, other

    eess.AS cs.SD

    Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

    Authors: Morgan Sandler, Arun Ross

    Abstract: In this work, we study the hypothesis that speaker identity embeddings extracted from speech samples may be used for detection and classification of emotion. In particular, we show that emotions can be effectively identified by learning speaker identities by use of a 1-D Triplet Convolutional Neural Network (CNN) & Global Style Token (GST) scheme (e.g., DeepTalk Network) and reusing the trained sp… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  13. arXiv:2210.15306  [pdf, other

    cs.SD cs.LG eess.AS

    Rigid-Body Sound Synthesis with Differentiable Modal Resonators

    Authors: Rodrigo Diaz, Ben Hayes, Charalampos Saitis, György Fazekas, Mark Sandler

    Abstract: Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-to-end framework for training a deep neural networ… ▽ More

    Submitted 28 October, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages

  14. arXiv:2208.06169  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

    Authors: Franco Caspe, Andrew McPherson, Mark Sandler

    Abstract: FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted to ISMIR 2022. See online supplement at https://fcaspe.github.io/ddx7/

    ACM Class: H.5.5; I.2.6

  15. arXiv:2207.07769  [pdf, ps, other

    cs.LG cs.AI cs.IR

    Anomalous behaviour in loss-gradient based interpretability methods

    Authors: Vinod Subramanian, Siddharth Gururani, Emmanouil Benetos, Mark Sandler

    Abstract: Loss-gradients are used to interpret the decision making process of deep learning models. In this work, we evaluate loss-gradient based attribution methods by occluding parts of the input and comparing the performance of the occluded input to the original input. We observe that the occluded input has better performance than the original across the test dataset under certain conditions. Similar beh… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted at ICLR RobustML workshop 2021

  16. arXiv:2204.04651  [pdf, other

    cs.SD cs.IR eess.AS

    Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

    Authors: Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, Mark Sandler

    Abstract: Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in building applications that allow artists to efficiently pick target samples from big sound libraries just by imitating them vocally. In this study, we investigated the… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022 (under review)

  17. arXiv:2204.04646  [pdf, other

    cs.SD cs.IR eess.AS

    Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification

    Authors: Alejandro Delgado, Emir Demirel, Vinod Subramanian, Charalampos Saitis, Mark Sandler

    Abstract: Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly. Classifier algorithms in VPT systems learn best from small user-specific datasets, which usually restrict modelling to small input feature sets to avoid data overfitting. This study explores severa… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Accepted at Sound and Music Computing (SMC) conference 2022

  18. arXiv:2203.15243  [pdf, other

    cs.CV

    Fine-tuning Image Transformers using Learnable Memory

    Authors: Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Andrew Jackson

    Abstract: In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens"… ▽ More

    Submitted 29 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: CVPR 2022, to appear

  19. arXiv:2201.04182  [pdf, other

    cs.LG cs.CV

    HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

    Authors: Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

    Abstract: In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space… ▽ More

    Submitted 13 July, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

  20. arXiv:2110.09223  [pdf, other

    cs.SD eess.AS

    Learning Models for Query by Vocal Percussion: A Comparative Study

    Authors: Alejandro Delgado, SkoT McDonald, Ning Xu, Charalampos Saitis, Mark Sandler

    Abstract: The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Thus, the automatic retrieval of drum sounds using vocal percussion can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. Here we explore different strategies to perform this type of query, making use of… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Published in proceedings of the International Computer Music Conference (ICMC) 2021

  21. arXiv:2107.10963  [pdf, other

    cs.LG cs.CV

    Compositional Models: Multi-Task Learning and Knowledge Transfer with Modular Networks

    Authors: Andrey Zhmoginov, Dina Bashkirova, Mark Sandler

    Abstract: Conditional computation and modular networks have been recently proposed for multitask learning and other problems as a way to decompose problem solving into multiple reusable computational blocks. We propose a new approach for learning modular networks based on the isometric version of ResNet with all residual blocks having the same configuration and the same number of parameters. This architectu… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

  22. arXiv:2104.04657  [pdf, other

    cs.LG cs.NE

    Meta-Learning Bidirectional Update Rules

    Authors: Mark Sandler, Max Vladymyrov, Andrey Zhmoginov, Nolan Miller, Andrew Jackson, Tom Madams, Blaise Aguera y Arcas

    Abstract: In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks… ▽ More

    Submitted 11 June, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: ICML 2021, 17 pages

  23. Transfer Learning for Piano Sustain-Pedal Detection

    Authors: Beici Liang, György Fazekas, Mark Sandler

    Abstract: Detecting piano pedalling techniques in polyphonic music remains a challenging task in music information retrieval. While other piano-related tasks, such as pitch estimation and onset detection, have seen improvement through applying deep learning methods, little work has been done to develop deep learning models to detect playing techniques. In this paper, we propose a transfer learning approach… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Published in 2019 International Joint Conference on Neural Networks (IJCNN)

  24. arXiv:2101.01260  [pdf, other

    cs.CV

    SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection

    Authors: Keren Ye, Adriana Kovashka, Mark Sandler, Menglong Zhu, Andrew Howard, Marco Fornoni

    Abstract: Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of parameters. While this guarantees high performance, it is also highly inefficient, as each model has to be separately downloaded and stored. In this paper we… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: Accepted by the ACCV2020 (Oral)

  25. arXiv:2012.05578  [pdf, other

    cs.LG cs.CV

    Large-Scale Generative Data-Free Distillation

    Authors: Liangchen Luo, Mark Sandler, Zi Lin, Andrey Zhmoginov, Andrew Howard

    Abstract: Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this pr… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

  26. arXiv:2008.04965  [pdf, other

    cs.CV cs.LG

    Image segmentation via Cellular Automata

    Authors: Mark Sandler, Andrey Zhmoginov, Liangcheng Luo, Alexander Mordvintsev, Ettore Randazzo, Blaise Agúera y Arcas

    Abstract: In this paper, we propose a new approach for building cellular automata to solve real-world segmentation problems. We design and train a cellular automaton that can successfully segment high-resolution images. We consider a colony that densely inhabits the pixel grid, and all cells are governed by a randomized update that uses the current state, the color, and the state of the $3\times 3$ neighbor… ▽ More

    Submitted 12 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

  27. arXiv:2001.06086  [pdf, other

    cs.AI cs.IR cs.LG cs.SD eess.AS

    A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

    Authors: Johan Pauwels, György Fazekas, Mark B. Sandler

    Abstract: In recent years, Markov logic networks (MLNs) have been proposed as a potentially useful paradigm for music signal analysis. Because all hidden Markov models can be reformulated as MLNs, the latter can provide an all-encompassing framework that reuses and extends previous work in the field. However, just because it is theoretically possible to reformulate previous work as MLNs, does not mean that… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020

  28. arXiv:1911.11177  [pdf, other

    cs.LG cs.CV stat.ML

    Structured Multi-Hashing for Model Compression

    Authors: Elad Eban, Yair Movshovitz-Attias, Hao Wu, Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel A. Carreira-Perpinan

    Abstract: Despite the success of deep neural networks (DNNs), state-of-the-art models are too large to deploy on low-resource devices or common server configurations in which multiple models are held in memory. Model compression methods address this limitation by reducing the memory footprint, latency, or energy consumption of a model with minimal impact on accuracy. We focus on the task of reducing the num… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

    Comments: Elad and Yair contributed equally to the paper. They jointly proposed the idea of structured-multi-hashing. Elad: Wrote most of the code and ran most of the experiments Yair: Main contributor to the manuscript Hao: Coding and experiments Yerlan: Coding and experiments Miguel: advised Yerlan about optimization and model compression Mark:experiments Andrew: experiments

  29. arXiv:1909.03205  [pdf, other

    cs.CV

    Non-discriminative data or weak model? On the relative importance of data and model resolution

    Authors: Mark Sandler, Jonathan Baccash, Andrey Zhmoginov, Andrew Howard

    Abstract: We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information… ▽ More

    Submitted 17 October, 2019; v1 submitted 7 September, 2019; originally announced September 2019.

    Comments: ICCV 2019 Workshop on Real-World Recognition from Low-Quality Images and Videos

  30. arXiv:1907.09578  [pdf, other

    cs.CV cs.IT cs.LG

    Information-Bottleneck Approach to Salient Region Discovery

    Authors: Andrey Zhmoginov, Ian Fischer, Mark Sandler

    Abstract: We propose a new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, ou… ▽ More

    Submitted 14 February, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

  31. arXiv:1907.02477  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    Adversarial Attacks in Sound Event Classification

    Authors: Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, Mark Sandler

    Abstract: Adversarial attacks refer to a set of methods that perturb the input to a classification model in order to fool the classifier. In this paper we apply different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification. Four of the models use mel-spectrogram input and one model uses raw audio input. The models represent standard architectures… ▽ More

    Submitted 15 August, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

    Comments: Fixed Freesound data reference to FSDKaggle2018

  32. Efficient On-line Computation of Visibility Graphs

    Authors: Delia Fano Yela, Florian Thalmann, Vincenzo Nicosia, Dan Stowell, Mark Sandler

    Abstract: A visibility algorithm maps time series into complex networks following a simple criterion. The resulting visibility graph has recently proven to be a powerful tool for time series analysis. However its straightforward computation is time-consuming and rigid, motivating the development of more efficient algorithms. Here we present a highly efficient method to compute visibility graphs with the fur… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: code https://github.com/delialia/bst

    Journal ref: Phys. Rev. Research 2, 023069 (2020)

  33. arXiv:1905.02244  [pdf, other

    cs.CV

    Searching for MobileNetV3

    Authors: Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam

    Abstract: We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration… ▽ More

    Submitted 20 November, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: ICCV 2019

  34. arXiv:1903.01976  [pdf, other

    cs.SD eess.AS

    Spectral Visibility Graphs: Application to Similarity of Harmonic Signals

    Authors: Delia Fano Yela, Dan Stowell, Mark Sandler

    Abstract: Graph theory is emerging as a new source of tools for time series analysis. One promising method is to transform a signal into its visibility graph, a representation which captures many interesting aspects of the signal. Here we introduce the visibility graph for audio spectra and propose a novel representation for audio analysis: the spectral visibility graph degree. Such representation inherentl… ▽ More

    Submitted 20 June, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

    Comments: European Signal Processing Conference (EUSIPCO)

  35. arXiv:1810.10703  [pdf, ps, other

    cs.LG cs.CV stat.ML

    K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

    Authors: Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard

    Abstract: We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network… ▽ More

    Submitted 23 February, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: published at ICLR 2019

  36. arXiv:1807.11626  [pdf, other

    cs.CV cs.LG

    MnasNet: Platform-Aware Neural Architecture Search for Mobile

    Authors: Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le

    Abstract: Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose a… ▽ More

    Submitted 28 May, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: Published in CVPR 2019

    Journal ref: CVPR 2019

  37. arXiv:1804.03230  [pdf, other

    cs.CV

    NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications

    Authors: Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam

    Abstract: This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt in… ▽ More

    Submitted 28 September, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Accepted by ECCV 2018

  38. arXiv:1804.02325  [pdf, other

    cs.SD eess.AS

    Does k Matter? k-NN Hubness Analysis for Kernel Additive Modelling Vocal Separation

    Authors: Delia Fano Yela, Dan Stowell, Mark Sandler

    Abstract: Kernel Additive Modelling (KAM) is a framework for source separation aiming to explicitly model inherent properties of sound sources to help with their identification and separation. KAM separates a given source by applying robust statistics on the selection of time-frequency bins obtained through a source-specific kernel, typically the k-NN function. Even though the parameter k appears to be key… ▽ More

    Submitted 6 April, 2018; originally announced April 2018.

    Comments: LVA-ICA 2018 - Feedback always welcome

  39. arXiv:1802.05178  [pdf, other

    cs.MM cs.SD eess.AS

    Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders

    Authors: Adib Mehrabi, Keunwoo Choi, Simon Dixon, Mark Sandler

    Abstract: The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual similarity between vocalisations and qu… ▽ More

    Submitted 14 February, 2018; originally announced February 2018.

    Comments: ICASSP 2018 camera-ready

  40. arXiv:1801.04381  [pdf, other

    cs.CV

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    Authors: Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen

    Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic se… ▽ More

    Submitted 21 March, 2019; v1 submitted 12 January, 2018; originally announced January 2018.

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510-4520

  41. arXiv:1712.02950  [pdf, other

    cs.CV cs.LG stat.ML

    CycleGAN, a Master of Steganography

    Authors: Casey Chu, Andrey Zhmoginov, Mark Sandler

    Abstract: CycleGAN (Zhu et al. 2017) is one recent successful approach to learn a transformation between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image into the images it generates in a nearly imperceptible, high-frequency signal. This trick ensures that the generator can recover the original… ▽ More

    Submitted 16 December, 2017; v1 submitted 8 December, 2017; originally announced December 2017.

    Comments: NIPS 2017, workshop on Machine Deception

  42. arXiv:1711.00351  [pdf, other

    cs.SD eess.AS

    Shift-Invariant Kernel Additive Modelling for Audio Source Separation

    Authors: Delia Fano Yela, Sebastian Ewert, Ken O'Hanlon, Mark B. Sandler

    Abstract: A major goal in blind source separation to identify and separate sources is to model their inherent characteristics. While most state-of-the-art approaches are supervised methods trained on large datasets, interest in non-data-driven approaches such as Kernel Additive Modelling (KAM) remains high due to their interpretability and adaptability. KAM performs the separation of a given source applying… ▽ More

    Submitted 16 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Feedback is welcome

    ACM Class: H.5.5; I.5.1; I.5.4

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018

  43. arXiv:1709.04396  [pdf, other

    cs.CV cs.SD

    A Tutorial on Deep Learning for Music Information Retrieval

    Authors: Keunwoo Choi, György Fazekas, Kyunghyun Cho, Mark Sandler

    Abstract: Following their success in Computer Vision and other areas, deep learning techniques have recently become widely adopted in Music Information Retrieval (MIR) research. However, the majority of works aim to adopt and assess methods that have been shown to be effective in other domains, while there is still a great need for more original research focusing on music primarily and utilising musical kno… ▽ More

    Submitted 3 May, 2018; v1 submitted 13 September, 2017; originally announced September 2017.

  44. arXiv:1709.01922  [pdf, other

    cs.SD cs.CV cs.IR cs.LG

    A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

    Authors: Keunwoo Choi, György Fazekas, Kyunghyun Cho, Mark Sandler

    Abstract: In this paper, we empirically investigate the effect of audio preprocessing on music tagging with deep neural networks. We perform comprehensive experiments involving audio preprocessing using different time-frequency representations, logarithmic magnitude compression, frequency weighting, and scaling. We show that many commonly used input preprocessing techniques are redundant except magnitude co… ▽ More

    Submitted 22 February, 2021; v1 submitted 6 September, 2017; originally announced September 2017.

    Comments: 5 pages. EUSIPCO 2018 camera-ready. arXiv:1706.02361 does not have the overlapped part with this submission anymore

  45. arXiv:1707.00160  [pdf, other

    cs.SD cs.NE

    An Augmented Lagrangian Method for Piano Transcription using Equal Loudness Thresholding and LSTM-based Decoding

    Authors: Sebastian Ewert, Mark B. Sandler

    Abstract: A central goal in automatic music transcription is to detect individual note events in music recordings. An important variant is instrument-dependent music transcription where methods can use calibration data for the instruments in use. However, despite the additional information, results rarely exceed an f-measure of 80%. As a potential explanation, the transcription problem can be shown to be ba… ▽ More

    Submitted 30 July, 2017; v1 submitted 1 July, 2017; originally announced July 2017.

    ACM Class: H.5.5; I.2.6

    Journal ref: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, pp. 146-150, 2017

  46. arXiv:1706.02361  [pdf, other

    cs.IR cs.LG cs.MM cs.SD

    The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging

    Authors: Keunwoo Choi, George Fazekas, Kyunghyun Cho, Mark Sandler

    Abstract: Deep neural networks (DNN) have been successfully applied to music classification including music tagging. However, there are several open questions regarding the training, evaluation, and analysis of DNNs. In this article, we investigate specific aspects of neural networks, the effects of noisy labels, to deepen our understanding of their properties. We analyse and (re-)validate a large music tag… ▽ More

    Submitted 14 November, 2017; v1 submitted 7 June, 2017; originally announced June 2017.

    Comments: The section that overlapped with arXiv:1709.01922 is completely removed since the earlier version. This is the camera-ready version

  47. arXiv:1703.09179  [pdf, other

    cs.CV cs.AI cs.MM cs.SD

    Transfer learning for music classification and regression tasks

    Authors: Keunwoo Choi, György Fazekas, Mark Sandler, Kyunghyun Cho

    Abstract: In this paper, we present a transfer learning approach for music classification and regression tasks. We propose to use a pre-trained convnet feature, a concatenated feature vector using the activations of feature maps of multiple layers in a trained convolutional network. We show how this convnet feature can serve as general-purpose music representation. In the experiments, a convnet is trained f… ▽ More

    Submitted 13 September, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

    Comments: 18th International Society of Music Information Retrieval (ISMIR) Conference, Suzhou, China, 2017

  48. arXiv:1702.06257  [pdf, other

    cs.CV

    The Power of Sparsity in Convolutional Neural Networks

    Authors: Soravit Changpinyo, Mark Sandler, Andrey Zhmoginov

    Abstract: Deep convolutional networks are well-known for their high computational and memory demands. Given limited resources, how does one design a network that balances its size, training time, and prediction accuracy? A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In… ▽ More

    Submitted 20 February, 2017; originally announced February 2017.

  49. arXiv:1702.02130  [pdf, other

    cs.SD

    On the Importance of Temporal Context in Proximity Kernels: A Vocal Separation Case Study

    Authors: Delia Fano Yela, Sebastian Ewert, Derry FitzGerald, Mark Sandler

    Abstract: Musical source separation methods exploit source-specific spectral characteristics to facilitate the decomposition process. Kernel Additive Modelling (KAM) models a source applying robust statistics to time-frequency bins as specified by a source-specific kernel, a function defining similarity between bins. Kernels in existing approaches are typically defined using metrics between single time fram… ▽ More

    Submitted 11 April, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

    Comments: 2017 AES International Conference on Semantic Audio

    ACM Class: H.5.5

    Journal ref: Proceedings of the AES International Conference on Semantic Audio, Erlangen, Germany, pp. 13-20, 2017

  50. arXiv:1609.06210  [pdf, other

    cs.SD

    Interference Reduction in Music Recordings Combining Kernel Additive Modelling and Non-Negative Matrix Factorization

    Authors: Delia Fano Yela, Sebastian Ewert, Derry FitzGerald, Mark Sandler

    Abstract: In live and studio recordings unexpected sound events often lead to interferences in the signal. For non-stationary interferences, sound source separation techniques can be used to reduce the interference level in the recording. In this context, we present a novel approach combining the strengths of two algorithmic families: NMF and KAM. The recent KAM approach applies robust statistics on frames… ▽ More

    Submitted 8 February, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

    Comments: International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    ACM Class: H.5.5

    Journal ref: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, pp. 51-55, 2017