Skip to main content

Showing 1–5 of 5 results for author: Çakır, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2007.04660  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Multi-task Regularization Based on Infrequent Classes for Audio Captioning

    Authors: Emre Çakır, Konstantinos Drossos, Tuomas Virtanen

    Abstract: Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio. Most audio captioning methods are based on deep neural networks, employing an encoder-decoder scheme and a dataset with audio clips and corresponding natural language descriptions (i.e. captions). A significant challenge for audio captioning is the distribution of words in the c… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  2. arXiv:1808.05777  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised adversarial domain adaptation for acoustic scene classification

    Authors: Shayan Gharib, Konstantinos Drossos, Emre Çakir, Dmitriy Serdyuk, Tuomas Virtanen

    Abstract: A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy. As a countermeasure, we present the first method of unsupervised adversarial domain adaptation for acoustic scene classification. We employ a model pre-trained on data from one set of… ▽ More

    Submitted 17 August, 2018; originally announced August 2018.

  3. arXiv:1805.03647  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

    Authors: Emre Çakır, Tuomas Virtanen

    Abstract: Sound event detection systems typically consist of two stages: extracting hand-crafted features from the raw audio waveform, and learning a map** between these features and the target sound events using a classifier. Recently, the focus of sound event detection research has been mostly shifted to the latter stage using standard features such as mel spectrogram as the input for classifiers such a… ▽ More

    Submitted 9 May, 2018; originally announced May 2018.

    Comments: accepted to IJCNN 2018

  4. arXiv:1706.02047  [pdf, other

    cs.SD cs.LG

    Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

    Authors: Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen

    Abstract: This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks. Data augmentation by blocks mixing and domain adaptation using a novel method of test mixing are proposed and evaluated in regard to making the method robust to unseen data. The contributions of two kinds of acoustic features (dominant frequency and log mel-band energy) and t… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

    Comments: Accepted for European Signal Processing Conference 2017

  5. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

    Authors: Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen

    Abstract: Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs an… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: Accepted for IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Sound Scene and Event Analysis