-
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Authors:
Daisuke Niizumi,
Daiki Takeuchi,
Yasunori Ohishi,
Noboru Harada,
Masahiro Yasuda,
Shunsuke Tsubaki,
Keisuke Imoto
Abstract:
Contrastive language-audio pre-training (CLAP) enables zero-shot (ZS) inference of audio and exhibits promising performance in several classification tasks. However, conventional audio representations are still crucial for many tasks where ZS is not applicable (e.g., regression problems). Here, we explore a new representation, a general-purpose audio-language representation, that performs well in…
▽ More
Contrastive language-audio pre-training (CLAP) enables zero-shot (ZS) inference of audio and exhibits promising performance in several classification tasks. However, conventional audio representations are still crucial for many tasks where ZS is not applicable (e.g., regression problems). Here, we explore a new representation, a general-purpose audio-language representation, that performs well in both ZS and transfer learning. To do so, we propose a new method, M2D-CLAP, which combines self-supervised learning Masked Modeling Duo (M2D) and CLAP. M2D learns an effective representation to model audio signals, and CLAP aligns the representation with text embedding. As a result, M2D-CLAP learns a versatile representation that allows for both ZS and transfer learning. Experiments show that M2D-CLAP performs well on linear evaluation, fine-tuning, and ZS classification with a GTZAN state-of-the-art of 75.17%, thus achieving a general-purpose audio-language representation.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Authors:
Masahiro Yasuda,
Noboru Harada,
Yasunori Ohishi,
Shoichiro Saito,
Akira Nakayama,
Nobutaka Ono
Abstract:
Observations with distributed sensors are essential in analyzing a series of human and machine activities (referred to as 'events' in this paper) in complex and extensive real-world environments. This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze eve…
▽ More
Observations with distributed sensors are essential in analyzing a series of human and machine activities (referred to as 'events' in this paper) in complex and extensive real-world environments. This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze events comprehensively. However, a learning method has yet to be established to extract joint representations that effectively combine such distributed observations. Therefore, we propose Guided Masked sELf-Distillation modeling (Guided-MELD) for inter-sensor relationship modeling. The basic idea of Guided-MELD is to learn to supplement the information from the masked sensor with information from other sensors needed to detect the event. Guided-MELD is expected to enable the system to effectively distill the fragmented or redundant target event information obtained by the sensors without being overly dependent on any specific sensors. To validate the effectiveness of the proposed method in novel tasks of distributed multimedia sensor event analysis, we recorded two new datasets that fit the problem setting: MM-Store and MM-Office. These datasets consist of human activities in a convenience store and an office, recorded using distributed cameras and microphones. Experimental results on these datasets show that the proposed Guided-MELD improves event tagging and detection performance and outperforms conventional inter-sensor relationship modeling methods. Furthermore, the proposed method performed robustly even when sensors were reduced.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Improving Interpretability of Scores in Anomaly Detection Based on Gaussian-Bernoulli Restricted Boltzmann Machine
Authors:
Kaiji Sekimoto,
Muneki Yasuda
Abstract:
Gaussian-Bernoulli restricted Boltzmann machines (GBRBMs) are often used for semi-supervised anomaly detection, where they are trained using only normal data points. In GBRBM-based anomaly detection, normal and anomalous data are classified based on a score that is identical to an energy function of the marginal GBRBM. However, the classification threshold is difficult to set to an appropriate val…
▽ More
Gaussian-Bernoulli restricted Boltzmann machines (GBRBMs) are often used for semi-supervised anomaly detection, where they are trained using only normal data points. In GBRBM-based anomaly detection, normal and anomalous data are classified based on a score that is identical to an energy function of the marginal GBRBM. However, the classification threshold is difficult to set to an appropriate value, as this score cannot be interpreted. In this study, we propose a measure that improves score's interpretability based on its cumulative distribution, and establish a guideline for setting the threshold using the interpretable measure. The results of numerical experiments show that the guideline is reasonable when setting the threshold solely using normal data points. Moreover, because identifying the measure involves computationally infeasible evaluation of the minimum score value, we also propose an evaluation method for the minimum score based on simulated annealing, which is widely used for optimization problems. The proposed evaluation method was also validated using numerical experiments.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human
Authors:
Masahiro Yasuda,
Shoichiro Saito,
Akira Nakayama,
Noboru Harada
Abstract:
We aim to perform sound event localization and detection (SELD) using wearable equipment for a moving human, such as a pedestrian. Conventional SELD tasks have dealt only with microphone arrays located in static positions. However, self-motion with three rotational and three translational degrees of freedom (6DoF) shall be considered for wearable microphone arrays. A system trained only with a dat…
▽ More
We aim to perform sound event localization and detection (SELD) using wearable equipment for a moving human, such as a pedestrian. Conventional SELD tasks have dealt only with microphone arrays located in static positions. However, self-motion with three rotational and three translational degrees of freedom (6DoF) shall be considered for wearable microphone arrays. A system trained only with a dataset using microphone arrays in a fixed position would be unable to adapt to the fast relative motion of sound events associated with self-motion, resulting in the degradation of SELD performance. To address this, we designed 6DoF SELD Dataset for wearable systems, the first SELD dataset considering the self-motion of microphones. Furthermore, we proposed a multi-modal SELD system that jointly utilizes audio and motion tracking sensor signals. These sensor signals are expected to help the system find useful acoustic cues for SELD on the basis of the current self-motion state. Experimental results on our dataset show that the proposed method effectively improves SELD performance with a mechanism to extract acoustic features conditioned by sensor signals.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Look Around! Unexpected gains from training on environments in the vicinity of the target
Authors:
Serena Bono,
Spandan Madan,
Ishaan Grover,
Mao Yasueda,
Cynthia Breazeal,
Hanspeter Pfister,
Gabriel Kreiman
Abstract:
Solutions to Markov Decision Processes (MDP) are often very sensitive to state transition probabilities. As the estimation of these probabilities is often inaccurate in practice, it is important to understand when and how Reinforcement Learning (RL) agents generalize when transition probabilities change. Here we present a new methodology to evaluate such generalization of RL agents under small shi…
▽ More
Solutions to Markov Decision Processes (MDP) are often very sensitive to state transition probabilities. As the estimation of these probabilities is often inaccurate in practice, it is important to understand when and how Reinforcement Learning (RL) agents generalize when transition probabilities change. Here we present a new methodology to evaluate such generalization of RL agents under small shifts in the transition probabilities. Specifically, we evaluate agents in new environments (MDPs) in the vicinity of the training MDP created by adding quantifiable, parametric noise into the transition function of the training MDP. We refer to this process as Noise Injection, and the resulting environments as $δ$-environments. This process allows us to create controlled variations of the same environment with the level of the noise serving as a metric of distance between environments. Conventional wisdom suggests that training and testing on the same MDP should yield the best results. However, we report several cases of the opposite -- when targeting a specific environment, training the agent in an alternative noise setting can yield superior outcomes. We showcase this phenomenon across $60$ different variations of ATARI games, including PacMan, Pong, and Breakout.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline
Authors:
Noboru Harada,
Daisuke Niizumi,
Yasunori Ohishi,
Daiki Takeuchi,
Masahiro Yasuda
Abstract:
This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring. First-shot ASD does not allow systems to do machine-type dependent hyperparameter tuning or tool ensembling based on the performance metric calculated with the grand truth. To show benchmark performance for First-shot ASD, this paper proposes an anomaly sound detect…
▽ More
This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring. First-shot ASD does not allow systems to do machine-type dependent hyperparameter tuning or tool ensembling based on the performance metric calculated with the grand truth. To show benchmark performance for First-shot ASD, this paper proposes an anomaly sound detection system that works on the domain generalization task in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 2: "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Technique" while complying with the First-shot requirements introduced in the DCASE 2023 Challenge Task 2 (DCASE2023T2). A simple autoencoder based implementation combined with selective Mahalanobis metric is implemented as a baseline system. The performance evaluation is conducted to set the target benchmark for the forthcoming DCASE2023T2. Source code of the baseline system will be available on GitHub: https://github.com/nttcslab/dcase2023_task2_baseline_ae .
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Multi-layered Discriminative Restricted Boltzmann Machine with Untrained Probabilistic Layer
Authors:
Yuri Kanno,
Muneki Yasuda
Abstract:
An extreme learning machine (ELM) is a three-layered feed-forward neural network having untrained parameters, which are randomly determined before training. Inspired by the idea of ELM, a probabilistic untrained layer called a probabilistic-ELM (PELM) layer is proposed, and it is combined with a discriminative restricted Boltzmann machine (DRBM), which is a probabilistic three-layered neural netwo…
▽ More
An extreme learning machine (ELM) is a three-layered feed-forward neural network having untrained parameters, which are randomly determined before training. Inspired by the idea of ELM, a probabilistic untrained layer called a probabilistic-ELM (PELM) layer is proposed, and it is combined with a discriminative restricted Boltzmann machine (DRBM), which is a probabilistic three-layered neural network for solving classification problems. The proposed model is obtained by stacking DRBM on the PELM layer. The resultant model (i.e., multi-layered DRBM (MDRBM)) forms a probabilistic four-layered neural network. In MDRBM, the parameters in the PELM layer can be determined using Gaussian-Bernoulli restricted Boltzmann machine. Owing to the PELM layer, MDRBM obtains a strong immunity against noise in inputs, which is one of the most important advantages of MDRBM. Numerical experiments using some benchmark datasets, MNIST, Fashion-MNIST, Urban Land Cover, and CIFAR-10, demonstrate that MDRBM is superior to other existing models, particularly, in terms of the noise-robustness property (or, in other words, the generalization property).
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Free Energy Evaluation Using Marginalized Annealed Importance Sampling
Authors:
Muneki Yasuda,
Chako Takahashi
Abstract:
The evaluation of the free energy of a stochastic model is considered a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because the free energy expression includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo me…
▽ More
The evaluation of the free energy of a stochastic model is considered a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because the free energy expression includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method that is similar to a simulated annealing and can effectively approximate the free energy. This study proposes an AIS-based approach, which is referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on theoretical and numerical perspectives. Based on the investigation, it is proved that mAIS is more effective than AIS under a certain condition.
△ Less
Submitted 23 August, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Composite Spatial Monte Carlo Integration Based on Generalized Least Squares
Authors:
Kaiji Sekimoto,
Muneki Yasuda
Abstract:
Although evaluation of the expectations on the Ising model is essential in various applications, it is mostly infeasible because of intractable multiple summations. Spatial Monte Carlo integration (SMCI) is a sampling-based approximation. It can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called t…
▽ More
Although evaluation of the expectations on the Ising model is essential in various applications, it is mostly infeasible because of intractable multiple summations. Spatial Monte Carlo integration (SMCI) is a sampling-based approximation. It can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator improves monotonically as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such an expansion. In this paper, based on the theory of generalized least squares (GLS), a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).
△ Less
Submitted 17 October, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Authors:
Masahiro Yasuda,
Yasunori Ohishi,
Shoichiro Saito,
Noboru Harada
Abstract:
We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels. In this task, distributed sensors are utilized complementarily to capture events that are difficult to capture with a single sensor, such as a series of actions of people moving in an intricat…
▽ More
We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels. In this task, distributed sensors are utilized complementarily to capture events that are difficult to capture with a single sensor, such as a series of actions of people moving in an intricate room, or communication between people located far apart in a room. For sensors to cooperate effectively in such a situation, the system should be able to exchange information among sensors and combines information that is useful for identifying events in a complementary manner. For such a mechanism, we propose a Transformer-based multi-sensor fusion (MultiTrans) which combines multi-sensor data on the basis of the relationships between features of different viewpoints and modalities. In the experiments using a dataset newly collected for this task, our proposed method using MultiTrans improved the event detection performance and outperformed comparatives.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Authors:
Masahiro Yasuda,
Yasunori Ohishi,
Shoichiro Saito
Abstract:
Our goal is to develop a sound event localization and detection (SELD) system that works robustly in unknown environments. A SELD system trained on known environment data is degraded in an unknown environment due to environmental effects such as reverberation and noise not contained in the training data. Previous studies on related tasks have shown that domain adaptation methods are effective when…
▽ More
Our goal is to develop a sound event localization and detection (SELD) system that works robustly in unknown environments. A SELD system trained on known environment data is degraded in an unknown environment due to environmental effects such as reverberation and noise not contained in the training data. Previous studies on related tasks have shown that domain adaptation methods are effective when data on the environment in which the system will be used is available even without labels. However adaptation to unknown environments remains a difficult task. In this study, we propose echo-aware feature refinement (EAR) for SELD, which suppresses environmental effects at the feature level by using additional spatial cues of the unknown environment obtained through measuring acoustic echoes. FOA-MEIR, an impulse response dataset containing over 100 environments, was recorded to validate the proposed method. Experiments on FOA-MEIR show that the EAR effectively improves SELD performance in unknown environments.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Authors:
Kento Nagatomo,
Masahiro Yasuda,
Kohei Yatabe,
Shoichiro Saito,
Yasuhiro Oikawa
Abstract:
Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction. Deep neural networks (DNNs) are utilized to associate them with the sound signals observed by a microphone array. Although ambisonic microphones are popular in the literature of SELD, they might limit the range of applications due to their predetermined geometry. Some applications (in…
▽ More
Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction. Deep neural networks (DNNs) are utilized to associate them with the sound signals observed by a microphone array. Although ambisonic microphones are popular in the literature of SELD, they might limit the range of applications due to their predetermined geometry. Some applications (including those for pedestrians that perform SELD while walking) require a wearable microphone array whose geometry can be designed to suit the task. In this paper, for the development of such a wearable SELD, we propose a dataset named Wearable SELD dataset. It consists of data recorded by 24 microphones placed on a head and torso simulators (HATS) with some accessories mimicking wearable devices (glasses, earphones, and headphones). We also provide experimental results of SELD using the proposed dataset and SELDNet to investigate the effect of microphone configuration.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization
Authors:
Tomoro Tanaka,
Kohei Yatabe,
Masahiro Yasuda,
Yasuhiro Oikawa
Abstract:
In this paper, we propose an audio declip** method that takes advantages of both sparse optimization and deep learning. Since sparsity-based audio declip** methods have been developed upon constrained optimization, they are adjustable and well-studied in theory. However, they always uniformly promote sparsity and ignore the individual properties of a signal. Deep neural network (DNN)-based met…
▽ More
In this paper, we propose an audio declip** method that takes advantages of both sparse optimization and deep learning. Since sparsity-based audio declip** methods have been developed upon constrained optimization, they are adjustable and well-studied in theory. However, they always uniformly promote sparsity and ignore the individual properties of a signal. Deep neural network (DNN)-based methods can learn the properties of target signals and use them for audio declip**. Still, they cannot perform well if the training data have mismatches and/or constraints in the time domain are not imposed. In the proposed method, we use a DNN in an optimization algorithm. It is inspired by an idea called plug-and-play (PnP) and enables us to promote sparsity based on the learned information of data, considering constraints in the time domain. Our experiments confirmed that the proposed method is stable and robust to mismatches between training and test data.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Spatial Monte Carlo Integration with Annealed Importance Sampling
Authors:
Muneki Yasuda,
Kaiji Sekimoto
Abstract:
Evaluating expectations on an Ising model (or Boltzmann machine) is essential for various applications, including statistical machine learning. However, in general, the evaluation is computationally difficult because it involves intractable multiple summations or integrations; therefore, it requires approximation. Monte Carlo integration (MCI) is a well-known approximation method; a more effective…
▽ More
Evaluating expectations on an Ising model (or Boltzmann machine) is essential for various applications, including statistical machine learning. However, in general, the evaluation is computationally difficult because it involves intractable multiple summations or integrations; therefore, it requires approximation. Monte Carlo integration (MCI) is a well-known approximation method; a more effective MCI-like approximation method was proposed recently, called spatial Monte Carlo integration (SMCI). However, the estimations obtained using SMCI (and MCI) exhibit a low accuracy in Ising models under a low temperature owing to degradation of the sampling quality. Annealed importance sampling (AIS) is a type of importance sampling based on Markov chain Monte Carlo methods that can suppress performance degradation in low-temperature regions with the force of importance weights. In this study, a new method is proposed to evaluate the expectations on Ising models combining AIS and SMCI. The proposed method performs efficiently in both high- and low-temperature regions, which is demonstrated theoretically and numerically.
△ Less
Submitted 12 April, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Authors:
Yuma Koizumi,
Yasunori Ohishi,
Daisuke Niizumi,
Daiki Takeuchi,
Masahiro Yasuda
Abstract:
The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted in…
▽ More
The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted into such a language model, we utilize guidance captions retrieved from a training dataset based on similarities that may exist in different audio. Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions. Experimental results show that (i) the proposed method has succeeded to use a pre-trained language model for audio captioning, and (ii) the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
A Generalization of Spatial Monte Carlo Integration
Authors:
Muneki Yasuda,
Kei Uchizawa
Abstract:
Spatial Monte Carlo integration (SMCI) is an extension of standard Monte Carlo integration and can approximate expectations on Markov random fields with high accuracy. SMCI was applied to pairwise Boltzmann machine (PBM) learning, with superior results to those from some existing methods. The approximation level of SMCI can be changed, and it was proved that a higher-order approximation of SMCI is…
▽ More
Spatial Monte Carlo integration (SMCI) is an extension of standard Monte Carlo integration and can approximate expectations on Markov random fields with high accuracy. SMCI was applied to pairwise Boltzmann machine (PBM) learning, with superior results to those from some existing methods. The approximation level of SMCI can be changed, and it was proved that a higher-order approximation of SMCI is statistically more accurate than a lower-order approximation. However, SMCI as proposed in the previous studies suffers from a limitation that prevents the application of a higher-order method to dense systems.
This study makes two different contributions as follows. A generalization of SMCI (called generalized SMCI (GSMCI)) is proposed, which allows relaxation of the above-mentioned limitation; moreover, a statistical accuracy bound of GSMCI is proved. This is the first contribution of this study. A new PBM learning method based on SMCI is proposed, which is obtained by combining SMCI and the persistent contrastive divergence. The proposed learning method greatly improves the accuracy of learning. This is the second contribution of this study.
△ Less
Submitted 16 September, 2020; v1 submitted 4 September, 2020;
originally announced September 2020.
-
A Transformer-based Audio Captioning Model with Keyword Estimation
Authors:
Yuma Koizumi,
Ryo Masumura,
Kyosuke Nishida,
Masahiro Yasuda,
Shoichiro Saito
Abstract:
One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation calle…
▽ More
One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation called TRACKE. It simultaneously solves the word-selection indeterminacy problem with the main task of AAC while executing the sub-task of acoustic event detection/acoustic scene classification (i.e., keyword estimation). TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy. Experimental results on a public AAC dataset indicate that TRACKE achieved state-of-the-art performance and successfully estimated both the caption and its keywords.
△ Less
Submitted 8 August, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
Authors:
Yuma Koizumi,
Yohei Kawaguchi,
Keisuke Imoto,
Toshiki Nakamura,
Yuki Nikaido,
Ryo Tanabe,
Harsh Purohit,
Kaori Suefusa,
Takashi Endo,
Masahiro Yasuda,
Noboru Harada
Abstract:
In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condi…
▽ More
In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data. We have designed this challenge as the first benchmark of ASD research, which includes a large-scale dataset, evaluation metrics, and a simple baseline system. We received 117 submissions from 40 teams, and several novel approaches have been developed as a result of this challenge. On the basis of the analysis of the evaluation results, we discuss two new approaches and their problems.
△ Less
Submitted 8 August, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation
Authors:
Masahiro Yasuda,
Yuma Koizumi,
Shoichiro Saito,
Hisashi Uematsu,
Keisuke Imoto
Abstract:
We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlap** sources because they cannot incorporate physical knowledge. Meanwhile, although…
▽ More
We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlap** sources because they cannot incorporate physical knowledge. Meanwhile, although the accuracy of physics-based approaches is inferior to DNN-based approaches, it is robust for overlap** source. In this study, we consider a combination of physics-based and DNN-based approaches; the sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation. This method enables the accurate DOA estimation for both single and overlap** sources using a spherical microphone array. Experimental results show that the proposed method achieves state-of-the-art DOA estimation accuracy on an open dataset of the SELD.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels
Authors:
Keisuke Imoto,
Noriyuki Tonami,
Yuma Koizumi,
Masahiro Yasuda,
Ryosuke Yamanishi,
Yoichi Yamashita
Abstract:
Sound event detection (SED) and acoustic scene classification (ASC) are major tasks in environmental sound analysis. Considering that sound events and scenes are closely related to each other, some works have addressed joint analyses of sound events and acoustic scenes based on multitask learning (MTL), in which the knowledge of sound events and scenes can help in estimating them mutually. The con…
▽ More
Sound event detection (SED) and acoustic scene classification (ASC) are major tasks in environmental sound analysis. Considering that sound events and scenes are closely related to each other, some works have addressed joint analyses of sound events and acoustic scenes based on multitask learning (MTL), in which the knowledge of sound events and scenes can help in estimating them mutually. The conventional MTL-based methods utilize one-hot scene labels to train the relationship between sound events and scenes; thus, the conventional methods cannot model the extent to which sound events and scenes are related. However, in the real environment, common sound events may occur in some acoustic scenes; on the other hand, some sound events occur only in a limited acoustic scene. In this paper, we thus propose a new method for SED based on MTL of SED and ASC using the soft labels of acoustic scenes, which enable us to model the extent to which sound events and scenes are related. Experiments conducted using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets show that the proposed method improves the SED performance by 3.80% in F-score compared with conventional MTL-based SED.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Consistent Batch Normalization for Weighted Loss in Imbalanced-Data Environment
Authors:
Muneki Yasuda,
Yeo Xian En,
Seishirou Ueno
Abstract:
In this study, classification problems based on feedforward neural networks in a data-imbalanced environment are considered. Learning from an imbalanced dataset is one of the most important practical problems in the field of machine learning. A weighted loss function (WLF) based on a cost-sensitive approach is a well-known and effective method for imbalanced datasets. A combination of WLF and batc…
▽ More
In this study, classification problems based on feedforward neural networks in a data-imbalanced environment are considered. Learning from an imbalanced dataset is one of the most important practical problems in the field of machine learning. A weighted loss function (WLF) based on a cost-sensitive approach is a well-known and effective method for imbalanced datasets. A combination of WLF and batch normalization (BN) is considered in this study. BN is considered as a powerful standard technique in the recent developments in deep learning. A simple combination of both methods leads to a size-inconsistency problem due to a mismatch between the interpretations of the effective size of the dataset in both methods. A simple modification to BN, called weighted BN (WBN), is proposed to correct the size mismatch. The idea of WBN is simple and natural. The proposed method in a data-imbalanced environment is validated using numerical experiments.
△ Less
Submitted 12 May, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.
-
Improvement of Batch Normalization in Imbalanced Data
Authors:
Muneki Yasuda,
Seishirou Ueno
Abstract:
In this study, we consider classification problems based on neural networks in data-imbalanced environment. Learning from an imbalanced data set is one of the most important and practical problems in the field of machine learning. A weighted loss function based on cost-sensitive approach is a well-known effective method for imbalanced data sets. We consider a combination of weighted loss function…
▽ More
In this study, we consider classification problems based on neural networks in data-imbalanced environment. Learning from an imbalanced data set is one of the most important and practical problems in the field of machine learning. A weighted loss function based on cost-sensitive approach is a well-known effective method for imbalanced data sets. We consider a combination of weighted loss function and batch normalization (BN) in this study. BN is a powerful standard technique in the recent developments in deep learning. A simple combination of both methods leads to a size-mismatch problem due to a mismatch between interpretations of effective size of data set in both methods. We propose a simple modification to BN to correct the size-mismatch and demonstrate that this modified BN is effective in data-imbalanced environment.
△ Less
Submitted 24 November, 2019;
originally announced November 2019.
-
DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector
Authors:
Masahiro Yasuda,
Yuma Koizumi,
Luca Mazzon,
Shoichiro Saito,
Hisashi Uematsu
Abstract:
We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation. Since the accuracy of IV-based DOA estimation degrades due to environmental noise and reverberation, two DNNs are used to remove such effects from the observed IVs. DOA is then estimated from the refined IVs based on the physics of wa…
▽ More
We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation. Since the accuracy of IV-based DOA estimation degrades due to environmental noise and reverberation, two DNNs are used to remove such effects from the observed IVs. DOA is then estimated from the refined IVs based on the physics of wave propagation. Experiments on an open dataset showed that the average DOA error of the proposed method was 0.528 degrees, and it outperformed a conventional IV-based and DNN-based DOA estimation method.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation
Authors:
Luca Mazzon,
Yuma Koizumi,
Masahiro Yasuda,
Noboru Harada
Abstract:
In this paper, we propose a novel data augmentation method for training neural networks for Direction of Arrival (DOA) estimation. This method focuses on expanding the representation of the DOA subspace of a dataset. Given some input data, it applies a transformation to it in order to change its DOA information and simulate new potentially unseen one. Such transformation, in general, is a combinat…
▽ More
In this paper, we propose a novel data augmentation method for training neural networks for Direction of Arrival (DOA) estimation. This method focuses on expanding the representation of the DOA subspace of a dataset. Given some input data, it applies a transformation to it in order to change its DOA information and simulate new potentially unseen one. Such transformation, in general, is a combination of a rotation and a reflection. It is possible to apply such transformation due to a well-known property of First Order Ambisonics (FOA). The same transformation is applied also to the labels, in order to maintain consistency between input data and target labels. Three methods with different level of generality are proposed for applying this augmentation principle. Experiments are conducted on two different DOA networks. Results of both experiments demonstrate the effectiveness of the novel augmentation strategy by improving the DOA error by around 40%.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Empirical Bayes Method for Boltzmann Machines
Authors:
Muneki Yasuda,
Tomoyuki Obuchi
Abstract:
In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empiric…
▽ More
In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empirical Bayes likelihood function involves intractable integrations of the partition function. The proposed algorithm avoids this computational problem by using the replica method and the Plefka expansion. Our method does not require any iterative procedures and is quite simple and fast, though it introduces a bias to the estimate, which exhibits an unnatural behavior with respect to the size of the dataset. This peculiar behavior is supposed to be due to the approximate treatment by the Plefka expansion. A possible extension to overcome this behavior is also discussed.
△ Less
Submitted 7 September, 2019; v1 submitted 13 June, 2019;
originally announced June 2019.
-
Restricted Boltzmann Machine with Multivalued Hidden Variables: a model suppressing over-fitting
Authors:
Yuuki Yokoyama,
Tomu Katsumata,
Muneki Yasuda
Abstract:
Generalization is one of the most important issues in machine learning problems. In this study, we consider generalization in restricted Boltzmann machines (RBMs). We propose an RBM with multivalued hidden variables, which is a simple extension of conventional RBMs. We demonstrate that the proposed model is better than the conventional model via numerical experiments for contrastive divergence lea…
▽ More
Generalization is one of the most important issues in machine learning problems. In this study, we consider generalization in restricted Boltzmann machines (RBMs). We propose an RBM with multivalued hidden variables, which is a simple extension of conventional RBMs. We demonstrate that the proposed model is better than the conventional model via numerical experiments for contrastive divergence learning with artificial data and a classification problem with MNIST.
△ Less
Submitted 8 January, 2020; v1 submitted 29 November, 2018;
originally announced November 2018.
-
Momentum-Space Renormalization Group Transformation in Bayesian Image Modeling by Gaussian Graphical Model
Authors:
Kazuyuki Tanaka,
Masamichi Nakamura,
Shun Kataoka,
Masayuki Ohzeki,
Muneki Yasuda
Abstract:
A new Bayesian modeling method is proposed by combining the maximization of the marginal likelihood with a momentum-space renormalization group transformation for Gaussian graphical models. Moreover, we present a scheme for computint the statistical averages of hyperparameters and mean square errors in our proposed method based on a momentumspace renormalization transformation.
A new Bayesian modeling method is proposed by combining the maximization of the marginal likelihood with a momentum-space renormalization group transformation for Gaussian graphical models. Moreover, we present a scheme for computint the statistical averages of hyperparameters and mean square errors in our proposed method based on a momentumspace renormalization transformation.
△ Less
Submitted 19 March, 2018;
originally announced April 2018.
-
Linear-Time Algorithm in Bayesian Image Denoising based on Gaussian Markov Random Field
Authors:
Muneki Yasuda,
Junpei Watanabe,
Shun Kataoka,
kazuyuki Tanaka
Abstract:
In this paper, we consider Bayesian image denoising based on a Gaussian Markov random field (GMRF) model, for which we propose an new algorithm. Our method can solve Bayesian image denoising problems, including hyperparameter estimation, in $O(n)$-time, where $n$ is the number of pixels in a given image. From the perspective of the order of the computational time, this is a state-of-the-art algori…
▽ More
In this paper, we consider Bayesian image denoising based on a Gaussian Markov random field (GMRF) model, for which we propose an new algorithm. Our method can solve Bayesian image denoising problems, including hyperparameter estimation, in $O(n)$-time, where $n$ is the number of pixels in a given image. From the perspective of the order of the computational time, this is a state-of-the-art algorithm for the present problem setting. Moreover, the results of our numerical experiments we show our method is in fact effective in practice.
△ Less
Submitted 3 March, 2020; v1 submitted 19 October, 2017;
originally announced October 2017.
-
Solving Non-parametric Inverse Problem in Continuous Markov Random Field using Loopy Belief Propagation
Authors:
Muneki Yasuda,
Shun Kataoka
Abstract:
In this paper, we address the inverse problem, or the statistical machine learning problem, in Markov random fields with a non-parametric pair-wise energy function with continuous variables. The inverse problem is formulated by maximum likelihood estimation. The exact treatment of maximum likelihood estimation is intractable because of two problems: (1) it includes the evaluation of the partition…
▽ More
In this paper, we address the inverse problem, or the statistical machine learning problem, in Markov random fields with a non-parametric pair-wise energy function with continuous variables. The inverse problem is formulated by maximum likelihood estimation. The exact treatment of maximum likelihood estimation is intractable because of two problems: (1) it includes the evaluation of the partition function and (2) it is formulated in the form of functional optimization. We avoid Problem (1) by using Bethe approximation. Bethe approximation is an approximation technique equivalent to the loopy belief propagation. Problem (2) can be solved by using orthonormal function expansion. Orthonormal function expansion can reduce a functional optimization problem to a function optimization problem. Our method can provide an analytic form of the solution of the inverse problem within the framework of Bethe approximation.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering
Authors:
Shun Kataoka,
Takuto Kobayashi,
Muneki Yasuda,
Kazuyuki Tanaka
Abstract:
We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure fr…
▽ More
We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.
△ Less
Submitted 21 July, 2016;
originally announced August 2016.
-
Statistical Analysis of Loopy Belief Propagation in Random Fields
Authors:
Muneki Yasuda,
Shun Kataoka,
Kazuyuki Tanaka
Abstract:
Loopy belief propagation (LBP), which is equivalent to the Bethe approximation in statistical mechanics, is a message-passing-type inference method that is widely used to analyze systems based on Markov random fields (MRFs). In this paper, we propose a message-passing-type method to analytically evaluate the quenched average of LBP in random fields by using the replica cluster variation method. Th…
▽ More
Loopy belief propagation (LBP), which is equivalent to the Bethe approximation in statistical mechanics, is a message-passing-type inference method that is widely used to analyze systems based on Markov random fields (MRFs). In this paper, we propose a message-passing-type method to analytically evaluate the quenched average of LBP in random fields by using the replica cluster variation method. The proposed analytical method is applicable to general pair-wise MRFs with random fields whose distributions differ from each other and can give the quenched averages of the Bethe free energies over random fields, which are consistent with numerical results. The order of its computational cost is equivalent to that of standard LBP. In the latter part of this paper, we describe the application of the proposed method to Bayesian image restoration, in which we observed that our theoretical results are in good agreement with the numerical results for natural images.
△ Less
Submitted 13 September, 2015; v1 submitted 16 March, 2015;
originally announced March 2015.
-
Inverse Renormalization Group Transformation in Bayesian Image Segmentations
Authors:
Kazuyuki Tanaka,
Shun Kataoka,
Muneki Yasuda,
Masayuki Ohzeki
Abstract:
A new Bayesian image segmentation algorithm is proposed by combining a loopy belief propagation with an inverse real space renormalization group transformation to reduce the computational time. In results of our experiment, we observe that the proposed method can reduce the computational time to less than one-tenth of that taken by conventional Bayesian approaches.
A new Bayesian image segmentation algorithm is proposed by combining a loopy belief propagation with an inverse real space renormalization group transformation to reduce the computational time. In results of our experiment, we observe that the proposed method can reduce the computational time to less than one-tenth of that taken by conventional Bayesian approaches.
△ Less
Submitted 5 January, 2015;
originally announced January 2015.
-
Boltzmann-Machine Learning of Prior Distributions of Binarized Natural Images
Authors:
Tomoyuki Obuchi,
Hirokazu Koma,
Muneki Yasuda
Abstract:
Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process.…
▽ More
Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process. Meanwhile, in a longer spatial scale, a longer-range, although still rapidly decaying, ferromagnetic interaction commonly appears in all cases. The characteristic length scale of the interactions is universally up to approximately four lattice spacings $ξ\approx 4$. These results are derived by using the mean-field method, which effectively reduces the computational time required in a Boltzmann machine. An improved mean-field method called the Bethe approximation also gives the same results, as well as the Monte Carlo method does for small size images. These reinforce the validity of our analysis and findings. Relations to criticality, frustration, and simple-cell receptive fields are also discussed.
△ Less
Submitted 23 October, 2016; v1 submitted 15 December, 2014;
originally announced December 2014.
-
Composite Likelihood Estimation for Restricted Boltzmann machines
Authors:
Muneki Yasuda,
Shun Kataoka,
Yuji Waizumi,
Kazuyuki Tanaka
Abstract:
Learning the parameters of graphical models using the maximum likelihood estimation is generally hard which requires an approximation. Maximum composite likelihood estimations are statistical approximations of the maximum likelihood estimation which are higher-order generalizations of the maximum pseudo-likelihood estimation. In this paper, we propose a composite likelihood method and investigate…
▽ More
Learning the parameters of graphical models using the maximum likelihood estimation is generally hard which requires an approximation. Maximum composite likelihood estimations are statistical approximations of the maximum likelihood estimation which are higher-order generalizations of the maximum pseudo-likelihood estimation. In this paper, we propose a composite likelihood method and investigate its property. Furthermore, we apply our composite likelihood method to restricted Boltzmann machines.
△ Less
Submitted 24 June, 2014;
originally announced June 2014.
-
Bayesian image segmentations by Potts prior and loopy belief propagation
Authors:
Kazuyuki Tanaka,
Shun Kataoka,
Muneki Yasuda,
Yuji Waizumi,
Chiou-Ting Hsu
Abstract:
This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory.…
▽ More
This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory. In order to determine these hyperparameters, we propose a new scheme for hyperparameter estimation based on conditional maximization of entropy in the Potts prior. The algorithm is given based on loopy belief propagation. In addition, we compare our conditional maximum entropy framework with the conventional maximum likelihood framework, and also clarify how the first order phase transitions in LBP's for Potts models influence our hyperparameter estimation procedures.
△ Less
Submitted 18 August, 2014; v1 submitted 11 April, 2014;
originally announced April 2014.
-
Traffic data reconstruction based on Markov random field modeling
Authors:
Shun Kataoka,
Muneki Yasuda,
Cyril Furtlehner,
Kazuyuki Tanaka
Abstract:
We consider the traffic data reconstruction problem. Suppose we have the traffic data of an entire city that are incomplete because some road data are unobserved. The problem is to reconstruct the unobserved parts of the data. In this paper, we propose a new method to reconstruct incomplete traffic data collected from various traffic sensors. Our approach is based on Markov random field modeling o…
▽ More
We consider the traffic data reconstruction problem. Suppose we have the traffic data of an entire city that are incomplete because some road data are unobserved. The problem is to reconstruct the unobserved parts of the data. In this paper, we propose a new method to reconstruct incomplete traffic data collected from various traffic sensors. Our approach is based on Markov random field modeling of road traffic. The reconstruction is achieved by using mean-field method and a machine learning method. We numerically verify the performance of our method using realistic simulated traffic data for the real road network of Sendai, Japan.
△ Less
Submitted 27 June, 2013;
originally announced June 2013.
-
Belief Propagation Algorithm for Portfolio Optimization Problems
Authors:
Takashi Shinzato,
Muneki Yasuda
Abstract:
The typical behavior of optimal solutions to portfolio optimization problems with absolute deviation and expected shortfall models using replica analysis was pioneeringly estimated by S. Ciliberti and M. Mézard [Eur. Phys. B. 57, 175 (2007)]; however, they have not yet developed an approximate derivation method for finding the optimal portfolio with respect to a given return set. In this study, an…
▽ More
The typical behavior of optimal solutions to portfolio optimization problems with absolute deviation and expected shortfall models using replica analysis was pioneeringly estimated by S. Ciliberti and M. Mézard [Eur. Phys. B. 57, 175 (2007)]; however, they have not yet developed an approximate derivation method for finding the optimal portfolio with respect to a given return set. In this study, an approximation algorithm based on belief propagation for the portfolio optimization problem is presented using the Bethe free energy formalism, and the consistency of the numerical experimental results of the proposed algorithm with those of replica analysis is confirmed. Furthermore, the conjecture of H. Konno and H. Yamazaki, that the optimal solutions with the absolute deviation model and with the mean-variance model have the same typical behavior, is verified using replica analysis and the belief propagation algorithm.
△ Less
Submitted 9 September, 2010; v1 submitted 23 August, 2010;
originally announced August 2010.