-
Deep convolutional demosaicking network for multispectral polarization filter array
Authors:
Tomoharu Ishiuchi,
Kazuma Shinoda
Abstract:
To address the demosaicking problem in multispectral polarization filter array (MSPFA) imaging, we propose a multispectral polarization demosaicking network (MSPDNet) that improves image reconstruction accuracy. Imaging with a multispectral polarization filter array acquires multispectral polarization information in a snapshot. The full-resolution multispectral polarization image must be reconstru…
▽ More
To address the demosaicking problem in multispectral polarization filter array (MSPFA) imaging, we propose a multispectral polarization demosaicking network (MSPDNet) that improves image reconstruction accuracy. Imaging with a multispectral polarization filter array acquires multispectral polarization information in a snapshot. The full-resolution multispectral polarization image must be reconstructed from a mosaic image. In the proposed method, a sparse image in which pixel values of the same channel are extracted from a mosaic image is used as input to MSPDNet. Missing pixels are interpolated by learning spatial and wavelength correlations from the observed pixels in the mosaic image. Moreover, by using 3D convolution, features are extracted at each convolution layer, and by deepening the network, even detailed features of the multispectral polarization image can be learned. Experimental results show that MSPDNet can reconstruct multi-wavelength and multi-polarization angle information with high accuracy in terms of peak signal-to-noise ratio (PSNR) evaluation and visual quality, indicating the effectiveness of the proposed method compared to other methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Multimodal Emotion Recognition with High-level Speech and Text Features
Authors:
Mariana Rodrigues Makiuchi,
Kuniaki Uto,
Koichi Shinoda
Abstract:
Automatic emotion recognition is one of the central concerns of the Human-Computer Interaction field as it can bridge the gap between humans and machines. Current works train deep learning models on low-level data representations to solve the emotion recognition task. Since emotion datasets often have a limited amount of data, these approaches may suffer from overfitting, and they may learn based…
▽ More
Automatic emotion recognition is one of the central concerns of the Human-Computer Interaction field as it can bridge the gap between humans and machines. Current works train deep learning models on low-level data representations to solve the emotion recognition task. Since emotion datasets often have a limited amount of data, these approaches may suffer from overfitting, and they may learn based on superficial cues. To address these issues, we propose a novel cross-representation speech model, inspired by disentanglement representation learning, to perform emotion recognition on wav2vec 2.0 speech features. We also train a CNN-based model to recognize emotions from text features extracted with Transformer-based models. We further combine the speech-based and text-based results with a score fusion approach. Our method is evaluated on the IEMOCAP dataset in a 4-class classification problem, and it surpasses current works on speech-only, text-only, and multimodal emotion recognition.
△ Less
Submitted 29 September, 2021;
originally announced November 2021.
-
Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network
Authors:
Mariana Rodrigues Makiuchi,
Tifani Warnita,
Nakamasa Inoue,
Koichi Shinoda,
Michitaka Yoshimura,
Momoko Kitazawa,
Kei Funaki,
Yoko Eguchi,
Taishiro Kishimoto
Abstract:
We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data. We extract paralinguistic features for a short speech segment and use Gated Convolutional Neural Networks (GCNN) to classify it into dementia or healthy. We evaluate our method on the Pitt Corpus and on our own dataset, the PROMPT Database. Our method yields the accuracy of 7…
▽ More
We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data. We extract paralinguistic features for a short speech segment and use Gated Convolutional Neural Networks (GCNN) to classify it into dementia or healthy. We evaluate our method on the Pitt Corpus and on our own dataset, the PROMPT Database. Our method yields the accuracy of 73.1% on the Pitt Corpus using an average of 114 seconds of speech data. In the PROMPT Database, our method yields the accuracy of 74.7% using 4 seconds of speech data and it improves to 80.8% when we use all the patient's speech data. Furthermore, we evaluate our method on a three-class classification problem in which we included the Mild Cognitive Impairment (MCI) class and achieved the accuracy of 60.6% with 40 seconds of speech data.
△ Less
Submitted 6 October, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.
-
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Authors:
Kong Aik Lee,
Ville Hautamaki,
Tomi Kinnunen,
Hitoshi Yamamoto,
Koji Okabe,
Ville Vestman,
**g Huang,
Guohong Ding,
Hanwu Sun,
Anthony Larcher,
Rohan Kumar Das,
Haizhou Li,
Mickael Rouvier,
Pierre-Michel Bousquet,
Wei Rao,
Qing Wang,
Chunlei Zhang,
Fahimeh Bahmaninezhad,
Hector Delgado,
Jose Patino,
Qiongqiong Wang,
Ling Guo,
Takafumi Koshinaka,
Jiacen Zhang,
Koichi Shinoda
, et al. (21 additional authors not shown)
Abstract:
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res…
▽ More
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Snapshot multispectral imaging using a filter array
Authors:
Kazuma Shinoda
Abstract:
A multispectral filter array (MSFA) is one solution for capturing a multispectral image (MSI) in a single shot at low cost. We introduce our optimization method of the spectral sensitivity of the MSFAs and demosaicking, and show a new prototype filter array for snapshot imaging based on a photonic crystal.
A multispectral filter array (MSFA) is one solution for capturing a multispectral image (MSI) in a single shot at low cost. We introduce our optimization method of the spectral sensitivity of the MSFAs and demosaicking, and show a new prototype filter array for snapshot imaging based on a photonic crystal.
△ Less
Submitted 28 August, 2018;
originally announced August 2018.
-
Deep demosaicking for multispectral filter arrays
Authors:
Kazuma Shinoda,
Shoichiro Yoshiba,
Madoka Hasegawa
Abstract:
We propose a novel demosaicking method for multispectral filter arrays based on a deep convolutional neural network. The proposed method first interpolates mosaicked multispectral images utilizing a bilinear approach, then applies a residual network to initial demosaicked images. The residual network consists of various three-dimensional convolutional layers and a rectified linear unit for describ…
▽ More
We propose a novel demosaicking method for multispectral filter arrays based on a deep convolutional neural network. The proposed method first interpolates mosaicked multispectral images utilizing a bilinear approach, then applies a residual network to initial demosaicked images. The residual network consists of various three-dimensional convolutional layers and a rectified linear unit for describing the features of a multispectral data cube. Experimental results reveal that the proposed method outperforms conventional demosaicking methods.
△ Less
Submitted 21 October, 2018; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Optimal Spectral Sensitivity of Multispectral Filter Array for Pathological Images
Authors:
Kazuma Shinoda,
Maru Kawase,
Madoka Hasegawa,
Masahiro Ishikawa,
Hideki Komagata,
Naoki Kobayashi
Abstract:
A capturing system with multispectral filter array (MSFA) technology has been researched to shorten the capturing time and reduce the cost. In this system, the mosaicked image captured by the MSFA is demosaicked to reconstruct multispectral images (MSIs). We focus on the spectral sensitivity design of a MSFA in this paper and propose a pathology-specific MSFA. The proposed method optimizes the MSF…
▽ More
A capturing system with multispectral filter array (MSFA) technology has been researched to shorten the capturing time and reduce the cost. In this system, the mosaicked image captured by the MSFA is demosaicked to reconstruct multispectral images (MSIs). We focus on the spectral sensitivity design of a MSFA in this paper and propose a pathology-specific MSFA. The proposed method optimizes the MSFA by minimizing the reconstruction error between training data of a pathological tissue and a demosaicked MSI under a cost function. Firstly, the spectral sensitivities of the filter array are set randomly, and the mosaicked image is obtained from the training data and the filter array. Then, a reconstructed image is obtained by Wiener estimation. The spectral sensitivities of the filter array are optimized iteratively by an interior-point approach to minimize the reconstruction error. We show the effectiveness of the proposed MSFA by comparing the recovered spectrum and RGB image with a conventional method.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Joint optimization of multispectral filter arrays and demosaicking for pathological images
Authors:
Kazuma Shinoda,
Maru Kawase,
Madoka Hasegawa,
Masahiro Ishikawa,
Hideki Komagata,
Naoki Kobayashi
Abstract:
A capturing system with multispectral filter array (MSFA) technology is proposed for shortening the capture time and reducing costs. Therein, a mosaicked image captured using an MSFA is demosaicked to reconstruct multispectral images (MSIs). Joint optimization of the spectral sensitivity of the MSFAs and demosaicking is considered, and pathology-specific multispectral imaging is proposed. This opt…
▽ More
A capturing system with multispectral filter array (MSFA) technology is proposed for shortening the capture time and reducing costs. Therein, a mosaicked image captured using an MSFA is demosaicked to reconstruct multispectral images (MSIs). Joint optimization of the spectral sensitivity of the MSFAs and demosaicking is considered, and pathology-specific multispectral imaging is proposed. This optimizes the MSFA and the demosaicking matrix by minimizing the reconstruction error between the training data of a hematoxylin and eosin-stained pathological tissue and a demosaicked MSI using a cost function. Initially, the spectral sensitivity of the filter array is set randomly and the mosaicked image is obtained from the training data. Subsequently, a reconstructed image is obtained using Wiener estimation. To minimize the reconstruction error, the spectral sensitivity of the filter array and the Wiener estimation matrix are optimized iteratively through an interior-point approach. The effectiveness of the proposed MSFA and demosaicking is demonstrated by comparing the recovered spectrum and RGB image with those obtained using a conventional method.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Authors:
Jiacen Zhang,
Nakamasa Inoue,
Koichi Shinoda
Abstract:
I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-…
▽ More
I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the generator or the one extracted from a long utterance. Additionally, we assign two other learning tasks to the GAN to stabilize its training and to make the generated ivector more speaker-specific. Speaker verification experiments on the NIST SRE 2008 "10sec-10sec" condition show that our method reduced the equal error rate by 11.3% from the conventional i-vector and PLDA system.
△ Less
Submitted 1 April, 2018;
originally announced April 2018.
-
Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data
Authors:
Tifani Warnita,
Nakamasa Inoue,
Koichi Shinoda
Abstract:
We propose an automatic detection method of Alzheimer's diseases using a gated convolutional neural network (GCNN) from speech data. This GCNN can be trained with a relatively small amount of data and can capture the temporal information in audio paralinguistic features. Since it does not utilize any linguistic features, it can be easily applied to any languages. We evaluated our method using Pitt…
▽ More
We propose an automatic detection method of Alzheimer's diseases using a gated convolutional neural network (GCNN) from speech data. This GCNN can be trained with a relatively small amount of data and can capture the temporal information in audio paralinguistic features. Since it does not utilize any linguistic features, it can be easily applied to any languages. We evaluated our method using Pitt Corpus. The proposed method achieved the accuracy of 73.6%, which is better than the conventional sequential minimal optimization (SMO) by 7.6 points.
△ Less
Submitted 30 March, 2018;
originally announced March 2018.
-
Attentive Statistics Pooling for Deep Speaker Embedding
Authors:
Koji Okabe,
Takafumi Koshinaka,
Koichi Shinoda
Abstract:
This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also…
▽ More
This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also weighted standard deviations. In this way, it can capture long-term variations in speaker characteristics more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.
△ Less
Submitted 24 February, 2019; v1 submitted 29 March, 2018;
originally announced March 2018.
-
Mosaicked multispectral image compression based on inter- and intra-band correlation
Authors:
Kazuma Shinoda,
Madoka Hasegawa,
Masahiro Yamaguchi,
Antonio Ortega
Abstract:
Multispectral imaging has been utilized in many fields, but the cost of capturing and storing image data is still high. Single-sensor cameras with multispectral filter arrays can reduce the cost of capturing images at the expense of slightly lower image quality. When multispectral filter arrays are used, conventional multispectral image compression methods can be applied after interpolation, but t…
▽ More
Multispectral imaging has been utilized in many fields, but the cost of capturing and storing image data is still high. Single-sensor cameras with multispectral filter arrays can reduce the cost of capturing images at the expense of slightly lower image quality. When multispectral filter arrays are used, conventional multispectral image compression methods can be applied after interpolation, but the compressed image data after interpolation has some redundancy because the interpolated data are computed from the captured raw data. In this paper, we propose an efficient image compression method for single-sensor multispectral cameras. The proposed method encodes the captured multispectral data before interpolation. We also propose a new spectral transform method for the compression of mosaicked multispectral images. This transform is designed by considering the filter arrangement and the spectral sensitivities of a multispectral filter array. The experimental results show that the proposed method achieves a higher peak signal-to-noise ratio at higher bit rates than a conventional compression method that encodes a multispectral image after interpolation, e.g., 3-dB gain over conventional compression when coding at rates of over 0.1 bit/pixel/bands.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.