Skip to main content

Showing 1–12 of 12 results for author: Shinoda, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05312  [pdf, other

    eess.IV

    Deep convolutional demosaicking network for multispectral polarization filter array

    Authors: Tomoharu Ishiuchi, Kazuma Shinoda

    Abstract: To address the demosaicking problem in multispectral polarization filter array (MSPFA) imaging, we propose a multispectral polarization demosaicking network (MSPDNet) that improves image reconstruction accuracy. Imaging with a multispectral polarization filter array acquires multispectral polarization information in a snapshot. The full-resolution multispectral polarization image must be reconstru… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2111.10202  [pdf, other

    eess.AS cs.CL cs.SD

    Multimodal Emotion Recognition with High-level Speech and Text Features

    Authors: Mariana Rodrigues Makiuchi, Kuniaki Uto, Koichi Shinoda

    Abstract: Automatic emotion recognition is one of the central concerns of the Human-Computer Interaction field as it can bridge the gap between humans and machines. Current works train deep learning models on low-level data representations to solve the emotion recognition task. Since emotion datasets often have a limited amount of data, these approaches may suffer from overfitting, and they may learn based… ▽ More

    Submitted 29 September, 2021; originally announced November 2021.

    Comments: Accepted at ASRU 2021. Code available at https://github.com/mmakiuchi/multimodal_emotion_recognition

  3. arXiv:2004.07992  [pdf, other

    eess.AS cs.LG cs.SD q-bio.QM

    Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

    Authors: Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro Kishimoto

    Abstract: We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data. We extract paralinguistic features for a short speech segment and use Gated Convolutional Neural Networks (GCNN) to classify it into dementia or healthy. We evaluate our method on the Pitt Corpus and on our own dataset, the PROMPT Database. Our method yields the accuracy of 7… ▽ More

    Submitted 6 October, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

  4. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, **g Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  5. arXiv:1808.09106  [pdf

    eess.IV

    Snapshot multispectral imaging using a filter array

    Authors: Kazuma Shinoda

    Abstract: A multispectral filter array (MSFA) is one solution for capturing a multispectral image (MSI) in a single shot at low cost. We introduce our optimization method of the spectral sensitivity of the MSFAs and demosaicking, and show a new prototype filter array for snapshot imaging based on a photonic crystal.

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: This paper has been submitted to International Workshop on Image Sensors and Imaging Systems (IWISS2018) (Invited talk)

    Journal ref: International Workshop on Image Sensors and Imaging Systems (IWISS2018)

  6. arXiv:1808.08021  [pdf, other

    eess.IV

    Deep demosaicking for multispectral filter arrays

    Authors: Kazuma Shinoda, Shoichiro Yoshiba, Madoka Hasegawa

    Abstract: We propose a novel demosaicking method for multispectral filter arrays based on a deep convolutional neural network. The proposed method first interpolates mosaicked multispectral images utilizing a bilinear approach, then applies a residual network to initial demosaicked images. The residual network consists of various three-dimensional convolutional layers and a rectified linear unit for describ… ▽ More

    Submitted 21 October, 2018; v1 submitted 24 August, 2018; originally announced August 2018.

  7. arXiv:1807.01386  [pdf, other

    eess.IV

    Optimal Spectral Sensitivity of Multispectral Filter Array for Pathological Images

    Authors: Kazuma Shinoda, Maru Kawase, Madoka Hasegawa, Masahiro Ishikawa, Hideki Komagata, Naoki Kobayashi

    Abstract: A capturing system with multispectral filter array (MSFA) technology has been researched to shorten the capturing time and reduce the cost. In this system, the mosaicked image captured by the MSFA is demosaicked to reconstruct multispectral images (MSIs). We focus on the spectral sensitivity design of a MSFA in this paper and propose a pathology-specific MSFA. The proposed method optimizes the MSF… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

    Journal ref: Image Electronics and Visual Computing Workshop (IEVC), 1P-10, Mar. 2017

  8. arXiv:1807.01385  [pdf, other

    eess.IV

    Joint optimization of multispectral filter arrays and demosaicking for pathological images

    Authors: Kazuma Shinoda, Maru Kawase, Madoka Hasegawa, Masahiro Ishikawa, Hideki Komagata, Naoki Kobayashi

    Abstract: A capturing system with multispectral filter array (MSFA) technology is proposed for shortening the capture time and reducing costs. Therein, a mosaicked image captured using an MSFA is demosaicked to reconstruct multispectral images (MSIs). Joint optimization of the spectral sensitivity of the MSFAs and demosaicking is considered, and pathology-specific multispectral imaging is proposed. This opt… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

    Journal ref: IIEEJ Transactions on Image Electronics and Visual Computing, Vol. 6, No. 1, pp. 13-21, Jun. 2018

  9. arXiv:1804.00290  [pdf, other

    eess.AS cs.LG cs.SD

    I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

    Authors: Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda

    Abstract: I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-… ▽ More

    Submitted 1 April, 2018; originally announced April 2018.

  10. arXiv:1803.11344  [pdf, other

    eess.AS cs.SD

    Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data

    Authors: Tifani Warnita, Nakamasa Inoue, Koichi Shinoda

    Abstract: We propose an automatic detection method of Alzheimer's diseases using a gated convolutional neural network (GCNN) from speech data. This GCNN can be trained with a relatively small amount of data and can capture the temporal information in audio paralinguistic features. Since it does not utilize any linguistic features, it can be easily applied to any languages. We evaluated our method using Pitt… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

    Comments: 5 pages, 3 figures, submitted to INTERSPEECH 2018

  11. Attentive Statistics Pooling for Deep Speaker Embedding

    Authors: Koji Okabe, Takafumi Koshinaka, Koichi Shinoda

    Abstract: This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also… ▽ More

    Submitted 24 February, 2019; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: Proc. Interspeech 2018, pp2252--2256. arXiv admin note: text overlap with arXiv:1809.09311

  12. arXiv:1801.03577  [pdf, ps, other

    eess.IV

    Mosaicked multispectral image compression based on inter- and intra-band correlation

    Authors: Kazuma Shinoda, Madoka Hasegawa, Masahiro Yamaguchi, Antonio Ortega

    Abstract: Multispectral imaging has been utilized in many fields, but the cost of capturing and storing image data is still high. Single-sensor cameras with multispectral filter arrays can reduce the cost of capturing images at the expense of slightly lower image quality. When multispectral filter arrays are used, conventional multispectral image compression methods can be applied after interpolation, but t… ▽ More

    Submitted 10 January, 2018; originally announced January 2018.