Skip to main content

Showing 1–34 of 34 results for author: Horiguchi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18910  [pdf, other

    cs.CL cs.SD eess.AS

    Factor-Conditioned Speaking-Style Captioning

    Authors: Atsushi Ando, Takafumi Moriya, Shota Horiguchi, Ryo Masumura

    Abstract: This paper presents a novel speaking-style captioning method that generates diverse descriptions while accurately predicting speaking-style information. Conventional learning criteria directly use original captions that contain not only speaking-style factor terms but also syntax words, which disturbs learning speaking-style information. To solve this problem, we introduce factor-conditioned capti… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2402.08209  [pdf, other

    cs.LG cs.AI

    Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits

    Authors: Hiroyuki Namba, Shota Horiguchi, Masaki Hamamoto, Masashi Egi

    Abstract: Data cleansing aims to improve model performance by removing a set of harmful instances from the training dataset. Data Shapley is a common theoretically guaranteed method to evaluate the contribution of each instance to model performance; however, it requires training on all subsets of the training data, which is computationally expensive. In this paper, we propose an iterativemethod to fast iden… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  3. arXiv:2309.01013  [pdf, other

    cs.LG

    Streaming Active Learning for Regression Problems Using Regression via Classification

    Authors: Shota Horiguchi, Kota Dohi, Yohei Kawaguchi

    Abstract: One of the challenges in deploying a machine learning model is that the model's performance degrades as the operating environment changes. To maintain the performance, streaming active learning is used, in which the model is retrained by adding a newly annotated sample to the training dataset if the prediction of the sample is not certain enough. Although many streaming active learning methods hav… ▽ More

    Submitted 15 December, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  4. arXiv:2305.17758  [pdf, ps, other

    cs.SD eess.AS

    CAPTDURE: Captioned Sound Dataset of Single Sources

    Authors: Yuki Okamoto, Kanta Shimonishi, Keisuke Imoto, Kota Dohi, Shota Horiguchi, Yohei Kawaguchi

    Abstract: In conventional studies on environmental sound separation and synthesis using captions, datasets consisting of multiple-source sounds with their captions were used for model training. However, when we collect the captions for multiple-source sound, it is not easy to collect detailed captions for each sound source, such as the number of sound occurrences and timbre. Therefore, it is difficult to ex… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH2023

  5. arXiv:2305.15518  [pdf, other

    eess.AS cs.SD

    Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model

    Authors: Aoi Ito, Shota Horiguchi

    Abstract: Large-scale pretrained models using self-supervised learning have reportedly improved the performance of speech anti-spoofing. However, the attacker side may also make use of such models. Also, since it is very expensive to train such models from scratch, pretrained models on the Internet are often used, but the attacker and defender may possibly use the same pretrained model. This paper investiga… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH 2023

  6. arXiv:2211.14455  [pdf, ps, other

    cs.IT math.DG math.ST physics.chem-ph physics.data-an

    Information Geometry of Dynamics on Graphs and Hypergraphs

    Authors: Tetsuya J. Kobayashi, Dimitri Loutchko, Atsushi Kamimura, Shuhei A. Horiguchi, Yuki Sughiyama

    Abstract: We introduce a new information-geometric structure associated with the dynamics on discrete objects such as graphs and hypergraphs. The presented setup consists of two dually flat structures built on the vertex and edge spaces, respectively. The former is the conventional duality between density and potential, e.g., the probability density and its logarithmic form induced by a convex thermodynamic… ▽ More

    Submitted 5 August, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 92 pages, 9 figures

  7. arXiv:2211.00947  [pdf, other

    stat.ML cs.LG

    Linear Embedding-based High-dimensional Batch Bayesian Optimization without Reconstruction Map**s

    Authors: Shuhei A. Horiguchi, Tomoharu Iwata, Taku Tsuzuki, Yosuke Ozawa

    Abstract: The optimization of high-dimensional black-box functions is a challenging problem. When a low-dimensional linear embedding structure can be assumed, existing Bayesian optimization (BO) methods often transform the original problem into optimization in a low-dimensional space. They exploit the low-dimensional structure and reduce the computational burden. However, we reveal that this approach could… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  8. arXiv:2210.03459  [pdf, other

    eess.AS cs.CL cs.SD

    Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

    Authors: Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia

    Abstract: Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  9. arXiv:2206.02432  [pdf, other

    eess.AS cs.CL cs.SD

    Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

    Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi

    Abstract: A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of spea… ▽ More

    Submitted 22 December, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted to IEEE/ACM TASLP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 706-720, 2023

  10. arXiv:2205.12683  [pdf, other

    cs.LG cs.AI stat.ML

    Rethinking Fano's Inequality in Ensemble Learning

    Authors: Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo Nukaga

    Abstract: We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad? Previous studies used a variant of Fano's inequality of information theory and derived a lower bound of the classification error rate on the basis of the $\textit{accuracy}$ and $\textit{diversity}$ of models. We revisit the original Fano's inequality and argue… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: ICML2022

  11. arXiv:2204.11232  [pdf, other

    eess.AS cs.CL cs.SD

    Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

    Authors: Natsuo Yamashita, Shota Horiguchi, Takeshi Homma

    Abstract: This paper investigates a method for simulating natural conversation in the model training of end-to-end neural diarization (EEND). Due to the lack of any annotated real conversational dataset, EEND is usually pretrained on a large-scale simulated conversational dataset first and then adapted to the target real dataset. Simulated datasets play an essential role in the training of EEND, but as yet… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Accepted to Speaker Odyssey 2022

  12. arXiv:2112.00209  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Environmental Sound Extraction Using Onomatopoeic Words

    Authors: Yuki Okamoto, Shota Horiguchi, Masaaki Yamamoto, Keisuke Imoto, Yohei Kawaguchi

    Abstract: An onomatopoeic word, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeic words to specify the target sound to be extracted. By this method, we estimate a time-frequency mask from an input mixture spectrogram and an onomatopoe… ▽ More

    Submitted 16 February, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: Accepted to ICASSP2022

  13. arXiv:2110.04694  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-Channel End-to-End Neural Diarization with Distributed Microphones

    Authors: Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

    Abstract: Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of t… ▽ More

    Submitted 28 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  14. arXiv:2107.01545  [pdf, other

    eess.AS cs.CL cs.SD

    Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

    Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, Yohei Kawaguchi

    Abstract: Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number of speakers is larger than the one observed during training. This is because its speaker counting relies on supervised learning. In this work, we introduce an un… ▽ More

    Submitted 23 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted to ASRU 2021

  15. Encoder-Decoder Based Attractors for End-to-End Neural Diarization

    Authors: Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola Garcia

    Abstract: This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker diarization, EEND methods are better in terms of speaker overlap handling. However, EEND still has a disadvantage in that it cannot deal with a flexible number of speakers. To remedy this problem, we introduce encoder-decoder-based a… ▽ More

    Submitted 28 March, 2022; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted to IEEE/ACM TASLP. This article is based on our previous conference paper arxiv:2005.09921

  16. arXiv:2106.04764  [pdf, other

    eess.AS cs.SD

    Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

    Authors: Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu

    Abstract: In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlap** speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each tim… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted for Interspeech 2021

  17. arXiv:2106.04078  [pdf, other

    eess.AS cs.SD

    End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

    Authors: Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu

    Abstract: In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlap** speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted for SLT 2021

    Journal ref: IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 849-856

  18. arXiv:2102.01363  [pdf, other

    eess.AS cs.CL cs.SD

    The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

    Authors: Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

  19. arXiv:2101.08473  [pdf, other

    cs.SD eess.AS

    Online Streaming End-to-End Neural Diarization Handling Overlap** Speech and Flexible Numbers of Speakers

    Authors: Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

    Abstract: We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlap** speech. In our previous study, the speaker-tracing buffer (STB) mechanism was proposed to achieve a chunk-wise streaming diarization using a pre-trained EEND model. STB traces the speaker information in previous chunks to map the speakers in a… ▽ More

    Submitted 6 April, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

  20. arXiv:2012.10055  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Speaker Diarization as Post-Processing

    Authors: Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

    Abstract: This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlap** speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handl… ▽ More

    Submitted 23 December, 2020; v1 submitted 18 December, 2020; originally announced December 2020.

  21. arXiv:2011.07791  [pdf, other

    eess.AS cs.SD eess.SP

    Block-Online Guided Source Separation

    Authors: Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

    Abstract: We propose a block-online algorithm of guided source separation (GSS). GSS is a speech separation method that uses diarization information to update parameters of the generative model of observation signals. Previous studies have shown that GSS performs well in multi-talker scenarios. However, it requires a large amount of calculation time, which is an obstacle to the deployment of online applicat… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: Accepted to SLT 2021

  22. arXiv:2007.15868  [pdf, other

    eess.AS cs.CL cs.SD

    Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

    Authors: Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

    Abstract: A novel framework for meeting transcription using asynchronous microphones is proposed in this paper. It consists of audio synchronization, speaker diarization, utterance-wise speech enhancement using guided source separation, automatic speech recognition, and duplication reduction. Doing speaker diarization before speech enhancement enables the system to deal with overlapped speech without consid… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: Accepted to INTERSPEECH 2020

  23. arXiv:2006.02616  [pdf, other

    eess.AS cs.SD

    Online End-to-End Neural Diarization with Speaker-Tracing Buffer

    Authors: Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

    Abstract: This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames re… ▽ More

    Submitted 6 March, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: Accepted to SLT 2021

  24. arXiv:2006.01796  [pdf, other

    eess.AS cs.CL cs.SD

    Neural Speaker Diarization with Speaker-Wise Chain Rule

    Authors: Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, **g Shi, Kenji Nagamatsu

    Abstract: Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each spe… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Submitted to Interspeech 2020

  25. arXiv:2005.09921  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

    Authors: Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu

    Abstract: End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional clustering-based speaker diarization, but it has one drawback: it is less flexible in terms of the number of speakers. This paper proposes a method for encoder-decoder based attractor calculation (EDA), which first generates a flexi… ▽ More

    Submitted 5 October, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Accepted to INTERSPEECH 2020

  26. arXiv:2004.09249  [pdf, other

    cs.SD cs.CL eess.AS

    CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

    Authors: Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

    Abstract: Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous C… ▽ More

    Submitted 2 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  27. arXiv:2003.02966  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

    Authors: Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu

    Abstract: The most common approach to speaker diarization is clustering of speaker embeddings. However, the clustering-based approach has a number of problems; i.e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps. To solve these p… ▽ More

    Submitted 24 February, 2020; originally announced March 2020.

    Comments: Submission to IEEE TASLP. This article draws from our previous conference papers: arxiv:1909.06247 and arxiv:1909.05952

  28. arXiv:1909.08103  [pdf, other

    cs.CL cs.SD eess.AS

    Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

    Authors: Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe

    Abstract: This paper investigates the use of target-speaker automatic speech recognition (TS-ASR) for simultaneous speech recognition and speaker diarization of single-channel dialogue recordings. TS-ASR is a technique to automatically extract and recognize only the speech of a target speaker given a short sample utterance of that speaker. One obvious drawback of TS-ASR is that it cannot be used when the sp… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: Accepted to ASRU 2019

  29. arXiv:1909.06247  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Neural Speaker Diarization with Self-attention

    Authors: Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe

    Abstract: Speaker diarization has been mainly developed based on the clustering of speaker embeddings. However, the clustering-based approach has two major problems; i.e., (i) it is not optimized to minimize diarization errors directly, and (ii) it cannot handle speaker overlaps correctly. To solve these problems, the End-to-End Neural Diarization (EEND), in which a bidirectional long short-term memory (BLS… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: Accepted for ASRU 2019

  30. arXiv:1909.05952  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Neural Speaker Diarization with Permutation-Free Objectives

    Authors: Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe

    Abstract: In this paper, we propose a novel end-to-end neural-network-based speaker diarization method. Unlike most existing methods, our proposed method does not have separate modules for extraction and clustering of speaker representations. Instead, our model has a single neural network that directly outputs speaker diarization results. To realize such a model, we formulate the speaker diarization problem… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: Accepted to INTERSPEECH 2019

  31. arXiv:1906.10876  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

    Authors: Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe

    Abstract: In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers speech given a short sample of the target speaker. The proposed auxiliary loss function attempts to additionally maximize interference speaker ASR accuracy during t… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

    Comments: Accepted to INTERSPEECH 2019

  32. arXiv:1905.12230  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

    Authors: Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach

    Abstract: In this paper, we present Hitachi and Paderborn University's joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural conversational content, and possibly (4) insufficient training data. As a… ▽ More

    Submitted 26 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted to INTERSPEECH 2019

  33. Personalized Classifier for Food Image Recognition

    Authors: Shota Horiguchi, Sosuke Amano, Makoto Ogawa, Kiyoharu Aizawa

    Abstract: Currently, food image recognition tasks are evaluated against fixed datasets. However, in real-world conditions, there are cases in which the number of samples in each class continues to increase and samples from novel classes appear. In particular, dynamic datasets in which each individual user creates samples and continues the updating process often have content that varies considerably between… ▽ More

    Submitted 8 April, 2018; originally announced April 2018.

    Comments: Accepted to IEEE Transaction on Multimedia. http://ieeexplore.ieee.org/document/8316919/

    Journal ref: IEEE Transactions on Multimedia 20.10 (2018): 2836-2848

  34. Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features

    Authors: Shota Horiguchi, Daiki Ikami, Kiyoharu Aizawa

    Abstract: The extraction of useful deep features is important for many computer vision tasks. Deep features extracted from classification networks have proved to perform well in those tasks. To obtain features of greater usefulness, end-to-end distance metric learning (DML) has been applied to train the feature extractor directly. However, in these DML studies, there were no equitable comparisons between fe… ▽ More

    Submitted 13 April, 2019; v1 submitted 29 December, 2017; originally announced December 2017.

    Comments: 6 pages

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019