Skip to main content

Showing 1–13 of 13 results for author: Higuchi, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.04439  [pdf, other

    eess.AS cs.LG cs.SD

    Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations

    Authors: Krishna Subramani, Paris Smaragdis, Takuya Higuchi, Mehrez Souden

    Abstract: Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short-Time Fourier Transform. However extending these applications to irregularly-spaced TF representations, like the Constant-Q transform, wavelets, or si… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE SPL, Code: https://github.com/SubramaniKrishna/in-nmf

  2. arXiv:2402.00340  [pdf, other

    cs.SD eess.AS

    Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

    Authors: Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald

    Abstract: Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-bank features as inputs, and thus, training them on top of self-supervised features assumes that both feature types require the same amount of learning for the task. In this work, we observe that pre-trained self-supervised spe… ▽ More

    Submitted 13 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  3. arXiv:2401.17230  [pdf, other

    cs.SD cs.AI eess.AS

    ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

    Authors: Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe

    Abstract: This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We provide several models, ranging from x-vector to recent SKA-TDNN. Through the modularized architecture design, variants can be developed easily. We also… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, 7 tables, Interspeech 2024

  4. arXiv:2309.16060  [pdf, other

    eess.AS cs.SD

    Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study

    Authors: Avamarie Brueggeman, Takuya Higuchi, Masood Delfarah, Stephen Shum, Vineet Garg

    Abstract: Noise robustness is a key aspect of successful speech applications. Speech enhancement (SE) has been investigated to improve automatic speech recognition accuracy; however, its effectiveness for keyword spotting (KWS) is still under-investigated. In this paper, we conduct a comprehensive study on single-channel speech enhancement for keyword spotting on the Google Speech Command (GSC) dataset. To… ▽ More

    Submitted 21 February, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

  5. arXiv:2309.16036  [pdf, other

    eess.AS cs.SD

    Multichannel Voice Trigger Detection Based on Transform-average-concatenate

    Authors: Takuya Higuchi, Avamarie Brueggeman, Masood Delfarah, Stephen Shum

    Abstract: Voice triggering (VT) enables users to activate their devices by just speaking a trigger phrase. A front-end system is typically used to perform speech enhancement and/or separation, and produces multiple enhanced and/or separated signals. Since conventional VT systems take only single-channel audio as input, channel selection is performed. A drawback of this approach is that unselected channels a… ▽ More

    Submitted 13 February, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted at HSCMA 2024

  6. arXiv:2204.02455  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Voice Trigger Detection with Metric Learning

    Authors: Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

    Abstract: Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented… ▽ More

    Submitted 13 September, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted at InterSpeech 2022

  7. arXiv:2107.07634  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-task Learning with Cross Attention for Keyword Spotting

    Authors: Takuya Higuchi, Anmol Gupta, Chandra Dhir

    Abstract: Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-tas… ▽ More

    Submitted 22 September, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

    Comments: Accepted at ASRU 2021

  8. arXiv:2102.09666  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Dynamic curriculum learning via data parameters for noise robust keyword spotting

    Authors: Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir

    Abstract: We propose dynamic curriculum learning via data parameters for noise robust keyword spotting. Data parameter learning has recently been introduced for image processing, where weight parameters, so-called data parameters, for target classes and instances are introduced and optimized along with model parameters. The data parameters scale logits and control importance over classes and instances durin… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: Accepted at ICASSP 2021

  9. arXiv:2010.05693  [pdf, ps, other

    cs.DC eess.SY

    Hybrid Vehicular and Cloud Distributed Computing: A Case for Cooperative Perception

    Authors: Enes Krijestorac, Agon Memedi, Takamasa Higuchi, Seyhan Ucar, Onur Altintas, Danijela Cabric

    Abstract: In this work, we propose the use of hybrid offloading of computing tasks simultaneously to edge servers (vertical offloading) via LTE communication and to nearby cars (horizontal offloading) via V2V communication, in order to increase the rate at which tasks are processed compared to local processing. Our main contribution is an optimized resource assignment and scheduling framework for hybrid off… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: 6 pages

  10. arXiv:2008.03405  [pdf, other

    eess.AS cs.SD

    Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

    Authors: Takuya Higuchi, Mohammad Ghasemzadeh, Kisun You, Chandra Dhir

    Abstract: We propose a stacked 1D convolutional neural network (S1DCNN) for end-to-end small footprint voice trigger detection in a streaming scenario. Voice trigger detection is an important speech application, with which users can activate their devices by simply saying a keyword or phrase. Due to privacy and latency reasons, a voice trigger detection system should run on an always-on processor on device.… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020

  11. arXiv:2004.10927  [pdf, other

    cs.AI cs.LG cs.RO eess.SP

    Cooperative Perception with Deep Reinforcement Learning for Connected Vehicles

    Authors: Shunsuke Aoki, Takamasa Higuchi, Onur Altintas

    Abstract: Sensor-based perception on vehicles are becoming prevalent and important to enhance the road safety. Autonomous driving systems use cameras, LiDAR, and radar to detect surrounding objects, while human-driven vehicles use them to assist the driver. However, the environmental perception by individual vehicles has the limitations on coverage and/or detection accuracy. For example, a vehicle cannot de… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  12. arXiv:1907.10126  [pdf, other

    cs.NI eess.SP

    Path Loss Models for V2V mmWave Communication: Performance Evaluation and Open Challenges

    Authors: Marco Giordani, Takayuki Shimizu, Andrea Zanella, Takamasa Higuchi, Onur Altintas, Michele Zorzi

    Abstract: Recently, millimeter wave (mmWave) bands have been investigated as a means to enhance automated driving and address the challenging data rate and latency demands of emerging automotive applications. For the development of those systems to operate in bands above 6 GHz, there is a need to have accurate channel models able to predict the peculiarities of the vehicular propagation at these bands, espe… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: 5 pages, 3 figures, 4 tables. Accepted to the IEEE 2nd Connected and Automated Vehicles Symposium

  13. arXiv:1905.09015  [pdf, other

    eess.SP cs.NI

    A Framework to Assess Value of Information in Future Vehicular Networks

    Authors: Marco Giordani, Takamasa Higuchi, Andrea Zanella, Onur Altintas, Michele Zorzi

    Abstract: Vehicles are becoming increasingly intelligent and connected, incorporating more and more sensors to support safer and more efficient driving. The large volume of data generated by such sensors, however, will likely saturate the capacity of vehicular communication technologies, making it challenging to guarantee the required quality of service. In this perspective, it is essential to assess the va… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

    Comments: 6 pages, 6 figures, 2 tables, accepted for publication to the 2019 1st ACM Workshop on Technologies, mOdels, and Protocols for Cooperative Connected Cars (TOP-Cars)