Skip to main content

Showing 1–30 of 30 results for author: Ono, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.08264  [pdf, other

    cs.MM cs.CV eess.AS

    Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

    Authors: Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono

    Abstract: Observations with distributed sensors are essential in analyzing a series of human and machine activities (referred to as 'events' in this paper) in complex and extensive real-world environments. This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze eve… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 13page, 7figure, under review

  2. arXiv:2402.15044  [pdf, other

    cs.CV cs.LG

    Fiducial Focus Augmentation for Facial Landmark Detection

    Authors: Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian

    Abstract: Deep learning methods have led to significant improvements in the performance on the facial landmark detection (FLD) task. However, detecting landmarks in challenging settings, such as head pose changes, exaggerated expressions, or uneven illumination, continue to remain a challenge due to high variability and insufficient samples. This inadequacy can be attributed to the model's inability to effe… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted to BMVC'23

  3. arXiv:2402.01238  [pdf, other

    cs.LG cs.AI cs.IT

    Flexible Variational Information Bottleneck: Achieving Diverse Compression with a Single Training

    Authors: Sota Kudo, Naoaki Ono, Shigehiko Kanaya, Ming Huang

    Abstract: Information Bottleneck (IB) is a widely used framework that enables the extraction of information related to a target random variable from a source random variable. In the objective function, IB controls the trade-off between data compression and predictiveness through the Lagrange multiplier $β$. Traditionally, to find the trade-off to be learned, IB requires a search for $β$ through multiple tra… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  4. arXiv:2312.13110  [pdf

    cs.LG physics.chem-ph q-bio.BM

    Pre-training of Molecular GNNs via Conditional Boltzmann Generator

    Authors: Daiki Koge, Naoaki Ono, Shigehiko Kanaya

    Abstract: Learning representations of molecular structures using deep learning is a fundamental problem in molecular property prediction tasks. Molecules inherently exist in the real world as three-dimensional structures; furthermore, they are not static but in continuous motion in the 3D Euclidean space, forming a potential energy surface. Therefore, it is desirable to generate multiple conformations in ad… ▽ More

    Submitted 18 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 4 pages

  5. arXiv:2312.11573  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Estimation of individual causal effects in network setup for multiple treatments

    Authors: Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar, Naoyuki Onoe

    Abstract: We study the problem of estimation of Individual Treatment Effects (ITE) in the context of multiple treatments and networked observational data. Leveraging the network information, we aim to utilize hidden confounders that may not be directly accessible in the observed data, thereby enhancing the practical applicability of the strong ignorability assumption. To achieve this, we first employ Graph… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 7 pages, accepted at AAAI-GCLR 2024

  6. arXiv:2310.20419  [pdf, other

    cs.IR

    Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search

    Authors: Naoki Ono, Yusuke Matsui

    Abstract: Approximate Nearest Neighbor Search (ANNS) is the task of finding the database vector that is closest to a given query vector. Graph-based ANNS is the family of methods with the best balance of accuracy and speed for million-scale datasets. However, graph-based methods have the disadvantage of long index construction time. Recently, many researchers have improved the tradeoff between accuracy and… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted by ACMMM 2023

  7. arXiv:2308.02227  [pdf, ps, other

    cs.CR

    Security Evaluation of Compressible and Learnable Image Encryption Against Jigsaw Puzzle Solver Attacks

    Authors: Tatsuya Chuman, Nobutaka Ono, Hitoshi Kiya

    Abstract: Several learnable image encryption schemes have been developed for privacy-preserving image classification. This paper focuses on the security block-based image encryption methods that are learnable and JPEG-friendly. Permuting divided blocks in an image is known to enhance robustness against ciphertext-only attacks (COAs), but recently jigsaw puzzle solver attacks have been demonstrated to be abl… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: To be appeared in 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023)

  8. arXiv:2307.12232  [pdf, other

    cs.SD eess.AS eess.SP

    Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase

    Authors: Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

    Abstract: We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE WASPAA 2023

  9. arXiv:2307.12231  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

    Authors: Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

    Abstract: Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. In detail, we explore multi-channel separation methods, mask-based beamforming and comp… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE WASPAA 2023

  10. arXiv:2307.00623  [pdf

    cs.LG q-bio.BM stat.ML

    Variational Autoencoding Molecular Graphs with Denoising Diffusion Probabilistic Model

    Authors: Daiki Koge, Naoaki Ono, Shigehiko Kanaya

    Abstract: In data-driven drug discovery, designing molecular descriptors is a very important task. Deep generative models such as variational autoencoders (VAEs) offer a potential solution by designing descriptors as probabilistic latent vectors derived from molecular structures. These models can be trained on large datasets, which have only molecular structures, and applied to transfer learning. Neverthele… ▽ More

    Submitted 22 August, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: 2 pages. Short paper submitted to IEEE CIBCB 2023

  11. arXiv:2306.02268  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Class Imbalance for End-to-end Semi-Supervised Object Detection

    Authors: Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik

    Abstract: Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods. However, many of these methods face challenges due to class imbalance, which hinders the effectiveness of the pseudo-label generator. Furthermore, in the literature, it has been observed that low-quality pseudo-labels severely limit the performance of SSOD. In this p… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted at the Efficient Deep Learning for Computer Vision Workshop, CVPR 2023

  12. arXiv:2302.10536  [pdf, other

    cs.SD cs.AI eess.AS

    Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

    Authors: Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe

    Abstract: Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emotion combinations only. In this paper, we tackle the problem of converting the emotion of speakers whose only neutral data ar… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Demo Samples at https://demosamplesites.github.io/EVCUP/

  13. arXiv:2302.07928  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

    Authors: Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono

    Abstract: This paper describes our submission to the Second Clarity Enhancement Challenge (CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy-reverberant environments with multiple interferers such as music and competing speakers. Our approach builds upon the powerful iterative neural/beamforming enhancement (iNeuBe) framework introduced in our recent work, and this p… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  14. arXiv:2210.10742  [pdf, other

    cs.SD eess.AS

    End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation

    Authors: Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono

    Abstract: Self-supervised learning representation (SSLR) has demonstrated its significant effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work pointed out the strength of integrating SSLR with single-channel speech enhancement for ASR in noisy environments. This paper further advances this integration by dealing with multi-channel input. We propose a novel end-to-end ar… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  15. arXiv:2207.04357  [pdf, ps, other

    cs.SD eess.AS

    Joint Analysis of Acoustic Scenes and Sound Events with Weakly labeled Data

    Authors: Shunsuke Tsubaki, Keisuke Imoto, Nobutaka Ono

    Abstract: Considering that acoustic scenes and sound events are closely related to each other, in some previous papers, a joint analysis of acoustic scenes and sound events utilizing multitask learning (MTL)-based neural networks was proposed. In conventional methods, a strongly supervised scheme is applied to sound event detection in MTL models, which requires strong labels of sound events in model trainin… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: Accepted to IWAENC2022

  16. arXiv:2206.13014  [pdf, other

    eess.AS cs.SD eess.SP

    Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones

    Authors: Yoshiki Masuyama, Kouei Yamaoka, Nobutaka Ono

    Abstract: In this paper, we propose to simultaneously estimate all the sampling rate offsets (SROs) of multiple devices. In a distributed microphone array, the SRO is inevitable, which deteriorates the performance of array signal processing. Most of the existing SRO estimation methods focused on synchronizing two microphones. When synchronizing more than two microphones, we select one reference microphone a… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: 5 pages, 2 figures,accepted by Interspeech2022

  17. arXiv:2206.02187  [pdf, other

    cs.CV cs.SD eess.AS

    M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

    Authors: Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe

    Abstract: Emotion Recognition in Conversations (ERC) is crucial in develo** sympathetic human-machine interaction. In conversational videos, emotion can be present in multiple modalities, i.e., audio, video, and transcript. However, due to the inherent characteristics of these modalities, multi-modal ERC has always been considered a challenging undertaking. Existing ERC research focuses mainly on using te… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in the 5th Multimodal Learning and Applications (MULA) Workshop at CVPR 2022

  18. arXiv:2204.03173  [pdf, other

    cs.LG cs.AI eess.SP

    Automated Sleep Staging via Parallel Frequency-Cut Attention

    Authors: Zheng Chen, Ziwei Yang, Lingwei Zhu, Wei Chen, Toshiyo Tamura, Naoaki Ono, MD Altaf-Ul-Amin, Shigehiko Kanaya, Ming Huang

    Abstract: This paper proposes a novel framework for automatically capturing the time-frequency nature of electroencephalogram (EEG) signals of human sleep based on the authoritative sleep medicine guidance. The framework consists of two parts: the first part extracts informative features by partitioning the input EEG spectrograms into a sequence of time-frequency patches. The second part is constituted by a… ▽ More

    Submitted 12 January, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: 10 pages, 9 figures

  19. arXiv:2204.02278  [pdf, other

    cs.LG q-bio.GN

    Cancer Subty** via Embedded Unsupervised Learning on Transcriptomics Data

    Authors: Ziwei Yang, Lingwei Zhu, Zheng Chen, Ming Huang, Naoaki Ono, MD Altaf-Ul-Amin, Shigehiko Kanaya

    Abstract: Cancer is one of the deadliest diseases worldwide. Accurate diagnosis and classification of cancer subtypes are indispensable for effective clinical treatment. Promising results on automatic cancer subty** systems have been published recently with the emergence of various deep learning methods. However, such automatic systems often overfit the data due to the high dimensionality and scarcity. In… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: 4 pages, accepted for EMBC 2022

  20. arXiv:2104.03255  [pdf, other

    cs.CV

    A Unified Model for Fingerprint Authentication and Presentation Attack Detection

    Authors: Additya Popli, Saraansh Tandon, Joshua J. Engelsma, Naoyuki Onoe, Atsushi Okubo, Anoop Namboodiri

    Abstract: Typical fingerprint recognition systems are comprised of a spoof detection module and a subsequent recognition module, running one after the other. In this paper, we reformulate the workings of a typical fingerprint recognition system. In particular, we posit that both spoof detection and fingerprint recognition are correlated tasks. Therefore, rather than performing the two tasks separately, we p… ▽ More

    Submitted 23 July, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted at IJCB2021; 12 pages

  21. Joint Dereverberation and Separation with Iterative Source Steering

    Authors: Taishi Nakashima, Robin Scheibler, Masahito Togami, Nobutaka Ono

    Abstract: We propose a new algorithm for joint dereverberation and blind source separation (DR-BSS). Our work builds upon the IRLMA-T framework that applies a unified filter combining dereverberation and separation. One drawback of this framework is that it requires several matrix inversions, an operation inherently costly and with potential stability issues. We leverage the recently introduced iterative so… ▽ More

    Submitted 31 May, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, accepted at ICASSP 2021

  22. arXiv:2004.03926  [pdf, other

    eess.SP cs.SD eess.AS

    MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: In this work, we propose efficient algorithms for joint independent subspace analysis (JISA), an extension of independent component analysis that deals with parallel mixtures, where not all the components are independent. We derive an algorithmic framework for JISA based on the majorization-minimization (MM) optimization technique (JISA-MM). We use a well-known inequality for super-Gaussian source… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 15 pages, 4 figures

  23. arXiv:1910.10654  [pdf, other

    cs.SD eess.AS eess.SP

    Fast Independent Vector Extraction by Iterative SINR Maximization

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We propose fast independent vector extraction (FIVE), a new algorithm that blindly extracts a single non-Gaussian source from a Gaussian background. The algorithm iteratively computes beamforming weights maximizing the signal-to-interference-and-noise ratio for an approximate noise covariance matrix. We demonstrate that this procedure minimizes the negative log-likelihood of the input data accordi… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 5 pages, 4 figures, Submitted to ICASSP 2020

  24. arXiv:1905.07880  [pdf, other

    cs.SD eess.AS

    Independent Vector Analysis with more Microphones than Sources

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. The proposed algorithm is based on a parametrization of the demixing matrix decreasing the number of parameters to estimate. Furthermore, orthogonal constraints between the signal… ▽ More

    Submitted 7 August, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: Accepted to WASPAA 2019, 5 pages, 3 figures

  25. Multi-modal Blind Source Separation with Microphones and Blinkies

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We propose a blind source separation algorithm that jointly exploits measurements by a conventional microphone array and an ad hoc array of low-rate sound power sensors called blinkies. While providing less information than microphones, blinkies circumvent some difficulties of microphone arrays in terms of manufacturing, synchronization, and deployment. The algorithm is derived from a joint probab… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figures

  26. arXiv:1806.10307  [pdf, other

    eess.AS cs.SD

    Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation

    Authors: Shinichi Mogami, Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source g… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

    Comments: 5 pages, 4 figures, To appear in the Proceedings of the 26th European Signal Processing Conference (EUSIPCO 2018)

  27. arXiv:1708.04795  [pdf, ps, other

    cs.SD

    Independent Low-Rank Matrix Analysis Based on Complex Student's $t$-Distribution for Blind Audio Source Separation

    Authors: Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we intro… ▽ More

    Submitted 16 August, 2017; originally announced August 2017.

    Comments: Preprint manuscript of 2017 IEEE International Workshop on Machine Learning for Signal Processing

  28. A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments

    Authors: Eita Nakamura, Nobutaka Ono, Shigeki Sagayama, Kenji Watanabe

    Abstract: We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indete… ▽ More

    Submitted 2 August, 2016; v1 submitted 8 April, 2014; originally announced April 2014.

    Comments: 35 pages, 6 figures, some explanations and evaluation results added, version accepted to JNMR

    Journal ref: Journal of New Music Research, Vol. 44, No. 4 (2015) 287-304

  29. Outer-Product Hidden Markov Model and Polyphonic MIDI Score Following

    Authors: Eita Nakamura, Tomohiko Nakamura, Yasuyuki Saito, Nobutaka Ono, Shigeki Sagayama

    Abstract: We present a polyphonic MIDI score-following algorithm capable of following performances with arbitrary repeats and skips, based on a probabilistic model of musical performances. It is attractive in practical applications of score following to handle repeats and skips which may be made arbitrarily during performances, but the algorithms previously described in the literature cannot be applied to s… ▽ More

    Submitted 8 April, 2014; originally announced April 2014.

    Comments: 42 pages, 8 figures, version submitted to JNMR. To appear in Journal of New Music Research (2014)

    Journal ref: Journal of New Music Research, Vol. 43, No. 2 (2014) 183-201

  30. arXiv:physics/0701168  [pdf, ps, other

    physics.soc-ph cs.CY physics.data-an

    A Gap in the Community-Size Distribution of a Large-Scale Social Networking Site

    Authors: Kikuo Yuta, Naoaki Ono, Yoshi Fujiwara

    Abstract: Social networking sites (SNS) have recently used by millions of people all over the world. An SNS is a society on the Internet, where people communicate and foster friendship with each other. We examine a nation-wide SNS (more than six million users at present), mutually acknowledged friendship network with third million people and nearly two million links. By employing a community-extracting me… ▽ More

    Submitted 19 March, 2007; v1 submitted 15 January, 2007; originally announced January 2007.

    Comments: 10 pages with 6 figures; method adequately referred