Skip to main content

Showing 1–46 of 46 results for author: Kaneko, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04155  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization

    Authors: Takuhiro Kaneko

    Abstract: Geometry-agnostic system identification is a technique for identifying the geometry and physical properties of an object from video sequences without any geometric assumptions. Recently, physics-augmented continuum neural radiance fields (PAC-NeRF) has demonstrated promising results for this technique by utilizing a hybrid Eulerian-Lagrangian representation, in which the geometry is represented by… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/lpo/

  2. arXiv:2403.16464  [pdf, other

    cs.SD cs.LG eess.AS

    Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka

    Abstract: A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics. However, this data-driven model requires a large amount of training data incurring high data-collection costs. This fact motivates us to train a GAN-based vocoder on limited data. A promising solutio… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP 2024. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/augcondd/

  3. arXiv:2403.14089  [pdf, other

    cs.CV

    Unsupervised Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training

    Authors: Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura

    Abstract: Unsupervised intrinsic image decomposition (IID) is the process of separating a natural image into albedo and shade without these ground truths. A recent model employing light detection and ranging (LiDAR) intensity demonstrated impressive performance, though the necessity of LiDAR intensity during inference restricts its practicality. Thus, IID models employing only a single image during inferenc… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2312.12808  [pdf, other

    cs.CL

    Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario

    Authors: Hiroki Onozeki, Zhiyang Qi, Kazuma Akiyama, Ryutaro Asahara, Takumasa Kaneko, Michimasa Inaba

    Abstract: This paper describes our dialogue system submitted to Dialogue Robot Competition 2023. The system's task is to help a user at a travel agency decide on a plan for visiting two sightseeing spots in Kyoto City that satisfy the user. Our dialogue system is flexible and stable and responds to user requirements by controlling dialogue flow according to dialogue scenarios. We also improved user satisfac… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2023

  5. arXiv:2310.01821  [pdf, other

    cs.CV cs.AI cs.GR cs.LG eess.IV

    MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields

    Authors: Takuhiro Kaneko

    Abstract: Neural radiance fields (NeRFs) have shown impressive results for novel view synthesis. However, they depend on the repetitive use of a single-input single-output multilayer perceptron (SISO MLP) that maps 3D coordinates and view direction to the color and volume density in a sample-wise manner, which slows the rendering. We propose a multi-input multi-output NeRF (MIMO-NeRF) that reduces the numbe… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted to ICCV 2023. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/mimo-nerf/

  6. arXiv:2308.07117  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

    Abstract: The inverse short-time Fourier transform network (iSTFTNet) has garnered attention owing to its fast, lightweight, and high-fidelity speech synthesis. It obtains these characteristics using a fast and lightweight 1D CNN as the backbone and replacing some neural processes with iSTFT. Owing to the difficulty of a 1D CNN to model high-dimensional spectrograms, the frequency dimension is reduced via t… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted to Interspeech 2023. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/

  7. arXiv:2304.10770  [pdf, other

    cs.LG cs.AI cs.IT

    DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

    Authors: Shanchuan Wan, Yu** Tang, Yingtao Tian, Tomoyuki Kaneko

    Abstract: Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations. However, there is a gap between the novelty of an observation and a… ▽ More

    Submitted 18 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: Accepted as a conference paper to the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23)

  8. arXiv:2303.13909  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

    Abstract: In speech synthesis, a generative adversarial network (GAN), training a generator (speech synthesizer) and a discriminator in a min-max game, is widely used to improve speech quality. An ensemble of discriminators is commonly used in recent neural vocoders (e.g., HiFi-GAN) and end-to-end text-to-speech (TTS) systems (e.g., VITS) to scrutinize waveforms from multiple perspectives. Such discriminato… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/

  9. arXiv:2303.10820  [pdf, other

    cs.CV

    Unsupervised Intrinsic Image Decomposition with LiDAR Intensity

    Authors: Shogo Sato, Yasuhiro Yao, Taiga Yoshida, Takuhiro Kaneko, Shingo Ando, Jun Shimamura

    Abstract: Intrinsic image decomposition (IID) is the task that decomposes a natural image into albedo and shade. While IID is typically solved through supervised learning methods, it is not ideal due to the difficulty in observing ground truth albedo and shade in general scenes. Conversely, unsupervised learning methods are currently underperforming supervised learning methods since there are no criteria fo… ▽ More

    Submitted 28 March, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023, Dataset link : (https://github.com/ntthilab-cv/NTT-intrinsic-dataset)

  10. arXiv:2302.11135   

    cs.LG cs.AI

    Semi-Supervised Approach for Early Stuck Sign Detection in Drilling Operations

    Authors: Andres Hernandez-Matamoros, Kohei Sugawara, Tatsuya Kaneko, Ryota Wada, Masahiko Ozaki

    Abstract: A real-time stuck pipe prediction methodology is proposed in this paper. We assume early signs of stuck pipe to be apparent when the drilling data behavior deviates from that from normal drilling operations. The definition of normalcy changes with drill string configuration or geological conditions. Here, a depth-domain data representation is adopted to capture the localized normal behavior. Sever… ▽ More

    Submitted 24 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: There is a conflict interest between authors

  11. arXiv:2209.14397  [pdf, ps, other

    eess.SP cs.CV cs.LG

    Variational Bayes for robust radar single object tracking

    Authors: Alp Sarı, Tak Kaneko, Lense H. M. Swaenen, Wouter M. Kouw

    Abstract: We address object tracking by radar and the robustness of the current state-of-the-art methods to process outliers. The standard tracking algorithms extract detections from radar image space to use it in the filtering stage. Filtering is performed by a Kalman filter, which assumes Gaussian distributed noise. However, this assumption does not account for large modeling errors and results in poor tr… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: 6 pages, 8 figures. Published as part of the proceedings of the IEEE International Workshop on Signal Processing Systems 2022

  12. arXiv:2206.06100  [pdf, other

    cs.CV cs.AI cs.GR cs.LG eess.IV

    AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields

    Authors: Takuhiro Kaneko

    Abstract: Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection. A successful approach involves a viewpoint-aware approach that learns an image distribution based on generative models (e.g., generative adversarial networks (GANs)) while generating various view images based on 3D-aware models (e.g., neural radiance fields (NeRFs)). However, they require… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR 2022. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/ar-nerf/

  13. arXiv:2203.02395  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform

    Authors: Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka, Shogo Seki

    Abstract: In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is commonly applied as an intermediate representation, and the necessity for a mel-spectrogram vocoder is increasing. A mel-spectrogram vocoder must solve three inverse problems: recovery of the original-scale magnitude spectrogram, phase reconstruction, and frequency-to-time conversion. A typical convolutional mel-… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted to ICASSP 2022. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet/

  14. Pixyz: a Python library for develo** deep generative models

    Authors: Masahiro Suzuki, Takaaki Kaneko, Yutaka Matsuo

    Abstract: With the recent rapid progress in the study of deep generative models (DGMs), there is a need for a framework that can implement them in a simple and generic way. In this research, we focus on two features of DGMs: (1) deep neural networks are encapsulated by probability distributions, and (2) models are designed and learned based on an objective function. Taking these features into account, we pr… ▽ More

    Submitted 21 September, 2023; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: Published in Advanced Robotics

    Journal ref: Advanced Robotics, 2023

  15. arXiv:2106.13041  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Unsupervised Learning of Depth and Depth-of-Field Effect from Natural Images with Aperture Rendering Generative Adversarial Networks

    Authors: Takuhiro Kaneko

    Abstract: Understanding the 3D world from 2D projected natural images is a fundamental challenge in computer vision and graphics. Recently, an unsupervised learning approach has garnered considerable attention owing to its advantages in data collection. However, to mitigate training limitations, typical methods need to impose assumptions for viewpoint distribution (e.g., a dataset containing various viewpoi… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2021 (Oral). Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/ar-gan/

  16. arXiv:2104.06900  [pdf, ps, other

    cs.SD eess.AS

    FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion

    Authors: Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko

    Abstract: This paper proposes a non-autoregressive extension of our previously proposed sequence-to-sequence (S2S) model-based voice conversion (VC) methods. S2S model-based VC methods have attracted particular attention in recent years for their flexibility in converting not only the voice identity but also the pitch contour and local duration of input speech, thanks to the ability of the encoder-decoder a… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  17. arXiv:2102.12841  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

    Abstract: Non-parallel voice conversion (VC) is a technique for training voice converters without a parallel corpus. Cycle-consistent adversarial network-based VCs (CycleGAN-VC and CycleGAN-VC2) are widely accepted as benchmark methods. However, owing to their insufficient ability to grasp time-frequency structures, their application is limited to mel-cepstrum conversion and not mel-spectrogram conversion d… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021. Project page: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/maskcyclegan-vc/index.html

  18. arXiv:2010.11672  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

    Abstract: Non-parallel voice conversion (VC) is a technique for learning map**s between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-sp… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted to Interspeech 2020. Project page: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html

  19. arXiv:2010.02977  [pdf, ps, other

    cs.SD eess.AS

    VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics

    Authors: Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Shogo Seki

    Abstract: In this paper, we propose a non-parallel any-to-many voice conversion (VC) method termed VoiceGrad. Inspired by WaveGrad, a recently introduced novel waveform generation method, VoiceGrad is based upon the concepts of score matching and Langevin dynamics. It uses weighted denoising score matching to train a score approximator, a fully convolutional network with a U-Net structure designed to predic… ▽ More

    Submitted 9 March, 2024; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: For more details on the baseline method used for comparison, please refer to our article in arXiv:2008.12604

  20. arXiv:2010.02756  [pdf, other

    cs.LG cs.AI

    Learning Diverse Options via InfoMax Termination Critic

    Authors: Yuji Kanagawa, Tomoyuki Kaneko

    Abstract: We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning. While options can speed up transfer learning by serving as reusable building blocks, learning reusable options for unknown task distribution remains challenging. Motivated by the recent success of mutual information (MI) based skill learning, we hypothesize that more divers… ▽ More

    Submitted 31 May, 2023; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Rejected from ICLR 2022. See https://openreview.net/forum?id=UTTrevGchy for reviews

    ACM Class: I.2.6

  21. arXiv:2008.07079  [pdf, other

    cs.LG cs.AI stat.ML

    Playing Catan with Cross-dimensional Neural Network

    Authors: Quentin Gendre, Tomoyuki Kaneko

    Abstract: Catan is a strategic board game having interesting properties, including multi-player, imperfect information, stochastic, complex state space structure (hexagonal board where each vertex, edge and face has its own features, cards for each player, etc), and a large action space (including negotiation). Therefore, it is challenging to build AI agents by Reinforcement Learning (RL for short), without… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

    Comments: 12 pages, 5 tables and 10 figures; submitted to the ICONIP 2020

  22. arXiv:2006.05513  [pdf

    physics.med-ph cs.CV eess.IV

    A Deep Learning-Based Method for Automatic Segmentation of Proximal Femur from Quantitative Computed Tomography Images

    Authors: Chen Zhao, Joyce H. Keyak, **shan Tang, Tadashi S. Kaneko, Sundeep Khosla, Shreyasee Amin, Elizabeth J. Atkinson, Lan-Juan Zhao, Michael J. Serou, Chaoyang Zhang, Hui Shen, Hong-Wen Deng, Weihua Zhou

    Abstract: Purpose: Proximal femur image analyses based on quantitative computed tomography (QCT) provide a method to quantify the bone density and evaluate osteoporosis and risk of fracture. We aim to develop a deep-learning-based method for automatic proximal femur segmentation. Methods and Materials: We developed a 3D image segmentation method based on V-Net, an end-to-end fully convolutional neural netwo… ▽ More

    Submitted 1 July, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  23. arXiv:2005.08445  [pdf, ps, other

    eess.AS cs.SD stat.ML

    Many-to-Many Voice Transformer Network

    Authors: Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda

    Abstract: This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously proposed an S2S-based VC method using a transformer network architecture called the voice transformer network (VTN). The original VTN was designed to learn only a m… ▽ More

    Submitted 6 November, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: submitted to IEEE/ACM Trans. ASLP. Please also refer to our related article: arXiv:1811.01609

  24. arXiv:2003.07849  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Blur, Noise, and Compression Robust Generative Adversarial Networks

    Authors: Takuhiro Kaneko, Tatsuya Harada

    Abstract: Generative adversarial networks (GANs) have gained considerable attention owing to their ability to reproduce images. However, they can recreate training images faithfully despite image degradation in the form of blur, noise, and compression, generating similarly degraded images. To solve this problem, the recently proposed noise robust GAN (NR-GAN) provides a partial solution by demonstrating the… ▽ More

    Submitted 23 June, 2021; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: Accepted to CVPR 2021. Project page: https://takuhirok.github.io/BNCR-GAN/

  25. arXiv:1912.12927  [pdf, other

    cs.LG stat.ML

    Learning with Multiple Complementary Labels

    Authors: Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, Masashi Sugiyama

    Abstract: A complementary label (CL) simply indicates an incorrect class of an example, but learning with CLs results in multi-class classifiers that can predict the correct class. Unfortunately, the problem setting only allows a single CL for each example, which notably limits its potential since our labelers may easily identify multiple CLs (MCLs) to one example. In this paper, we propose a novel problem… ▽ More

    Submitted 6 August, 2022; v1 submitted 30 December, 2019; originally announced December 2019.

    Comments: Corrected typos in Lemma 2, accepted by ICML 2020

  26. arXiv:1911.11776  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Noise Robust Generative Adversarial Networks

    Authors: Takuhiro Kaneko, Tatsuya Harada

    Abstract: Generative adversarial networks (GANs) are neural networks that learn data distributions through adversarial training. In intensive studies, recent GANs have shown promising results for reproducing training images. However, in spite of noise, they reproduce images with fidelity. As an alternative, we propose a novel family of GANs called noise robust GANs (NR-GANs), which can learn a clean image g… ▽ More

    Submitted 31 March, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: Accepted to CVPR 2020. Project page: https://takuhirok.github.io/NR-GAN/

  27. arXiv:1907.12279  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

    Abstract: Non-parallel multi-domain voice conversion (VC) is a technique for learning map**s among multiple domains without relying on parallel data. This is important but challenging owing to the requirement of learning multiple map**s and the non-availability of explicit supervision. Recently, StarGAN-VC has garnered attention owing to its ability to solve this problem only using a single generator. H… ▽ More

    Submitted 7 August, 2019; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: Accepted to Interspeech 2019. Project page: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/stargan-vc2/index.html

  28. arXiv:1905.02185  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Label-Noise Robust Multi-Domain Image-to-Image Translation

    Authors: Takuhiro Kaneko, Tatsuya Harada

    Abstract: Multi-domain image-to-image translation is a problem where the goal is to learn map**s among multiple domains. This problem is challenging in terms of scalability because it requires the learning of numerous map**s, the number of which increases proportional to the number of domains. However, generative adversarial networks (GANs) have emerged recently as a powerful framework for this problem.… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  29. arXiv:1904.08129  [pdf, other

    cs.LG stat.ML

    Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning

    Authors: Yuji Kanagawa, Tomoyuki Kaneko

    Abstract: In this paper, we propose Rogue-Gym, a simple and classic style roguelike game built for evaluating generalization in reinforcement learning (RL). Combined with the recent progress of deep neural networks, RL has successfully trained human-level agents without human knowledge in many games such as those for Atari 2600. However, it has been pointed out that agents trained with RL methods often over… ▽ More

    Submitted 31 May, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

    Comments: 8 pages, 14 figures, 4 tables, accepted to IEEE COG 2019

  30. arXiv:1904.04631  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

    Abstract: Non-parallel voice conversion (VC) is a technique for learning the map** from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time ali… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: Accepted to ICASSP 2019. Project page: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/index.html

  31. arXiv:1904.04540  [pdf, ps, other

    cs.SD stat.ML

    Crossmodal Voice Conversion

    Authors: Hirokazu Kameoka, Kou Tanaka, Aaron Valero Puche, Yasunori Ohishi, Takuhiro Kaneko

    Abstract: Humans are able to imagine a person's voice from the person's appearance and imagine the person's appearance from his/her voice. In this paper, we make the first attempt to develop a method that can convert speech into a voice that matches an input face image and generate a face image that matches the voice of the input speech by leveraging the correlation between faces and voices. We propose a mo… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: Submitted to Interspeech2019

  32. arXiv:1904.02892  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

    Authors: Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

    Abstract: WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training. However, the human ear can still distinguish the processed speech waveforms from natural ones… ▽ More

    Submitted 8 April, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Submitted to INTERSPEECH2019

  33. arXiv:1902.01056  [pdf, other

    cs.LG stat.ML

    Online Multiclass Classification Based on Prediction Margin for Partial Feedback

    Authors: Takuo Kaneko, Issei Sato, Masashi Sugiyama

    Abstract: We consider the problem of online multiclass classification with partial feedback, where an algorithm predicts a class for a new instance in each round and only receives its correctness. Although several methods have been developed for this problem, recent challenging real-world applications require further performance improvement. In this paper, we propose a novel online learning algorithm inspir… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  34. arXiv:1901.09777  [pdf, other

    cs.DC

    SimBlock: A Blockchain Network Simulator

    Authors: Yusuke Aoki, Kai Otsuki, Takeshi Kaneko, Ryohei Banno, Kazuyuki Shudo

    Abstract: Blockchain, which is a technology for distributedly managing ledger information over multiple nodes without a centralized system, has elicited increasing attention. Performing experiments on actual blockchains are difficult because a large number of nodes in wide areas are necessary. In this study, we developed a blockchain network simulator SimBlock for such experiments. Unlike the existing simul… ▽ More

    Submitted 19 March, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

    Comments: Proc. 2nd Workshop on Cryptocurrencies and Blockchains for Distributed Systems (CryBlock 2019) (in conj. with INFOCOM 2019)

  35. arXiv:1811.11165  [pdf, other

    cs.CV cs.LG stat.ML

    Label-Noise Robust Generative Adversarial Networks

    Authors: Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requi… ▽ More

    Submitted 2 May, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted to CVPR 2019 (Oral). Project page: https://takuhirok.github.io/rGAN/

  36. arXiv:1811.11163  [pdf, other

    cs.CV cs.LG stat.ML

    Class-Distinct and Class-Mutual Image Generation with GANs

    Authors: Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Class-conditional extensions of generative adversarial networks (GANs), such as auxiliary classifier GAN (AC-GAN) and conditional GAN (cGAN), have garnered attention owing to their ability to decompose representations into class labels and other factors and to boost the training stability. However, a limitation is that they assume that each class is separable and ignore the relationship between cl… ▽ More

    Submitted 24 July, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted to BMVC 2019 (Spotlight). Project page: https://takuhirok.github.io/CP-GAN/

  37. arXiv:1811.04076  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms

    Authors: Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

    Abstract: This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq has been outstanding at numerous tasks involving sequence modeling such as speech synthesis and recognition, machine translation, and image captioning. In contrast to current VC techniques, our method 1) stabilizes and accelerat… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP2019

  38. arXiv:1811.01609  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

    Authors: Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo

    Abstract: This paper proposes a voice conversion (VC) method using sequence-to-sequence (seq2seq or S2S) learning, which flexibly converts not only the voice characteristics but also the pitch contour and duration of input speech. The proposed method, called ConvS2S-VC, has three key features. First, it uses a model with a fully convolutional architecture. This is particularly advantageous in that it is sui… ▽ More

    Submitted 6 October, 2020; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: Published in IEEE/ACM Trans. ASLP https://ieeexplore.ieee.org/document/9113442

  39. arXiv:1809.10288  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

    Authors: Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka

    Abstract: We propose a learning-based filter that allows us to directly modify a synthetic speech waveform into a natural speech waveform. Speech-processing systems using a vocoder framework such as statistical parametric speech synthesis and voice conversion are convenient especially for a limited number of data because it is possible to represent and process interpretable acoustic features over a compact… ▽ More

    Submitted 28 September, 2018; v1 submitted 25 September, 2018; originally announced September 2018.

    Comments: SLT2018

  40. arXiv:1808.05092  [pdf, ps, other

    stat.ML cs.LG cs.SD eess.AS

    ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

    Authors: Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

    Abstract: This paper proposes a non-parallel many-to-many voice conversion (VC) method using a variant of the conditional variational autoencoder (VAE) called an auxiliary classifier VAE (ACVAE). The proposed method has three key features. First, it adopts fully convolutional architectures to construct the encoder and decoder networks so that the networks can learn conversion rules that capture time depende… ▽ More

    Submitted 10 October, 2020; v1 submitted 13 August, 2018; originally announced August 2018.

    Comments: Publised in IEEE/ACM Trans. ASLP https://ieeexplore.ieee.org/abstract/document/8718381 Please also refer to our related articles: arXiv:1806.02169, arXiv:2008.12604

  41. arXiv:1806.02169  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

    Authors: Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

    Abstract: This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN. Our method, which we call StarGAN-VC, is noteworthy in that it (1) requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training, (2) simultaneously learns many-to-many map**s across dif… ▽ More

    Submitted 29 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

  42. arXiv:1805.10603  [pdf, other

    cs.CV

    Generative Adversarial Image Synthesis with Decision Tree Latent Controller

    Authors: Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino

    Abstract: This paper proposes the decision tree latent controller generative adversarial network (DTLC-GAN), an extension of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision. To impose a hierarchical inclusion structure on latent variables, we incorporate a new architecture called the DTLC into the generator input. The DTLC has a multiple-layer tree s… ▽ More

    Submitted 27 May, 2018; originally announced May 2018.

    Comments: CVPR 2018. Project page: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/dtlc-gan/

  43. arXiv:1804.02181  [pdf, ps, other

    eess.SP cs.LG stat.ML

    Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms

    Authors: Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Hiroyasu Ando

    Abstract: In this paper, we address the problem of reconstructing a time-domain signal (or a phase spectrogram) solely from a magnitude spectrogram. Since magnitude spectrograms do not contain phase information, we must restore or infer phase information to reconstruct a time-domain signal. One widely used approach for dealing with the signal reconstruction problem was proposed by Griffin and Lim. This meth… ▽ More

    Submitted 6 April, 2018; originally announced April 2018.

  44. arXiv:1711.11293  [pdf, ps, other

    stat.ML cs.SD eess.AS

    Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

    Authors: Takuhiro Kaneko, Hirokazu Kameoka

    Abstract: We propose a parallel-data-free voice-conversion (VC) method that can learn a map** from source to target speech without relying on parallel data. The proposed method is general purpose, high quality, and parallel-data free and works without any extra data, modules, or alignment procedure. It also avoids over-smoothing, which occurs in many conventional statistical model-based VC methods. Our me… ▽ More

    Submitted 20 December, 2017; v1 submitted 30 November, 2017; originally announced November 2017.

  45. arXiv:1401.1910  [pdf

    cs.SE

    Aligning Software-related Strategies in Multi-Organizational Settings

    Authors: Martin Kowalczyk, Jürgen Münch, Masafumi Katahira, Tatsuya Kaneko, Yuko Miyamoto, Yumi Koishi

    Abstract: Aligning the activities of an organization with its business goals is a challenging task that is critical for success. Alignment in a multi-organizational setting requires the integration of different internal or external organizational units. The anticipated benefits of multi-organizational alignment consist of clarified contributions and increased transparency of the involved organizational unit… ▽ More

    Submitted 9 January, 2014; originally announced January 2014.

    Comments: 14 pages. The final publication is available at http://www.shaker.de/de/content/catalogue/index.asp? lang=de&ID=8&ISBN=978-3-8322-9618-6

    Journal ref: Proceedings of the International Conference on Software Process and Product Measurement (IWSM/MetriKon/ Mensura 2010), pages 261-274, Stuttgart, Germany, November 10-12 2010

  46. arXiv:1303.1748  [pdf, ps, other

    math.NA cs.OH math.DG

    Mixed Maps for Kolmogoroff-Nagumo-Type Averaging on the Compact Stiefel Manifold

    Authors: Simone Fiori, Tetsuya Kaneko, Toshihisa Tanaka

    Abstract: The present research work proposes a new fast fixed-point averaging algorithm on the compact Stiefel manifold based on a mixed retraction/lifting pair. Numerical comparisons between fixed-point algorithms based on the proposed non-associated retraction/lifting map pair and two associated retraction/lifting pairs confirm that the averaging algorithm based on a combination of mixed maps is remarkabl… ▽ More

    Submitted 7 March, 2013; originally announced March 2013.

    Comments: 10 pages