Skip to main content

Showing 1–8 of 8 results for author: Akama, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.09062  [pdf, other

    cs.SD cs.LG eess.AS

    Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

    Authors: Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele RodolĂ , Taketo Akama

    Abstract: In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effec… ▽ More

    Submitted 17 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  2. arXiv:2404.02342  [pdf, other

    cs.CL cs.SD eess.AS

    A Computational Analysis of Lyric Similarity Perception

    Authors: Haven Kim, Taketo Akama

    Abstract: In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  3. arXiv:2401.04558  [pdf, other

    cs.SD cs.LG eess.AS

    HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks

    Authors: Zhe Zhang, Taketo Akama

    Abstract: GANStrument, exploiting GANs with a pitch-invariant feature extractor and instance conditioning technique, has shown remarkable capabilities in synthesizing realistic instrument sounds. To further improve the reconstruction ability and pitch accuracy to enhance the editability of user-provided sound, we propose HyperGANStrument, which introduces a pitch-invariant hypernetwork to modulate the weigh… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, Accepted for 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Audio examples: https://noto.li/MLIuBC

  4. arXiv:2312.10402  [pdf, other

    cs.SD cs.AI eess.AS

    Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion

    Authors: Gakusei Sato, Taketo Akama

    Abstract: Automatic Music Transcription (AMT) is a vital technology in the field of music information processing. Despite recent enhancements in performance due to machine learning techniques, current methods typically attain high accuracy in domains where abundant annotated data is available. Addressing domains with low or no resources continues to be an unresolved challenge. To tackle this issue, we propo… ▽ More

    Submitted 30 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: 7 pages, 1 figure

  5. arXiv:2307.04305  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

    Authors: Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

    Abstract: Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term dependencies in the frequency and time axes. In this… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 8 pages, 6 figures, to be published in ISMIR2023

  6. arXiv:2304.07449  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging

    Authors: Taketo Akama, Hiroaki Kitano, Katsuhiro Takematsu, Yasushi Miyajima, Natalia Polouliakh

    Abstract: In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstr… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: 11 pages

  7. arXiv:2211.05385  [pdf, other

    cs.SD cs.LG eess.AS

    GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning

    Authors: Gaku Narita, Junichi Shimizu, Taketo Akama

    Abstract: We propose GANStrument, a generative adversarial model for instrument sound synthesis. Given a one-shot sound as input, it is able to generate pitched instrument sounds that reflect the timbre of the input within an interactive time. By exploiting instance conditioning, GANStrument achieves better fidelity and diversity of synthesized sounds and generalization ability to various inputs. In additio… ▽ More

    Submitted 7 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures, Accepted to 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Audio examples: https://ganstrument.github.io/ganstrument-demo/

  8. arXiv:2111.11703  [pdf, other

    cs.LG cs.AI cs.SD eess.AS stat.ML

    A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence

    Authors: Taketo Akama

    Abstract: Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences, which plays an important part in steering generation interactively. However, editing subsequences mainly involves randomly resampling subsequences from a possible generation space. We propose a contextual latent space model (CLSM) in order for users to be able to exp… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: 22nd International Society for Music Information Retrieval Conference (ISMIR), 2021; 8 pages