Skip to main content

Showing 1–17 of 17 results for author: Mitsui, K

.
  1. arXiv:2406.12428  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

    Authors: Kentaro Mitsui, Koh Mitsuda, Toshiaki Wakatsuki, Yukiya Hono, Kei Sawada

    Abstract: Multimodal language models that process both text and speech have a potential for applications in spoken dialogue systems. However, current models face two major challenges in response generation latency: (1) generating a spoken response requires the prior generation of a written response, and (2) speech sequences are significantly longer than text sequences. This study addresses these issues by e… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures, 4 tables, demo samples: https://rinnakk.github.io/research/publications/PSLM

  2. arXiv:2404.01657  [pdf, other

    cs.CL cs.AI cs.CV cs.LG eess.AS

    Release of Pre-Trained Models for the Japanese Language

    Authors: Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda

    Abstract: AI democratization aims to create a world in which the average person can utilize AI techniques. To achieve this goal, numerous research institutes have attempted to make their results accessible to the public. In particular, large pre-trained models trained on large-scale data have shown unprecedented potential, and their release has had a significant impact. However, most of the released models… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 9 pages, 1 figure, 5 tables, accepted for LREC-COLING 2024. Models are publicly available at https://huggingface.co/rinna

  3. Development of a near-infrared wide-field integral field unit by ultra-precision diamond cutting

    Authors: Kosuke Kushibiki, Shinobu Ozaki, Masahiro Takeda, Takuya Hosobata, Yutaka Yamagata, Shinya Morita, Toshihiro Tsuzuki, Keiichi Nakagawa, Takao Saiki, Yutaka Ohtake, Kenji Mitsui, Hirofumi Okita, Yutaro Kitagawa, Yukihiro Kono, Kentaro Motohara, Hidenori Takahashi, Masahiro Konishi, Natsuko Kato, Shuhei Koyama, Nuo Chen

    Abstract: Integral Field Spectroscopy (IFS) is an observational method to obtain spatially resolved spectra over a specific field of view (FoV) in a single exposure. In recent years, near-infrared IFS has gained importance in observing objects with strong dust attenuation or at high redshift. One limitation of existing near-infrared IFS instruments is their relatively small FoV, less than 100 arcsec$^2$, co… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 24 pages, 18 figures, 7 tables. Accepted for publication in JATIS

    Journal ref: Journal of Astronomical Telescopes, Instruments, and Systems, Vol. 10, Issue 1, 015004 (March 2024)

  4. arXiv:2312.03668  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition

    Authors: Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

    Abstract: Advances in machine learning have made it possible to perform various text and speech processing tasks, such as automatic speech recognition (ASR), in an end-to-end (E2E) manner. E2E approaches utilizing pre-trained models are gaining attention for conserving training data and resources. However, most of their applications in ASR involve only one of either a pre-trained speech or a language model.… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at https://huggingface.co/rinna/nue-asr

  5. arXiv:2310.01088  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Towards human-like spoken dialogue generation between AI agents from written dialogue

    Authors: Kentaro Mitsui, Yukiya Hono, Kei Sawada

    Abstract: The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents. However, generating human-like spoken dialogues from these written dialogues remains challenging. Spoken dialogues have several unique characteristics: they frequently include backchannels and laughter, and the smoothness of turn-taking significantly influences the fluidity of… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 18 pages, 8 figures, 9 tables, audio samples: https://rinnakk.github.io/research/publications/CHATS/

  6. arXiv:2302.14337  [pdf, other

    cs.CV cs.CL cs.SD eess.AS eess.IV

    UniFLG: Unified Facial Landmark Generator from Text or Speech

    Authors: Kentaro Mitsui, Yukiya Hono, Kei Sawada

    Abstract: Talking face generation has been extensively investigated owing to its wide applicability. The two primary frameworks used for talking face generation comprise a text-driven framework, which generates synchronized speech and talking faces from text, and a speech-driven framework, which generates talking faces from speech. To integrate these frameworks, this paper proposes a unified facial landmark… ▽ More

    Submitted 18 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 5 pages, 2 figures, 3 tables, accepted for INTERSPEECH 2023. Audio samples: https://rinnakk.github.io/research/publications/UniFLG

  7. arXiv:2302.06883  [pdf, other

    cs.CV

    Text-Guided Scene Sketch-to-Photo Synthesis

    Authors: AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio Okura

    Abstract: We propose a method for scene-level sketch-to-photo synthesis with text guidance. Although object-level sketch-to-photo synthesis has been widely studied, whole-scene synthesis is still challenging without reference photos that adequately reflect the target style. To this end, we leverage knowledge from recent large-scale pre-trained generative models, resulting in text-guided sketch-to-photo synt… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  8. arXiv:2206.12040  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

    Authors: Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: The recent text-to-speech (TTS) has achieved quality comparable to that of humans; however, its application in spoken dialogue has not been widely studied. This study aims to realize a TTS that closely resembles human dialogue. First, we record and transcribe actual spontaneous dialogues. Then, the proposed dialogue TTS is trained in two stages: first stage, variational autoencoder (VAE)-VITS or G… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 5 pages, 3 figures, accepted for INTERSPEECH 2022. Audio samples: https://rinnakk.github.io/research/publications/DialogueTTS/

  9. A Super-Earth Orbiting Near the Inner Edge of the Habitable Zone around the M4.5-dwarf Ross 508

    Authors: Hiroki Harakawa, Takuya Takarada, Yui Kasagi, Teruyuki Hirano, Takayuki Kotani, Masayuki Kuzuhara, Masashi Omiya, Hajime Kawahara, Akihiko Fukui, Yasunori Hori, Hiroyuki Tako Ishikawa, Masahiro Ogihara, John Livingston, Timothy D. Brandt, Thayne Currie, Wako Aoki, Charles A. Beichman, Thomas Henning, Klaus Hodapp, Masato Ishizuka, Hideyuki Izumiura, Shane Jacobson, Markus Janson, Eiji Kambe, Takanori Kodama , et al. (24 additional authors not shown)

    Abstract: We report the near-infrared radial-velocity (RV) discovery of a super-Earth planet on a 10.77-day orbit around the M4.5 dwarf Ross 508 ($J_\mathrm{mag}=9.1$). Using precision RVs from the Subaru Telescope IRD (InfraRed Doppler) instrument, we derive a semi-amplitude of $3.92^{+0.60}_{-0.58}$ ${\rm m\,s}^{-1}$, corresponding to a planet with a minimum mass… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted for publication in PASJ (May 23, 2022)

  10. arXiv:2201.08113  [pdf, ps, other

    math.AG

    Relative compactifications of semiabelian Néron models, I

    Authors: Kentaro Mitsui, Iku Nakamura

    Abstract: Let $R$ be a complete discrete valuation ring, $k(η)$ its fraction field, $S:=\Spec R$, $(G_η,\cL_η)$ a polarized abelian variety over $k(η)$ with $\cL_η$ ample cubical and $\cG$ the Néron model of $G_η$ over $S$. Suppose that $\cG$ is totally degenerate semiabelian over $S$. Then there exists a (unique) relative compactification $(P,\cN)$ of $\cG$ such that ($α$) $P$ is Cohen-Macaulay with… ▽ More

    Submitted 15 May, 2024; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: 77 pages, 2 figures

    MSC Class: Primary 14K05; Secondary 14J10; 14K99

  11. arXiv:2109.13714  [pdf, other

    eess.AS cs.LG cs.SD

    MSR-NV: Neural Vocoder Using Multiple Sampling Rates

    Authors: Kentaro Mitsui, Kei Sawada

    Abstract: The development of neural vocoders (NVs) has resulted in the high-quality and fast generation of waveforms. However, conventional NVs target a single sampling rate and require re-training when applied to different sampling rates. A suitable sampling rate varies from application to application due to the trade-off between speech quality and generation speed. In this study, we propose a method to ha… ▽ More

    Submitted 23 June, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: 6 pages including supplement, 3 figures, accepted for INTERSPEECH 2022. Audio samples: https://rinnakk.github.io/research/publications/MSR-NV/

  12. arXiv:2008.02950  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

    Authors: Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari

    Abstract: Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a single model. Although many approaches using deep neural networks (DNNs) have been proposed, DNNs are prone to overfitting when the amount of training data is limited. We propose a framework for multi-speaker speech synthesis using deep Gaussian processes (DGPs); a DGP is a deep architecture of Bayesian ker… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 5 pages, accepted for INTERSPEECH 2020

  13. arXiv:1908.06248  [pdf, other

    cs.SD eess.AS

    JVS corpus: free Japanese multi-speaker voice corpus

    Authors: Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, Hiroshi Saruwatari

    Abstract: Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we are develo** Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In 2017, we released the JSUT corpus, which contains 10 hours of reading-style speech uttered b… ▽ More

    Submitted 17 August, 2019; originally announced August 2019.

  14. arXiv:1711.11547  [pdf, ps, other

    math.AG

    Logarithmic good reduction and the index

    Authors: Kentaro Mitsui, Arne Smeets

    Abstract: Let $K$ be the fraction field of a complete discrete valuation ring, with algebraically closed residue field of characteristic $p > 0$. This paper studies the index of a smooth, proper $K$-variety $X$ with logarithmic good reduction. We prove that it is prime to $p$ in `most' cases, for example if the Euler number of $X$ does not vanish, but (perhaps surprisingly) not always. We also fully charact… ▽ More

    Submitted 15 June, 2023; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: Corrected version

    MSC Class: 14D06; 14G20; 14H10; 14H25

  15. arXiv:1206.6232  [pdf, ps, other

    math.DS

    Growth of critical points in one-dimensional lattice systems

    Authors: Masayuki Asaoka, Tomohiro Fukaya, Kentaro Mitsui, Masaki Tsukamoto

    Abstract: We study the growth of the numbers of critical points in one-dimensional lattice systems by using (real) algebraic geometry and the theory of homoclinic tangency.

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: 19 pages

    MSC Class: 57R70; 37C35

  16. Neutrinos from extragalactic cosmic ray interactions in the far infrared background

    Authors: E. V. Bugaev, A. Misaki, K. Mitsui

    Abstract: Extragalactic background of high energy neutrinos arising from the interactions of cosmic ray protons with far-infrared extragalactic background radiation is calculated. The main assumption is that the cosmic ray spectrum at energies higher than $10^8 GeV$ has extragalactic origin and, therefore, is proton dominated. All calculations are performed with taking into account the possible cosmologic… ▽ More

    Submitted 19 May, 2005; v1 submitted 6 May, 2004; originally announced May 2004.

    Comments: 8 pages (including 1 figure)

    Journal ref: Astropart.Phys. 24 (2005) 345-354

  17. Identification problems of muon and electron events in the Super-Kamiokande detector

    Authors: K Mitsui, T Kitamura, T Wada, K Okei

    Abstract: In the measurement of atmospheric nu_e and nu_mu fluxes, the calculations of the Super Kamiokande group for the distinction between muon-like and electronlike events observed in the water Cerenkov detector have initially assumed a misidentification probability of less than 1 % and later 2 % for the sub-GeV range. In the multi-GeV range, they compared only the observed behaviors of ring patterns… ▽ More

    Submitted 18 September, 2002; originally announced September 2002.

    Comments: 17 pages, 12 figures

    Journal ref: J.Phys.G29:2281-2290,2003