Skip to main content

Showing 1–17 of 17 results for author: Ando, A

.
  1. arXiv:2406.18910  [pdf, other

    cs.CL cs.SD eess.AS

    Factor-Conditioned Speaking-Style Captioning

    Authors: Atsushi Ando, Takafumi Moriya, Shota Horiguchi, Ryo Masumura

    Abstract: This paper presents a novel speaking-style captioning method that generates diverse descriptions while accurately predicting speaking-style information. Conventional learning criteria directly use original captions that contain not only speaking-style factor terms but also syntax words, which disturbs learning speaking-style information. To solve this problem, we introduce factor-conditioned capti… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2402.07085  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

    Authors: Kenichi Fujita, Atsushi Ando, Yusuke Ijima

    Abstract: This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 11 pages,9 figures, Accepted to IEICE TRANSACTIONS on Information and Systems

    Journal ref: IEICE TRANSACTIONS on Information and Systems 107.1 (2024): 93-104

  3. arXiv:2309.12656  [pdf, other

    eess.AS cs.SD

    NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

    Authors: Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa

    Abstract: This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations. The proposed diarization pipeline uses weighted prediction error (WPE)-based dereverberation as a front end, then applies end-to-end neural diarization with vector clustering (EEND-VC) to each channel separately. It integrates the diarization result obtained from each channel using d… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 5 pages, 5 figures, Submitted to ICASSP 2024

  4. arXiv:2308.16454  [pdf, other

    cs.CV cs.LG

    Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

    Authors: Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura

    Abstract: This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetu… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by International Conference on Computer Vision (ICCV) 2023

  5. Nonuniqueness phenomena in discontinuous dynamical systems and their regularizations

    Authors: Alessia andò, Roderick Edwards, Nicola Guglielmi

    Abstract: In a recent paper by Guglielmi and Hairer (SIADS 2015), an analysis in the $\varepsilon\to 0$ limit was proposed of regularized discontinuous ODEs in codimension-2 switching domains; this was obtained by studying a certain 2-dimensional system describing the so-called hidden dynamics. In particular, the existence of a unique limit solution was not proved in all cases, a few of which were labeled a… ▽ More

    Submitted 1 June, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 32 pages, 5 figures

    MSC Class: 34A09; 34A36; 34C23; 34E15; 65L11; 92C42

    Journal ref: SIAM J. Appl. Dyn. Syst. 23:2 (2024), pp. 1345-1371

  6. arXiv:2306.02273  [pdf, ps, other

    cs.CL cs.SD eess.AS

    End-to-End Joint Target and Non-Target Speakers ASR

    Authors: Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

    Abstract: This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applicatio… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted at Interspeech 2023

  7. Piecewise orthogonal collocation for computing periodic solutions of coupled delay equations

    Authors: Alessia andò, Dimitri Breda

    Abstract: We extend the piecewise orthogonal collocation method to computing periodic solutions of coupled renewal and delay differential equations. Through a rigorous error analysis, we prove convergence of the relevant finite-element method and provide a theoretical estimate of the error. We conclude with some numerical experiments to further support the theoretical results.

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: 32 pages, 6 figures, to appear in Appl. Numer. Math. arXiv admin note: substantial text overlap with arXiv:2105.09199

    MSC Class: 65L03; 65L10; 65L20; 65L60; 92D25

  8. arXiv:2301.10222  [pdf, other

    cs.CV cs.AI cs.LG

    RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving

    Authors: Angelika Ando, Spyros Gidaris, Andrei Bursuc, Gilles Puy, Alexandre Boulch, Renaud Marlet

    Abstract: Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other point cloud representations, achieve state-of-the-art results. Today, projection-based methods leverage 2D CNNs but recent advances in c… ▽ More

    Submitted 25 April, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: CVPR 2023. Code at https://github.com/valeoai/rangevit

  9. arXiv:2210.15937  [pdf, other

    cs.CL cs.SD eess.AS

    On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

    Authors: Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

    Abstract: This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted to SLT 2022

  10. arXiv:2207.04659  [pdf, other

    cs.SD eess.AS

    Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

    Authors: Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura

    Abstract: In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and automatic speech recognition (ASR), where a small amount of paired data and a large amount of unpaired text data are available. Conventional studies form a cycle called the TTS-ASR pipeline, where the multispeaker TTS model synthesizes speech from text with a reference speech and the ASR model reconstructs… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted to INTERSPEECH 2022

  11. arXiv:2203.11998  [pdf, other

    math.NA math.DS

    A pseudospectral method for investigating the stability of linear population models with two physiological structures

    Authors: Alessia Andò, Simone De Reggi, Davide Liessi, Francesca Scarabel

    Abstract: The asymptotic stability of the null equilibrium of a linear population model with two physiological structures formulated as a first-order hyperbolic PDE is determined by the spectrum of its infinitesimal generator. We propose an equivalent reformulation of the problem in the space of absolutely continuous functions in the sense of Carathéodory, so that the domain of the corresponding infinitesim… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    MSC Class: 37M99; 65L07; 65N25 (Primary) 35L04; 37N25; 47D06; 92D25 (Secondary)

    Journal ref: Mathematical Biosciences and Engineering 2023, Volume 20, Issue 3: 4493-4515

  12. Convergence analysis of collocation methods for computing periodic solutions of retarded functional differential equations

    Authors: Alessia Andò, Dimitri Breda

    Abstract: We analyze the convergence of piecewise collocation methods for computing periodic solutions of general retarded functional differential equations under the abstract framework recently developed in [S. Maset, Numer. Math. (2016) 133(3):525-555], [S. Maset, SIAM J. Numer. Anal. (2015) 53(6):2771--2793] and [S. Maset, SIAM J. Numer. Anal. (2015) 53(6):2794--2821]. We rigorously show that a reformula… ▽ More

    Submitted 26 November, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: 52 pages, 0 figures, short version to appear on SIAM J. Numer. Anal

    MSC Class: 65L03; 65L10; 65L20; 65L60

    Journal ref: SIAM J. Numer. Anal., 58 (2020), pp. 3010-3039

  13. arXiv:1903.12316  [pdf, other

    eess.AS cs.SD

    Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise -

    Authors: Yi Zhao, Atsushi Ando, Shinji Takaki, Junichi Yamagishi, Satoshi Kobashikawa

    Abstract: Speakers usually adjust their way of talking in noisy environments involuntarily for effective communication. This adaptation is known as the Lombard effect. Although speech accompanying the Lombard effect can improve the intelligibility of a speaker's voice, the changes in acoustic features (e.g. fundamental frequency, speech intensity, and spectral tilt) caused by the Lombard effect may also aff… ▽ More

    Submitted 9 April, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: Submitted to Interspeech 2019, Graz, Austria

  14. arXiv:1807.10879  [pdf, ps, other

    cond-mat.mtrl-sci

    Ultrafast Dynamics of Electron-phonon Coupling in Transition-metal Dichalcogenides

    Authors: Kotaro Makino, Yuta Saito, Shuuto Horii, Paul Fons, Alexander V. Kolobov, Atsushi Ando, Keiji Ueno, Richarj Mondal, Muneaki Hase

    Abstract: Time-domain femtosecond laser spectroscopic measurements of the ultrafast lattice dynamics in 2H-MoTe2 bulk crystals were carried out to understand the carrier-phonon interactions that govern electronic transport properties. An unusually long lifetime coherent A1g phonon mode was observed even in the presence of very large density of photo-excited carriers at room temperature. The decay rate was o… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

  15. Measurement and comparison of individual external doses of high-school students living in Japan, France, Poland and Belarus -- the "D-shuttle" project --

    Authors: N. Adachi, V. Adamovitch, Y. Adjovi, K. Aida, H. Akamatsu, S. Akiyama, A. Akli, A. Ando, T. Andrault, H. Antonietti, S. Anzai, G. Arkoun, C. Avenoso, D. Ayrault, M. Banasiewicz, M. Banaśkiewicz, L. Bernandini, E. Bernard, E. Berthet, M. Blanchard, D. Boreyko, K. Boros, S. Charron, P. Cornette, K. Czerkas , et al. (208 additional authors not shown)

    Abstract: Twelve high schools in Japan (of which six are in Fukushima Prefecture), four in France, eight in Poland and two in Belarus cooperated in the measurement and comparison of individual external doses in 2014. In total 216 high-school students and teachers participated in the study. Each participant wore an electronic personal dosimeter "D-shuttle" for two weeks, and kept a journal of his/her whereab… ▽ More

    Submitted 18 November, 2015; v1 submitted 21 June, 2015; originally announced June 2015.

  16. Enhancement of phonon effects in photoexcited states of one-dimensional Mott insulators

    Authors: Hiroaki Matsueda, Akihiro Ando, Takami Tohyama, Sadamichi Maekawa

    Abstract: We examine how the electron correlation affects the electron-phonon (EP) interaction in the linear optical absorption spectrum of the one-dimensional (1D) extended Hubbard-Holstein model. A density matrix renormalization group (DMRG) calculation shows that the effect of the EP interaction on an exciton is enhanced by increasing the on-site Coulomb repulsion. This enhancement is in contrast to th… ▽ More

    Submitted 27 February, 2008; originally announced February 2008.

    Comments: 5 pages, 2 figures

  17. arXiv:physics/0410205  [pdf

    physics.plasm-ph

    Development of supersonic plasma flows by use of a magnetic nozzle and an ICRF heating

    Authors: M. Inutake, A. Ando, K. Hattori, H. Tobari, Y. Hosokawa, R. Sato, M. Hatanaka, K. Harata

    Abstract: A high-beta, supersonic plasma flow plays a crucial role in MHD phenomena in space and fusion plasmas. There are a few experimental researches on production and control of a fast flowing plasma in spite of a growing significance in the magnetized-plasma flow dynamics. A magneto-plasma-dynamic arcjet (MPDA) is one of promising devices to produce a supersonic plasma flow and has been utilized as a… ▽ More

    Submitted 22 October, 2004; originally announced October 2004.

    Comments: 12th International Congress on Plasma Physics, 25-29 October 2004, Nice (France)