Search | arXiv e-print repository

NAIST Simultaneous Speech Translation System for IWSLT 2024

Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

Abstract: This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding poli… ▽ More This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt. The submitted models employ the LA policy because it outperformed the AlignAtt policy in previous models. Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module that incorporates a phoneme estimation model, a parallel acoustic model, and a parallel WaveGAN vocoder. We improved our incremental TTS by applying the Transformer architecture with the AlignAtt policy for the estimation model. The results show that our upgraded TTS module contributed to improving the system performance. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: IWSLT 2024 system paper

arXiv:2311.14353 [pdf, other]

Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation

Authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

Abstract: Simultaneous translation is a task in which the translation begins before the end of an input speech segment. Its evaluation should be conducted based on latency in addition to quality, and for users, the smallest possible amount of latency is preferable. Most existing metrics measure latency based on the start timings of partial translations and ignore their duration. This means such metrics do n… ▽ More Simultaneous translation is a task in which the translation begins before the end of an input speech segment. Its evaluation should be conducted based on latency in addition to quality, and for users, the smallest possible amount of latency is preferable. Most existing metrics measure latency based on the start timings of partial translations and ignore their duration. This means such metrics do not penalize the latency caused by long translation output, which delays the comprehension of users and subsequent translations. In this work, we propose a novel latency evaluation metric for simultaneous translation called \emph{Average Token Delay} (ATD) that focuses on the duration of partial translations. We demonstrate its effectiveness through analyses simulating user-side latency based on Ear-Voice Span (EVS). In our experiment, ATD had the highest correlation with EVS among baseline latency metrics under most conditions. △ Less

Submitted 27 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: Extended version of the paper (doi: 10.21437/Interspeech.2023-933) which appeared in INTERSPEECH 2023

arXiv:2306.08582 [pdf, other]

Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

Abstract: Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the… ▽ More Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the size of such SI data is limited, so the SI data should be used together with ordinary bilingual data whose translations are given in offline. In this paper, we propose an effective way to train a SimulST model using mixed data of SI and offline. The proposed method trains a single model using the mixed data with style tags that tell the model to generate SI- or offline-style outputs. Experiment results show improvements of BLEURT in different latency ranges, and our analyses revealed the proposed model generates SI-style outputs more than the baseline. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted to IWSLT2023 scientific paper

arXiv:2211.13173 [pdf, other]

Average Token Delay: A Latency Metric for Simultaneous Translation

Authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

Abstract: Simultaneous translation is a task in which translation begins before the speaker has finished speaking. In its evaluation, we have to consider the latency of the translation in addition to the quality. The latency is preferably as small as possible for users to comprehend what the speaker says with a small delay. Existing latency metrics focus on when the translation starts but do not consider ad… ▽ More Simultaneous translation is a task in which translation begins before the speaker has finished speaking. In its evaluation, we have to consider the latency of the translation in addition to the quality. The latency is preferably as small as possible for users to comprehend what the speaker says with a small delay. Existing latency metrics focus on when the translation starts but do not consider adequately when the translation ends. This means such metrics do not penalize the latency caused by a long translation output, which actually delays users' comprehension. In this work, we propose a novel latency evaluation metric called Average Token Delay (ATD) that focuses on the end timings of partial translations in simultaneous translation. We discuss the advantage of ATD using simulated examples and also investigate the differences between ATD and Average Lagging with simultaneous translation experiments. △ Less

Submitted 8 February, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

arXiv:2203.07622 [pdf, other]

The International Linear Collider: Report to Snowmass 2021

Authors: Alexander Aryshev, Ties Behnke, Mikael Berggren, James Brau, Nathaniel Craig, Ayres Freitas, Frank Gaede, Spencer Gessner, Stefania Gori, Christophe Grojean, Sven Heinemeyer, Daniel Jeans, Katja Kruger, Benno List, Jenny List, Zhen Liu, Shinichiro Michizono, David W. Miller, Ian Moult, Hitoshi Murayama, Tatsuya Nakada, Emilio Nanni, Mihoko Nojiri, Hasan Padamsee, Maxim Perelstein , et al. (487 additional authors not shown)

Abstract: The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This docu… ▽ More The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community. △ Less

Submitted 16 January, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 356 pages, Large pdf file (40 MB) submitted to Snowmass 2021; v2 references to Snowmass contributions added, additional authors; v3 references added, some updates, additional authors

Report number: DESY-22-045, IFT--UAM/CSIC--22-028, KEK Preprint 2021-61, PNNL-SA-160884, SLAC-PUB-17662

arXiv:2110.13480 [pdf, other]

Simultaneous Neural Machine Translation with Constituent Label Prediction

Authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

Abstract: Simultaneous translation is a task in which translation begins before the speaker has finished speaking, so it is important to decide when to start the translation process. However, deciding whether to read more input words or start to translate is difficult for language pairs with different word orders such as English and Japanese. Motivated by the concept of pre-reordering, we propose a couple o… ▽ More Simultaneous translation is a task in which translation begins before the speaker has finished speaking, so it is important to decide when to start the translation process. However, deciding whether to read more input words or start to translate is difficult for language pairs with different word orders such as English and Japanese. Motivated by the concept of pre-reordering, we propose a couple of simple decision rules using the label of the next constituent predicted by incremental constituent label prediction. In experiments on English-to-Japanese simultaneous translation, the proposed method outperformed baselines in the quality-latency trade-off. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: WMT2021

arXiv:2106.01597 [pdf, other]

ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation

Authors: Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Yoshinobu Kano, Kumari Deepshikha

Abstract: Despite the recent advancement in NLP research, cross-lingual transfer for natural language generation is relatively understudied. In this work, we transfer supervision from high resource language (HRL) to multiple low-resource languages (LRLs) for natural language generation (NLG). We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor genera… ▽ More Despite the recent advancement in NLP research, cross-lingual transfer for natural language generation is relatively understudied. In this work, we transfer supervision from high resource language (HRL) to multiple low-resource languages (LRLs) for natural language generation (NLG). We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor generation) and three syntactically diverse languages, i.e., English, Hindi, and Japanese. We propose an unsupervised cross-lingual language generation framework (called ZmBART) that does not use any parallel or pseudo-parallel/back-translated data. In this framework, we further pre-train mBART sequence-to-sequence denoising auto-encoder model with an auxiliary task using monolingual data of three languages. The objective function of the auxiliary task is close to the target tasks which enriches the multi-lingual latent representation of mBART and provides good initialization for target tasks. Then, this model is fine-tuned with task-specific supervised English data and directly evaluated with low-resource languages in the Zero-shot setting. To overcome catastrophic forgetting and spurious correlation issues, we applied freezing model component and data argumentation approaches respectively. This simple modeling approach gave us promising results.We experimented with few-shot training (with 1000 supervised data points) which boosted the model performance further. We performed several ablations and cross-lingual transferability analyses to demonstrate the robustness of ZmBART. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: Accepted in Findings of ACL-IJCNLP 2021

arXiv:2101.05269 [pdf, other]

doi 10.3847/1538-4357/abf7c4

Supernova Model Discrimination with Hyper-Kamiokande

Authors: Hyper-Kamiokande Collaboration, :, K. Abe, P. Adrich, H. Aihara, R. Akutsu, I. Alekseev, A. Ali, F. Ameli, I. Anghel, L. H. V. Anthony, M. Antonova, A. Araya, Y. Asaoka, Y. Ashida, V. Aushev, F. Ballester, I. Bandac, M. Barbi, G. J. Barker, G. Barr, M. Batkiewicz-Kwasniak, M. Bellato, V. Berardi, M. Bergevin , et al. (478 additional authors not shown)

Abstract: Core-collapse supernovae are among the most magnificent events in the observable universe. They produce many of the chemical elements necessary for life to exist and their remnants -- neutron stars and black holes -- are interesting astrophysical objects in their own right. However, despite millennia of observations and almost a century of astrophysical study, the explosion mechanism of core-colla… ▽ More Core-collapse supernovae are among the most magnificent events in the observable universe. They produce many of the chemical elements necessary for life to exist and their remnants -- neutron stars and black holes -- are interesting astrophysical objects in their own right. However, despite millennia of observations and almost a century of astrophysical study, the explosion mechanism of core-collapse supernovae is not yet well understood. Hyper-Kamiokande is a next-generation neutrino detector that will be able to observe the neutrino flux from the next galactic core-collapse supernova in unprecedented detail. We focus on the first 500 ms of the neutrino burst, corresponding to the accretion phase, and use a newly-developed, high-precision supernova event generator to simulate Hyper-Kamiokande's response to five different supernova models. We show that Hyper-Kamiokande will be able to distinguish between these models with high accuracy for a supernova at a distance of up to 100 kpc. Once the next galactic supernova happens, this ability will be a powerful tool for guiding simulations towards a precise reproduction of the explosion mechanism observed in nature. △ Less

Submitted 20 July, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: 21 pages, 7 figures. Article based on thesis published as arXiv:2002.01649. v2: added references and some explanations in response to reviewer comments

Journal ref: Astrophys.J. 916 (2021) 15

arXiv:2009.00794 [pdf, ps, other]

The Hyper-Kamiokande Experiment -- Snowmass LOI

Authors: Hyper-Kamiokande Collaboration, :, K. Abe, P. Adrich, H. Aihara, R. Akutsu, I. Alekseev, A. Ali, F. Ameli, L. H. V. Anthony, A. Araya, Y. Asaoka, V. Aushev, I. Bandac, M. Barbi, G. Barr, M. Batkiewicz-Kwasniak, M. Bellato, V. Berardi, L. Bernard, E. Bernardini, L. Berns, S. Bhadra, J. Bian, A. Blanchet , et al. (366 additional authors not shown)

Abstract: Hyper-Kamiokande is the next generation underground water Cherenkov detector that builds on the highly successful Super-Kamiokande experiment. The detector which has an 8.4~times larger effective volume than its predecessor will be located along the T2K neutrino beamline and utilize an upgraded J-PARC beam with 2.6~times beam power. Hyper-K's low energy threshold combined with the very large fiduc… ▽ More Hyper-Kamiokande is the next generation underground water Cherenkov detector that builds on the highly successful Super-Kamiokande experiment. The detector which has an 8.4~times larger effective volume than its predecessor will be located along the T2K neutrino beamline and utilize an upgraded J-PARC beam with 2.6~times beam power. Hyper-K's low energy threshold combined with the very large fiducial volume make the detector unique, that is expected to acquire an unprecedented exposure of 3.8~Mton$\cdot$year over a period of 20~years of operation. Hyper-Kamiokande combines an extremely diverse science program including nucleon decays, long-baseline neutrino oscillations, atmospheric neutrinos, and neutrinos from astrophysical origins. The scientific scope of this program is highly complementary to liquid-argon detectors for example in sensitivity to nucleon decay channels or supernova detection modes. Hyper-Kamiokande construction has started in early 2020 and the experiment is expected to start operations in 2027. The Hyper-Kamiokande collaboration is presently being formed amongst groups from 19 countries including the United States, whose community has a long history of making significant contributions to the neutrino physics program in Japan. US physicists have played leading roles in the Kamiokande, Super-Kamiokande, EGADS, K2K, and T2K programs. △ Less

Submitted 1 September, 2020; originally announced September 2020.

Comments: 6 pages, prepared as Snowmass2021 LOI

arXiv:1407.4971 [pdf, other]

Statistical Inference with Different Missing-data Mechanisms

Authors: Kosuke Morikawa, Yutaka Kano

Abstract: When data are missing due to at most one cause from some time to next time, we can make sampling distribution inferences about the parameter of the data by modeling the missing-data mechanism correctly. Proverbially, in case its mechanism is missing at random (MAR), it can be ignored, but in case not missing at random (NMAR), it can not be. There are no methods, however, to analyze when missing of… ▽ More When data are missing due to at most one cause from some time to next time, we can make sampling distribution inferences about the parameter of the data by modeling the missing-data mechanism correctly. Proverbially, in case its mechanism is missing at random (MAR), it can be ignored, but in case not missing at random (NMAR), it can not be. There are no methods, however, to analyze when missing of the data can occur because of several causes despite of there being many such data in practice. Hence the aim of this paper is to propose how to inference on such data. Concretely, we extend the missing-data indicator from usual binary random vectors to discrete random vectors, define missing-data mechanism for every causes and research ignorability of a mixture of missing-data mechanisms such as "MAR & MAR" and "MAR & NMAR". In particular, when the combination of mechanisms is "MAR & NMAR", generally the component of MAR can not be ignored, but in special case, it can be. △ Less

Submitted 18 July, 2014; originally announced July 2014.

arXiv:1405.3380 [pdf, other]

Identification Problem for The Analysis of Binary Data with Non-ignorable Missing

Authors: Kosuke Morikawa, Yutaka Kano

Abstract: When a missing-data mechanism is NMAR or non-ignorable, missingness is itself vital information and it must be taken into the likelihood, which, however, needs to introduce additional parameters to be estimated. The incompleteness of the data and introduction of more parameters can cause the identification problem. When a response variable is binary, it becomes a more serious problem because of le… ▽ More When a missing-data mechanism is NMAR or non-ignorable, missingness is itself vital information and it must be taken into the likelihood, which, however, needs to introduce additional parameters to be estimated. The incompleteness of the data and introduction of more parameters can cause the identification problem. When a response variable is binary, it becomes a more serious problem because of less information of bi- nary data, however, there are no methods to briefly verify whether a mode is identified or not. Therefore, we provide a new necessary and sufficient condition to easily check model identifiability when analyzing binary data with non-ignorable missing by condi- tional models. This condition can give us what condition is needed for a model to have identifiability as well as make easily check the identifiability of a model. △ Less

Submitted 14 May, 2014; originally announced May 2014.

arXiv:1405.0654 [pdf, ps, other]

Reeb orbits trapped by Denjoy minimal sets

Authors: Takahiro Arai, Takashi Inaba, Yosuke Kano

Abstract: Let $\varphi$ be any flow on $T^n$ obtained as the suspension of a diffeomorphism of $T^{n-1}$ and let $\mathcal A$ be any compact invariant set of $\varphi$. We realize $(\mathcal A, \varphi|_{\mathcal A})$ up to reparametrization as an invariant set of the Reeb flow of a contact form on $\mathbb R^{2n+1}$ equal to the standard contact form outside a compact set and defining the standard contact… ▽ More Let $\varphi$ be any flow on $T^n$ obtained as the suspension of a diffeomorphism of $T^{n-1}$ and let $\mathcal A$ be any compact invariant set of $\varphi$. We realize $(\mathcal A, \varphi|_{\mathcal A})$ up to reparametrization as an invariant set of the Reeb flow of a contact form on $\mathbb R^{2n+1}$ equal to the standard contact form outside a compact set and defining the standard contact structure on all of $\mathbb R^{2n+1}$. This generalizes the construction of Geiges, Röttgen and Zehmisch. △ Less

Submitted 1 July, 2014; v1 submitted 4 May, 2014; originally announced May 2014.

Comments: 7 pages. Two coauthors newly joined. Construction of the function improved

MSC Class: 37C27; 53D10; 57R30

arXiv:1312.5458 [pdf, ps, other]

Full information maximum likelihood estimation in factor analysis with a lot of missing values

Authors: Kei Hirose, Sunyong Kim, Yutaka Kano, Miyuki Imada, Manabu Yoshida, Masato Matsuo

Abstract: We consider the problem of full information maximum likelihood (FIML) estimation in a factor analysis model when a majority of the data values are missing. The expectation-maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on observed variables are included in complete data. However, the EM algorithm has an extremely high computational cost when the n… ▽ More We consider the problem of full information maximum likelihood (FIML) estimation in a factor analysis model when a majority of the data values are missing. The expectation-maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on observed variables are included in complete data. However, the EM algorithm has an extremely high computational cost when the number of observations is large and/or plenty of missing values are involved. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on observed variables as a part of complete data. Our algorithm is applied to a real data set collected from a Web questionnaire that asks about first impressions of human; almost $90\%$ of the data values are missing. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy even if the number of observations is large. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes. △ Less

Submitted 19 December, 2013; originally announced December 2013.

Comments: 19 pages, 2 figures

arXiv:1207.1413 [pdf]

Discovery of non-gaussian linear causal models using ICA

Authors: Shohei Shimizu, Aapo Hyvarinen, Yutaka Kano, Patrik O. Hoyer

Abstract: In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data,… ▽ More In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-gaussian distributions of non-zero variances. The solution relies on the use of the statistical method known as independent component analysis (ICA), and does not require any pre-specified time-ordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear Non-Gaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

Report number: UAI-P-2005-PG-525-533

arXiv:1203.2413 [pdf, ps, other]

doi 10.2140/pjm.2014.267.399

Taut foliations and the actions of fundamental groups on leaf spaces and universal circles

Authors: Yosuke Kano

Abstract: Let $F$ be a leafwise hyperbolic taut foliation of a closed 3-manifold $M$ and let $L$ be the leaf space of the pullback of $F$ to the universal cover of $M$. We show that if $F$ has branching, then the natural action of $π_1(M)$ on $L$ is faithful. We also show that if $F$ has a finite branch locus $B$ whose stabilizer acts on $B$ nontrivially, then the stabilizer is an infinite cyclic group gene… ▽ More Let $F$ be a leafwise hyperbolic taut foliation of a closed 3-manifold $M$ and let $L$ be the leaf space of the pullback of $F$ to the universal cover of $M$. We show that if $F$ has branching, then the natural action of $π_1(M)$ on $L$ is faithful. We also show that if $F$ has a finite branch locus $B$ whose stabilizer acts on $B$ nontrivially, then the stabilizer is an infinite cyclic group generated by an indivisible element of $π_1(M)$. △ Less

Submitted 12 March, 2012; originally announced March 2012.

Comments: 16 pages, 2 figures

MSC Class: 57M60

Journal ref: Pacific J. Math. 267 (2014) 399-416

Showing 1–15 of 15 results for author: Kano, Y