Search | arXiv e-print repository

YODAS: Youtube-Oriented Dataset for Audio and Speech

Authors: Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe

Abstract: In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets ar… ▽ More In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets are apt for self-supervised learning applications. YODAS is distinctive as the first publicly available dataset of its scale, and it is distributed under a Creative Commons license. We introduce the collection methodology utilized for YODAS, which contributes to the large-scale speech dataset construction. Subsequently, we provide a comprehensive analysis of speech, text contained within the dataset. Finally, we describe the speech recognition baselines over the top-15 languages. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: ASRU 2023

arXiv:2402.18932 [pdf, other]

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

Authors: Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

Abstract: Collecting high-quality studio recordings of audio is challenging, which limits the language coverage of text-to-speech (TTS) systems. This paper proposes a framework for scaling a multilingual TTS model to 100+ languages using found data without supervision. The proposed framework combines speech-text encoder pretraining with unsupervised training using untranscribed speech and unspoken text data… ▽ More Collecting high-quality studio recordings of audio is challenging, which limits the language coverage of text-to-speech (TTS) systems. This paper proposes a framework for scaling a multilingual TTS model to 100+ languages using found data without supervision. The proposed framework combines speech-text encoder pretraining with unsupervised training using untranscribed speech and unspoken text data sources, thereby leveraging massively multilingual joint speech and text representation learning. Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10% to ground truth). With just 15 minutes of transcribed, found data, we can reduce the intelligibility difference to 1% or less from the ground-truth, and achieve naturalness scores that match the ground-truth in several languages. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: To appear in ICASSP 2024

arXiv:2401.16812 [pdf, other]

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Authors: Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari

Abstract: While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes reference-aware automatic evaluation methods for speech generation inspired by evaluation metrics in natural language processing. The proposed SpeechBERTS… ▽ More While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes reference-aware automatic evaluation methods for speech generation inspired by evaluation metrics in natural language processing. The proposed SpeechBERTScore computes the BERTScore for self-supervised dense speech features of the generated and reference speech, which can have different sequential lengths. We also propose SpeechBLEU and SpeechTokenDistance, which are computed on speech discrete tokens. The evaluations on synthesized speech show that our method correlates better with human subjective ratings than mel cepstral distortion and a recent mean opinion score prediction model. Also, they are effective in noisy speech evaluation and have cross-lingual applicability. △ Less

Submitted 12 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by Interspeech 2024. An extended version with Appendix. Code: https://github.com/Takaaki-Saeki/DiscreteSpeechMetrics

arXiv:2309.08127 [pdf, other]

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

Authors: Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari

Abstract: This paper proposes a method for extracting a lightweight subset from a text-to-speech (TTS) corpus ensuring synthetic speech quality. In recent years, methods have been proposed for constructing large-scale TTS corpora by collecting diverse data from massive sources such as audiobooks and YouTube. Although these methods have gained significant attention for enhancing the expressive capabilities o… ▽ More This paper proposes a method for extracting a lightweight subset from a text-to-speech (TTS) corpus ensuring synthetic speech quality. In recent years, methods have been proposed for constructing large-scale TTS corpora by collecting diverse data from massive sources such as audiobooks and YouTube. Although these methods have gained significant attention for enhancing the expressive capabilities of TTS systems, they often prioritize collecting vast amounts of data without considering practical constraints like storage capacity and computation time in training, which limits the available data quantity. Consequently, the need arises to efficiently collect data within these volume constraints. To address this, we propose a method for selecting the core subset~(known as \textit{core-set}) from a TTS corpus on the basis of a \textit{diversity metric}, which measures the degree to which a subset encompasses a wide range. Experimental results demonstrate that our proposed method performs significantly better than the baseline phoneme-balanced data selection across language and corpus size. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2302.13652 [pdf, ps, other]

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Authors: Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

Abstract: Pause insertion, also known as phrase break prediction and phrasing, is an essential part of TTS systems because proper pauses with natural duration significantly enhance the rhythm and intelligibility of synthetic speech. However, conventional phrasing models ignore various speakers' different styles of inserting silent pauses, which can degrade the performance of the model trained on a multi-spe… ▽ More Pause insertion, also known as phrase break prediction and phrasing, is an essential part of TTS systems because proper pauses with natural duration significantly enhance the rhythm and intelligibility of synthetic speech. However, conventional phrasing models ignore various speakers' different styles of inserting silent pauses, which can degrade the performance of the model trained on a multi-speaker speech corpus. To this end, we propose more powerful pause insertion frameworks based on a pre-trained language model. Our approach uses bidirectional encoder representations from transformers (BERT) pre-trained on a large-scale text corpus, injecting speaker embedding to capture various speaker characteristics. We also leverage duration-aware pause insertion for more natural multi-speaker TTS. We develop and evaluate two types of models. The first improves conventional phrasing models on the position prediction of respiratory pauses (RPs), i.e., silent pauses at word transitions without punctuation. It performs speaker-conditioned RP prediction considering contextual information and is used to demonstrate the effect of speaker information on the prediction. The second model is further designed for phoneme-based TTS models and performs duration-aware pause insertion, predicting both RPs and punctuation-indicated pauses (PIPs) that are categorized by duration. The evaluation results show that our models improve the precision and recall of pause insertion and the rhythm of synthetic speech. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP2023

arXiv:2301.12596 [pdf, other]

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Authors: Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource la… ▽ More While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource languages for which only textual resources are available, making TTS accessible to thousands of languages. Inspired by the strong cross-lingual transferability of multilingual language models, our framework first performs masked language model pretraining with multilingual text-only data. Then we train this model with a paired data in a supervised manner, while freezing a language-aware embedding layer. This allows inference even for languages not included in the paired data but present in the text-only data. Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language. △ Less

Submitted 27 May, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

Comments: To appear in IJCAI 2023

arXiv:2212.04559 [pdf, other]

SpeechLMScore: Evaluating speech generation using speech language model

Authors: Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe

Abstract: While human evaluation is the most reliable metric for evaluating speech generation systems, it is generally costly and time-consuming. Previous studies on automatic speech quality assessment address the problem by predicting human evaluation scores with machine learning models. However, they rely on supervised learning and thus suffer from high annotation costs and domain-shift problems. We propo… ▽ More While human evaluation is the most reliable metric for evaluating speech generation systems, it is generally costly and time-consuming. Previous studies on automatic speech quality assessment address the problem by predicting human evaluation scores with machine learning models. However, they rely on supervised learning and thus suffer from high annotation costs and domain-shift problems. We propose SpeechLMScore, an unsupervised metric to evaluate generated speech using a speech-language model. SpeechLMScore computes the average log-probability of a speech signal by map** it into discrete tokens and measures the average probability of generating the sequence of tokens. Therefore, it does not require human annotation and is a highly scalable framework. Evaluation results demonstrate that the proposed metric shows a promising correlation with human evaluation scores on different speech generation tasks including voice conversion, text-to-speech, and speech enhancement. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2210.15447 [pdf, other]

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

Authors: Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran

Abstract: This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. Existing multilingual TTS typically supports tens of languages, which are a small fraction of the thousands of languages in the world. One difficulty to scale multilingual TTS to hundreds of languages is collecting high-quality speech-text paired da… ▽ More This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. Existing multilingual TTS typically supports tens of languages, which are a small fraction of the thousands of languages in the world. One difficulty to scale multilingual TTS to hundreds of languages is collecting high-quality speech-text paired data in low-resource languages. This study extends Maestro, a speech-text joint pretraining framework for automatic speech recognition (ASR), to speech generation tasks. To train a TTS model from various types of speech and text data, different training schemes are designed to handle supervised (paired TTS and ASR data) and unsupervised (untranscribed speech and unspoken text) datasets. Experimental evaluation shows that 1) multilingual TTS models trained on Virtuoso can achieve significantly better naturalness and intelligibility than baseline ones in seen languages, and 2) they can synthesize reasonably intelligible and naturally sounding speech for unseen languages where no high-quality paired TTS data is available. △ Less

Submitted 15 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: To appear in ICASSP 2023

arXiv:2210.14850 [pdf, other]

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

Authors: Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari

Abstract: This paper proposes a method for selecting training data for text-to-speech (TTS) synthesis from dark data. TTS models are typically trained on high-quality speech corpora that cost much time and money for data collection, which makes it very challenging to increase speaker variation. In contrast, there is a large amount of data whose availability is unknown (a.k.a, "dark data"), such as YouTube v… ▽ More This paper proposes a method for selecting training data for text-to-speech (TTS) synthesis from dark data. TTS models are typically trained on high-quality speech corpora that cost much time and money for data collection, which makes it very challenging to increase speaker variation. In contrast, there is a large amount of data whose availability is unknown (a.k.a, "dark data"), such as YouTube videos. To utilize data other than TTS corpora, previous studies have selected speech data from the corpora on the basis of acoustic quality. However, considering that TTS models robust to data noise have been proposed, we should select data on the basis of its importance as training data to the given TTS model, not the quality of speech itself. Our method with a loop of training and evaluation selects training data on the basis of the automatically predicted quality of synthetic speech of a given TTS model. Results of evaluations using YouTube data reveal that our method outperforms the conventional acoustic-quality-based method. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2210.09815 [pdf, other]

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion

Authors: Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: We present a training method with linguistic speech regularization that improves the robustness of spontaneous speech synthesis methods with filled pause (FP) insertion. Spontaneous speech synthesis is aimed at producing speech with human-like disfluencies, such as FPs. Because modeling the complex data distribution of spontaneous speech with a rich FP vocabulary is challenging, the quality of FP-… ▽ More We present a training method with linguistic speech regularization that improves the robustness of spontaneous speech synthesis methods with filled pause (FP) insertion. Spontaneous speech synthesis is aimed at producing speech with human-like disfluencies, such as FPs. Because modeling the complex data distribution of spontaneous speech with a rich FP vocabulary is challenging, the quality of FP-inserted synthetic speech is often limited. To address this issue, we present a method for synthesizing spontaneous speech that improves robustness to diverse FP insertions. Regularization is used to stabilize the synthesis of the linguistic speech (i.e., non-FP) elements. To further improve robustness to diverse FP insertions, it utilizes pseudo-FPs sampled using an FP word prediction model as well as ground-truth FPs. Our experiments demonstrated that the proposed method improves the naturalness of synthetic speech with ground-truth and predicted FPs by 0.24 and 0.26, respectively. △ Less

Submitted 19 September, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Accepted to SSW12

arXiv:2210.07559 [pdf, ps, other]

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

Authors: Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: We present a comprehensive empirical study for personalized spontaneous speech synthesis on the basis of linguistic knowledge. With the advent of voice cloning for reading-style speech synthesis, a new voice cloning paradigm for human-like and spontaneous speech synthesis is required. We, therefore, focus on personalized spontaneous speech synthesis that can clone both the individual's voice timbr… ▽ More We present a comprehensive empirical study for personalized spontaneous speech synthesis on the basis of linguistic knowledge. With the advent of voice cloning for reading-style speech synthesis, a new voice cloning paradigm for human-like and spontaneous speech synthesis is required. We, therefore, focus on personalized spontaneous speech synthesis that can clone both the individual's voice timbre and speech disfluency. Specifically, we deal with filled pauses, a major source of speech disfluency, which is known to play an important role in speech generation and communication in psychology and linguistics. To comparatively evaluate personalized filled pause insertion and non-personalized filled pause prediction methods, we developed a speech synthesis method with a non-personalized external filled pause predictor trained with a multi-speaker corpus. The results clarify the position-word entanglement of filled pauses, i.e., the necessity of precisely predicting positions for naturalness and the necessity of precisely predicting words for individuality on the evaluation of synthesized speech. △ Less

Submitted 19 September, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted to APSIPA ASC 2022

arXiv:2204.02536 [pdf, other]

Next-Generation Superconducting RF Technology based on Advanced Thin Film Technologies and Innovative Materials for Accelerator Enhanced Performance and Energy Reach

Authors: A. - M. Valente-Feliciano, C. Antoine, S. Anlage, G. Ciovati, J. Delayen, F. Gerigk, A. Gurevich, T. Junginger, S. Keckert, G. Keppe, J. Knobloch, T. Kubo, O. Kugeler, D. Manos, C. Pira, T. Proslier, U. Pudasaini, C. E. Reece, R. A. Rimmer, G. J. Rosaz, T. Saeki, R. Vaglio, R. Valizadeh, H. Vennekate, W. Venturini Delsolaro , et al. (3 additional authors not shown)

Abstract: Superconducting RF is a key technology for future particle accelerators, now relying on advanced surfaces beyond bulk Nb for a leap in performance and efficiency. The SRF thin film strategy aims at transforming the current SRF technology by using highly functional materials, addressing all the necessary functions. The community is deploying efforts in three research thrusts to develop next-generat… ▽ More Superconducting RF is a key technology for future particle accelerators, now relying on advanced surfaces beyond bulk Nb for a leap in performance and efficiency. The SRF thin film strategy aims at transforming the current SRF technology by using highly functional materials, addressing all the necessary functions. The community is deploying efforts in three research thrusts to develop next-generation thin-film based cavities. Nb on Cu cavities are developed to perform as good as or better than bulk Nb at reduced cost and with better thermal stability. Recent results showing improved accelerating field and dramatically reduced Q slope show their potential for many applications. The second research thrust is to develop cavities coated with materials that can operate at higher temperatures or sustain higher fields. Proof of principle has been established for the merit of Nb3Sn for SRF application. Research is now needed to further exploit the material and reach its full potential with novel deposition techniques. The third line of research is to push SRF performance beyond the capabilities of the superconductors alone with multilayered coatings. In parallel, developments are needed to provide quality substrates, cooling schemes and cryomodule design tailored to thin film cavities. Recent results in these three research thrusts suggest that SRF thin film technologies are at the eve of a technological revolution. For them to mature, active community support and sustained funding are needed to address fundamental developments supporting material deposition techniques, surface and RF research, technical challenges associated with scaling and industrialization. With dedicated and sustained investment, next-generation thin-film based cavities will become a reality with high performance and efficiency, facilitating energy sustainable science while enabling higher luminosity, and higher energy. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: Contribution to Snowmass 2021

arXiv:2204.02152 [pdf, other]

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

Authors: Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to VoiceMOS Challenge 2022. The challenge is to predict the MOS values of speech samples collected from previous Blizzard Challenges and Voice Conversion Challenges for two tracks: a main track for in-domain prediction and an out-of-domain (OOD) track for which there is less labeled data from different listening tes… ▽ More We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to VoiceMOS Challenge 2022. The challenge is to predict the MOS values of speech samples collected from previous Blizzard Challenges and Voice Conversion Challenges for two tracks: a main track for in-domain prediction and an out-of-domain (OOD) track for which there is less labeled data from different listening tests. Our system is based on ensemble learning of strong and weak learners. Strong learners incorporate several improvements to the previous fine-tuning models of self-supervised learning (SSL) models, while weak learners use basic machine-learning methods to predict scores from SSL features. In the Challenge, our system had the highest score on several metrics for both the main and OOD tracks. In addition, we conducted ablation studies to investigate the effectiveness of our proposed methods. △ Less

Submitted 29 June, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Accepted to INTERSPEECH 2022

arXiv:2203.15683 [pdf, other]

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning

Authors: Takaaki Saeki, Kentaro Tachibana, Ryuichi Yamamoto

Abstract: Most text-to-speech (TTS) methods use high-quality speech corpora recorded in a well-designed environment, incurring a high cost for data collection. To solve this problem, existing noise-robust TTS methods are intended to use noisy speech corpora as training data. However, they only address either time-invariant or time-variant noises. We propose a degradation-robust TTS method, which can be trai… ▽ More Most text-to-speech (TTS) methods use high-quality speech corpora recorded in a well-designed environment, incurring a high cost for data collection. To solve this problem, existing noise-robust TTS methods are intended to use noisy speech corpora as training data. However, they only address either time-invariant or time-variant noises. We propose a degradation-robust TTS method, which can be trained on speech corpora that contain both additive noises and environmental distortions. It jointly represents the time-variant additive noises with a frame-level encoder and the time-invariant environmental distortions with an utterance-level encoder. We also propose a regularization method to attain clean environmental embedding that is disentangled from the utterance-dependent information such as linguistic contents and speaker characteristics. Evaluation results show that our method achieved significantly higher-quality synthetic speech than previous methods in the condition including both additive noise and reverberation. △ Less

Submitted 29 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted to INTERSPEECH 2022

arXiv:2203.14725 [pdf, other]

vTTS: visual-text to speech

Authors: Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, Hiroshi Saruwatari

Abstract: This paper proposes visual-text to speech (vTTS), a method for synthesizing speech from visual text (i.e., text as an image). Conventional TTS converts phonemes or characters into discrete symbols and synthesizes a speech waveform from them, thus losing the visual features that the characters essentially have. Therefore, our method synthesizes speech not from discrete symbols but from visual text.… ▽ More This paper proposes visual-text to speech (vTTS), a method for synthesizing speech from visual text (i.e., text as an image). Conventional TTS converts phonemes or characters into discrete symbols and synthesizes a speech waveform from them, thus losing the visual features that the characters essentially have. Therefore, our method synthesizes speech not from discrete symbols but from visual text. The proposed vTTS extracts visual features with a convolutional neural network and then generates acoustic features with a non-autoregressive model inspired by FastSpeech2. Experimental results show that 1) vTTS is capable of generating speech with naturalness comparable to or better than a conventional TTS, 2) it can transfer emphasis and emotion attributes in visual text to speech without additional labels and architectures, and 3) it can synthesize more natural and intelligible speech from unseen and rare characters than conventional TTS. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: submitted to interspech 2022

arXiv:2203.12937 [pdf, other]

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Authors: Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari

Abstract: We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora, it cannot sufficiently represent acoustic distortions of real data, limiting the applicability. Our model consists of analysis, synthesis, and channel modules… ▽ More We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora, it cannot sufficiently represent acoustic distortions of real data, limiting the applicability. Our model consists of analysis, synthesis, and channel modules that simulate the recording process of degraded speech and is trained with real degraded speech data in a self-supervised manner. The analysis module extracts distortionless speech features and distortion features from degraded speech, while the synthesis module synthesizes the restored speech waveform, and the channel module adds distortions to the speech waveform. Our model also enables audio effect transfer, in which only acoustic distortions are extracted from degraded speech and added to arbitrary high-quality audio. Experimental evaluations with both simulated and real data show that our method achieves significantly higher-quality speech restoration than the previous supervised method, suggesting its applicability to real degraded speech materials. △ Less

Submitted 27 June, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Accepted to INTERSPEECH 2022

arXiv:2203.09961 [pdf, other]

Personalized Filled-pause Generation with Group-wise Prediction Models

Authors: Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: In this paper, we propose a method to generate personalized filled pauses (FPs) with group-wise prediction models. Compared with fluent text generation, disfluent text generation has not been widely explored. To generate more human-like texts, we addressed disfluent text generation. The usage of disfluency, such as FPs, rephrases, and word fragments, differs from speaker to speaker, and thus, the… ▽ More In this paper, we propose a method to generate personalized filled pauses (FPs) with group-wise prediction models. Compared with fluent text generation, disfluent text generation has not been widely explored. To generate more human-like texts, we addressed disfluent text generation. The usage of disfluency, such as FPs, rephrases, and word fragments, differs from speaker to speaker, and thus, the generation of personalized FPs is required. However, it is difficult to predict them because of the sparsity of position and the frequency difference between more and less frequently used FPs. Moreover, it is sometimes difficult to adapt FP prediction models to each speaker because of the large variation of the tendency within each speaker. To address these issues, we propose a method to build group-dependent prediction models by grou** speakers on the basis of their tendency to use FPs. This method does not require a large amount of data and time to train each speaker model. We further introduce a loss function and a word embedding model suitable for FP prediction. Our experimental results demonstrate that group-dependent models can predict FPs with higher scores than a non-personalized one and the introduced loss function and word embedding model improve the prediction performance. △ Less

Submitted 22 April, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted to LREC 2022

arXiv:2203.07622 [pdf, other]

The International Linear Collider: Report to Snowmass 2021

Authors: Alexander Aryshev, Ties Behnke, Mikael Berggren, James Brau, Nathaniel Craig, Ayres Freitas, Frank Gaede, Spencer Gessner, Stefania Gori, Christophe Grojean, Sven Heinemeyer, Daniel Jeans, Katja Kruger, Benno List, Jenny List, Zhen Liu, Shinichiro Michizono, David W. Miller, Ian Moult, Hitoshi Murayama, Tatsuya Nakada, Emilio Nanni, Mihoko Nojiri, Hasan Padamsee, Maxim Perelstein , et al. (487 additional authors not shown)

Abstract: The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This docu… ▽ More The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community. △ Less

Submitted 16 January, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 356 pages, Large pdf file (40 MB) submitted to Snowmass 2021; v2 references to Snowmass contributions added, additional authors; v3 references added, some updates, additional authors

Report number: DESY-22-045, IFT--UAM/CSIC--22-028, KEK Preprint 2021-61, PNNL-SA-160884, SLAC-PUB-17662

arXiv:2203.07371 [pdf]

doi 10.1088/1748-0221/18/04/T04005

Medium-Grain Niobium SRF Cavity Production Technology for Science Frontiers and Accelerator Applications

Authors: G. Myneni, Hani E. Elsayed-Ali, Md Obidul Islam, Md Nizam Sayeed, G. Ciovati, P. Dhakal, R. A. Rimmer, M. Carl, A. Fajardo, N. Lannoy, B. Khanal, T. Dohmae, A. Kumar, T. Saeki, K. Umemori, M. Yamanaka, S. Michizono, A. Yamamoto

Abstract: We propose cost-effective production of medium grain (MG) niobium (Nb) discs directly sliced from forged and annealed billet. This production method provides clean surface conditions and reliable mechanical characteristics with sub-millimeter average grain size resulting in stable SRF cavity production. We propose to apply this material to particle accelerator applications in the science and indus… ▽ More We propose cost-effective production of medium grain (MG) niobium (Nb) discs directly sliced from forged and annealed billet. This production method provides clean surface conditions and reliable mechanical characteristics with sub-millimeter average grain size resulting in stable SRF cavity production. We propose to apply this material to particle accelerator applications in the science and industrial frontiers. The science applications require high field gradients (>~40 MV/m) particularly in pulse mode. The industrial applications require high Q0 values with moderate gradients (~30 MV/m) in CW mode operation. This report describes the MG Nb disc production recently demonstrated and discusses future prospects for application in advanced particle accelerators in the science and industrial frontiers. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: Contribution to Snowmass 2021

arXiv:2112.09323 [pdf, other]

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

Authors: Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, Shinji Watanabe

Abstract: In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can autom… ▽ More In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can automatically filter the videos and subtitles with almost no language-dependent processes. We consistently employ Connectionist Temporal Classification (CTC)-based techniques for automatic speech recognition (ASR) and a speaker variation-based method for automatic speaker verification (ASV). We build 1) a large-scale Japanese ASR benchmark with more than 1,300 hours of data and 2) 900 hours of data for Japanese ASV. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: Submitted to ICASSP2022

arXiv:2110.07840 [pdf, other]

ESPnet2-TTS: Extending the Edge of TTS Research

Authors: Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

Abstract: This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance T… ▽ More This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance TTS performance. The unified design of our recipes enables users to quickly reproduce state-of-the-art E2E-TTS results. We also provide many pre-trained models in a unified Python interface for inference, offering a quick means for users to generate baseline samples and build demos. Experimental evaluations with English and Japanese corpora demonstrate that our provided models synthesize utterances comparable to ground-truth ones, achieving state-of-the-art TTS performance. The toolkit is available online at https://github.com/espnet/espnet. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Submitted to ICASSP2022. Demo HP: https://espnet.github.io/icassp2022-tts/

arXiv:2109.10724 [pdf, other]

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Authors: Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: Incremental text-to-speech (TTS) synthesis generates utterances in small linguistic units for the sake of real-time and low-latency applications. We previously proposed an incremental TTS method that leverages a large pre-trained language model to take unobserved future context into account without waiting for the subsequent segment. Although this method achieves comparable speech quality to that… ▽ More Incremental text-to-speech (TTS) synthesis generates utterances in small linguistic units for the sake of real-time and low-latency applications. We previously proposed an incremental TTS method that leverages a large pre-trained language model to take unobserved future context into account without waiting for the subsequent segment. Although this method achieves comparable speech quality to that of a method that waits for the future context, it entails a huge amount of processing for sampling from the language model at each time step. In this paper, we propose an incremental TTS method that directly predicts the unobserved future context with a lightweight model, instead of sampling words from the large-scale language model. We perform knowledge distillation from a GPT2-based context prediction network into a simple recurrent model by minimizing a teacher-student loss defined between the context embedding vectors of those models. Experimental results show that the proposed method requires about ten times less inference time to achieve comparable synthetic speech quality to that of our previous method, and it can perform incremental synthesis much faster than the average speaking speed of human English speakers, demonstrating the availability of our method to real-time applications. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: Accepted for ASRU2021

arXiv:2105.06025 [pdf]

Machine-learning-based investigation on classifying binary and multiclass behavior outcomes of children with PIMD/SMID

Authors: Von Ralph Dane Marquez Herbuela, Tomonori Karita, Yoshiya Furukawa, Yoshinori Wada, Yoshihiro Yagi, Shuichiro Senba, Eiko Onishi, Tatsuo Saeki

Abstract: Recently, the importance of weather parameters and location information to better understand the context of the communication of children with profound intellectual and multiple disabilities (PIMD) or severe motor and intellectual disorders (SMID) has been proposed. However, an investigation on whether these data can be used to classify their behavior for system optimization aimed for predicting t… ▽ More Recently, the importance of weather parameters and location information to better understand the context of the communication of children with profound intellectual and multiple disabilities (PIMD) or severe motor and intellectual disorders (SMID) has been proposed. However, an investigation on whether these data can be used to classify their behavior for system optimization aimed for predicting their behavior for independent communication and mobility has not been done. Thus, this study investigates whether recalibrating the datasets including either minor or major behavior categories or both, combining location and weather data and feature selection method training (Boruta) would allow more accurate classification of behavior discriminated to binary and multiclass classification outcomes using eXtreme Gradient Boosting (XGB), support vector machine (SVM), random forest (RF), and neural network (NN) classifiers. Multiple single-subject face-to-face and video-recorded sessions were conducted among 20 purposively sampled 8 to 10 -year old children diagnosed with PIMD/SMID or severe or profound intellectual disabilities and their caregivers. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2012.12612 [pdf, ps, other]

doi 10.1109/LSP.2021.3073869

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Authors: Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech. Incremental TTS is generally subject to a trade-off between latency and synthetic speech quality. It is challenging to produce high-quality speech with a low-latency setup that does not make much use of an unobserved future sentence (… ▽ More This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech. Incremental TTS is generally subject to a trade-off between latency and synthetic speech quality. It is challenging to produce high-quality speech with a low-latency setup that does not make much use of an unobserved future sentence (hereafter, "lookahead"). To resolve this issue, we propose an incremental TTS method that uses a pseudo lookahead generated with a language model to take the future contextual information into account without increasing latency. Our method can be regarded as imitating a human's incremental reading and uses pretrained GPT2, which accounts for the large-scale linguistic knowledge, for the lookahead generation. Evaluation results show that our method 1) achieves higher speech quality than the method taking only observed information into account and 2) achieves a speech quality equivalent to waiting for the future context observation. △ Less

Submitted 14 April, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: Accepted for IEEE Signal Processing Letters

arXiv:2009.00260 [pdf]

Children with PIMD/SMID expressive behaviors: Development and testing of ChildSIDE app, the first step for independent communication and mobility

Authors: Von Ralph Dane Marquez Herbuela, Tomonori Karita, Yoshiya Furukawa, Yoshinori Wada, Shuichiro Senba, Eiko Onishi, Tatsuo Saeki

Abstract: Children with profound intellectual and multiple disabilities or severe motor and intellectual disabilities only communicate through movements, vocalizations, body postures, muscle tensions, or facial expressions on a pre- or protosymbolic level. Yet, to the best of our knowledge, hardly any system has been developed to interpret their expressive behaviors. This paper describes the design, develop… ▽ More Children with profound intellectual and multiple disabilities or severe motor and intellectual disabilities only communicate through movements, vocalizations, body postures, muscle tensions, or facial expressions on a pre- or protosymbolic level. Yet, to the best of our knowledge, hardly any system has been developed to interpret their expressive behaviors. This paper describes the design, development, and testing of ChildSIDE in collecting behaviors of children and transmitting location and environmental data to the database. The movements associated with each behavior were also identified for future system development. ChildSIDE app was pilot tested among purposively recruited child-caregiver dyads. ChildSIDE was more likely to collect correct behavior data than paper-based method and it had >93% in detecting and transmitting location and environment data except for iBeacon data. Behaviors were manifested mainly through hand and body movements and vocalizations. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2002.06778 [pdf, other]

Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

Authors: Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Abstract: In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. The conventional method with a minimum-phase filter achieves high-quality conversion but requires heavy computation in filtering. This is because the minimum phase using a fixed lifter of the Hilbert transform often re… ▽ More In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. The conventional method with a minimum-phase filter achieves high-quality conversion but requires heavy computation in filtering. This is because the minimum phase using a fixed lifter of the Hilbert transform often results in a long-tap filter. One of our methods is a data-driven method for lifter training. Since this method takes filter truncation into account in training, it can shorten the tap length of the filter while preserving conversion accuracy. Our other method is sub-band processing for extending the conventional method from narrow-band (16 kHz) to full-band (48 kHz) VC, which can convert a full-band waveform with higher converted-speech quality. Experimental results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality and 2) the proposed sub-band-processing method for full-band VC can improve the converted-speech quality than the conventional method. △ Less

Submitted 17 February, 2020; originally announced February 2020.

Comments: 5 pages, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing 2020 (ICASSP 2020)

arXiv:1907.03514 [pdf, ps, other]

Evaluation of the superconducting characteristics of multi-layer thin-film structures of NbN and SiO$_2$ on pure Nb substrate

Authors: R. Katayama, H. Hayano, T. Kubo, T. Saeki, Hayato Ito, Y. Iwashita, H. Tongu, C. Z. Antoine, R. Ito, T. Nagata

Abstract: In recent years, it has been pointed out that the maximum accelerating gradient of a superconducting RF cavity can be increased by coating the inner surface of the cavity with a multilayer thin-film structure consisting of alternating insulating and superconducting layers. In this structure, the principal parameter that limits the performance of the cavity is the critical magnetic field or effecti… ▽ More In recent years, it has been pointed out that the maximum accelerating gradient of a superconducting RF cavity can be increased by coating the inner surface of the cavity with a multilayer thin-film structure consisting of alternating insulating and superconducting layers. In this structure, the principal parameter that limits the performance of the cavity is the critical magnetic field or effective $H_{C1}$ at which vortices begin penetrating into the superconductor layer. This is predicted to depend on the combination of the film thickness. We made samples that have a NbN/SiO$_2$ thin-film structure on a pure Nb substrate with several layers of NbN film deposited using DC magnetron sputtering method. Here, we report the measurement results of effective $H_{C1}$ of NbN/SiO$_2$(30 nm)/Nb multilayer samples with thicknesses of NbN layers in the range from 50 nm to 800 nm by using the third-harmonic voltage method. Experimental results show that an optimum thickness exists, which increases the effective $H_{C1}$ by 23.8 %. △ Less

Submitted 8 July, 2019; originally announced July 2019.

Comments: Contribution to the 19th International Conference on RF Superconductivity (SRF2019)

arXiv:1907.03410 [pdf]

Lower critical field measurement of NbN multilayer thin film superconductor at KEK

Authors: H. Ito, H. Hayano, T. Kubo, T. Saeki, R. Katayama, Y. Iwashita, H. Tongu, R. Ito, T. Nagata, C. Z. Antoine

Abstract: The multilayer thin film structure of the superconductor has been proposed by A. Gurevich to enhance the maximum gradient of SRF cavities. The lower critical field Hc1 at which the vortex starts penetrating the superconducting material will be improved by coating Nb with thin film superconductor such as NbN. It is expected that the enhancement of Hc1 depends on the thickness of each layer. In orde… ▽ More The multilayer thin film structure of the superconductor has been proposed by A. Gurevich to enhance the maximum gradient of SRF cavities. The lower critical field Hc1 at which the vortex starts penetrating the superconducting material will be improved by coating Nb with thin film superconductor such as NbN. It is expected that the enhancement of Hc1 depends on the thickness of each layer. In order to determine the optimum thickness of each layer and to compare the measurement results with the theoretical prediction proposed by T. Kubo, we developed the Hc1 measurement system using the third harmonic response of the applied AC magnetic field at KEK. For the Hc1 measurement without the influence of the edge or the shape effects, the AC magnetic field can be applied locally by the solenoid coil of 5mm diameter in our measurement system. ULVAC made the NbN-SiO2 multilayer thin film samples of various NbN thicknesses. In this report, the measurement result of the bulk Nb sample and NbN-SiO2 multilayer thin film samples of different thickness of NbN layer will be discussed. △ Less

Submitted 8 July, 2019; originally announced July 2019.

Comments: 5 pages, 7 figures, contribution to the 19th International Conference on RF Superconductivity (SRF2019)

arXiv:1906.08468 [pdf]

doi 10.1016/j.nima.2019.163284

Lower Critical Field Measurement System based on Third-Harmonic Method for Superconducting RF Materials

Authors: Hayato Ito, Hitoshi Hayano, Takayuki Kubo, Takayuki Saeki

Abstract: We develop a lower critical field (Hc1) measurement system using the third-harmonic response of an applied AC magnetic field from a solenoid coil positioned above a superconducting sample. Parameter Hc1 is measured via detection of the third-harmonic component, which drastically changes when a vortex begins to penetrate the superconductor with temperature increase. The magnetic field locally appli… ▽ More We develop a lower critical field (Hc1) measurement system using the third-harmonic response of an applied AC magnetic field from a solenoid coil positioned above a superconducting sample. Parameter Hc1 is measured via detection of the third-harmonic component, which drastically changes when a vortex begins to penetrate the superconductor with temperature increase. The magnetic field locally applied to one side of the sample mimics the magnetic field within superconducting radio-frequency (SRF) cavities and prevents edge effects of the superconducting sample. With this approach, our measurement system can potentially characterize surface-engineered SRF materials such as Superconductor-Insulator-Superconductor multilayer structure (S-I-S structure). As a validation test, we measure the temperature dependence of Hc1 of two high-RRR bulk Nb samples and obtain results consistent with the literature. We also confirm that our system can apply magnetic fields of at least 120 mT at 4-5 K without any problem of heat generation of the coil. This field value is higher than those reported in previous works and makes it possible to more accurately estimate Hc1 at lower temperatures. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 17 pages, 11 figures

arXiv:1804.04824 [pdf, other]

An Improvement of Non-binary Code Correcting Single b-Burst of Insertions or Deletions

Authors: Toyohiko Saeki, Takayuki Nozaki

Abstract: This paper constructs a non-binary code correcting a single $b$-burst of insertions or deletions with a large cardinality. This paper also proposes a decoding algorithm of this code and evaluates a lower bound of the cardinality of this code. Moreover, we evaluate an asymptotic upper bound on the cardinality of codes which correct a single burst of insertions or deletions. This paper constructs a non-binary code correcting a single $b$-burst of insertions or deletions with a large cardinality. This paper also proposes a decoding algorithm of this code and evaluates a lower bound of the cardinality of this code. Moreover, we evaluate an asymptotic upper bound on the cardinality of codes which correct a single burst of insertions or deletions. △ Less

Submitted 9 August, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

Comments: 7 pages, accepted to ISITA 2018

arXiv:1407.0771 [pdf, ps, other]

Review of the multilayer coating model

Authors: Takayuki Kubo, Yoshihisa Iwashita, Takayuki Saeki

Abstract: The recent theoretical study on the multilayer-coating model published in Applied Physics Letters [1] is reviewed. Magnetic-field attenuation behavior in a multilayer coating model is different from a semi-infinite superconductor and a superconducting thin film. This difference causes that of the vortex-penetration field at which the Bean-Livingston surface barrier disappears. A material with smal… ▽ More The recent theoretical study on the multilayer-coating model published in Applied Physics Letters [1] is reviewed. Magnetic-field attenuation behavior in a multilayer coating model is different from a semi-infinite superconductor and a superconducting thin film. This difference causes that of the vortex-penetration field at which the Bean-Livingston surface barrier disappears. A material with smaller penetration depth, such as a pure Nb, is preferable as the substrate for pushing up the vortex-penetration field of the superconductor layer. The field limit of the whole structure of the multilayer coating model is limited not only by the vortex-penetration field of the superconductor layer, but also by that of the substrate. Appropriate thicknesses of superconductor and insulator layers can be extracted from contour plots of the field limit of the multilayer coating model given in Ref.[1]. △ Less

Submitted 2 July, 2014; originally announced July 2014.

Comments: 3 pages, 3 figures, the 5th International Particle Accelerator Conference (IPAC14), Dresden, Germany, 15-20 June, 2014

Journal ref: Proceedings of IPAC14, Dresden, Germany (2014), p. 2522, WEPRI023

arXiv:1307.0583 [pdf, ps, other]

Vortex penetration field of the multilayer coating model

Authors: Takayuki Kubo, Yoshihisa Iwashita, Takayuki Saeki

Abstract: The vortex penetration field of the multilayer coating model with a single superconductor layer and a single insulator layer formed on a bulk superconductor are derived. The same formula can be applied to a model with a superconductor layer formed on a bulk superconductor without an insulator layer. The vortex penetration field of the multilayer coating model with a single superconductor layer and a single insulator layer formed on a bulk superconductor are derived. The same formula can be applied to a model with a superconductor layer formed on a bulk superconductor without an insulator layer. △ Less

Submitted 1 July, 2013; originally announced July 2013.

Comments: 3 pages, 2 figures, the 16th International Conference on RF Superconductivity (SRF 2013), Paris, France, 22-27 September, 2013

arXiv:1306.4823 [pdf, ps, other]

RF field-attenuation formulae for the multilayer coating model

Authors: Takayuki Kubo, Takayuki Saeki, Yoshihisa Iwashita

Abstract: Formulae that describe the RF electromagnetic field attenuation for the multilayer coating model with a single superconductor layer and a single insulator layer deposited on a bulk superconductor are derived from a rigorous calculation with the Maxwell equations and the London equation. Formulae that describe the RF electromagnetic field attenuation for the multilayer coating model with a single superconductor layer and a single insulator layer deposited on a bulk superconductor are derived from a rigorous calculation with the Maxwell equations and the London equation. △ Less

Submitted 30 July, 2013; v1 submitted 20 June, 2013; originally announced June 2013.

Comments: 3 pages, the 4th International Particle Accelerator Conference (IPAC13), Shanghai, China, 12-17 May, 2013

Journal ref: in proceedings of the 4th International Particle Accelerator Conference, IPAC13, Shanghai, China (2013), p. 2343

arXiv:1304.6876 [pdf, ps, other]

doi 10.1063/1.4862892

Radio-frequency electromagnetic field and vortex penetration in multilayered superconductors

Authors: Takayuki Kubo, Yoshihisa Iwashita, Takayuki Saeki

Abstract: A multilayered structure with a single superconductor layer and a single insulator layer formed on a bulk superconductor is studied. General formulae for the vortex-penetration field of the superconductor layer and the magnetic field on the bulk superconductor, which is shielded by the superconductor and insulator layers, are derived with a rigorous calculation of the magnetic field attenuation in… ▽ More A multilayered structure with a single superconductor layer and a single insulator layer formed on a bulk superconductor is studied. General formulae for the vortex-penetration field of the superconductor layer and the magnetic field on the bulk superconductor, which is shielded by the superconductor and insulator layers, are derived with a rigorous calculation of the magnetic field attenuation in the multilayered structure. The achievable peak surface field depends on the thickness and its material of the superconductor layer, the thickness of the insulator layer and material of the bulk superconductor. The calculation shows a good agreement with an experimental result. A combination of the thicknesses of superconductor and insulator layers to enhance the field limit can be given by the formulae for any given materials. △ Less

Submitted 9 January, 2014; v1 submitted 25 April, 2013; originally announced April 2013.

Comments: 4 pages, 4 figures; figure and table are added, discussions extended, references added, to appear in Applied Physics Letters

Journal ref: Appl. Phys. Lett. 104, 032603 (2014)

arXiv:astro-ph/0205344 [pdf, ps, other]

doi 10.1016/S0927-6505(02)00195-0

Precise Measurements of Atmospheric Muon Fluxes with the BESS Spectrometer

Authors: M. Motoki, T. Sanuki, S. Orito, K. Abe, K. Anraku, Y. Asaoka, M. Fujikawa, H. Fuke, S. Haino, M. Imori, K. Izumi, T. Maeno, Y. Makida, N. Matsui, H. Matsumoto, H. Matsunaga, J. Mitchell, T. Mitsui, A. Moiseev, J. Nishimura, M. Nozaki, J. Ormes, T. Saeki, M. Sasaki, E. S. Seo , et al. (14 additional authors not shown)

Abstract: The vertical absolute fluxes of atmospheric muons and muon charge ratio have been measured precisely at different geomagnetic locations by using the BESS spectrometer. The observations had been performed at sea level (30 m above sea level) in Tsukuba, Japan, and at 360 m above sea level in Lynn Lake, Canada. The vertical cutoff rigidities in Tsukuba (36.2 N, 140.1 E) and in Lynn Lake (56.5 N, 10… ▽ More The vertical absolute fluxes of atmospheric muons and muon charge ratio have been measured precisely at different geomagnetic locations by using the BESS spectrometer. The observations had been performed at sea level (30 m above sea level) in Tsukuba, Japan, and at 360 m above sea level in Lynn Lake, Canada. The vertical cutoff rigidities in Tsukuba (36.2 N, 140.1 E) and in Lynn Lake (56.5 N, 101.0 W) are 11.4 GV and 0.4 GV, respectively. We have obtained vertical fluxes of positive and negative muons in a momentum range from 0.6 to 20 GeV/c with systematic errors less than 3 % in both measurements. By comparing the data collected at two different geomagnetic latitudes, we have seen an effect of cutoff rigidity. The dependence on the atmospheric pressure and temperature, and the solar modulation effect have been also clearly observed. We also clearly observed the decrease of charge ratio of muons at low momentum side with at higher cutoff rigidity region. △ Less

Submitted 21 May, 2002; originally announced May 2002.

Comments: 35 pages, 9 figures. Submitted to Astroparticle Physics

Journal ref: Astropart.Phys. 19 (2003) 113-126

arXiv:astro-ph/0109007 [pdf, ps, other]

doi 10.1103/PhysRevLett.88.051101

Measurements of Cosmic-ray Low-energy Antiproton and Proton Spectra in a Transient Period of the Solar Field Reversal

Authors: Y. Asaoka, Y. Shikaze, K. Abe, K. Anraku, M. Fujikawa, H. Fuke, S. Haino, M. Imori, K. Izumi, T. Maeno, Y. Makida, S. Matsuda, N. Matsui, T. Matsukawa, H. Matsumoto, H. Matsunaga, J. Mitchell, T. Mitsui, A. Moiseev, M. Motoki, J. Nishimura, M. Nozaki, S. Orito, J. F. Ormes, T. Saeki , et al. (17 additional authors not shown)

Abstract: The energy spectra of cosmic-ray low-energy antiprotons and protons have been measured by BESS in 1999 and 2000, during a period covering the solar magnetic field reversal. Based on these measurements, a sudden increase of the antiproton to proton flux ratio following the solar magnetic field reversal was observed, and it generally agrees with a drift model of the solar modulation. The energy spectra of cosmic-ray low-energy antiprotons and protons have been measured by BESS in 1999 and 2000, during a period covering the solar magnetic field reversal. Based on these measurements, a sudden increase of the antiproton to proton flux ratio following the solar magnetic field reversal was observed, and it generally agrees with a drift model of the solar modulation. △ Less

Submitted 25 January, 2002; v1 submitted 2 September, 2001; originally announced September 2001.

Comments: 4 pages, 4 figures, revised version accepted for publication in Phys. Rev. Lett

Journal ref: Phys.Rev.Lett.88:051101,2002

arXiv:astro-ph/0002481 [pdf, ps, other]

doi 10.1086/317873

Precise Measurement of Cosmic-Ray Proton and Helium Spectra with the BESS Spectrometer

Authors: T. Sanuki, M. Motoki, H. Matsumoto, E. S. Seo, J. Z. Wang, K. Abe, K. Anraku, Y. Asaoka, M. Fujikawa, M. Imori, T. Maeno, Y. Makida, N. Matsui, H. Matsunaga, J. Mitchell, T. Mitsui, A. Moiseev, J. Nishimura, M. Nozaki, S. Orito, J. Ormes, T. Saeki, M. Sasaki, Y. Shikaze, T. Sonoda , et al. (9 additional authors not shown)

Abstract: We report cosmic-ray proton and helium spectra in energy ranges of 1 to 120 GeV and 1 to 54 GeV/nucleon, respectively, measured by a balloon flight of the BESS spectrometer in 1998. The magnetic-rigidity of the cosmic-rays was reliably determined by highly precise measurement of the circular track in a uniform solenoidal magnetic field of 1 Tesla. Those spectra were determined within overall unc… ▽ More We report cosmic-ray proton and helium spectra in energy ranges of 1 to 120 GeV and 1 to 54 GeV/nucleon, respectively, measured by a balloon flight of the BESS spectrometer in 1998. The magnetic-rigidity of the cosmic-rays was reliably determined by highly precise measurement of the circular track in a uniform solenoidal magnetic field of 1 Tesla. Those spectra were determined within overall uncertainties of +-5 % for protons and +- 10 % for helium nuclei including statistical and systematic errors. △ Less

Submitted 25 February, 2000; originally announced February 2000.

Comments: 12 pages, 4 figures

Journal ref: Astrophys.J. 545 (2000) 1135

arXiv:hep-ex/9906036 [pdf, ps, other]

W mass measurement at LEP

Authors: Takayuki Saeki

Abstract: In 1998, the four experiments of LEP, i.e. ALEPH, DELPHI, L3 and OPAL, collected data of about 175 /pb per experiment at the center-of-mass energy of 189 GeV. Using these data, the mass of W boson was directly measured by reconstructing the decay products of two W bosons from the e+e- collisions. The W mass measurement was combined personally with the results obtained from data at 161, 172, and… ▽ More In 1998, the four experiments of LEP, i.e. ALEPH, DELPHI, L3 and OPAL, collected data of about 175 /pb per experiment at the center-of-mass energy of 189 GeV. Using these data, the mass of W boson was directly measured by reconstructing the decay products of two W bosons from the e+e- collisions. The W mass measurement was combined personally with the results obtained from data at 161, 172, and 183 GeV. This yielded the private LEP2 average of Mw = 80.350 +/- 0.056 GeV. △ Less

Submitted 24 June, 1999; originally announced June 1999.

Comments: 6 pages, 2 figures; uses Moriond.sty; to appear in the proceedings of the XXXIVth Recontres de Moriond QCD, March 1999

arXiv:astro-ph/9906426 [pdf, ps, other]

doi 10.1103/PhysRevLett.84.1078

Precision Measurement of Cosmic-Ray Antiproton Spectrum

Authors: S. Orito, T. Maeno, H. Matsunaga, K. Abe, K. Anraku, Y. Asaoka, M. Fujikawa, M. Imori, M. Ishino, Y. Makida, N. Matsui, H. Matsumoto, J. Mitchell, T. Mitsui, A. Moiseev, M. Motoki, J. Nishimura, M. Nozaki, J. Ormes, T. Saeki, T. Sanuki, M. Sasaki, E. S. Seo, Y. Shikaze, T. Sonoda , et al. (9 additional authors not shown)

Abstract: The energy spectrum of cosmic-ray antiprotons has been measured in the range 0.18 to 3.56 GeV, based on 458 antiprotons collected by BESS in recent solar-minimum period. We have detected for the first time a distinctive peak at 2 GeV of antiprotons originating from cosmic-ray interactions with the interstellar gas. The peak spectrum is reproduced by theoretical calculations, implying that the pr… ▽ More The energy spectrum of cosmic-ray antiprotons has been measured in the range 0.18 to 3.56 GeV, based on 458 antiprotons collected by BESS in recent solar-minimum period. We have detected for the first time a distinctive peak at 2 GeV of antiprotons originating from cosmic-ray interactions with the interstellar gas. The peak spectrum is reproduced by theoretical calculations, implying that the propagation models are basically correct and that different cosmic-ray species undergo a universal propagation. Future BESS flights toward the solar maximum will help us to study the solar modulation and the propagation in detail and to search for primary antiproton components. △ Less

Submitted 26 June, 1999; originally announced June 1999.

Comments: REVTeX, 4 pages including 4 eps figures

Journal ref: Phys.Rev.Lett.84:1078-1081,2000

arXiv:astro-ph/9809326 [pdf, ps, other]

doi 10.1103/PhysRevLett.81.4052

Measurement of Low-Energy Cosmic-Ray Antiprotons at Solar Minimum

Authors: H. Matsunaga, S. Orito, H. Matsumoto, K. Yoshimura, A. Moiseev, K. Anraku, R. Golden, M. Imori, Y. Makida, J. Mitchell, M. Motoki, J. Nishimura, M. Nozaki, J. Ormes, T. Saeki, T. Sanuki, R. Streitmatter, J. Suzuki, K. Tanaka, I. Ueda, N. Yajima, T. Yamagami, A. Yamamoto, T. Yoshida

Abstract: The absolute fluxes of the cosmic-ray antiprotons at solar minimum are measured in the energy range 0.18 to 1.4 GeV, based on 43 events unambiguously detected in BESS '95 data. The resultant energy spectrum appears to be flat below 1 GeV, compatible with a possible admixture of primary antiproton component with a soft energy spectrum, while the possibility of secondary antiprotons alone explaini… ▽ More The absolute fluxes of the cosmic-ray antiprotons at solar minimum are measured in the energy range 0.18 to 1.4 GeV, based on 43 events unambiguously detected in BESS '95 data. The resultant energy spectrum appears to be flat below 1 GeV, compatible with a possible admixture of primary antiproton component with a soft energy spectrum, while the possibility of secondary antiprotons alone explaining the data cannot be excluded with the present accuracy. Further improvement of statistical accuracy and extension of the energy range are planned in future BESS flights. △ Less

Submitted 25 September, 1998; originally announced September 1998.

Comments: REVTeX, 4 pages including 4 eps figures. Submitted to PRL

Report number: RESCEU 20/98, UT-ICEPP 98-02

Journal ref: Phys.Rev.Lett.81:4052-4055,1998

arXiv:astro-ph/9710228 [pdf, ps, other]

doi 10.1016/S0370-2693(98)00131-2

A New Limit on the Flux of Cosmic Antihelium

Authors: T. Saeki, K. Anraku, S. Orito, J. Ormes, M. Imori, B. Kimbell, Y. Makida, H. Matsumoto, H. Matsunaga, J. Mitchell, M. Motoki, J. Nishimura, M. Nozaki, M. Otoba, T. Sanuki, R. Streitmatter, J. Suzuki, K. Tanaka, I. Ueda, N. Yajima, T. Yamagami, A. Yamamoto, T. Yoshida, K. Yoshimura

Abstract: A very sensitive search for cosmic-ray antihelium was performed using data obtained from three scientific flights of BESS magnetic rigidity spectrometer. We have not observed any antihelium; this places a model-independent upper limit (95 % C.L.) on the antihelium flux of 6*10**(-4) m**(-2)sr**(-1)s**(-1) at the top of the atmosphere in the rigidity region 1 to 16 GV, after correcting for the es… ▽ More A very sensitive search for cosmic-ray antihelium was performed using data obtained from three scientific flights of BESS magnetic rigidity spectrometer. We have not observed any antihelium; this places a model-independent upper limit (95 % C.L.) on the antihelium flux of 6*10**(-4) m**(-2)sr**(-1)s**(-1) at the top of the atmosphere in the rigidity region 1 to 16 GV, after correcting for the estimated interaction loss of antihelium in the air and in the instrument. The corresponding upper limit on the Hebar/He flux ratio is 3.1*10**(-6), 30 times more stringent than the limits obtained in similar rigidity regions with magnetic spectrometers previous to BESS. △ Less

Submitted 21 October, 1997; originally announced October 1997.

Comments: REVTeX, 4 pages (including 5 EPS figures). Submitted to PRL

Journal ref: Phys.Lett. B422 (1998) 319-324

Showing 1–41 of 41 results for author: Saeki, T