Skip to main content

Showing 251–300 of 845 results for author: Watanabe, S

.
  1. arXiv:2203.10273  [pdf, ps, other

    cond-mat.str-el cond-mat.mtrl-sci

    Crystalline Electronic Field in Rare-Earth Based Quasicrystal and Approximant: Analysis of Quantum Critical Au-Al-Yb Quasicrystal and Approximant

    Authors: Shinji Watanabe, Mina Kawamoto

    Abstract: On the basis of the point charge model, we formulate the crystalline electronic field (CEF) Hamiltonian $H_{\rm CEF}$ in the rare-earth based quasicrystal (QC) and approximant crystal (AC) with ligand ions located at pseudo 5-fold configurations by using the operator equivalent method. By setting the total angular momentum $J=7/2$, the CEF in the quantum critical QC Au$_{51}$Al$_{34}$Yb$_{15}$ and… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: 5 pages, 6 figures

    Journal ref: J. Phys. Soc. Jpn. 90 (2021) 063701

  2. arXiv:2203.10242  [pdf

    physics.optics cond-mat.mtrl-sci physics.ins-det

    Temporal-offset dual-comb vibrometer with picometer axial precision

    Authors: A. Iwasaki, D. Nishikawa, M. Okano, S. Tateno, K. Yamanoi, Y. Nozaki, S. Watanabe

    Abstract: We demonstrate a dual-comb vibrometer where the pulses of one frequency-comb are split into pulse pairs. We introduce a delay between the two pulses of each pulse pair in front of the sample, and after the corresponding two consecutive reflections at the vibrating sample surface, the initially introduced delay is cancelled by a modified Sagnac geometry. The remaining phase difference between the t… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: 22 pages, 5 figures

    Journal ref: APL Photonics 7(10), 106101 (2022)

  3. Measurements of second-harmonic Fourier coefficients from azimuthal anisotropies in $p$$+$$p$, $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

    Authors: N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, V. Andrieux, K. Aoki, N. Apadula, H. Asano, C. Ayuso, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont, A. Berdnikov, Y. Berdnikov , et al. (368 additional authors not shown)

    Abstract: Recently, the PHENIX Collaboration has published second- and third-harmonic Fourier coefficients $v_2$ and $v_3$ for midrapidity ($|η|<0.35$) charged hadrons in 0\%--5\% central $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV utilizing three sets of two-particle correlations for two detector combinations with different pseudorapidity acceptance [Phys. Rev. C {\bf 105},… ▽ More

    Submitted 4 March, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: 393 authors from 72 institutions, 14 pages, 10 figures, 2014, 2015, and 2016 data. v2 is version accepted for publication in Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. C 107, 024907 (2023)

  4. arXiv:2203.07960  [pdf, other

    eess.AS

    Investigating self-supervised learning for speech enhancement and separation

    Authors: Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, Sanjeev Khudanpur

    Abstract: Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation extracts target speech from interfering speakers. Despite a great number of supervised learning-based enhancement and separation methods having been proposed and achieving good performance, studies on applying self-supervised learning (SSL… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: To appear in ICASSP 2022

  5. arXiv:2203.06884  [pdf, other

    cs.LG math.ST

    Asymptotic Behavior of Bayesian Generalization Error in Multinomial Mixtures

    Authors: Takumi Watanabe, Sumio Watanabe

    Abstract: Multinomial mixtures are widely used in the information engineering field, however, their mathematical properties are not yet clarified because they are singular learning models. In fact, the models are non-identifiable and their Fisher information matrices are not positive definite. In recent years, the mathematical foundation of singular statistical models are clarified by using algebraic geomet… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  6. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

  7. Study of $φ$-meson production in $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

    Authors: U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, V. Andrieux, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau , et al. (346 additional authors not shown)

    Abstract: Small nuclear collisions are mainly sensitive to cold-nuclear-matter effects; however, the collective behavior observed in these collisions shows a hint of hot-nuclear-matter effects. The identified-particle spectra, especially the $φ$ mesons which contain strange and antistrange quarks and have a relatively small hadronic-interaction cross section, are a good tool to study these effects. The PHEN… ▽ More

    Submitted 26 July, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: 371 authors from 72 institutions, 13 pages, 7 figures, 7 tables, 2014 and 2015 data. v2 is version accepted for publication Physical Review C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. C 106, 014908 (2022)

  8. arXiv:2203.05749  [pdf, ps, other

    stat.ME stat.ML

    Classification from Positive and Biased Negative Data with Skewed Labeled Posterior Probability

    Authors: Shotaro Watanabe, Hidetoshi Matsui

    Abstract: The binary classification problem has a situation where only biased data are observed in one of the classes. In this paper, we propose a new method to approach the positive and biased negative (PbN) classification problem, which is a weakly supervised learning method to learn a binary classifier from positive data and negative data with biased observations. We incorporate a method to correct the n… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: 14 pages, 1 figure

  9. arXiv:2203.04575  [pdf, other

    math.PR cs.IT math.ST

    Geometric Aspects of Data-Processing of Markov Chains

    Authors: Geoffrey Wolfer, Shun Watanabe

    Abstract: We examine data-processing of Markov chains through the lens of information geometry. We first establish a theory of congruent Markov morphisms within the framework of stochastic matrices. Specifically, we introduce and justify the concept of a linear right inverse (congruent embedding) for lum**, a well-known operation used in Markov chains to extract coarse information. Furthermore, we inspect… ▽ More

    Submitted 20 December, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

    MSC Class: 60J10

  10. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu **, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  11. arXiv:2203.00232  [pdf, other

    cs.SD cs.CL eess.AS

    Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

    Authors: Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

    Abstract: Graph-based temporal classification (GTC), a generalized form of the connectionist temporal classification loss, was recently proposed to improve automatic speech recognition (ASR) systems using graph-based supervision. For example, GTC was first used to encode an N-best list of pseudo-label sequences into a graph for semi-supervised learning. In this paper, we propose an extension of GTC to model… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: To appear in ICASSP2022

  12. arXiv:2202.12298  [pdf, other

    eess.AS cs.SD

    Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

    Authors: Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe

    Abstract: This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones. The core of our approach combines Deep Neural Network (DNN) driven complex spectral map** with linear beamformers such as the multi-frame multi-channel Wiener filter. Our proposed system has two DNNs and a linear beamformer in between. Both DNNs are trained to… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: to be published in IEEE ICASSP 2022

  13. Acoustic Event Detection with Classifier Chains

    Authors: Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi

    Abstract: This paper proposes acoustic event detection (AED) with classifier chains, a new classifier based on the probabilistic chain rule. The proposed AED with classifier chains consists of a gated recurrent unit and performs iterative binary detection of each event one by one. In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 5pages, presented at Interspeech2021

  14. Measurement of Direct-Photon Cross Section and Double-Helicity Asymmetry at $\sqrt{s}=510$ GeV in $\vec{p}+\vec{p}$ Collisions

    Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, M. Alfred, N. Apadula, Y. Aramaki, H. Asano, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont , et al. (336 additional authors not shown)

    Abstract: We present measurements of the cross section and double-helicity asymmetry $A_{LL}$ of direct-photon production in $\vec{p}+\vec{p}$ collisions at $\sqrt{s}=510$ GeV. The measurements have been performed at midrapidity ($|η|<0.25$) with the PHENIX detector at the Relativistic Heavy Ion Collider. At relativistic energies, direct photons are dominantly produced from the initial quark-gluon hard scat… ▽ More

    Submitted 6 May, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: 358 authors from 72 institutions, 8 pages, 2 figures, 1 table, 2013 data. v2 is version accepted by Physical Review Letters. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

  15. arXiv:2202.06497  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Universal and Efficient p-Do** of Organic Semiconductors by Electrophilic Attack of Cations

    Authors: **g Guo, Ying Liu, **-An Chen, Xinhao Wang, Yanpei Wang, **g Guo, Xincan Qiu, Zebing Zeng, Lang Jiang, Yuan** Yi, Shun Watanabe, Lei Liao, Yugang Bai, Thuc-Quyen Nguyen, Yuanyuan Hu

    Abstract: Do** is of great importance to tailor the electrical properties of semiconductors. However, the present do** methodologies for organic semiconductors (OSCs) are either inefficient or can only apply to a small number of OSCs, seriously limiting their general application. Herein, we reveal a novel p-do** mechanism by investigating the interactions between the dopant trityl cation and poly(3-he… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  16. arXiv:2202.05256  [pdf, other

    eess.AS cs.LG cs.SD

    Conditional Diffusion Probabilistic Model for Speech Enhancement

    Authors: Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

    Abstract: Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorit… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

  17. Measurement of $ψ(2S)$ nuclear modification at backward and forward rapidity in $p$$+$$p$, $p$$+$Al, and $p$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

    Authors: U. A. Acharya, C. Aidala, Y. Akiba, M. Alfred, V. Andrieux, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, N. S. Bandara, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau, J. S. Bok, V. Borisov, M. L. Brooks, J. Bryslawskyj, V. Bumazhnov , et al. (291 additional authors not shown)

    Abstract: Suppression of the $J/ψ$ nuclear-modification factor has been seen as a trademark signature of final-state effects in large collision systems for decades. In small systems, the nuclear modification was attributed to cold-nuclear-matter effects until the observation of strong differential suppression of the $ψ(2S)$ state in $p/d$$+$$A$ collisions suggested the presence of final-state effects. Resul… ▽ More

    Submitted 30 June, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: 315 authors from 69 institutions, 16 pages, 9 figures, 4 tables, 2015 data. v2 is version accepted for publication in Physical Review C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. C 105, 064912 (2022)

  18. arXiv:2202.01405  [pdf, other

    eess.AS cs.CL cs.SD

    Joint Speech Recognition and Audio Captioning

    Authors: Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe

    Abstract: Speech samples recorded in both indoor and outdoor environments are often contaminated with secondary audio sources. Most end-to-end monaural speech recognition systems either remove these background sounds using speech enhancement or train noise-robust models. For better model interpretability and holistic understanding, we aim to bring together the growing field of automated audio captioning (AA… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 5 pages, 2 figures. Accepted for ICASSP 2022

  19. arXiv:2201.13005  [pdf, ps, other

    cs.IT

    On Sub-optimality of Random Binning for Distributed Hypothesis Testing

    Authors: Shun Watanabe

    Abstract: We investigate the quantize and binning scheme, known as the Shimokawa-Han-Amari (SHA) scheme, for the distributed hypothesis testing. We develop tools to evaluate the critical rate attainable by the SHA scheme. For a product of binary symmetric double sources, we present a sequential scheme that improves upon the SHA scheme.

    Submitted 2 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: 6 pages; v2 added a reference

  20. arXiv:2201.10190  [pdf, ps, other

    eess.AS cs.SD

    Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR

    Authors: Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe

    Abstract: A streaming style inference of encoder-decoder automatic speech recognition (ASR) system is important for reducing latency, which is essential for interactive use cases. To this end, we propose a novel blockwise synchronous decoding algorithm with a hybrid approach that combines endpoint prediction and endpoint post-determination. In the endpoint prediction, we compute the expectation of the numbe… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted for ICASSP2022

  21. arXiv:2201.10103  [pdf, other

    eess.AS cs.SD

    Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

    Authors: Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang

    Abstract: While Transformers have achieved promising results in end-to-end (E2E) automatic speech recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up the decoding process. For real-world deployment, ASR systems are desired to be highly accurate while achieving fast inference. Non-autoregressive (NAR) models have become a popular alternative due to their fast inference… ▽ More

    Submitted 26 January, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted by ICASSP2022

  22. arXiv:2201.05420  [pdf, other

    eess.AS cs.SD

    A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies

    Authors: Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe

    Abstract: In this study, we present recent developments of models trained with the RNN-T loss in ESPnet. It involves the use of various architectures such as recently proposed Conformer, multi-task learning with different auxiliary criteria and multiple decoding strategies, including our own proposition. Through experiments and benchmarks, we show that our proposed systems can be competitive against other s… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

  23. arXiv:2112.09872  [pdf

    physics.optics

    Hyperparameter tuning of optical neural network classifiers for high-order gaussian beams

    Authors: Shunsuke Watanabe, Tomoyoshi Shimobaba, Takashi Kakue, Tomoyoshi Ito

    Abstract: High-order Gaussian beams with multiple propagation modes have been studied for free-space optical communications. Fast classification of beams using a diffractive deep neural network, D2NN, has been proposed. D2NN optimization is important because it has numerous hyperparameters, such as interlayer distances and mode combinations. In this study, we classify Hermite-Gaussian beams, which are high-… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

  24. arXiv:2112.09382  [pdf, other

    cs.SD cs.LG eess.AS

    Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem

    Authors: **g Shi, Xuankai Chang, Tomoki Hayashi, Yen-Ju Lu, Shinji Watanabe, Bo Xu

    Abstract: Deep learning based models have significantly improved the performance of speech separation with input mixtures like the cocktail party. Prominent methods (e.g., frequency-domain and time-domain speech separation) usually build regression models to predict the ground-truth speech from the mixture, using the masking-based design and the signal-level loss criterion (e.g., MSE or SI-SNR). This study… ▽ More

    Submitted 9 January, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: 5 pages, https://shincling.github.io/discreteSeparation/

  25. arXiv:2112.09323  [pdf, other

    cs.SD eess.AS

    JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

    Authors: Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, Shinji Watanabe

    Abstract: In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can autom… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

    Comments: Submitted to ICASSP2022

  26. Transverse-single-spin asymmetries of charged pions at midrapidity in transversely polarized $p{+}p$ collisions at $\sqrt{s}=200$ GeV

    Authors: U. A. Acharya, C. Aidala, Y. Akiba, M. Alfred, V. Andrieux, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, N. S. Bandara, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau, J. S. Bok, V. Borisov, M. L. Brooks, J. Bryslawskyj, V. Bumazhnov , et al. (286 additional authors not shown)

    Abstract: In 2015, the PHENIX collaboration has measured single-spin asymmetries for charged pions in transversely polarized proton-proton collisions at the center of mass energy of $\sqrt{s}=200$ GeV. The pions were detected at central rapidities of $|η|<0.35$. The single-spin asymmetries are consistent with zero for each charge individually, as well as consistent with the previously published neutral-pion… ▽ More

    Submitted 9 February, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: 311 authors from 68 institutions, 8 pages, 3 figures, 1 table. 2015 data. v2 is version accepted for publication in Physical Review D. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. D 105, 032003 (2022)

  27. arXiv:2112.02492  [pdf, other

    cond-mat.soft

    History-dependent deformation of a rotated granular pile governed by granular friction

    Authors: T. Irie, R. Yamaguchi, S. Watanabe, H. Katsuragi

    Abstract: We experimentally examined the history dependence of the rotation-induced granular deformation. As an initial state, we prepared a quasi-two-dimensional granular pile whose apex is at the rotational axis and its initial inclination is at the angle of repose. The rotation rate was increased from $0$ to $620$~(rpm) and then decreased back to $0$. During the rotation, deformation of the rotated granu… ▽ More

    Submitted 2 June, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: 11 pages, 10 figures

  28. arXiv:2112.02489  [pdf, other

    cond-mat.soft astro-ph.EP

    Deformation of a rotated granular pile governed by body-force-dependent friction

    Authors: T. Irie, R. Yamaguchi, S. Watanabe, H. Katsuragi

    Abstract: Although the gravity dependence of granular friction is crucial to understand various natural phenomena, its precise characterization is difficult. We propose a method to characterize granular friction under various gravity (body force) conditions controlled by centrifugal force; specifically, the deformation of a rotated granular pile was measured. To understand the mechanics governing the observ… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: 10 pages, 5 figures

  29. arXiv:2111.15016  [pdf, other

    cs.CL cs.SD eess.AS

    Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

    Authors: Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

    Abstract: Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint m… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  30. arXiv:2111.14706  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

    Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

    Abstract: As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can b… ▽ More

    Submitted 3 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022 (5 pages)

  31. arXiv:2111.08201  [pdf, other

    eess.AS cs.CL

    Attention-based Multi-hypothesis Fusion for Speech Summarization

    Authors: Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe

    Abstract: Speech summarization, which generates a text summary from speech, can be achieved by combining automatic speech recognition (ASR) and text summarization (TS). With this cascade approach, we can exploit state-of-the-art models and large training datasets for both subtasks, i.e., Transformer for ASR and Bidirectional Encoder Representations from Transformers (BERT) for TS. However, ASR errors direct… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  32. Systematic study of nuclear effects in $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV using $π^0$ production

    Authors: U. A. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, V. Andrieux, A. Angerami, K. Aoki, N. Apadula, Y. Aramaki, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, N. S. Bandara, B. Bannier, K. N. Barish , et al. (529 additional authors not shown)

    Abstract: The PHENIX collaboration presents a systematic study of $π^0$ production from $p$$+$$p$, $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. Measurements were performed with different centrality selections as well as the total inelastic, 0%--100%, selection for all collision systems. For 0%--100% collisions, the nuclear modification factors, $R_{xA}$, are cons… ▽ More

    Submitted 6 June, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

    Comments: 554 authors from 81 institutions, 21 pages, 13 figures, and 3 tables. Data from 2008, 2014, and 2015. v2 is version accepted for publication in Physical Review C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. C 105, 064902 (2022)

  33. arXiv:2111.01326  [pdf, other

    eess.AS cs.CL cs.SD

    Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

    Authors: Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black

    Abstract: Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-re… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  34. arXiv:2111.01272  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Sequence Transduction with Graph-based Supervision

    Authors: Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

    Abstract: The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if… ▽ More

    Submitted 31 March, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted for publication at IEEE ICASSP 2022

  35. arXiv:2110.15018  [pdf, other

    eess.AS cs.SD

    TorchAudio: Building Blocks for Audio and Speech Processing

    Authors: Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi

    Abstract: This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif… ▽ More

    Submitted 16 February, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  36. The Boué--Dupuis formula and the exponential hypercontractivity in the Gaussian space

    Authors: Yuu Hariya, Sou Watanabe

    Abstract: This paper concerns a variational representation formula for Wiener functionals. Let $B=\{ B_{t}\} _{t\ge 0}$ be a standard $d$-dimensional Brownian motion. Boué and Dupuis (1998) showed that, for any bounded measurable functional $F(B)$ of $B$ up to time $1$, the expectation $\mathbb{E}\!\left[ e^{F(B)}\right] $ admits a variational representation in terms of drifted Brownian motions. In this pap… ▽ More

    Submitted 3 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: 15 pages: newly added reference [9] by Chandra et al. (arXiv:2006.15933); also added is a corollary (Corollary 2.1) to Theorem 1.1, in which the case of bounded drifts is treated

    MSC Class: 60H30 (Primary) 60J65; 60E15 (Secondary)

  37. arXiv:2110.14139  [pdf, other

    eess.AS cs.SD

    Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

    Authors: Wangyou Zhang, **g Shi, Chenda Li, Shinji Watanabe, Yanmin Qian

    Abstract: The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful inv… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, accepted by IEEE WASPAA 2021

  38. arXiv:2110.07840  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet2-TTS: Extending the Edge of TTS Research

    Authors: Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance T… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP2022. Demo HP: https://espnet.github.io/icassp2022-tts/

  39. Transverse single spin asymmetries of forward neutrons in $p$$+$$p$, $p$$+$Al, and $p$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV as a function of transverse and longitudinal momenta

    Authors: U. A. Acharya, C. Aidala, Y. Akiba, M. Alfred, V. Andrieux, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, N. S. Bandara, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau, J. S. Bok, V. Borisov, M. L. Brooks, J. Bryslawskyj, V. Bumazhnov , et al. (286 additional authors not shown)

    Abstract: In 2015 the PHENIX collaboration at the Relativistic Heavy Ion Collider recorded $p$$+$$p$, $p$$+$Al, and $p$$+$Au collision data at center of mass energies of $\sqrt{s_{_{NN}}}=200$ GeV with the proton beam(s) transversely polarized. At very forward rapidities $η>6.8$ relative to the polarized proton beam, neutrons were detected either inclusively or in (anti)correlation with detector activity re… ▽ More

    Submitted 9 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 311 authors from 68 institutions, 12 pages, 8 figures, 2015 data. v2 is version accepted for publication in Physical Review D. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. D 105, 032004 (2022)

  40. arXiv:2110.06280  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

    Authors: Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda

    Abstract: This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech representation (S3R) is valuable in its potential to replace the expensive supervised representation adopted by state-of-the-art VC systems. Moreover, we claim that VC is a good probing task for S3R analysis. In this work, we… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022. Code available at: https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/a2o-vc-vcc2020

  41. arXiv:2110.05571  [pdf, other

    eess.AS cs.CL

    SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

    Authors: **g Pan, Tao Lei, Kwangyoun Kim, Kyu Han, Shinji Watanabe

    Abstract: The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fa… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  42. arXiv:2110.05249  [pdf, other

    eess.AS cs.CL cs.SD

    A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

    Authors: Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

    Abstract: Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we con… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to ASRU2021

  43. arXiv:2110.04694  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-Channel End-to-End Neural Diarization with Distributed Microphones

    Authors: Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

    Abstract: Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of t… ▽ More

    Submitted 28 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  44. arXiv:2110.04590  [pdf, other

    cs.CL cs.SD eess.AS

    An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

    Authors: Xuankai Chang, Takashi Maekaku, Pengcheng Guo, **g Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe

    Abstract: Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on evaluating the quality of self-supervised pretrained representations on various tasks without domain restriction, e.g. SUPERB. However, such evaluations… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: To appear in ASRU2021

  45. arXiv:2110.00285  [pdf, ps, other

    math.CO

    Independence and orthogonality of algebraic eigenvectors over the max-plus algebra

    Authors: Yuki Nishida, Sennosuke Watanabe, Yoshihide Watanabe

    Abstract: The max-plus algebra $\mathbb{R}\cup \{-\infty \}$ is a semiring with the two operations: addition $a \oplus b := \max(a,b)$ and multiplication $a \otimes b := a + b$. Roots of the characteristic polynomial of a max-plus matrix are called algebraic eigenvalues. Recently, algebraic eigenvectors with respect to algebraic eigenvalues were introduced as a generalized concept of eigenvectors. In this p… ▽ More

    Submitted 3 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 29 pages, 1 figure

    MSC Class: 15A16; 15A80

  46. arXiv:2109.12804  [pdf, other

    eess.AS cs.CL cs.SD

    Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

    Authors: Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe

    Abstract: The multi-decoder (MD) end-to-end speech translation model has demonstrated high translation quality by searching for better intermediate automatic speech recognition (ASR) decoder states as hidden intermediates (HI). It is a two-pass decoding model decomposing the overall task into ASR and machine translation sub-tasks. However, the decoding speed is not fast enough for real-world applications be… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Accepted at IEEE ASRU 2021

  47. Diffuse Supernova Neutrino Background Search at Super-Kamiokande

    Authors: Super-Kamiokande Collaboration, :, K. Abe, C. Bronner, Y. Hayato, K. Hiraide, M. Ikeda, S. Imaizumi, J. Kameda, Y. Kanemura, Y. Kataoka, S. Miki, M. Miura, S. Moriyama, Y. Nagao, M. Nakahata, S. Nakayama, T. Okada, K. Okamoto, A. Orii, G. Pronost, H. Sekiya, M. Shiozawa, Y. Sonoda, Y. Suzuki , et al. (197 additional authors not shown)

    Abstract: A new search for the diffuse supernova neutrino background (DSNB) flux has been conducted at Super-Kamiokande (SK), with a $22.5\times2970$-kton$\cdot$day exposure from its fourth operational phase IV. The new analysis improves on the existing background reduction techniques and systematic uncertainties and takes advantage of an improved neutron tagging algorithm to lower the energy threshold comp… ▽ More

    Submitted 2 November, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

    Comments: 42 pages, 37 figures, 14 tables

  48. arXiv:2109.04411  [pdf, other

    eess.AS cs.CL cs.SD

    Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

    Authors: Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

    Abstract: This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models. End-to-end speech translation models have several advantages over traditional cascade systems such as inference latency reduction. However, conventional AR decoding methods are not fast enough because each token is generated incrementally. NAR models, however, can accelera… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

  49. arXiv:2109.03868  [pdf, ps, other

    math-ph cond-mat.supr-con

    An operator-theoretical study on the BCS-Bogoliubov model of superconductivity near absolute zero temperature

    Authors: Shuji Watanabe

    Abstract: In the preceding papers the present author gave another proof of the existence and uniqueness of the solution to the BCS-Bogoliubov gap equation for superconductivity from the viewpoint of operator theory, and showed that the solution is partially differentiable with respect to the temperature twice. Thanks to these results, we can indeed partially differentiate the solution and the thermodynamic… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 9 pages

    MSC Class: 45G10; 47H10; 47N50; 82D55

    Journal ref: Scientific Reports 11 (2021), 15983

  50. arXiv:2109.00360  [pdf, other

    physics.ins-det astro-ph.IM

    First Gadolinium Loading to Super-Kamiokande

    Authors: K. Abe, C. Bronner, Y. Hayato, K. Hiraide, M. Ikeda, S. Imaizumi, J. Kameda, Y. Kanemura, Y. Kataoka, S. Miki, M. Miura, S. Moriyama, Y. Nagao, M. Nakahata, S. Nakayama, T. Okada, K. Okamoto, A. Orii, G. Pronost, H. Sekiya, M. Shiozawa, Y. Sonoda, Y. Suzuki, A. Takeda, Y. Takemoto , et al. (192 additional authors not shown)

    Abstract: In order to improve Super-Kamiokande's neutron detection efficiency and to thereby increase its sensitivity to the diffuse supernova neutrino background flux, 13 tons of $\rm Gd_2(\rm SO_4)_3\cdot \rm 8H_2O$ (gadolinium sulfate octahydrate) was dissolved into the detector's otherwise ultrapure water from July 14 to August 17, 2020, marking the start of the SK-Gd phase of operations. During the loa… ▽ More

    Submitted 15 December, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: 37 pages, 19 Figures, Accepted for publication in Nucl. Instrum. Meth. A

    Journal ref: Nuclear Inst. and Methods in Physics Research, A 1027 (2022) 166248