-
Input Decoupling of Lagrangian Systems via Coordinate Transformation: General Characterization and its Application to Soft Robotics
Authors:
Pietro Pustina,
Cosimo Della Santina,
Frédéric Boyer,
Alessandro De Luca,
Federico Renda
Abstract:
Suitable representations of dynamical systems can simplify their analysis and control. On this line of thought, this paper aims to answer the following question: Can a transformation of the generalized coordinates under which the actuators directly perform work on a subset of the configuration variables be found? Not only we show that the answer to this question is yes, but we also provide necessa…
▽ More
Suitable representations of dynamical systems can simplify their analysis and control. On this line of thought, this paper aims to answer the following question: Can a transformation of the generalized coordinates under which the actuators directly perform work on a subset of the configuration variables be found? Not only we show that the answer to this question is yes, but we also provide necessary and sufficient conditions. More specifically, we look for a representation of the configuration space such that the right-hand side of the dynamics in Euler-Lagrange form becomes $[\boldsymbol{I} \; \boldsymbol{O}]^{T}\boldsymbol{u}$, being $u$ the system input. We identify a class of systems, called collocated, for which this problem is solvable. Under mild conditions on the input matrix, a simple test is presented to verify whether a system is collocated or not. By exploiting power invariance, we provide necessary and sufficient conditions that a change of coordinates decouples the input channels if and only if the dynamics is collocated. In addition, we use the collocated form to derive novel controllers for damped underactuated mechanical systems. To demonstrate the theoretical findings, we consider several Lagrangian systems with a focus on continuum soft robots.
△ Less
Submitted 23 February, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Authors:
Florian Boyer,
Yusuke Shinohara,
Takaaki Ishii,
Hirofumi Inaguma,
Shinji Watanabe
Abstract:
In this study, we present recent developments of models trained with the RNN-T loss in ESPnet. It involves the use of various architectures such as recently proposed Conformer, multi-task learning with different auxiliary criteria and multiple decoding strategies, including our own proposition. Through experiments and benchmarks, we show that our proposed systems can be competitive against other s…
▽ More
In this study, we present recent developments of models trained with the RNN-T loss in ESPnet. It involves the use of various architectures such as recently proposed Conformer, multi-task learning with different auxiliary criteria and multiple decoding strategies, including our own proposition. Through experiments and benchmarks, we show that our proposed systems can be competitive against other state-of-art systems on well-known datasets such as LibriSpeech and AISHELL-1. Additionally, we demonstrate that these models are promising against other already implemented systems in ESPnet in regards to both performance and decoding speed, enabling the possibility to have powerful systems for a streaming task. With these additions, we hope to expand the usefulness of the ESPnet toolkit for the research community and also give tools for the ASR industry to deploy our systems in realistic and production environments.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Authors:
Shinji Watanabe,
Florian Boyer,
Xuankai Chang,
Pengcheng Guo,
Tomoki Hayashi,
Yosuke Higuchi,
Takaaki Hori,
Wen-Chin Huang,
Hirofumi Inaguma,
Naoyuki Kamo,
Shigeki Karita,
Chenda Li,
**g Shi,
Aswin Shanmugam Subramanian,
Wangyou Zhang
Abstract:
This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequence-to-sequence modeling. The project has grown rapidly and now covers a wide range of speech processing applications. Now ESPnet also includes text…
▽ More
This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequence-to-sequence modeling. The project has grown rapidly and now covers a wide range of speech processing applications. Now ESPnet also includes text to speech (TTS), voice conversation (VC), speech translation (ST), and speech enhancement (SE) with support for beamforming, speech separation, denoising, and dereverberation. All applications are trained in an end-to-end manner, thanks to the generic sequence to sequence modeling properties, and they can be further integrated and jointly optimized. Also, ESPnet provides reproducible all-in-one recipes for these applications with state-of-the-art performance in various benchmarks by incorporating transformer, advanced data augmentation, and conformer. This project aims to provide up-to-date speech processing experience to the community so that researchers in academia and various industry scales can develop their technologies collaboratively.
△ Less
Submitted 23 December, 2020;
originally announced December 2020.
-
Recent Developments on ESPnet Toolkit Boosted by Conformer
Authors:
Pengcheng Guo,
Florian Boyer,
Xuankai Chang,
Tomoki Hayashi,
Yosuke Higuchi,
Hirofumi Inaguma,
Naoyuki Kamo,
Chenda Li,
Daniel Garcia-Romero,
Jiatong Shi,
**g Shi,
Shinji Watanabe,
Kun Wei,
Wangyou Zhang,
Yuekai Zhang
Abstract:
In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-…
▽ More
In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
△ Less
Submitted 29 October, 2020; v1 submitted 26 October, 2020;
originally announced October 2020.
-
End-to-End Speech Recognition: A review for the French Language
Authors:
Florian Boyer,
Jean-Luc Rouas
Abstract:
Recently, end-to-end ASR based either on sequence-to-sequence networks or on the CTC objective function gained a lot of interest from the community, achieving competitive results over traditional systems using robust but complex pipelines. One of the main features of end-to-end systems, in addition to the ability to free themselves from extra linguistic resources such as dictionaries or language m…
▽ More
Recently, end-to-end ASR based either on sequence-to-sequence networks or on the CTC objective function gained a lot of interest from the community, achieving competitive results over traditional systems using robust but complex pipelines. One of the main features of end-to-end systems, in addition to the ability to free themselves from extra linguistic resources such as dictionaries or language models, is the capacity to model acoustic units such as characters, subwords or directly words; opening up the capacity to directly translate speech with different representations or levels of knowledge depending on the target language. In this paper we propose a review of the existing end-to-end ASR approaches for the French language. We compare results to conventional state-of-the-art ASR systems and discuss which units are more suited to model the French language.
△ Less
Submitted 23 October, 2019; v1 submitted 18 October, 2019;
originally announced October 2019.