Search | arXiv e-print repository

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Authors: Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a… ▽ More Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. Additionally, it provides interactive visualizations and demonstrations of classic models for educational purposes. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. △ Less

Submitted 22 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Amphion Website: https://github.com/open-mmlab/Amphion

arXiv:2209.12642 [pdf]

Design of Automatic Driving Safety Level and Positioning Accuracy

Authors: Tiantian Tang, Hao Xu, Chengcheng Wu, Sijie Lye, Yan Xiang

Abstract: Autonomous driving is a hot research topic in the frontier of science and technology. Technology companies and traditional car companies are develo** and designing autonomous driving technology from two different directions. Based on the automatic driving classification standard and ISO safety level, combined with the number of traffic accidents and death data in China, and referring to the risk… ▽ More Autonomous driving is a hot research topic in the frontier of science and technology. Technology companies and traditional car companies are develo** and designing autonomous driving technology from two different directions. Based on the automatic driving classification standard and ISO safety level, combined with the number of traffic accidents and death data in China, and referring to the risk allocation method of the automated driving virtual drive system in the United States, the risk allocation of China's virtual drive system will be carried out. In addition, combined with the vehicle "positioning box" model, the theoretical calculation of the alarm limit of positioning accuracy in China will be carried out and the positioning accuracy requirements of related vehicles will be designed. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: in Chinese language

arXiv:2103.14297 [pdf, other]

CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier

Authors: Tiantian Tang, Xinyuan Zhou, Yanhua Long, Yijie Li, Jiaen Liang

Abstract: Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications. In this study, we propose a novel CNN-based discriminative training framework as a domain compensation method to handle this issue. It uses a parallel CNN-based discriminator to learn a pair of high-level intermediate acoustic representations. Togethe… ▽ More Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications. In this study, we propose a novel CNN-based discriminative training framework as a domain compensation method to handle this issue. It uses a parallel CNN-based discriminator to learn a pair of high-level intermediate acoustic representations. Together with a binary discriminative loss, the discriminators are forced to maximally exploit the discrimination of heterogeneous acoustic information in each audio clip with target events, which results in a robust paired representations that can well discriminate the target events and background/domain variations separately. Moreover, to better learn the transient characteristics of target events, a frame-wise classifier is designed to perform the final classification. In addition, a two-stage training with the CNN-based discriminator initialization is further proposed to enhance the system training. All experiments are performed on the DCASE 2018 Task3 datasets. Results show that our proposal significantly outperforms the official baseline on cross-domain conditions in AUC by relative $1.8-12.1$% without any performance degradation on in-domain evaluation conditions. △ Less

Submitted 26 March, 2021; originally announced March 2021.

arXiv:2009.10543 [pdf]

An unnoticed side effect of electric vehicles

Authors: Tao Wang, Ying Yang, Tieqiao Tang, Xiaobo Qu

Abstract: We illustrate that the electrification of our transport system might impose unnecessary extra congestion and delay for daily commuting passengers. By modelling travel behaviors of these passengers, it is found that more of them tend to depart at a narrower peak-hour time window. The occurrence of this shift is mainly caused by (1) the energy consumption of electric vehicles (EVs) is much lower tha… ▽ More We illustrate that the electrification of our transport system might impose unnecessary extra congestion and delay for daily commuting passengers. By modelling travel behaviors of these passengers, it is found that more of them tend to depart at a narrower peak-hour time window. The occurrence of this shift is mainly caused by (1) the energy consumption of electric vehicles (EVs) is much lower than that of traditional vehicles and (2) the energy consumption of EVs is less sensitive to congestion than that of traditional vehicles. We further examine the role of congestion toll in minimizing the extra congestion and delay. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: 11 pages, 3 figures

arXiv:2006.03169 [pdf, other]

Fast CRDNN: Towards on Site Training of Mobile Construction Machines

Authors: Yusheng Xiang, Tian Tang, Tianqing Su, Christine Brach, Libo Liu, Samuel Mao, Marcus Geimer

Abstract: The CRDNN is a combined neural network that can increase the holistic efficiency of torque based mobile working machines by about 9% by means of accurately detecting the truck loading cycles. On the one hand, it is a robust but offline learning algorithm so that it is more accurate and much quicker than the previous methods. However, on the other hand, its accuracy can not always be guaranteed bec… ▽ More The CRDNN is a combined neural network that can increase the holistic efficiency of torque based mobile working machines by about 9% by means of accurately detecting the truck loading cycles. On the one hand, it is a robust but offline learning algorithm so that it is more accurate and much quicker than the previous methods. However, on the other hand, its accuracy can not always be guaranteed because of the diversity of the mobile machines industry and the nature of the offline method. To address the problem, we utilize the transfer learning algorithm and the Internet of Things (IoT) technology. Concretely, the CRDNN is first trained by computer and then saved in the on-board ECU. In case that the pre-trained CRDNN is not suitable for the new machine, the operator can label some new data by our App connected to the on-board ECU of that machine through Bluetooth. With the newly labeled data, we can directly further train the pretrained CRDNN on the ECU without overloading since transfer learning requires less computation effort than training the networks from scratch. In our paper, we prove this idea and show that CRDNN is always competent, with the help of transfer learning and IoT technology by field experiment, even the new machine may have a different distribution. Also, we compared the performance of other SOTA multivariate time series algorithms on predicting the working state of the mobile machines, which denotes that the CRDNNs are still the most suitable solution. As a by-product, we build up a human-machine communication system to label the dataset, which can be operated by engineers without knowledge about Artificial Intelligence (AI). △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 15 pages, 18 figures

arXiv:2001.03233 [pdf, other]

RSL-Net: Localising in Satellite Images From a Radar on the Ground

Authors: Tim Y. Tang, Daniele De Martini, Dan Barnes, Paul Newman

Abstract: This paper is about localising a vehicle in an overhead image using FMCW radar mounted on a ground vehicle. FMCW radar offers extraordinary promise and efficacy for vehicle localisation. It is impervious to all weather types and lighting conditions. However the complexity of the interactions between millimetre radar wave and the physical environment makes it a challenging domain. Infrastructure-fr… ▽ More This paper is about localising a vehicle in an overhead image using FMCW radar mounted on a ground vehicle. FMCW radar offers extraordinary promise and efficacy for vehicle localisation. It is impervious to all weather types and lighting conditions. However the complexity of the interactions between millimetre radar wave and the physical environment makes it a challenging domain. Infrastructure-free large-scale radar-based localisation is in its infancy. Typically here a map is built and suitable techniques, compatible with the nature of sensor, are brought to bear. In this work we eschew the need for a radar-based map; instead we simply use an overhead image -- a resource readily available everywhere. This paper introduces a method that not only naturally deals with the complexity of the signal type but does so in the context of cross modal processing. △ Less

Submitted 6 February, 2020; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L)

Showing 1–6 of 6 results for author: Tang, T