Search | arXiv e-print repository

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2312.15946 [pdf, other]

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

Authors: Bo Han, Yi Ren, Hao Peng, Teng Zhang, Zeyu Ling, Xiang Yin, Feilin Han

Abstract: The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make… ▽ More The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.10155 [pdf, ps, other]

Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

Authors: Feng Han, **gang Yi

Abstract: External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC mod… ▽ More External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC modeling structure. Two GP-based learning controllers are presented by using the EIC structure property. The partial EIC (PEIC)-based control design partitions the robotic dynamics into a fully actuated subsystem and one reduced-order underactuated system. The null-space EIC (NEIC)-based control compensates for the uncontrolled motion in a subspace, while the other closed-loop dynamics are not affected. Under the PEIC- and NEIC-based, the tracking and balance tasks are guaranteed and convergence rate and bounded errors are achieved without causing any uncontrolled motion by the original EIC-based control. We validate the results and demonstrate the GP-based learning control design performance using two inverted pendulum platforms. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2310.11178 [pdf, other]

FocDepthFormer: Transformer with LSTM for Depth Estimation from Focus

Authors: Xueyang Kang, Fengze Han, Abdur Fayjie, Dong Gong

Abstract: Depth estimation from focal stacks is a fundamental computer vision problem that aims to infer depth from focus/defocus cues in the image stacks. Most existing methods tackle this problem by applying convolutional neural networks (CNNs) with 2D or 3D convolutions over a set of fixed stack images to learn features across images and stacks. Their performance is restricted due to the local properties… ▽ More Depth estimation from focal stacks is a fundamental computer vision problem that aims to infer depth from focus/defocus cues in the image stacks. Most existing methods tackle this problem by applying convolutional neural networks (CNNs) with 2D or 3D convolutions over a set of fixed stack images to learn features across images and stacks. Their performance is restricted due to the local properties of the CNNs, and they are constrained to process a fixed number of stacks consistent in train and inference, limiting the generalization to the arbitrary length of stacks. To handle the above limitations, we develop a novel Transformer-based network, FocDepthFormer, composed mainly of a Transformer with an LSTM module and a CNN decoder. The self-attention in Transformer enables learning more informative features via an implicit non-local cross reference. The LSTM module is learned to integrate the representations across the stack with arbitrary images. To directly capture the low-level features of various degrees of focus/defocus, we propose to use multi-scale convolutional kernels in an early-stage encoder. Benefiting from the design with LSTM, our FocDepthFormer can be pre-trained with abundant monocular RGB depth estimation data for visual pattern capturing, alleviating the demand for the hard-to-collect focal stack data. Extensive experiments on various focal stack benchmark datasets show that our model outperforms the state-of-the-art models on multiple metrics. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 20 pages, 18 figures, journal paper

ACM Class: I.4.9; I.2.10

arXiv:2309.15784 [pdf, other]

Gaussian Process-Enhanced, External and Internal Convertible (EIC) Form-Based Control of Underactuated Balance Robots

Authors: Feng Han, **gang Yi

Abstract: External and internal convertible (EIC) form-based motion control (i.e., EIC-based control) is one of the effective approaches for underactuated balance robots. By sequentially controller design, trajectory tracking of the actuated subsystem and balance of the unactuated subsystem can be achieved simultaneously. However, with certain conditions, there exists uncontrolled robot motion under the EIC… ▽ More External and internal convertible (EIC) form-based motion control (i.e., EIC-based control) is one of the effective approaches for underactuated balance robots. By sequentially controller design, trajectory tracking of the actuated subsystem and balance of the unactuated subsystem can be achieved simultaneously. However, with certain conditions, there exists uncontrolled robot motion under the EIC-based control. We first identify these conditions and then propose an enhanced EIC-based control with a Gaussian process data-driven robot dynamic model. Under the new enhanced EIC-based control, the stability and performance of the closed-loop system is guaranteed. We demonstrate the GP-enhanced EIC-based control experimentally using two examples of underactuated balance robots. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2305.00250 [pdf, other]

A Direct Sampling-Based Deep Learning Approach for Inverse Medium Scattering Problems

Authors: Jianfeng Ning, Fuqun Han, Jun Zou

Abstract: In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the… ▽ More In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the relation between the index functions and the true contrasts. Our proposed DSM-DL is computationally efficient, robust to noise, easy to implement, and able to naturally incorporate multiple measured data to achieve high-quality reconstructions. Some representative tests are carried out with varying numbers of incident waves and different noise levels to evaluate the performance of the proposed method. The results demonstrate the promising benefits of combining deep learning techniques with the DSM for IMSP. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2303.07089

Range Resolution Enhanced Method with Spectral Properties for Hyperspectral Lidar

Authors: Yuhao Xia, Shilong Xu, Hui Shao, Ahui Hou, Jiajie Fang, Fei Han, Youlong Chen, Jiaqi Wen, Yuwei Chen, Yihua Hu

Abstract: Waveform decomposition is needed as a first step in the extraction of various types of geometric and spectral information from hyperspectral full-waveform LiDAR echoes. We present a new approach to deal with the "Pseudo-monopulse" waveform formed by the overlapped waveforms from multi-targets when they are very close. We use one single skew-normal distribution (SND) model to fit waveforms of all s… ▽ More Waveform decomposition is needed as a first step in the extraction of various types of geometric and spectral information from hyperspectral full-waveform LiDAR echoes. We present a new approach to deal with the "Pseudo-monopulse" waveform formed by the overlapped waveforms from multi-targets when they are very close. We use one single skew-normal distribution (SND) model to fit waveforms of all spectral channels first and count the geometric center position distribution of the echoes to decide whether it contains multi-targets. The geometric center position distribution of the "Pseudo-monopulse" presents aggregation and asymmetry with the change of wavelength, while such an asymmetric phenomenon cannot be found from the echoes of the single target. Both theoretical and experimental data verify the point. Based on such observation, we further propose a hyperspectral waveform decomposition method utilizing the SND mixture model with: 1) initializing new waveform component parameters and their ranges based on the distinction of the three characteristics (geometric center position, pulse width, and skew-coefficient) between the echo and fitted SND waveform and 2) conducting single-channel waveform decomposition for all channels and 3) setting thresholds to find outlier channels based on statistical parameters of all single-channel decomposition results (the standard deviation and the means of geometric center position) and 4) re-conducting single-channel waveform decomposition for these outlier channels. The proposed method significantly improves the range resolution from 60cm to 5cm at most for a 4ns width laser pulse and represents the state-of-the-art in "Pseudo-monopulse" waveform decomposition. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

arXiv:2301.09080 [pdf, other]

Dance2MIDI: Dance-driven multi-instruments music generation

Authors: Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han

Abstract: Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instruments scenario is under-explored. The challenges associated with the dance-driven multi-instrument music (MIDI) generation are twofold: 1) no publicly available multi-instruments MIDI and video paired dataset and 2) the weak co… ▽ More Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instruments scenario is under-explored. The challenges associated with the dance-driven multi-instrument music (MIDI) generation are twofold: 1) no publicly available multi-instruments MIDI and video paired dataset and 2) the weak correlation between music and video. To tackle these challenges, we build the first multi-instruments MIDI and dance paired dataset (D2MIDI). Based on our proposed dataset, we introduce a multi-instruments MIDI generation framework (Dance2MIDI) conditioned on dance video. Specifically, 1) to capture the relationship between dance and music, we employ the Graph Convolutional Network to encode the dance motion. This allows us to extract features related to dance movement and dance style, 2) to generate a harmonious rhythm, we utilize a Transformer model to decode the drum track sequence, leveraging a cross-attention mechanism, and 3) we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task. A BERT-like model is employed to comprehend the context of the entire music piece through self-supervised learning. We evaluate the generated music of our framework trained on the D2MIDI dataset and demonstrate that our method achieves State-of-the-Art performance. △ Less

Submitted 27 February, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: has been accepted by Computational Visual Media Journal

arXiv:2205.14029 [pdf]

Lesion classification by model-based feature extraction: A differential affine invariant model of soft tissue elasticity

Authors: Weiguo Cao, Marc J. Pomeroy, Zhengrong Liang, Yongfeng Gao, Yongyi Shi, Jiaxing Tan, Fangfang Han, **g Wang, Jianhua Ma, Hongbin Lu, Almas F. Abbasi, Perry J. Pickhardt

Abstract: The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomogra… ▽ More The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomography (CT) imaging modality for model-based feature extraction machine learning (ML) differentiation of lesions. The model describes a dynamic non-rigid (or elastic) deformation in differential manifold to mimic the soft tissues elasticity under wave fluctuation in vivo. Based on the model, three local deformation invariants are constructed by two tensors defined by the first and second order derivatives from the CT images and used to generate elastic feature maps after normalization via a novel signal suppression method. The model-based elastic image features are extracted from the feature maps and fed to machine learning to perform lesion classifications. Two pathologically proven image datasets of colon polyps (44 malignant and 43 benign) and lung nodules (46 malignant and 20 benign) were used to evaluate the proposed model-based lesion classification. The outcomes of this modeling approach reached the score of area under the curve of the receiver operating characteristics of 94.2 % for the polyps and 87.4 % for the nodules, resulting in an average gain of 5 % to 30 % over ten existing state-of-the-art lesion classification methods. The gains by modeling tissue elasticity for ML differentiation of lesions are striking, indicating the great potential of exploring the modeling strategy to other tissue properties for ML differentiation of lesions. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 12 pages, 4 figures, 3 tables

arXiv:2204.01101 [pdf, ps, other]

Learning-Based Safe Motion Control of Vehicle Ski-Stunt Maneuvers

Authors: Feng Han, **gang Yi

Abstract: This paper presents a safety guaranteed control method for an autonomous vehicle ski-stunt maneuver, that is, a vehicle moving with two one-side wheels. To capture the vehicle dynamics precisely, a Gaussian process model is used as additional correction to the nominal model that is obtained from physical principles. We construct a probabilistic control barrier function (CBF) to guarantee the plana… ▽ More This paper presents a safety guaranteed control method for an autonomous vehicle ski-stunt maneuver, that is, a vehicle moving with two one-side wheels. To capture the vehicle dynamics precisely, a Gaussian process model is used as additional correction to the nominal model that is obtained from physical principles. We construct a probabilistic control barrier function (CBF) to guarantee the planar motion safety. The CBF and the balance equilibrium manifold are enforced as the constraints into a safety critical control form. Under the proposed control method, the vehicle avoids the obstacle collision and safely maintain the balance for autonomous ski-stunt maneuvers. We conduct numerical simulation validation to demonstrate the control design. Preliminary experiment results are also presented to confirm the learning-based motion control using a scaled RC truck for autonomous ski-stunt maneuvers. △ Less

Submitted 3 April, 2022; originally announced April 2022.

arXiv:2203.11777 [pdf, other]

Autonomous Bikebot Control for Crossing Obstacles with Assistive Leg Impulsive Actuation

Authors: Feng Han, Xinyan Huang, Zenghao Wang, **gang Yi, Tao Liu

Abstract: As a single-track mobile platform, bikebot (i.e., bicycle-based robot) has attractive navigation capability to pass through narrow, off-road terrain with high-speed and high-energy efficiency. However, running crossing step-like obstacles creates challenges for intrinsically unstable, underactuated bikebots. This paper presents a novel autonomous bikebot control with assistive leg actuation to nav… ▽ More As a single-track mobile platform, bikebot (i.e., bicycle-based robot) has attractive navigation capability to pass through narrow, off-road terrain with high-speed and high-energy efficiency. However, running crossing step-like obstacles creates challenges for intrinsically unstable, underactuated bikebots. This paper presents a novel autonomous bikebot control with assistive leg actuation to navigate crossing obstacles. The proposed design integrates the external/internal convertible-based control with leg-assisted impulse control. The leg-terrain interaction generates assistive impulsive torques to help maintain the navigation and balance capability when running across obstacles. The control performance is analyzed and guaranteed. The experimental results confirm that under the control design, the bikebot can smoothly run crossing multiple step-like obstacles with height more than one third of the wheel radius. The comparison results demonstrate the superior performance than those under only the velocity and steering control without leg assistive impulsive actuation. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.10210 [pdf, ps, other]

Coordinated Pose Control of Mobile Manipulation with an Unstable Bikebot Platform

Authors: Feng Han, Alborz Jelvani, **gang Yi, Tao Liu

Abstract: Bikebot manipulation has advantages of the single-track robot mobility and manipulation dexterity. We present a coordinated pose control of mobile manipulation with the stationary bikebot. The challenges of the bikebot manipulation include the limited steering balance capability of the unstable bikebot and kinematic redundancy of the manipulator. We first present the steering balance model to anal… ▽ More Bikebot manipulation has advantages of the single-track robot mobility and manipulation dexterity. We present a coordinated pose control of mobile manipulation with the stationary bikebot. The challenges of the bikebot manipulation include the limited steering balance capability of the unstable bikebot and kinematic redundancy of the manipulator. We first present the steering balance model to analyze and explore the maximum steering capability to balance the stationary platform. A balancing equilibrium manifold is then proposed to describe the necessary condition to fulfill the simultaneous platform balance and posture control of the end-effector. A coordinated planning and control design is presented to determine the balance-prioritized posture control under kinematic and dynamic constraints. Extensive experiments are conducted to demonstrate the mechatronic design for autonomous plant inspection in agricultural applications. The results confirm the feasibility to use the bikebot manipulation for a plant inspection with end-effector position and orientation errors about 5 mm and 0.3 degs, respectively. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:1912.10269 [pdf]

UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing

Authors: Nan Wang, Yabin Zhou, Fenglei Han, Haitao Zhu, **gzheng Yao

Abstract: In real-world underwater environment, exploration of seabed resources, underwater archaeology, and underwater fishing rely on a variety of sensors, vision sensor is the most important one due to its high information content, non-intrusive, and passive nature. However, wavelength-dependent light attenuation and back-scattering result in color distortion and haze effect, which degrade the visibility… ▽ More In real-world underwater environment, exploration of seabed resources, underwater archaeology, and underwater fishing rely on a variety of sensors, vision sensor is the most important one due to its high information content, non-intrusive, and passive nature. However, wavelength-dependent light attenuation and back-scattering result in color distortion and haze effect, which degrade the visibility of images. To address this problem, firstly, we proposed an unsupervised generative adversarial network (GAN) for generating realistic underwater images (color distortion and haze effect) from in-air image and depth map pairs based on improved underwater imaging model. Secondly, U-Net, which is trained efficiently using synthetic underwater dataset, is adopted for color restoration and dehazing. Our model directly reconstructs underwater clear images using end-to-end autoencoder networks, while maintaining scene content structural similarity. The results obtained by our method were compared with existing methods qualitatively and quantitatively. Experimental results obtained by the proposed model demonstrate well performance on open real-world underwater datasets, and the processing speed can reach up to 125FPS running on one NVIDIA 1060 GPU. Source code, sample datasets are made publicly available at https://github.com/infrontofme/UWGAN_UIE. △ Less

Submitted 26 March, 2021; v1 submitted 21 December, 2019; originally announced December 2019.

Comments: 10 pages, 8 figures

Showing 1–13 of 13 results for author: Han, F