-
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
Authors:
Yu-Hua Chen,
Woosung Choi,
Wei-Hsiang Liao,
Marco Martínez-Ramírez,
Kin Wai Cheuk,
Yuki Mitsufuji,
Jyh-Shing Roger Jang,
Yi-Hsuan Yang
Abstract:
Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work…
▽ More
Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work done by Wright et al. has explored the potential of leveraging unpaired data for training, using a generative adversarial network (GAN)-based framework. This paper extends their work by using more advanced discriminators in the GAN, and using more unpaired data for training. Specifically, drawing inspiration from recent advancements in neural vocoders, we employ in our GAN-based model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD). Moreover, we experiment with adding unprocessed audio signals that do not have the corresponding rendered audio of a target tone to the training data, to see how much the GAN model benefits from the unpaired data. Our experiments show that the proposed two extensions contribute to the modeling of both low-gain and high-gain guitar amplifiers.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases
Authors:
Xiangde Luo,
Zihan Li,
Shaoting Zhang,
Wenjun Liao,
Guotai Wang
Abstract:
Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot…
▽ More
Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annotations) from 413 patients each with 17 (female) or 19 (male) labelled organs, manually delineated by oncologists. We grouped scans based on clinical information into 1) diagnosis/radiotherapy (317 volumes), 2) partial excision without the whole organ missing (22 volumes), and 3) excision with the whole organ missing (74 volumes). RAOS provides a potential benchmark for evaluating model robustness including organ hallucination. It also includes some organs that can be very hard to access on public datasets like the rectum, colon, intestine, prostate and seminal vesicles. We benchmarked several state-of-the-art methods in these three clinical groups to evaluate performance and robustness. We also assessed cross-generalization between RAOS and three public datasets. This dataset and comprehensive analysis establish a potential baseline for future robustness research: \url{https://github.com/Luoxd1996/RAOS}.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
SilentCipher: Deep Audio Watermarking
Authors:
Mayank Kumar Singh,
Naoya Takahashi,
Weihsiang Liao,
Yuki Mitsufuji
Abstract:
In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study…
▽ More
In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Authors:
Yixiao Zhang,
Yukara Ikemiya,
Woosung Choi,
Naoki Murata,
Marco A. Martínez-Ramírez,
Liwei Lin,
Gus Xia,
Wei-Hsiang Liao,
Yuki Mitsufuji,
Simon Dixon
Abstract:
Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; o…
▽ More
Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; other research uses large language models to predict edited music, resulting in imprecise audio reconstruction. To Combine the strengths and address these limitations, we introduce Instruct-MusicGen, a novel approach that finetunes a pretrained MusicGen model to efficiently follow editing instructions such as adding, removing, or separating stems. Our approach involves a modification of the original MusicGen architecture by incorporating a text fusion module and an audio fusion module, which allow the model to process instruction texts and audio inputs concurrently and yield the desired edited music. Remarkably, Instruct-MusicGen only introduces 8% new parameters to the original MusicGen model and only trains for 5K steps, yet it achieves superior performance across all tasks compared to existing baselines, and demonstrates performance comparable to the models trained for specific tasks. This advancement not only enhances the efficiency of text-to-music editing but also broadens the applicability of music language models in dynamic music production environments.
△ Less
Submitted 29 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
Authors:
Hao Hao Tan,
Kin Wai Cheuk,
Taemin Cho,
Wei-Hsiang Liao,
Yuki Mitsufuji
Abstract:
This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, a…
▽ More
This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, and token shuffling are proposed. These methods are evaluated on the Slakh2100 dataset, demonstrating improved onset F1 scores and reduced instrument leakage. In addition to the conventional multi-instrument transcription F1 score, new metrics such as the instrument leakage ratio and the instrument detection F1 score are introduced for a more comprehensive assessment of transcription quality. The study also explores the issue of domain overfitting by evaluating MT3 on single-instrument monophonic datasets such as ComMU and NSynth. The findings, along with the source code, are shared to facilitate future work aimed at refining token-based multi-instrument AMT models.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation
Authors:
Weibin Liao,
Yinghao Zhu,
Xinyuan Wang,
Chengwei Pan,
Yasha Wang,
Liantao Ma
Abstract:
UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Tr…
▽ More
UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Transformer architectures. Building upon this, we employ Mamba as a lightweight substitute for CNN and Transformer within UNet, aiming at tackling challenges stemming from computational resource limitations in real medical settings. To this end, we introduce the Lightweight Mamba UNet (LightM-UNet) that integrates Mamba and UNet in a lightweight framework. Specifically, LightM-UNet leverages the Residual Vision Mamba Layer in a pure Mamba fashion to extract deep semantic features and model long-range spatial dependencies, with linear computational complexity. Extensive experiments conducted on two real-world 2D/3D datasets demonstrate that LightM-UNet surpasses existing state-of-the-art literature. Notably, when compared to the renowned nnU-Net, LightM-UNet achieves superior segmentation performance while drastically reducing parameter and computation costs by 116x and 21x, respectively. This highlights the potential of Mamba in facilitating model lightweighting. Our code implementation is publicly available at https://github.com/MrBlankness/LightM-UNet.
△ Less
Submitted 11 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Authors:
Yixiao Zhang,
Yukara Ikemiya,
Gus Xia,
Naoki Murata,
Marco A. Martínez-Ramírez,
Wei-Hsiang Liao,
Yuki Mitsufuji,
Simon Dixon
Abstract:
Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and inst…
▽ More
Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.
△ Less
Submitted 28 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Improving the Accuracy and Interpretability of Neural Networks for Wind Power Forecasting
Authors:
Wenlong Liao,
Fernando Porte-Agel,
Jiannong Fang,
Birgitte Bak-Jensen,
Zhe Yang,
Gonghao Zhang
Abstract:
Deep neural networks (DNNs) are receiving increasing attention in wind power forecasting due to their ability to effectively capture complex patterns in wind data. However, their forecasted errors are severely limited by the local optimal weight issue in optimization algorithms, and their forecasted behavior also lacks interpretability. To address these two challenges, this paper firstly proposes…
▽ More
Deep neural networks (DNNs) are receiving increasing attention in wind power forecasting due to their ability to effectively capture complex patterns in wind data. However, their forecasted errors are severely limited by the local optimal weight issue in optimization algorithms, and their forecasted behavior also lacks interpretability. To address these two challenges, this paper firstly proposes simple but effective triple optimization strategies (TriOpts) to accelerate the training process and improve the model performance of DNNs in wind power forecasting. Then, permutation feature importance (PFI) and local interpretable model-agnostic explanation (LIME) techniques are innovatively presented to interpret forecasted behaviors of DNNs, from global and instance perspectives. Simulation results show that the proposed TriOpts not only drastically improve the model generalization of DNNs for both the deterministic and probabilistic wind power forecasting, but also accelerate the training process. Besides, the proposed PFI and LIME techniques can accurately estimate the contribution of each feature to wind power forecasting, which helps to construct feature engineering and understand how to obtain forecasted values for a given sample.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Authors:
Xiangde Luo,
Jia Fu,
Yunxin Zhong,
Shuolin Liu,
Bing Han,
Mehdi Astaraki,
Simone Bendazzoli,
Iuliana Toma-Dasu,
Yiwen Ye,
Ziyang Chen,
Yong Xia,
Yanzhou Su,
** Ye,
Junjun He,
Zhaohu Xing,
Hongqiu Wang,
Lei Zhu,
Kaixiang Yang,
Xin Fang,
Zhiwei Wang,
Chan Woong Lee,
Sang Joon Park,
Jaehee Chun,
Constantin Ulrich,
Klaus H. Maier-Hein
, et al. (17 additional authors not shown)
Abstract:
Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results…
▽ More
Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
An Explainable Framework for Machine learning-Based Reactive Power Optimization of Distribution Network
Authors:
Wenlong Liao,
Benjamin Schäfer,
Dalin Qin,
Gonghao Zhang,
Zhixian Wang,
Zhe Yang
Abstract:
To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of mach…
▽ More
To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of machine learning models. To address this issue, an explainable machine-learning framework is proposed to optimize the reactive power in distribution networks. Firstly, a Shapley additive explanation framework is presented to measure the contribution of each input feature to the solution of reactive power optimizations generated from machine learning models. Secondly, a model-agnostic approximation method is developed to estimate Shapley values, so as to avoid the heavy computational burden associated with direct calculations of Shapley values. The simulation results show that the proposed explainable framework can accurately explain the solution of the machine learning model-based reactive power optimization by using visual analytics, from both global and instance perspectives. Moreover, the proposed explainable framework is model-agnostic, and thus applicable to various models (e.g., neural networks).
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Explainable Modeling for Wind Power Forecasting: A Glass-Box Approach with High Accuracy
Authors:
Wenlong Liao,
Fernando Porte-Agel,
Jiannong Fang,
Birgitte Bak-Jensen,
Guangchun Ruan,
Zhe Yang
Abstract:
Machine learning models (e.g., neural networks) achieve high accuracy in wind power forecasting, but they are usually regarded as black boxes that lack interpretability. To address this issue, the paper proposes a glass-box approach that combines high accuracy with transparency for wind power forecasting. Specifically, the core is to sum up the feature effects by constructing shape functions, whic…
▽ More
Machine learning models (e.g., neural networks) achieve high accuracy in wind power forecasting, but they are usually regarded as black boxes that lack interpretability. To address this issue, the paper proposes a glass-box approach that combines high accuracy with transparency for wind power forecasting. Specifically, the core is to sum up the feature effects by constructing shape functions, which effectively map the intricate non-linear relationships between wind power output and input features. Furthermore, the forecasting model is enriched by incorporating interaction terms that adeptly capture interdependencies and synergies among the input features. The additive nature of the proposed glass-box approach ensures its interpretability. Simulation results show that the proposed glass-box approach effectively interprets the results of wind power forecasting from both global and instance perspectives. Besides, it outperforms most benchmark models and exhibits comparable performance to the best-performing neural networks. This dual strength of transparency and high accuracy positions the proposed glass-box approach as a compelling choice for reliable wind power forecasting.
△ Less
Submitted 26 February, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
On the Language Encoder of Contrastive Cross-modal Models
Authors:
Mengjie Zhao,
Junya Ono,
Zhi Zhong,
Chieh-Hsin Lai,
Yuhta Takida,
Naoki Murata,
Wei-Hsiang Liao,
Takashi Shibuya,
Hiromi Wakaki,
Yuki Mitsufuji
Abstract:
Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding…
▽ More
Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding training affect language encoder quality and cross-modal task performance. In VL pretraining, we found that sentence embedding training language encoder quality and aids in cross-modal tasks, improving contrastive VL models such as CyCLIP. In contrast, AL pretraining benefits less from sentence embedding training, which may result from the limited amount of pretraining data. We analyze the representation spaces to understand the strengths of sentence embedding training, and find that it improves text-space uniformity, at the cost of decreased cross-modal alignment.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
Authors:
Frank Cwitkowitz,
Kin Wai Cheuk,
Woosung Choi,
Marco A. Martínez-Ramírez,
Keisuke Toyama,
Wei-Hsiang Liao,
Yuki Mitsufuji
Abstract:
In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but th…
▽ More
In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to timbre-less audio, which can be interpreted as pitch salience. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods, while only requiring a small amount of annotated data.
△ Less
Submitted 24 January, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Authors:
Carlos Hernandez-Olivan,
Koichi Saito,
Naoki Murata,
Chieh-Hsin Lai,
Marco A. Martínez-Ramirez,
Wei-Hsiang Liao,
Yuki Mitsufuji
Abstract:
Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrinsic properties, making it versatile across various restoration tasks. In this paper, we identify that there are potential iss…
▽ More
Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrinsic properties, making it versatile across various restoration tasks. In this paper, we identify that there are potential issues which will degrade current DPS-based methods' performance and introduce the way to mitigate the issues inspired by diverse diffusion guidance techniques including the RePaint (RP) strategy and the Pseudoinverse-Guided Diffusion Models ($Π$GDM). We demonstrate our methods for the vocal declip** and bandwidth extension tasks under various levels of distortion and cutoff frequency, respectively. In both tasks, our methods outperform the current DPS-based music restoration benchmarks. We refer to \url{http://carlosholivan.github.io/demos/audio-restoration-2023.html} for examples of the restored audio samples.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
Authors:
Giorgio Fabbro,
Stefan Uhlich,
Chieh-Hsin Lai,
Woosung Choi,
Marco Martínez-Ramírez,
Weihsiang Liao,
Igor Gadelha,
Geraldo Ramos,
Eddie Hsu,
Hugo Rodrigues,
Fabian-Robert Stöter,
Alexandre Défossez,
Yi Luo,
Jianwei Yu,
Dipam Chakraborty,
Sharada Mohanty,
Roman Solovyev,
Alexander Stempkovskiy,
Tatiana Habruseva,
Nabarun Goswami,
Tatsuya Harada,
Minseok Kim,
Jun Hyung Lee,
Yuanliang Dong,
Xinran Zhang
, et al. (2 additional authors not shown)
Abstract:
This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t…
▽ More
This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.
△ Less
Submitted 19 April, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
From CNN to Transformer: A Review of Medical Image Segmentation Models
Authors:
Wenjian Yao,
Jiajun Bai,
Wei Liao,
Yuheng Chen,
Mengjuan Liu,
Yao Xie
Abstract:
Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tas…
▽ More
Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tasks, transformer-based models like TransUNet have achieved desirable performance on multiple medical image segmentation datasets. In this paper, we conduct a survey of the most representative four medical image segmentation models in recent years. We theoretically analyze the characteristics of these models and quantitatively evaluate their performance on two benchmark datasets (i.e., Tuberculosis Chest X-rays and ovarian tumors). Finally, we discuss the main challenges and future trends in medical image segmentation. Our work can assist researchers in the related field to quickly establish medical segmentation models tailored to specific regions.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Authors:
Keisuke Toyama,
Taketo Akama,
Yukara Ikemiya,
Yuhta Takida,
Wei-Hsiang Liao,
Yuki Mitsufuji
Abstract:
Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term dependencies in the frequency and time axes. In this…
▽ More
Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term dependencies in the frequency and time axes. In this work, we propose hFT-Transformer, which is an automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture. The first hierarchy includes a convolutional block in the time axis, a Transformer encoder in the frequency axis, and a Transformer decoder that converts the dimension in the frequency axis. The output is then fed into the second hierarchy which consists of another Transformer encoder in the time axis. We evaluated our method with the widely used MAPS and MAESTRO v3.0.0 datasets, and it demonstrated state-of-the-art performance on all the F1-scores of the metrics among Frame, Note, Note with Offset, and Note with Offset and Velocity estimations.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data
Authors:
Hongmin Cai,
Xiaoke Huang,
Zhengliang Liu,
Wenxiong Liao,
Haixing Dai,
Zihao Wu,
Dajiang Zhu,
Hui Ren,
Quanzheng Li,
Tianming Liu,
Xiang Li
Abstract:
Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach invo…
▽ More
Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Real Time Bearing Fault Diagnosis Based on Convolutional Neural Network and STM32 Microcontroller
Authors:
Wenhao Liao
Abstract:
With the rapid development of big data and edge computing, many researchers focus on improving the accuracy of bearing fault classification using deep learning models, and implementing the deep learning classification model on limited resource platforms such as STM32. To this end, this paper realizes the identification of bearing fault vibration signal based on convolutional neural network, the fa…
▽ More
With the rapid development of big data and edge computing, many researchers focus on improving the accuracy of bearing fault classification using deep learning models, and implementing the deep learning classification model on limited resource platforms such as STM32. To this end, this paper realizes the identification of bearing fault vibration signal based on convolutional neural network, the fault identification accuracy of the optimised model can reach 98.9%. In addition, this paper successfully applies the convolutional neural network model to STM32H743VI microcontroller, the running time of each diagnosis is 19ms. Finally, a complete real-time communication framework between the host computer and the STM32 is designed, which can perfectly complete the data transmission through the serial port and display the diagnosis results on the TFT-LCD screen.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
Authors:
Junghyun Koo,
Marco A. Martínez-Ramírez,
Wei-Hsiang Liao,
Stefan Uhlich,
Kyogu Lee,
Yuki Mitsufuji
Abstract:
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dat…
▽ More
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.
△ Less
Submitted 11 April, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Automatic music mixing with deep learning and out-of-domain data
Authors:
Marco A. Martínez-Ramírez,
Wei-Hsiang Liao,
Giorgio Fabbro,
Stefan Uhlich,
Chihiro Nagashima,
Yuki Mitsufuji
Abstract:
Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge (e.g., a mixing engineer). The automation of music production tasks has become an emerging field in recent years, where rule-based methods and machine learning approaches have been explored. Nevertheless, the lack of dry o…
▽ More
Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge (e.g., a mixing engineer). The automation of music production tasks has become an emerging field in recent years, where rule-based methods and machine learning approaches have been explored. Nevertheless, the lack of dry or clean instrument recordings limits the performance of such models, which is still far from professional human-made mixes. We explore whether we can use out-of-domain data such as wet or processed multitrack music recordings and repurpose it to train supervised deep learning models that can bridge the current gap in automatic mixing quality. To achieve this we propose a novel data preprocessing method that allows the models to perform automatic music mixing. We also redesigned a listening test method for evaluating music mixing systems. We validate our results through such subjective tests using highly experienced mixing engineers as participants.
△ Less
Submitted 29 August, 2022; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Scribble-Supervised Medical Image Segmentation via Dual-Branch Network and Dynamically Mixed Pseudo Labels Supervision
Authors:
Xiangde Luo,
Minhao Hu,
Wenjun Liao,
Shuwei Zhai,
Tao Song,
Guotai Wang,
Shaoting Zhang
Abstract:
Medical image segmentation plays an irreplaceable role in computer-assisted diagnosis, treatment planning, and following-up. Collecting and annotating a large-scale dataset is crucial to training a powerful segmentation model, but producing high-quality segmentation masks is an expensive and time-consuming procedure. Recently, weakly-supervised learning that uses sparse annotations (points, scribb…
▽ More
Medical image segmentation plays an irreplaceable role in computer-assisted diagnosis, treatment planning, and following-up. Collecting and annotating a large-scale dataset is crucial to training a powerful segmentation model, but producing high-quality segmentation masks is an expensive and time-consuming procedure. Recently, weakly-supervised learning that uses sparse annotations (points, scribbles, bounding boxes) for network training has achieved encouraging performance and shown the potential for annotation cost reduction. However, due to the limited supervision signal of sparse annotations, it is still challenging to employ them for networks training directly. In this work, we propose a simple yet efficient scribble-supervised image segmentation method and apply it to cardiac MRI segmentation. Specifically, we employ a dual-branch network with one encoder and two slightly different decoders for image segmentation and dynamically mix the two decoders' predictions to generate pseudo labels for auxiliary supervision. By combining the scribble supervision and auxiliary pseudo labels supervision, the dual-branch network can efficiently learn from scribble annotations end-to-end. Experiments on the public ACDC dataset show that our method performs better than current scribble-supervised segmentation methods and also outperforms several semi-supervised segmentation methods.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Short-Term Power Prediction for Renewable Energy Using Hybrid Graph Convolutional Network and Long Short-Term Memory Approach
Authors:
Wenlong Liao,
Birgitte Bak-Jensen,
Jayakrishnan Radhakrishna Pillai,
Zhe Yang,
Kuangpu Liu
Abstract:
Accurate short-term solar and wind power predictions play an important role in the planning and operation of power systems. However, the short-term power prediction of renewable energy has always been considered a complex regression problem, owing to the fluctuation and intermittence of output powers and the law of dynamic change with time due to local weather conditions, i.e. spatio-temporal corr…
▽ More
Accurate short-term solar and wind power predictions play an important role in the planning and operation of power systems. However, the short-term power prediction of renewable energy has always been considered a complex regression problem, owing to the fluctuation and intermittence of output powers and the law of dynamic change with time due to local weather conditions, i.e. spatio-temporal correlation. To capture the spatio-temporal features simultaneously, this paper proposes a new graph neural network-based short-term power forecasting approach, which combines the graph convolutional network (GCN) and long short-term memory (LSTM). Specifically, the GCN is employed to learn complex spatial correlations between adjacent renewable energies, and the LSTM is used to learn dynamic changes of power generation curves. The simulation results show that the proposed hybrid approach can model the spatio-temporal correlation of renewable energies, and its performance outperforms popular baselines on real-world datasets.
△ Less
Submitted 7 February, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image
Authors:
Xiangde Luo,
Wenjun Liao,
Jianghong Xiao,
Jieneng Chen,
Tao Song,
Xiaofan Zhang,
Kang Li,
Dimitris N. Metaxas,
Guotai Wang,
Shaoting Zhang
Abstract:
Whole abdominal organ segmentation is important in diagnosing abdomen lesions, radiotherapy, and follow-up. However, oncologists' delineating all abdominal organs from 3D volumes is time-consuming and very expensive. Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, a…
▽ More
Whole abdominal organ segmentation is important in diagnosing abdomen lesions, radiotherapy, and follow-up. However, oncologists' delineating all abdominal organs from 3D volumes is time-consuming and very expensive. Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, and there is a lack of large-scale datasets covering the whole abdomen region with accurate and detailed annotations for the whole abdominal organ segmentation. In this work, we establish a new large-scale \textit{W}hole abdominal \textit{OR}gan \textit{D}ataset (\textit{WORD}) for algorithm research and clinical application development. This dataset contains 150 abdominal CT volumes (30495 slices). Each volume has 16 organs with fine pixel-level annotations and scribble-based sparse annotations, which may be the largest dataset with whole abdominal organ annotation. Several state-of-the-art segmentation methods are evaluated on this dataset. And we also invited three experienced oncologists to revise the model predictions to measure the gap between the deep learning method and oncologists. Afterwards, we investigate the inference-efficient learning on the WORD, as the high-resolution image requires large GPU memory and a long inference time in the test stage. We further evaluate the scribble-based annotation-efficient learning on this dataset, as the pixel-wise manual annotation is time-consuming and expensive. The work provided a new benchmark for the abdominal multi-organ segmentation task, and these experiments can serve as the baseline for future research and clinical application development.
△ Less
Submitted 12 February, 2023; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
Authors:
Bo-Yu Chen,
Wei-Han Hsu,
Wei-Hsiang Liao,
Marco A. Martínez Ramírez,
Yuki Mitsufuji,
Yi-Hsuan Yang
Abstract:
A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic with seamless transitions between adjacent tracks. In this paper, we explore a data-driven approach that uses a generative adversarial network to create the song transition by learning from real-world DJ mixes. In particular, the generator of the model uses two differentiable digital signal processing components, an equalizer (EQ…
▽ More
A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic with seamless transitions between adjacent tracks. In this paper, we explore a data-driven approach that uses a generative adversarial network to create the song transition by learning from real-world DJ mixes. In particular, the generator of the model uses two differentiable digital signal processing components, an equalizer (EQ) and a fader, to mix two tracks selected by a data generation pipeline. The generator has to set the parameters of the EQs and fader in such away that the resulting mix resembles real mixes created by humanDJ, as judged by the discriminator counterpart. Result of a listening test shows that the model can achieve competitive results compared with a number of baselines.
△ Less
Submitted 17 February, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Probabilistic Reach-Avoid Reachability in Nondeterministic Systems with Time-VaryingTargets and Obstacles
Authors:
Wei Liao,
Taotao Liang,
Xiaohui Wei,
Qiaozhi Yin
Abstract:
The probabilistic reachability problems of nondeterministic systems are studied. Based on the existing studies, the definition of probabilistic reachable sets is generalized by taking into account time-varying target set and obstacle. A numerical method is proposed to compute probabilistic reachable sets. First, a scalar function in the state space is constructed by backward recursion and grid int…
▽ More
The probabilistic reachability problems of nondeterministic systems are studied. Based on the existing studies, the definition of probabilistic reachable sets is generalized by taking into account time-varying target set and obstacle. A numerical method is proposed to compute probabilistic reachable sets. First, a scalar function in the state space is constructed by backward recursion and grid interpolation, and then the probability reachable set is represented as a nonzero level set of this scalar function. In addition, based on the constructed scalar function, the optimal control policy can be designed. At the end of this paper, some examples are taken to illustrate the validity and accuracy of the proposed method.
△ Less
Submitted 7 August, 2021;
originally announced August 2021.
-
Computation of Reachable Sets Based on Hamilton-Jacobi-Bellman Equation with Running Cost Function
Authors:
Weiwei Liao,
Tao Liang
Abstract:
A novel method for computing reachable sets is proposed in this paper. In the proposed method, a Hamilton-Jacobi-Bellman equation with running cost functionis numerically solved and the reachable sets of different time horizons are characterized by a family of non-zero level sets of the solution of the Hamilton-Jacobi-Bellman equation. In addition to the classical reachable set, by setting differe…
▽ More
A novel method for computing reachable sets is proposed in this paper. In the proposed method, a Hamilton-Jacobi-Bellman equation with running cost functionis numerically solved and the reachable sets of different time horizons are characterized by a family of non-zero level sets of the solution of the Hamilton-Jacobi-Bellman equation. In addition to the classical reachable set, by setting different running cost functions and terminal conditionsof the Hamilton-Jacobi-Bellman equation, the proposed method allows to compute more generalized reachable sets, which are referred to as cost-limited reachable sets. In order to overcome the difficulty of solving the Hamilton-Jacobi-Bellman equation caused by the discontinuity of the solution, a method based on recursion and grid interpolation is employed.
At the end of this paper, some examples are taken to illustrate the validity and generality of the proposed method.
△ Less
Submitted 16 May, 2022; v1 submitted 25 July, 2021;
originally announced July 2021.
-
Stability and Super-resolution of MUSIC and ESPRIT for Multi-snapshot Spectral Estimation
Authors:
Weilin Li,
Zengying Zhu,
Weiguo Gao,
Wen**g Liao
Abstract:
This paper studies the spectral estimation problem of estimating the locations of a fixed number of point sources given multiple snapshots of Fourier measurements collected by a uniform array of sensors. We prove novel stability bounds for MUSIC and ESPRIT as a function of the noise standard deviation, number of snapshots, source amplitudes, and support. Our most general result is a perturbation b…
▽ More
This paper studies the spectral estimation problem of estimating the locations of a fixed number of point sources given multiple snapshots of Fourier measurements collected by a uniform array of sensors. We prove novel stability bounds for MUSIC and ESPRIT as a function of the noise standard deviation, number of snapshots, source amplitudes, and support. Our most general result is a perturbation bound of the signal space in terms of the minimum singular value of Fourier matrices. When the point sources are located in several separated clumps, we provide an explicit upper bound of the noise-space correlation perturbation error in MUSIC and the support error in ESPRIT in terms of a Super-Resolution Factor (SRF). The upper bound for ESPRIT is then compared with a new Cramér-Rao lower bound for the clumps model. As a result, we show that ESPRIT is comparable to that of the optimal unbiased estimator(s) in terms of the dependence on noise, number of snapshots and SRF. As a byproduct of our analysis, we discover several fundamental differences between the single-snapshot and multi-snapshot problems. Our theory is validated by numerical experiments.
△ Less
Submitted 22 September, 2022; v1 submitted 29 May, 2021;
originally announced May 2021.
-
A Novel Unified Framework for Solving Reachability, Viability and Invariance Problems
Authors:
Wei Liao,
Taotao Liang,
Xiaohui Wei,
Jizhou Lai
Abstract:
The level set method is a widely used tool for solving reachability and invariance problems. However, some shortcomings, such as the difficulties of handling dissipation function and constructing terminal conditions for solving the Hamilton-Jacobi partial differential equation, limit the application of the level set method in some problems with non-affine nonlinear systems and irregular target set…
▽ More
The level set method is a widely used tool for solving reachability and invariance problems. However, some shortcomings, such as the difficulties of handling dissipation function and constructing terminal conditions for solving the Hamilton-Jacobi partial differential equation, limit the application of the level set method in some problems with non-affine nonlinear systems and irregular target sets. This paper proposes a method that can effectively avoid the above tricky issues and thus has better generality. In the proposed method, the reachable or invariant sets with different time horizons are characterized by some non-zero sublevel sets of a value function. This value function is not obtained by solving a viscosity solution of the partial differential equation but by recursion and interpolation approximation. At the end of this paper, some examples are taken to illustrate the accuracy and generality of the proposed method.
△ Less
Submitted 29 November, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Scenario Generation for Cooling, Heating, and Power Loads Using Generative Moment Matching Networks
Authors:
Wenlong Liao,
Yusen Wang,
Yuelong Wang,
Kody Powell,
Qi Liu,
Zhe Yang
Abstract:
Scenario generations of cooling, heating, and power loads are of great significance for the economic operation and stability analysis of integrated energy systems. In this paper, a novel deep generative network is proposed to model cooling, heating, and power load curves based on a generative moment matching networks (GMMN) where an auto-encoder transforms high-dimensional load curves into low-dim…
▽ More
Scenario generations of cooling, heating, and power loads are of great significance for the economic operation and stability analysis of integrated energy systems. In this paper, a novel deep generative network is proposed to model cooling, heating, and power load curves based on a generative moment matching networks (GMMN) where an auto-encoder transforms high-dimensional load curves into low-dimensional latent variables and the maximum mean discrepancy represents the similarity metrics between the generated samples and the real samples. After training the model, the new scenarios are generated by feeding Gaussian noises to the scenario generator of the GMMN. Unlike the explicit density models, the proposed GMMN does not need to artificially assume the probability distribution of the load curves, which leads to stronger universality. The simulation results show that the GMMN not only fits the probability distribution of multi-class load curves well, but also accurately captures the shape (e.g., large peaks, fast ramps, and fluctuation), frequency-domain characteristics, and temporal-spatial correlations of cooling, heating, and power loads. Furthermore, the energy consumption of generated samples closely resembles that of real samples.
△ Less
Submitted 21 April, 2022; v1 submitted 5 February, 2021;
originally announced February 2021.
-
A Review of Graph Neural Networks and Their Applications in Power Systems
Authors:
Wenlong Liao,
Birgitte Bak-Jensen,
Jayakrishnan Radhakrishna Pillai,
Yuelong Wang,
Yusen Wang
Abstract:
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as graph-structured data with high dimens…
▽ More
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as graph-structured data with high dimensional features and interdependency among nodes. The complexity of graph-structured data has brought significant challenges to the existing deep neural networks defined in Euclidean domains. Recently, many publications generalizing deep neural networks for graph-structured data in power systems have emerged. In this paper, a comprehensive overview of graph neural networks (GNNs) in power systems is proposed. Specifically, several classical paradigms of GNNs structures (e.g., graph convolutional networks) are summarized, and key applications in power systems, such as fault scenario application, time series prediction, power flow calculation, and data generation are reviewed in detail. Furthermore, main issues and some research trends about the applications of GNNs in power systems are discussed.
△ Less
Submitted 12 June, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
An Improved Level Set Method for Reachability Problems in Differential Games
Authors:
Wei Liao,
Taotao Liang,
Pengwen Xiong,
Chen Wang,
Aiguo Song,
Peter X. Liu
Abstract:
This study focuses on reachability problems in differential games. An improved level set method for computing reachable tubes is proposed in this paper. The reachable tube is described as a sublevel set of a value function, which is the viscosity solution of a Hamilton-Jacobi equation with running cost. We generalize the concept of reachable tubes and propose a new class of reachable tubes, which…
▽ More
This study focuses on reachability problems in differential games. An improved level set method for computing reachable tubes is proposed in this paper. The reachable tube is described as a sublevel set of a value function, which is the viscosity solution of a Hamilton-Jacobi equation with running cost. We generalize the concept of reachable tubes and propose a new class of reachable tubes, which are referred to as cost-limited one. In particular, a performance index can be specified for the system, and a cost-limited reachable tube is a set of initial states of the system's trajectories that can reach the target set before the performance index increases to a given admissible cost. Such a reachable tube can be obtained by specifying the corresponding running cost function for the Hamilton-Jacobi equation. Different non-zero sublevel sets of the viscosity solution of the Hamilton-Jacobi equation at a certain time point can be used to characterize the cost-limited reachable tubes with different admissible costs (or the reachable tubes with different time horizons), thus reducing the storage space consumption. Several examples are provided to illustrate the validity and accuracy of the proposed method.
△ Less
Submitted 16 May, 2022; v1 submitted 23 January, 2021;
originally announced January 2021.
-
Recursive Regret Matching: A General Method for Solving Time-invariant Nonlinear Zero-sum Differential Games
Authors:
Wei Liao,
Xiaohui Wei,
Jizhou Lai
Abstract:
In this paper, a new method is proposed to compute the rolling Nash equilibrium of the time-invariant nonlinear two-person zero-sum differential games. The idea is to discretize the time to transform a differential game into a sequential game with several steps, and by introducing state-value function, transform the sequential game into a recursion consisting of several normal-form games, finally,…
▽ More
In this paper, a new method is proposed to compute the rolling Nash equilibrium of the time-invariant nonlinear two-person zero-sum differential games. The idea is to discretize the time to transform a differential game into a sequential game with several steps, and by introducing state-value function, transform the sequential game into a recursion consisting of several normal-form games, finally, each normal-form game is solved with action abstraction and regret matching. To improve the real-time property of the proposed method, the state-value function can be kept in memory. This method can deal with the situations that the saddle point exists or does not exist, and the analysises of the existence of the saddle point can be avoided. If the saddle point does not exist, the mixed optimal control pair can be obtained. At the end of this paper, some examples are taken to illustrate the validity of the proposed method.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Automated Pancreas Segmentation Using Multi-institutional Collaborative Deep Learning
Authors:
Pochuan Wang,
Chen Shen,
Holger R. Roth,
Dong Yang,
Daguang Xu,
Masahiro Oda,
Kazunari Misawa,
Po-Ting Chen,
Kao-Lang Liu,
Wei-Chih Liao,
Weichung Wang,
Kensaku Mori
Abstract:
The performance of deep learning-based methods strongly relies on the number of datasets used for training. Many efforts have been made to increase the data in the medical image analysis field. However, unlike photography images, it is hard to generate centralized databases to collect medical images because of numerous technical, legal, and privacy issues. In this work, we study the use of federat…
▽ More
The performance of deep learning-based methods strongly relies on the number of datasets used for training. Many efforts have been made to increase the data in the medical image analysis field. However, unlike photography images, it is hard to generate centralized databases to collect medical images because of numerous technical, legal, and privacy issues. In this work, we study the use of federated learning between two institutions in a real-world setting to collaboratively train a model without sharing the raw data across national boundaries. We quantitatively compare the segmentation models obtained with federated learning and local training alone. Our experimental results show that federated learning models have higher generalizability than standalone training.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Coupled Convolutional Neural Network with Adaptive Response Function Learning for Unsupervised Hyperspectral Super-Resolution
Authors:
Ke Zheng,
Lianru Gao,
Wenzhi Liao,
Danfeng Hong,
Bing Zhang,
Ximin Cui,
Jocelyn Chanussot
Abstract:
Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most…
▽ More
Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most of these methods assume that the prior information of the Point Spread Function (PSF) and Spectral Response Function (SRF) are known. However, in practice, this information is often limited or unavailable. In this work, an unsupervised deep learning-based fusion method - HyCoNet - that can solve the problems in HSI-MSI fusion without the prior PSF and SRF information is proposed. HyCoNet consists of three coupled autoencoder nets in which the HSI and MSI are unmixed into endmembers and abundances based on the linear unmixing model. Two special convolutional layers are designed to act as a bridge that coordinates with the three autoencoder nets, and the PSF and SRF parameters are learned adaptively in the two convolution layers during the training process. Furthermore, driven by the joint loss function, the proposed method is straightforward and easily implemented in an end-to-end training manner. The experiments performed in the study demonstrate that the proposed method performs well and produces robust results for different datasets and arbitrary PSFs and SRFs.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing
Authors:
Xue Yang,
Junchi Yan,
Wenlong Liao,
Xiaokang Yang,
** Tang,
Tao He
Abstract:
Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of d…
▽ More
Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S$^2$TLD by this paper. The results show the effectiveness of our approach. The released dataset S2TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories.
△ Less
Submitted 28 April, 2022; v1 submitted 28 April, 2020;
originally announced April 2020.
-
Asynchronous Distributed ADMM for Large-Scale Optimization- Part II: Linear Convergence Analysis and Numerical Performance
Authors:
Tsung-Hui Chang,
Wei-Cheng Liao,
Mingyi Hong,
Xiangfeng Wang
Abstract:
The alternating direction method of multipliers (ADMM) has been recognized as a versatile approach for solving modern large-scale machine learning and signal processing problems efficiently. When the data size and/or the problem dimension is large, a distributed version of ADMM can be used, which is capable of distributing the computation load and the data set to a network of computing nodes. Unfo…
▽ More
The alternating direction method of multipliers (ADMM) has been recognized as a versatile approach for solving modern large-scale machine learning and signal processing problems efficiently. When the data size and/or the problem dimension is large, a distributed version of ADMM can be used, which is capable of distributing the computation load and the data set to a network of computing nodes. Unfortunately, a direct synchronous implementation of such algorithm does not scale well with the problem size, as the algorithm speed is limited by the slowest computing nodes. To address this issue, in a companion paper, we have proposed an asynchronous distributed ADMM (AD-ADMM) and studied its worst-case convergence conditions. In this paper, we further the study by characterizing the conditions under which the AD-ADMM achieves linear convergence. Our conditions as well as the resulting linear rates reveal the impact that various algorithm parameters, network delay and network size have on the algorithm performance. To demonstrate the superior time efficiency of the proposed AD-ADMM, we test the AD-ADMM on a high-performance computer cluster by solving a large-scale logistic regression problem.
△ Less
Submitted 8 September, 2015;
originally announced September 2015.
-
Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis
Authors:
Tsung-Hui Chang,
Mingyi Hong,
Wei-Cheng Liao,
Xiangfeng Wang
Abstract:
Aiming at solving large-scale learning problems, this paper studies distributed optimization methods based on the alternating direction method of multipliers (ADMM). By formulating the learning problem as a consensus problem, the ADMM can be used to solve the consensus problem in a fully parallel fashion over a computer network with a star topology. However, traditional synchronized computation do…
▽ More
Aiming at solving large-scale learning problems, this paper studies distributed optimization methods based on the alternating direction method of multipliers (ADMM). By formulating the learning problem as a consensus problem, the ADMM can be used to solve the consensus problem in a fully parallel fashion over a computer network with a star topology. However, traditional synchronized computation does not scale well with the problem size, as the speed of the algorithm is limited by the slowest workers. This is particularly true in a heterogeneous network where the computing nodes experience different computation and communication delays. In this paper, we propose an asynchronous distributed ADMM (AD-AMM) which can effectively improve the time efficiency of distributed optimization. Our main interest lies in analyzing the convergence conditions of the AD-ADMM, under the popular partially asynchronous model, which is defined based on a maximum tolerable delay of the network. Specifically, by considering general and possibly non-convex cost functions, we show that the AD-ADMM is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points as long as the algorithm parameters are chosen appropriately according to the network delay. We further illustrate that the asynchrony of the ADMM has to be handled with care, as slightly modifying the implementation of the AD-ADMM can jeopardize the algorithm convergence, even under a standard convex setting.
△ Less
Submitted 19 February, 2016; v1 submitted 8 September, 2015;
originally announced September 2015.