-
Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation
Authors:
Miseul Kim,
Soo-Whan Chung,
Youna Ji,
Hong-Goo Kang,
Min-Seok Choi
Abstract:
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by…
▽ More
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by the target acoustic scene of the reference prompt. Specifically, AST-LDM is a latent diffusion model conditioned by CLAP embeddings that describe target acoustic scenes in either audio or text modalities. The contributions of this paper include introducing the AST task and implementing its baseline model. For AST-LDM, we emphasize its core framework, which is to preserve the input speech and generate audio consistently with both the given speech and the target acoustic environment. Experiments, including objective and subjective tests, validate the feasibility and efficacy of our approach.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Learning Force Control for Legged Manipulation
Authors:
Tifanny Portela,
Gabriel B. Margolis,
Yandong Ji,
Pulkit Agrawal
Abstract:
Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing.…
▽ More
Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.
△ Less
Submitted 20 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving
Authors:
Hang Zhou,
Haichao Liu,
Hongliang Lu,
Dan Xu,
Jun Ma,
Yiding Ji
Abstract:
Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have…
▽ More
Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have limitations in achieving perfect accuracy on the training dataset and network performance can be affected by out-of-distribution problem. In this paper, we propose FusionAssurance, a novel trajectory-based end-to-end driving fusion framework which combines physics-informed control for safety assurance. By incorporating Potential Field into Model Predictive Control, FusionAssurance is capable of navigating through scenarios that are not included in the training dataset and scenarios where neural network fail to generalize. The effectiveness of the approach is demonstrated by extensive experiments under various scenarios on the CARLA benchmark.
△ Less
Submitted 5 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Taming Lookup Tables for Efficient Image Retouching
Authors:
Sidi Yang,
Binxiao Huang,
Mingdeng Cao,
Yatai Ji,
Hanzhong Guo,
Ngai Wong,
Yujiu Yang
Abstract:
The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th…
▽ More
The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkee** the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Physics Sensor Based Deep Learning Fall Detection System
Authors:
Zeyuan Qu,
Tiange Huang,
Yuxin Ji,
Yongjun Li
Abstract:
Fall detection based on embedded sensor is a practical and popular research direction in recent years. In terms of a specific application: fall detection methods based upon physics sensors such as [gyroscope and accelerator] have been exploited using traditional hand crafted features and feed them in machine learning models like Markov chain or just threshold based classification methods. In this…
▽ More
Fall detection based on embedded sensor is a practical and popular research direction in recent years. In terms of a specific application: fall detection methods based upon physics sensors such as [gyroscope and accelerator] have been exploited using traditional hand crafted features and feed them in machine learning models like Markov chain or just threshold based classification methods. In this paper, we build a complete system named TSFallDetect including data receiving device based on embedded sensor, mobile deep-learning model deploying platform, and a simple server, which will be used to gather models and data for future expansion. On the other hand, we exploit the sequential deep-learning methods to address this falling motion prediction problem based on data collected by inertial and film pressure sensors. We make a empirical study based on existing datasets and our datasets collected from our system separately, which shows that the deep-learning model has more potential advantage than other traditional methods, and we proposed a new deep-learning model based on the time series data to predict the fall, and it may be superior to other sequential models in this particular field.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Exploiting Manifold Structured Data Priors for Improved MR Fingerprinting Reconstruction
Authors:
Peng Li,
Yu** Ji,
Yue Hu
Abstract:
Estimating tissue parameter maps with high accuracy and precision from highly undersampled measurements presents one of the major challenges in MR fingerprinting (MRF). Many existing works project the recovered voxel fingerprints onto the Bloch manifold to improve reconstruction performance. However, little research focuses on exploiting the latent manifold structure priors among fingerprints. To…
▽ More
Estimating tissue parameter maps with high accuracy and precision from highly undersampled measurements presents one of the major challenges in MR fingerprinting (MRF). Many existing works project the recovered voxel fingerprints onto the Bloch manifold to improve reconstruction performance. However, little research focuses on exploiting the latent manifold structure priors among fingerprints. To fill this gap, we propose a novel MRF reconstruction framework based on manifold structured data priors. Since it is difficult to directly estimate the fingerprint manifold structure, we model the tissue parameters as points on a low-dimensional parameter manifold. We reveal that the fingerprint manifold shares the same intrinsic topology as the parameter manifold, although being embedded in different Euclidean spaces. To exploit the non-linear and non-local redundancies in MRF data, we divide the MRF data into spatial patches, and the similarity measurement among data patches can be accurately obtained using the Euclidean distance between the corresponding patches in the parameter manifold. The measured similarity is then used to construct the graph Laplacian operator, which represents the fingerprint manifold structure. Thus, the fingerprint manifold structure is introduced in the reconstruction framework by using the low-dimensional parameter manifold. Additionally, we incorporate the locally low-rank prior in the reconstruction framework to further utilize the local correlations within each patch for improved reconstruction performance. We also adopt a GPU-accelerated NUFFT library to accelerate reconstruction in non-Cartesian sampling scenarios. Experimental results demonstrate that our method can achieve significantly improved reconstruction performance with reduced computational time over the state-of-the-art methods.
△ Less
Submitted 16 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
High-risk Factor Prediction in Lung Cancer Using Thin CT Scans: An Attention-Enhanced Graph Convolutional Network Approach
Authors:
Xiaotong Fu,
Xiangyu Meng,
**g Zhou,
Ying Ji
Abstract:
Lung cancer, particularly in its advanced stages, remains a leading cause of death globally. Though early detection via low-dose computed tomography (CT) is promising, the identification of high-risk factors crucial for surgical mode selection remains a challenge. Addressing this, our study introduces an Attention-Enhanced Graph Convolutional Network (AE-GCN) model to classify whether there are hi…
▽ More
Lung cancer, particularly in its advanced stages, remains a leading cause of death globally. Though early detection via low-dose computed tomography (CT) is promising, the identification of high-risk factors crucial for surgical mode selection remains a challenge. Addressing this, our study introduces an Attention-Enhanced Graph Convolutional Network (AE-GCN) model to classify whether there are high-risk factors in stage I lung cancer based on the preoperative CT images. This will aid surgeons in determining the optimal surgical method before the operation. Unlike previous studies that relied on 3D patch techniques to represent nodule spatial features, our method employs a GCN model to capture the spatial characteristics of pulmonary nodules. Specifically, we regard each slice of the nodule as a graph vertex, and the inherent spatial relationships between slices form the edges. Then, to enhance the expression of nodule features, we integrated both channel and spatial attention mechanisms with a pre-trained VGG model for adaptive feature extraction from pulmonary nodules. Lastly, the effectiveness of the proposed method is demonstrated using real-world data collected from the hospitals, thereby emphasizing its potential utility in the clinical practice.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Adaptive Fusion of Radiomics and Deep Features for Lung Adenocarcinoma Subtype Recognition
Authors:
**g Zhou,
Xiaotong Fu,
Xirong Li,
Wei Feng,
Zhang Zhang,
Ying Ji
Abstract:
The most common type of lung cancer, lung adenocarcinoma (LUAD), has been increasingly detected since the advent of low-dose computed tomography screening technology. In clinical practice, pre-invasive LUAD (Pre-IAs) should only require regular follow-up care, while invasive LUAD (IAs) should receive immediate treatment with appropriate lung cancer resection, based on the cancer subtype. However,…
▽ More
The most common type of lung cancer, lung adenocarcinoma (LUAD), has been increasingly detected since the advent of low-dose computed tomography screening technology. In clinical practice, pre-invasive LUAD (Pre-IAs) should only require regular follow-up care, while invasive LUAD (IAs) should receive immediate treatment with appropriate lung cancer resection, based on the cancer subtype. However, prior research on diagnosing LUAD has mainly focused on classifying Pre-IAs/IAs, as techniques for distinguishing different subtypes of IAs have been lacking. In this study, we proposed a multi-head attentional feature fusion (MHA-FF) model for not only distinguishing IAs from Pre-IAs, but also for distinguishing the different subtypes of IAs. To predict the subtype of each nodule accurately, we leveraged both radiomics and deep features extracted from computed tomography images. Furthermore, those features were aggregated through an adaptive fusion module that can learn attention-based discriminative features. The utility of our proposed method is demonstrated here by means of real-world data collected from a multi-center cohort.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
EEG-based Emotion Style Transfer Network for Cross-dataset Emotion Recognition
Authors:
Yi** Zhou,
Fu Li,
Yang Li,
Youshuo Ji,
Lijian Zhang,
Yuanfang Chen,
Wenming Zheng,
Guangming Shi
Abstract:
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the pr…
▽ More
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
Authors:
Doyeon Kim,
Soo-Whan Chung,
Hyewon Han,
Youna Ji,
Hong-Goo Kang
Abstract:
This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then restore missing signal components, our model performs these tasks in parallel using two heterogeneous decoder networks. Based on the U-Net style enco…
▽ More
This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then restore missing signal components, our model performs these tasks in parallel using two heterogeneous decoder networks. Based on the U-Net style encoder-decoder framework, we attach an additional decoder so that each decoder network performs noise suppression or restoration separately. We carefully design each decoder architecture to operate appropriately depending on its objectives. Additionally, we improve performance by leveraging a learnable weighting factor, aggregating the two decoder output waveforms. Experimental results with objective metrics across various environments clearly demonstrate the effectiveness of our approach over a single decoder or multi-stage systems for general speech restoration task.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
An empirical study on speech restoration guided by self supervised speech representation
Authors:
Jaeuk Byun,
Youna Ji,
Soo Whan Chung,
Soyeon Choe,
Min Seok Choi
Abstract:
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clip**, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen…
▽ More
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clip**, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech representation learning on the speech restoration task. Specifically, we employ speech representation in various speech restoration networks and evaluate their performance under complicated distortion scenarios. Our experiments demonstrate that the contextual information provided by the self-supervised speech representation can enhance speech restoration performance in various distortion scenarios, while also increasing robustness against the duration of speech attenuation and mismatched test conditions.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Efficient Neural Music Generation
Authors:
Max W. Y. Lam,
Qiao Tian,
Tang Li,
Zongyu Yin,
Siyuan Feng,
Ming Tu,
Yuliang Ji,
Rui Xia,
Mingbo Ma,
Xuchen Song,
Jitong Chen,
Yu** Wang,
Yuxuan Wang
Abstract:
Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real…
▽ More
Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real-time generation. Efficient music generation with a quality on par with MusicLM remains a significant challenge. In this paper, we present MeLoDy (M for music; L for LM; D for diffusion), an LM-guided diffusion model that generates music audios of state-of-the-art quality meanwhile reducing 95.7% or 99.6% forward passes in MusicLM, respectively, for sampling 10s or 30s music. MeLoDy inherits the highest-level LM from MusicLM for semantic modeling, and applies a novel dual-path diffusion (DPD) model and an audio VAE-GAN to efficiently decode the conditioning semantic tokens into waveform. DPD is proposed to simultaneously model the coarse and fine acoustics by incorporating the semantic information into segments of latents effectively via cross-attention at each denoising step. Our experimental results suggest the superiority of MeLoDy, not only in its practical advantages on sampling speed and infinitely continuable generation, but also in its state-of-the-art musicality, audio quality, and text correlation.
Our samples are available at https://Efficient-MeLoDy.github.io/.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Generation of Structurally Realistic Retinal Fundus Images with Diffusion Models
Authors:
Sojung Go,
Younghoon Ji,
Sang Jun Park,
Soochahn Lee
Abstract:
We introduce a new technique for generating retinal fundus images that have anatomically accurate vascular structures, using diffusion models. We generate artery/vein masks to create the vascular structure, which we then condition to produce retinal fundus images. The proposed method can generate high-quality images with more realistic vascular structures and can create a diverse range of images b…
▽ More
We introduce a new technique for generating retinal fundus images that have anatomically accurate vascular structures, using diffusion models. We generate artery/vein masks to create the vascular structure, which we then condition to produce retinal fundus images. The proposed method can generate high-quality images with more realistic vascular structures and can create a diverse range of images based on the strengths of the diffusion model. We present quantitative evaluations that demonstrate the performance improvement using our method for data augmentation on vessel segmentation and artery/vein classification. We also present Turing test results by clinical experts, showing that our generated images are difficult to distinguish with real images. We believe that our method can be applied to construct stand-alone datasets that are irrelevant of patient privacy.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Semi-Asynchronous Federated Edge Learning Mechanism via Over-the-air Computation
Authors:
Zhoubin Kou,
Yun Ji,
Xiaoxiong Zhong,
Sheng Zhang
Abstract:
Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous…
▽ More
Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous aggregation FEEL mechanism with AirComp scheme (PAOTA) to improve the training efficiency of the FEEL system in the case of significant heterogeneity in data and devices. Taking the staleness and divergence of model updates from edge devices into consideration, we minimize the convergence upper bound of the FEEL global model by adjusting the uplink transmit power of edge devices at each aggregation period. The simulation results demonstrate that our proposed algorithm achieves convergence performance close to that of the ideal Local SGD. Furthermore, with the same target accuracy, the training time required for PAOTA is less than that of the ideal Local SGD and the synchronous FEEL algorithm via AirComp.
△ Less
Submitted 29 May, 2023; v1 submitted 6 May, 2023;
originally announced May 2023.
-
mHealth hyperspectral learning for instantaneous spatiospectral imaging of hemodynamics
Authors:
Yuhyun Ji,
Sang Mok Park,
Semin Kwon,
Jung Woo Leem,
Vidhya Vijayakrishnan Nair,
Yunjie Tong,
Young L. Kim
Abstract:
Hyperspectral imaging acquires data in both the spatial and frequency domains to offer abundant physical or biological information. However, conventional hyperspectral imaging has intrinsic limitations of bulky instruments, slow data acquisition rate, and spatiospectral tradeoff. Here we introduce hyperspectral learning for snapshot hyperspectral imaging in which sampled hyperspectral data in a sm…
▽ More
Hyperspectral imaging acquires data in both the spatial and frequency domains to offer abundant physical or biological information. However, conventional hyperspectral imaging has intrinsic limitations of bulky instruments, slow data acquisition rate, and spatiospectral tradeoff. Here we introduce hyperspectral learning for snapshot hyperspectral imaging in which sampled hyperspectral data in a small subarea are incorporated into a learning algorithm to recover the hypercube. Hyperspectral learning exploits the idea that a photograph is more than merely a picture and contains detailed spectral information. A small sampling of hyperspectral data enables spectrally informed learning to recover a hypercube from an RGB image. Hyperspectral learning is capable of recovering full spectroscopic resolution in the hypercube, comparable to high spectral resolutions of scientific spectrometers. Hyperspectral learning also enables ultrafast dynamic imaging, leveraging ultraslow video recording in an off-the-shelf smartphone, given that a video comprises a time series of multiple RGB images. To demonstrate its versatility, an experimental model of vascular development is used to extract hemodynamic parameters via statistical and deep-learning approaches. Subsequently, the hemodynamics of peripheral microcirculation is assessed at an ultrafast temporal resolution up to a millisecond, using a conventional smartphone camera. This spectrally informed learning method is analogous to compressed sensing; however, it further allows for reliable hypercube recovery and key feature extractions with a transparent learning algorithm. This learning-powered snapshot hyperspectral imaging method yields high spectral and temporal resolutions and eliminates the spatiospectral tradeoff, offering simple hardware requirements and potential applications of various machine-learning techniques.
△ Less
Submitted 5 April, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Deep Learning-based Eye-Tracking Analysis for Diagnosis of Alzheimer's Disease Using 3D Comprehensive Visual Stimuli
Authors:
Fangyu Zuo,
Peiguang **g,
**glin Sun,
Jizhong,
Duan,
Yong Ji,
Yu Liu
Abstract:
Alzheimer's Disease (AD) causes a continuous decline in memory, thinking, and judgment. Traditional diagnoses are usually based on clinical experience, which is limited by some realistic factors. In this paper, we focus on exploiting deep learning techniques to diagnose AD based on eye-tracking behaviors. Visual attention, as typical eye-tracking behavior, is of great clinical value to detect cogn…
▽ More
Alzheimer's Disease (AD) causes a continuous decline in memory, thinking, and judgment. Traditional diagnoses are usually based on clinical experience, which is limited by some realistic factors. In this paper, we focus on exploiting deep learning techniques to diagnose AD based on eye-tracking behaviors. Visual attention, as typical eye-tracking behavior, is of great clinical value to detect cognitive abnormalities in AD patients. To better analyze the differences in visual attention between AD patients and normals, we first conduct a 3D comprehensive visual task on a non-invasive eye-tracking system to collect visual attention heatmaps. We then propose a multi-layered comparison convolution neural network (MC-CNN) to distinguish the visual attention differences between AD patients and normals. In MC-CNN, the multi-layered representations of heatmaps are obtained by hierarchical convolution to better encode eye-movement behaviors, which are further integrated into a distance vector to benefit the comprehensive visual task. Extensive experimental results on the collected dataset demonstrate that MC-CNN achieves consistent validity in classifying AD patients and normals with eye-tracking data.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Target Controllability of Multiagent Systems under Directed Weighted Topology
Authors:
Yanan Ji,
Zhijian Ji,
Yungang Liu,
Chong Lin
Abstract:
In this paper, the target controllability of multiagent systems under directed weighted topology is studied. A graph partition is constructed, in which part of the nodes are divided into different cells, which are selected as leaders. The remaining nodes are divided by maximum equitable partition. By taking the advantage of reachable nodes and the graph partition, we provide a necessary and suffic…
▽ More
In this paper, the target controllability of multiagent systems under directed weighted topology is studied. A graph partition is constructed, in which part of the nodes are divided into different cells, which are selected as leaders. The remaining nodes are divided by maximum equitable partition. By taking the advantage of reachable nodes and the graph partition, we provide a necessary and sufficient condition for the target controllability of a first-order multiagent system. It is shown that the system is target controllable if and only if each cell contains no more than one target node and there are no unreachable target nodes, with $δ-$reachable nodes belonging to the same cell in the above graph partition. By means of controllability decomposition, a necessary and sufficient condition for the target controllability of the system is given, as well as a target node selection method to ensure the target controllability. In a high-order multiagent system, once the topology, leaders, and target nodes are fixed, the target controllability of the high-order multiagent system is shown to be the same to the first-order one. This paper also considers a general linear system. If there is an independent strongly connected component that contains only target nodes and the general linear system is target controllable, then graph $\mathcal{G}$ is leader-target follower connected.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Diffusion-based Generative Speech Source Separation
Authors:
Robin Scheibler,
Youna Ji,
Soo-Whan Chung,
Jaeuk Byun,
Soyeon Choe,
Min-Seok Choi
Abstract:
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a…
▽ More
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities or the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.
△ Less
Submitted 2 November, 2022; v1 submitted 31 October, 2022;
originally announced October 2022.
-
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Authors:
Yan Jia,
Mi Hong,
**gyu Hou,
Kailong Ren,
Sifan Ma,
** Wang,
Fangzhen Peng,
Yinglin Ji,
Lin Yang,
Junjie Wang
Abstract:
This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusi…
▽ More
This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusion. We compared and fused the hybrid architecture and two kinds of end-to-end architecture. For end-to-end modeling, we used models based on connectionist temporal classification/attention-based encoder-decoder architecture and recurrent neural network transducer/attention-based encoder-decoder architecture. The performance of these models is evaluated with an additional language model to improve word error rates. As a result, our system achieved 10.2\% character error rate on the challenge test set data and ranked third place among the submitted systems in the challenge.
△ Less
Submitted 16 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
SOFFLFM: Super-resolution optical fluctuation Fourier light-field microscopy
Authors:
Haixin Huang,
Haoyuan Qiu,
Hanzhe Wu,
Yihong Ji,
Heng Li,
Bin Yu,
Danni Chen,
Junle Qu
Abstract:
Fourier light-field microscopy (FLFM) uses a micro-lens array (MLA) to segment the Fourier Plane of the microscopic objective lens to generate multiple two-dimensional perspective views, thereby reconstructing the three-dimensional(3D) structure of the sample using 3D deconvolution calculation without scanning. However, the resolution of FLFM is still limited by diffraction, and furthermore, depen…
▽ More
Fourier light-field microscopy (FLFM) uses a micro-lens array (MLA) to segment the Fourier Plane of the microscopic objective lens to generate multiple two-dimensional perspective views, thereby reconstructing the three-dimensional(3D) structure of the sample using 3D deconvolution calculation without scanning. However, the resolution of FLFM is still limited by diffraction, and furthermore, dependent on the aperture division. In order to improve its resolution, a Super-resolution optical fluctuation Fourier light field microscopy (SOFFLFM) was proposed here, in which the Sofi method with ability of super-resolution was introduced into FLFM. SOFFLFM uses higher-order cumulants statistical analysis on an image sequence collected by FLFM, and then carries out 3D deconvolution calculation to reconstruct the 3D structure of the sample. Theoretical basis of SOFFLFM on improving resolution was explained and then verified with simulations. Simulation results demonstrated that SOFFLFM improved lateral and axial resolution by more than sqrt(2) and 2 times in the 2nd and 4th order accumulations, compared with that of FLFM.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot
Authors:
Yandong Ji,
Zhongyu Li,
Yinan Sun,
Xue Bin Peng,
Sergey Levine,
Glen Berseth,
Koushil Sreenath
Abstract:
We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Develo** algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability…
▽ More
We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Develo** algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability during the control of a dynamic legged robot. Moreover, we need to consider motion planning to shoot the hard-to-model deformable ball rolling on the ground with uncertain friction to a desired location. In this paper, we propose a hierarchical framework that leverages deep reinforcement learning to train (a) a robust motion control policy that can track arbitrary motions and (b) a planning policy to decide the desired kicking motion to shoot a soccer ball to a target. We deploy the proposed framework on an A1 quadrupedal robot and enable it to accurately shoot the ball to random targets in the real world.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Designs, Motion Mechanism, Motion Coordination, and Communication of Bionic Robot Fishes: A Survey
Authors:
Zhiwei Yu,
Kai Li,
Yu Ji,
Simon X. Yang
Abstract:
In the last few years, there have been many new developments and significant accomplishments in the research of bionic robot fishes. However, in terms of swimming performance, existing bionic robot fishes lag far behind fish, prompting researchers to constantly develop innovative designs of various bionic robot fishes. In this paper, the latest designs of robot fishes are presented in detail, dist…
▽ More
In the last few years, there have been many new developments and significant accomplishments in the research of bionic robot fishes. However, in terms of swimming performance, existing bionic robot fishes lag far behind fish, prompting researchers to constantly develop innovative designs of various bionic robot fishes. In this paper, the latest designs of robot fishes are presented in detail, distinguished by the propulsion mode. New robot fishes mainly include soft robot fishes and rigid-soft coupled robot fishes. The latest progress in the study of the swimming mechanism is analyzed on the basis of summarizing the main swimming theories of fish. The current state-of-the-art research in the new field of motion coordination and communication of multiple robot fishes is summarized. The general research trend in robot fishes is to utilize more efficient and robust methods to best mimic real fish while exhibiting superior swimming performance. The current challenges and potential future research directions are discussed. Various methods are needed to narrow the gap in swimming performance between robot fishes and fish. This paper is a first step to bring together roboticists and marine biologists interested in learning state-of-the-art research on bionic robot fishes.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
Authors:
Yuanfeng Ji,
Haotian Bai,
Jie Yang,
Chongjian Ge,
Ye Zhu,
Ruimao Zhang,
Zhen Li,
Lingyan Zhang,
Wanling Ma,
Xiang Wan,
** Luo
Abstract:
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a l…
▽ More
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org.
△ Less
Submitted 1 September, 2022; v1 submitted 16 June, 2022;
originally announced June 2022.
-
GMSS: Graph-Based Multi-Task Self-Supervised Learning for EEG Emotion Recognition
Authors:
Yang Li,
Ji Chen,
Fu Li,
Boxun Fu,
Hao Wu,
Youshuo Ji,
Yi** Zhou,
Yi Niu,
Guangming Shi,
Wenming Zheng
Abstract:
Previous electroencephalogram (EEG) emotion recognition relies on single-task learning, which may lead to overfitting and learned emotion features lacking generalization. In this paper, a graph-based multi-task self-supervised learning model (GMSS) for EEG emotion recognition is proposed. GMSS has the ability to learn more general representations by integrating multiple self-supervised tasks, incl…
▽ More
Previous electroencephalogram (EEG) emotion recognition relies on single-task learning, which may lead to overfitting and learned emotion features lacking generalization. In this paper, a graph-based multi-task self-supervised learning model (GMSS) for EEG emotion recognition is proposed. GMSS has the ability to learn more general representations by integrating multiple self-supervised tasks, including spatial and frequency jigsaw puzzle tasks, and contrastive learning tasks. By learning from multiple tasks simultaneously, GMSS can find a representation that captures all of the tasks thereby decreasing the chance of overfitting on the original task, i.e., emotion recognition task. In particular, the spatial jigsaw puzzle task aims to capture the intrinsic spatial relationships of different brain regions. Considering the importance of frequency information in EEG emotional signals, the goal of the frequency jigsaw puzzle task is to explore the crucial frequency bands for EEG emotion recognition. To further regularize the learned features and encourage the network to learn inherent representations, contrastive learning task is adopted in this work by map** the transformed data into a common feature space. The performance of the proposed GMSS is compared with several popular unsupervised and supervised methods. Experiments on SEED, SEED-IV, and MPED datasets show that the proposed model has remarkable advantages in learning more discriminative and general features for EEG emotional signals.
△ Less
Submitted 11 April, 2022;
originally announced May 2022.
-
TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding
Authors:
Ruiteng Zhang,
Jianguo Wei,
Xugang Lu,
Wenhuan Lu,
Di **,
Junhai Xu,
Lin Zhang,
Yantao Ji,
Jianwu Dang
Abstract:
Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale f…
▽ More
Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale features with the simple fully convolutional operation could not efficiently improve the performance due to the rapid increase of model parameters and computational complexity. Therefore, in the most current state-of-the-art network architectures, only a few branches corresponding to a limited number of temporal scales could be designed for speaker embeddings. To address this problem, in this paper, we propose an effective temporal multi-scale (TMS) model where multi-scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs. The new model is based on the conventional TDNN, where the network architecture is smartly separated into two modeling operators: a channel-modeling operator and a temporal multi-branch modeling operator. Adding temporal multi-scale in the temporal multi-branch operator needs only a little bit increase of the number of parameters, and thus save more computational budget for adding more branches with large temporal scales. Moreover, in the inference stage, we further developed a systemic re-parameterization method to convert the TMS-based model into a single-path-based topology in order to increase inference speed. We investigated the performance of the new TMS method for automatic speaker verification (ASV) on in-domain and out-of-domain conditions. Results show that the TMS-based model obtained a significant increase in the performance over the SOTA ASV models, meanwhile, had a faster inference speed.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
A Data Augmentation Method for Fully Automatic Brain Tumor Segmentation
Authors:
Yu Wang,
Yarong Ji,
Hongbing Xiao
Abstract:
Automatic segmentation of glioma and its subregions is of great significance for diagnosis, treatment and monitoring of disease. In this paper, an augmentation method, called TensorMixup, was proposed and applied to the three dimensional U-Net architecture for brain tumor segmentation. The main ideas included that first, two image patches with size of 128 in three dimensions were selected accordin…
▽ More
Automatic segmentation of glioma and its subregions is of great significance for diagnosis, treatment and monitoring of disease. In this paper, an augmentation method, called TensorMixup, was proposed and applied to the three dimensional U-Net architecture for brain tumor segmentation. The main ideas included that first, two image patches with size of 128 in three dimensions were selected according to glioma information of ground truth labels from the magnetic resonance imaging data of any two patients with the same modality. Next, a tensor in which all elements were independently sampled from Beta distribution was used to mix the image patches. Then the tensor was mapped to a matrix which was used to mix the one-hot encoded labels of the above image patches. Therefore, a new image and its one-hot encoded label were synthesized. Finally, the new data was used to train the model which could be used to segment glioma. The experimental results show that the mean accuracy of Dice scores are 91.32%, 85.67%, and 82.20% respectively on the whole tumor, tumor core, and enhancing tumor segmentation, which proves that the proposed TensorMixup is feasible and effective for brain tumor segmentation.
△ Less
Submitted 17 February, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.
-
Online State Estimation for Supervisor Synthesis in Discrete-Event Systems with Communication Delays and Losses
Authors:
Yunfeng Hou,
Yunfeng Ji,
Gang Wang,
Ching-Yen Weng,
Qingdu Li
Abstract:
In the context of networked discrete-event systems (DESs), communication delays and losses exist between the plant and the supervisor for observation and between the supervisor and the actuator for control. In this paper, we first introduce a new framework for supervisory control of networked DESs. Under the introduced framework, we address the state estimation problem for supervisor synthesis of…
▽ More
In the context of networked discrete-event systems (DESs), communication delays and losses exist between the plant and the supervisor for observation and between the supervisor and the actuator for control. In this paper, we first introduce a new framework for supervisory control of networked DESs. Under the introduced framework, we address the state estimation problem for supervisor synthesis of networked DESs with both communication delays and losses. The estimation algorithm considers the effect of the controls imposed on the system. Additionally, the estimation algorithm is based on the control decisions available up to the moment, and all the future control decisions are assumed to be unknowable. Two notions, called "observation channel configuration" for tracking observation delays and losses and "control channel configuration" for tracking control delays and losses, are defined. Then, we introduce an online approach for state estimation of the controlled system. Compared with the existing approach, the proposed approach under the introduced framework can estimate the state of the controlled system more accurately. As an application of the proposed approach, we finally show that the existing methods can be easily applied to synthesize maximally permissible and safe networked supervisors.
△ Less
Submitted 6 October, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Progressive Graph Convolution Network for EEG Emotion Recognition
Authors:
Yi** Zhou,
Fu Li,
Yang Li,
Youshuo Ji,
Guangming Shi,
Wenming Zheng,
Lijian Zhang,
Yuanfang Chen,
Rui Cheng
Abstract:
Studies in the area of neuroscience have revealed the relationship between emotional patterns and brain functional regions, demonstrating that dynamic relationships between different brain regions are an essential factor affecting emotion recognition determined through electroencephalography (EEG). Moreover, in EEG emotion recognition, we can observe that clearer boundaries exist between coarse-gr…
▽ More
Studies in the area of neuroscience have revealed the relationship between emotional patterns and brain functional regions, demonstrating that dynamic relationships between different brain regions are an essential factor affecting emotion recognition determined through electroencephalography (EEG). Moreover, in EEG emotion recognition, we can observe that clearer boundaries exist between coarse-grained emotions than those between fine-grained emotions, based on the same EEG data; this indicates the concurrence of large coarse- and small fine-grained emotion variations. Thus, the progressive classification process from coarse- to fine-grained categories may be helpful for EEG emotion recognition. Consequently, in this study, we propose a progressive graph convolution network (PGCN) for capturing this inherent characteristic in EEG emotional signals and progressively learning the discriminative EEG features. To fit different EEG patterns, we constructed a dual-graph module to characterize the intrinsic relationship between different EEG channels, containing the dynamic functional connections and static spatial proximity information of brain regions from neuroscience research. Moreover, motivated by the observation of the relationship between coarse- and fine-grained emotions, we adopt a dual-head module that enables the PGCN to progressively learn more discriminative EEG features, from coarse-grained (easy) to fine-grained categories (difficult), referring to the hierarchical characteristic of emotion. To verify the performance of our model, extensive experiments were conducted on two public datasets: SEED-IV and multi-modal physiological emotion database (MPED).
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization
Authors:
Ruiteng Zhang,
Jianguo Wei,
Wenhuan Lu,
Lin Zhang,
Yantao Ji,
Junhai Xu,
Xugang Lu
Abstract:
Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type…
▽ More
Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type networks, to increase the inference speed and verification accuracy of models. CS-Rep solves the problem that existing re-parameterization methods are unsuitable for typical ASV backbones. When a model applies CS-Rep, the training-period network utilizes a multi-branch topology to capture speaker information, whereas the inference-period model converts to a time-delay neural network (TDNN)-like plain backbone with stacked TDNN layers to achieve the fast inference speed. Based on CS-Rep, an improved TDNN with friendly test and deployment called Rep-TDNN is proposed. Compared with the state-of-the-art model ECAPA-TDNN, which is highly recognized in the industry, Rep-TDNN increases the actual inference speed by about 50% and reduces the EER by 10%. The code will be released.
△ Less
Submitted 3 April, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
A New Approach for Verification of Delay Coobservability of Discrete-Event Systems
Authors:
Yunfeng Hou,
Qingdu Li,
Yunfeng Ji,
Gang Wang,
Ching-Yen Weng
Abstract:
In decentralized networked supervisory control of discrete-event systems (DESs), the local supervisors observe event occurrences subject to observation delays to make correct control decisions. Delay coobservability describes whether these local supervisors can make sufficient observations. In this paper, we provide an efficient way to verify delay coobservability. For each controllable event, we…
▽ More
In decentralized networked supervisory control of discrete-event systems (DESs), the local supervisors observe event occurrences subject to observation delays to make correct control decisions. Delay coobservability describes whether these local supervisors can make sufficient observations. In this paper, we provide an efficient way to verify delay coobservability. For each controllable event, we partition the specification language into a finite number of sets such that strings in different sets have different lengths. For each of the sets, we construct a verifier to check if delay coobservability holds for the controllable event. The computational complexity of the proposed approach is polynomial with respect to the number of states, the number of events, and the upper bounds on observation delays and only exponential with respect to the number of local supervisors. It has lower complexity order than the existing approaches. In addition, we investigate the relationship between the decentralized supervisory control of networked DESs and the decentralized fault diagnosis of networked DESs and show that delay $K$-codiagnosability is transformable to delay coobservability. Thus, techniques for the verification of delay coobservability can be leveraged to verify delay $K$-codiagnosability.
△ Less
Submitted 19 May, 2022; v1 submitted 1 October, 2021;
originally announced October 2021.
-
The Medical Segmentation Decathlon
Authors:
Michela Antonelli,
Annika Reinke,
Spyridon Bakas,
Keyvan Farahani,
AnnetteKopp-Schneider,
Bennett A. Landman,
Geert Litjens,
Bjoern Menze,
Olaf Ronneberger,
Ronald M. Summers,
Bram van Ginneken,
Michel Bilello,
Patrick Bilic,
Patrick F. Christ,
Richard K. G. Do,
Marc J. Gollub,
Stephan H. Heckers,
Henkjan Huisman,
William R. Jarnagin,
Maureen K. McHugo,
Sandy Napel,
Jennifer S. Goli Pernicka,
Kawal Rhode,
Catalina Tobon-Gomez,
Eugene Vorontsov
, et al. (34 additional authors not shown)
Abstract:
International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro…
▽ More
International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems
Authors:
Xianfu Chen,
Celimuge Wu,
Tao Chen,
Zhi Liu,
Honggang Zhang,
Mehdi Bennis,
Hang Liu,
Yusheng Ji
Abstract:
This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system, which is deployed by an infrastructure provider (InP). A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP based on a long…
▽ More
This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system, which is deployed by an infrastructure provider (InP). A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP based on a long-term business agreement. Due to the dynamic characteristics, the interactions among the MUs are modelled by a non-cooperative stochastic game, in which the control policies are coupled and each MU aims to selfishly maximize its own expected long-term payoff. To address the Nash equilibrium solutions, we propose that each MU behaves in accordance with the local system states and conjectures, based on which the stochastic game is transformed into a single-agent Markov decision process. Moreover, we derive a novel online deep reinforcement learning (RL) scheme that adopts two separate double deep Q-networks for each MU to approximate the Q-factor and the post-decision Q-factor. Using the proposed deep RL scheme, each MU in the system is able to make decisions without a priori statistical knowledge of dynamics. Numerical experiments examine the potentials of the proposed scheme in balancing the age of information and the energy consumption.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Distributed Consensus of Nonlinear Multi-Agent Systems With Mismatched Uncertainties and Unknown High-Frequency Gains (Extended Version)
Authors:
Gang Wang,
Chaoli Wang,
Zhengtao Ding,
Yunfeng Ji
Abstract:
This brief addresses the distributed consensus problem of nonlinear multi-agent systems under a general directed communication topology. Each agent is governed by higher-order dynamics with mismatched uncertainties, multiple completely unknown high-frequency gains, and external disturbances. The main contribution of this brief is to present a new distributed consensus algorithm, enabling the contr…
▽ More
This brief addresses the distributed consensus problem of nonlinear multi-agent systems under a general directed communication topology. Each agent is governed by higher-order dynamics with mismatched uncertainties, multiple completely unknown high-frequency gains, and external disturbances. The main contribution of this brief is to present a new distributed consensus algorithm, enabling the control input of each agent to require minimal information from its neighboring agents, that is, only their output information. To this end, a dynamic system is explicitly constructed for each agent to generate a reference output. Theoretical and simulation verifications of the proposed algorithm are rigorously studied to ensure that asymptotic consensus can be achieved and that all closed-loop signals remain bounded.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Multi-cell Edge Coverage Enhancement Using Mobile UAV-Relay
Authors:
Yukuan Ji,
Zhaohui Yang,
Hong Shen,
Wei Xu,
Kezhi Wang,
Xiaodai Dong
Abstract:
Unmanned aerial vehicle (UAV)-assisted communication is a promising technology in future wireless communication networks. UAVs can not only help offload data traffic from ground base stations (GBSs), but also improve the quality of service of cell-edge users (CEUs). In this paper, we consider the enhancement of cell-edge communications through a mobile relay, i.e., UAV, in multi-cell networks. Dur…
▽ More
Unmanned aerial vehicle (UAV)-assisted communication is a promising technology in future wireless communication networks. UAVs can not only help offload data traffic from ground base stations (GBSs), but also improve the quality of service of cell-edge users (CEUs). In this paper, we consider the enhancement of cell-edge communications through a mobile relay, i.e., UAV, in multi-cell networks. During each transmission period, GBSs first send data to the UAV, and then the UAV forwards its received data to CEUs according to a certain association strategy. In order to maximize the sum rate of all CEUs, we jointly optimize the UAV mobility management, including trajectory, velocity, and acceleration, and association strategy of CEUs to the UAV, subject to minimum rate requirements of CEUs, mobility constraints of the UAV and causal buffer constraints in practice. To address the mixed-integer nonconvex problem, we transform it into two convex subproblems by applying tight bounds and relaxations. An iterative algorithm was proposed to solve the two subproblems in an alternating manner. Numerical results show that the proposed algorithm achieves higher rates of CEUs as compared with existing benchmark schemes.
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
Super Resolution Convolutional Neural Network for Feature Extraction in Spectroscopic Data
Authors:
Han Peng,
Xiang Gao,
Yu He,
Yiwei Li,
Yuchen Ji,
Chuhang Liu,
Sandy A. Ekahana,
Ding Pei,
Zhongkai Liu,
Zhixun Shen,
Yulin Chen
Abstract:
Two dimensional (2D) peak finding is a common practice in data analysis for physics experiments, which is typically achieved by computing the local derivatives. However, this method is inherently unstable when the local landscape is complicated, or the signal-to-noise ratio of the data is low. In this work, we propose a new method in which the peak tracking task is formalized as an inverse problem…
▽ More
Two dimensional (2D) peak finding is a common practice in data analysis for physics experiments, which is typically achieved by computing the local derivatives. However, this method is inherently unstable when the local landscape is complicated, or the signal-to-noise ratio of the data is low. In this work, we propose a new method in which the peak tracking task is formalized as an inverse problem, thus can be solved with a convolutional neural network (CNN). In addition, we show that the underlying physics principle of the experiments can be used to generate the training data. By generalizing the trained neural network on real experimental data, we show that the CNN method can achieve comparable or better results than traditional derivative based methods. This approach can be further generalized in different physics experiments when the physical process is known.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Orthonormal Embedding-based Deep Clustering for Single-channel Speech Separation
Authors:
Soyeon Choe,
Soo-Whan Chung,
Youna Ji,
Hong-Goo Kang
Abstract:
Deep clustering is a deep neural network-based speech separation algorithm that first trains the mixed component of signals with high-dimensional embeddings, and then uses a clustering algorithm to separate each mixture of sources. In this paper, we extend the baseline criterion of deep clustering with an additional regularization term to further improve the overall performance. This term plays a…
▽ More
Deep clustering is a deep neural network-based speech separation algorithm that first trains the mixed component of signals with high-dimensional embeddings, and then uses a clustering algorithm to separate each mixture of sources. In this paper, we extend the baseline criterion of deep clustering with an additional regularization term to further improve the overall performance. This term plays a role in assigning a condition to the embeddings such that it gives less correlation to each embedding dimension, leading to better decomposition of the spectral bins. The regularization term helps to mitigate the unavoidable permutation problem in the conventional deep clustering method, which enables to bring better clustering through the formation of optimal embeddings. We evaluate the results by varying embedding dimension, signal-to-interference ratio (SIR), and gender dependency. The performance comparison with the source separation measurement metric, i.e. signal-to-distortion ratio (SDR), confirms that the proposed method outperforms the conventional deep clustering method.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
Electronics of Time-of-flight Measurement for Back-n at CSNS
Authors:
T. Yu,
P. Cao,
X. Y. Ji,
L. K. Xie,
X. R. Huang,
Q. An,
H. Y. Bai,
J. Bao,
Y. H. Chen,
P. J. Cheng,
Z. Q. Cui,
R. R. Fan,
C. Q. Feng,
M. H. Gu,
Z. J. Han,
G. Z. He,
Y. C. He,
Y. F. He,
H. X. Huang,
W. L. Huang,
X. L. Ji,
H. Y. Jiang,
W. Jiang,
H. Y. **g,
L. Kang
, et al. (46 additional authors not shown)
Abstract:
Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXI…
▽ More
Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXIe (Peripheral Component Interconnect Express eXtensions for Instrumentation) platform, which is composed of FDM (Field Digitizer Modules), TCM (Trigger and Clock Module), and SCM (Signal Conditioning Module). T0 signal synchronous to the CSNS accelerator represents the neutron emission from the target. It is the start of time stamp. The trigger and clock module (TCM) receives, synchronizes and distributes the T0 signal to each FDM based on the PXIe backplane bus. Meantime, detector signals after being conditioned are fed into FDMs for waveform digitizing. First sample point of the signal is the stop of time stamp. According to the start, stop time stamp and the time of signal over threshold, the total TOF can be obtained. FPGA-based (Field Programmable Gate Array) TDC is implemented on TCM to accurately acquire the time interval between the asynchronous T0 signal and the global synchronous clock phase. There is also an FPGA-based TDC on FDM to accurately acquire the time interval between T0 arriving at FDM and the first sample point of the detector signal, the over threshold time of signal is obtained offline. This method for TOF measurement is efficient and not needed for additional modules. Test result shows the accuracy of TOF is sub-nanosecond and can meet the requirement for Back-n at CSNS.
△ Less
Submitted 24 June, 2018;
originally announced June 2018.
-
T0 Fan-out for Back-n White Neutron Facility at CSNS
Authors:
X. Y. Ji,
P. Cao,
T. Yu,
L. K. Xie,
X. R. Huang,
Q. An,
H. Y. Bai,
J. Bao,
Y. H. Chen,
P. J. Cheng,
Z. Q. Cui,
R. R. Fan,
C. Q. Feng,
M. H. Gu,
Z. J. Han,
G. Z. He,
Y. C. He,
Y. F. He,
H. X. Huang,
W. L. Huang,
X. L. Ji,
H. Y. Jiang,
W. Jiang,
H. Y. **g,
L. Kang
, et al. (46 additional authors not shown)
Abstract:
the main physics goal for Back-n white neutron facility at China Spallation Neutron Source (CSNS) is to measure nuclear data. The energy of neutrons is one of the most important parameters for measuring nuclear data. Method of time of flight (TOF) is used to obtain the energy of neutrons. The time when proton bunches hit the thick tungsten target is considered as the start point of TOF. T0 signal,…
▽ More
the main physics goal for Back-n white neutron facility at China Spallation Neutron Source (CSNS) is to measure nuclear data. The energy of neutrons is one of the most important parameters for measuring nuclear data. Method of time of flight (TOF) is used to obtain the energy of neutrons. The time when proton bunches hit the thick tungsten target is considered as the start point of TOF. T0 signal, generated from the CSNS accelerator, represents this start time. Besides, the T0 signal is also used as the gate control signal that triggers the readout electronics. Obviously, the timing precision of T0 directly affects the measurement precision of TOF and controls the running or readout electronics. In this paper, the T0 fan-out for Back-n white neutron facility at CSNS is proposed. The T0 signal travelling from the CSNS accelerator is fanned out to the two underground experiment stations respectively over long cables. To guarantee the timing precision, T0 signal is conditioned with good signal edge. Furthermore, techniques of signal pre-emphasizing and equalizing are used to improve signal quality after T0 being transmitted over long cables with about 100 m length. Experiments show that the T0 fan-out works well, the T0 signal transmitted over 100 m remains a good time resolution with a standard deviation of 25 ps. It absolutely meets the required accuracy of the measurement of TOF.
△ Less
Submitted 24 June, 2018;
originally announced June 2018.
-
Stochastic Interchange Scheduling in the Real-Time Electricity Market
Authors:
Yuting Ji,
Tongxin Zheng,
Lang Tong
Abstract:
The problem of multi-area interchange scheduling in the presence of stochastic generation and load is considered. A new interchange scheduling technique based on a two-stage stochastic minimization of overall expected operating cost is proposed. Because directly solving the stochastic optimization is intractable, an equivalent problem that maximizes the expected social welfare is formulated. The p…
▽ More
The problem of multi-area interchange scheduling in the presence of stochastic generation and load is considered. A new interchange scheduling technique based on a two-stage stochastic minimization of overall expected operating cost is proposed. Because directly solving the stochastic optimization is intractable, an equivalent problem that maximizes the expected social welfare is formulated. The proposed technique leverages the operator's capability of forecasting locational marginal prices (LMPs) and obtains the optimal interchange schedule without iterations among operators.
△ Less
Submitted 10 January, 2016;
originally announced January 2016.
-
Probabilistic Forecast of Real-Time LMP and Network Congestion
Authors:
Yuting Ji,
Robert J. Thomas,
Lang Tong
Abstract:
The short-term forecasting of real-time locational marginal price (LMP) and network congestion is considered from a system operator perspective. A new probabilistic forecasting technique is proposed based on a multiparametric programming formulation that partitions the uncertainty parameter space into critical regions from which the conditional probability distribution of the real-time LMP/congest…
▽ More
The short-term forecasting of real-time locational marginal price (LMP) and network congestion is considered from a system operator perspective. A new probabilistic forecasting technique is proposed based on a multiparametric programming formulation that partitions the uncertainty parameter space into critical regions from which the conditional probability distribution of the real-time LMP/congestion is obtained. The proposed method incorporates load/generation forecast, time varying operation constraints, and contingency models. By shifting the computation cost associated with multiparametric programs offline, the online computation cost is significantly reduced. An online simulation technique by generating critical regions dynamically is also proposed, which results in several orders of magnitude improvement in the computational cost over standard Monte Carlo methods.
△ Less
Submitted 24 June, 2016; v1 submitted 20 March, 2015;
originally announced March 2015.