Search | arXiv e-print repository

Efficient UAV Hovering, Resource Allocation, and Trajectory Design for ISAC with Limited Backhaul Capacity

Authors: Ata Khalili, Atefeh Rezaei, Dongfang Xu, Falko Dressler, Robert Schober

Abstract: In this paper, we investigate the joint resource allocation and trajectory design for a multi-user, multi-target unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) system, where the link capacity between a ground base station (BS) and the UAV is limited. The UAV conducts target sensing and information transmission in orthogonal time slots to prevent interference. As… ▽ More In this paper, we investigate the joint resource allocation and trajectory design for a multi-user, multi-target unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) system, where the link capacity between a ground base station (BS) and the UAV is limited. The UAV conducts target sensing and information transmission in orthogonal time slots to prevent interference. As is common in practical systems, sensing is performed while the UAV hovers, allowing the UAV to acquire high-quality sensing data. Subsequently, the acquired sensing data is offloaded to the ground BS for further processing. We jointly optimize the UAV trajectory, UAV velocity, beamforming for the communication users, power allocated to the sensing beam, and time of hovering for sensing to minimize the power consumption of the UAV while ensuring the communication quality of service (QoS) and successful sensing. Due to the prohibitively high complexity of the resulting non-convex mixed integer non-linear program (MINLP), we employ a series of transformations and optimization techniques, including semidefinite relaxation, big-M method, penalty approach, and successive convex approximation, to obtain a low-complexity suboptimal solution. Our simulation results reveal that 1) the proposed design achieves significant power savings compared to two baseline schemes; 2) stricter sensing requirements lead to longer sensing times, highlighting the challenge of efficiently managing both sensing accuracy and sensing time; 3) the optimized trajectory design ensures precise hovering directly above the targets during sensing, enhancing sensing quality and enabling the application of energy-focused beams; and 4) the proposed trajectory design balances the capacity of the backhaul link and the downlink rate of the communication users. △ Less

Submitted 30 April, 2024; originally announced June 2024.

Comments: Submitted to IEEE for possible publications. arXiv admin note: text overlap with arXiv:2302.10124

arXiv:2406.10897 [pdf, ps, other]

When NOMA Meets AIGC: Enhanced Wireless Federated Learning

Authors: Ding Xu, Lingjie Duan, Hongbo Zhu

Abstract: Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types… ▽ More Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types of data to compensate for the insufficient local data. Nevertheless, downloading synthetic data or uploading local models iteratively takes a lot of time, especially for a large amount of devices. To address this issue, we propose to leverage non-orthogonal multiple access (NOMA) to achieve efficient synthetic data and local model transmission. This paper is the first to combine AIGC and NOMA with WFL to maximally enhance the learning performance. For the proposed NOMA+AIGC-enhanced WFL, the problem of jointly optimizing the synthetic data distribution, two-way communication and computation resource allocation to minimize the global learning error is investigated. The problem belongs to NP-hard mixed integer nonlinear programming, whose optimal solution is intractable to find. We first employ the block coordinate descent method to decouple the complicated-coupled variables, and then resort to our analytical method to derive an efficient low-complexity local optimal solution with partial closed-form results. Extensive simulations validate the superiority of the proposed scheme compared to the existing and benchmark schemes such as the frequency/time division multiple access based AIGC-enhanced schemes. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages, submitted to IEEE TWC for possible publication

arXiv:2406.10895 [pdf, ps, other]

Fair Computation Offloading for RSMA-Assisted Mobile Edge Computing Networks

Authors: Ding Xu, Lingjie Duan, Haitao Zhao, Hongbo Zhu

Abstract: Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issu… ▽ More Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issues. In this regard, we study an RSMA-assisted MEC system with multiple MEC servers, channels and devices, and consider the fairness among devices. A max-min fairness computation offloading problem to maximize the minimum computation offloading rate is investigated. Since the problem is difficult to solve optimally, we develop an efficient algorithm to obtain a suboptimal solution. Particularly, the time allocation and the computing frequency allocation are derived as closed-form functions of the transmit power allocation and the successive interference cancellation (SIC) decoding order, while the SIC decoding order is obtained heuristically, and the bisection search and the successive convex approximation methods are employed to optimize the transmit power allocation. For the MEC server and channel allocation problem, we transform it into a hypergraph matching problem and solve it by matching theory. Simulation results demonstrate that the proposed RSMA-assisted MEC system outperforms current MEC systems under various system setups. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages,submitted to IEEE TWC for possible publication

arXiv:2405.12357 [pdf]

Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI reconstruction time while maintaining the reconstruction quality. Methods: Patients who underwent free-breathing liver 4D MRI were included in the study. Fully- and retrospectively under-sampled data at 3, 6 and 10 times (3x, 6x and 10x) were first reconstructed using the nuFFT algorithm. Re-Con-GAN then trained input and output in pairs. Three types of networks, ResNet9, UNet and reconstruction swin transformer, were explored as generators. PatchGAN was selected as the discriminator. Re-Con-GAN processed the data (3D+t) as temporal slices (2D+t). A total of 48 patients with 12332 temporal slices were split into training (37 patients with 10721 slices) and test (11 patients with 1611 slices). Results: Re-Con-GAN consistently achieved comparable/better PSNR, SSIM, and RMSE scores compared to CS/UNet models. The inference time of Re-Con-GAN, UNet and CS are 0.15s, 0.16s, and 120s. The GTV detection task showed that Re-Con-GAN and CS, compared to UNet, better improved the dice score (3x Re-Con-GAN 80.98%; 3x CS 80.74%; 3x UNet 79.88%) of unprocessed under-sampled images (3x 69.61%). Conclusion: A generative network with adversarial training is proposed with promising and efficient reconstruction results demonstrated on an in-house dataset. The rapid and qualitative reconstruction of 4D liver MR has the potential to facilitate online adaptive MR-guided radiotherapy for liver cancer. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11093 [pdf, other]

AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

Authors: David Xu

Abstract: Multi-modal learning in the audio-language domain has seen significant advancements in recent years. However, audio-language learning faces challenges due to limited and lower-quality data compared to image-language tasks. Existing audio-language datasets are notably smaller, and manual labeling is hindered by the need to listen to entire audio clips for accurate labeling. Our method systematica… ▽ More Multi-modal learning in the audio-language domain has seen significant advancements in recent years. However, audio-language learning faces challenges due to limited and lower-quality data compared to image-language tasks. Existing audio-language datasets are notably smaller, and manual labeling is hindered by the need to listen to entire audio clips for accurate labeling. Our method systematically generates audio-caption pairs by augmenting audio clips with natural language labels and corresponding audio signal processing operations. Leveraging a Large Language Model, we generate descriptions of augmented audio clips with a prompt template. This scalable method produces AudioSetMix, a high-quality training dataset for text-and-audio related models. Integration of our dataset improves models performance on benchmarks by providing diversified and better-aligned examples. Notably, our dataset addresses the absence of modifiers (adjectives and adverbs) in existing datasets. By enabling models to learn these concepts, and generating hard negative examples during training, we achieve state-of-the-art performance on multiple benchmarks. △ Less

Submitted 7 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: typos corrected

arXiv:2405.04274 [pdf, other]

Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression

Authors: Zhenghao Chen, Lu** Zhou, Zhihao Hu, Dong Xu

Abstract: Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents. Although these methods have been very practical in neural image compression (NIC), their application in neural video compression (NVC) is still limited due to two main aspects: 1), video compression relies heavily on temporal redundancy, therefore updating just one or a few… ▽ More Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents. Although these methods have been very practical in neural image compression (NIC), their application in neural video compression (NVC) is still limited due to two main aspects: 1), video compression relies heavily on temporal redundancy, therefore updating just one or a few frames can lead to significant errors accumulating over time; 2), NVC frameworks are generally more complex, with many large components that are not easy to update quickly during encoding. To address the previously mentioned challenges, we have developed a content-adaptive NVC technique called Group-aware Parameter-Efficient Updating (GPU). Initially, to minimize error accumulation, we adopt a group-aware approach for updating encoder parameters. This involves adopting a patch-based Group of Pictures (GoP) training strategy to segment a video into patch-based GoPs, which will be updated to facilitate a globally optimized domain-transferable solution. Subsequently, we introduce a parameter-efficient delta-tuning strategy, which is achieved by integrating several light-weight adapters into each coding component of the encoding process by both serial and parallel configuration. Such architecture-agnostic modules stimulate the components with large parameters, thereby reducing both the update cost and the encoding time. We incorporate our GPU into the latest NVC framework and conduct comprehensive experiments, whose results showcase outstanding video compression efficiency across four video benchmarks and adaptability of one medical image benchmark. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00316 [pdf, other]

Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving

Authors: Hang Zhou, Haichao Liu, Hongliang Lu, Dan Xu, Jun Ma, Yiding Ji

Abstract: Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have… ▽ More Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have limitations in achieving perfect accuracy on the training dataset and network performance can be affected by out-of-distribution problem. In this paper, we propose FusionAssurance, a novel trajectory-based end-to-end driving fusion framework which combines physics-informed control for safety assurance. By incorporating Potential Field into Model Predictive Control, FusionAssurance is capable of navigating through scenarios that are not included in the training dataset and scenarios where neural network fail to generalize. The effectiveness of the approach is demonstrated by extensive experiments under various scenarios on the CARLA benchmark. △ Less

Submitted 5 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.18705 [pdf, other]

Wireless Information and Energy Transfer in the Era of 6G Communications

Authors: Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

Abstract: Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting… ▽ More Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting the quality-of-service demands of WIET, in terms of both data transfer and power delivery, requires effective co-design of the information and energy signals. In this article, we present the main principles and design aspects of WIET, focusing on its integration in 6G networks. First, we discuss how conventional communication notions such as resource allocation and waveform design need to be revisited in the context of WIET. Next, we consider various candidate 6G technologies that can boost WIET efficiency, namely, holographic multiple-input multiple-output, near-field beamforming, terahertz communication, intelligent reflecting surfaces (IRSs), and reconfigurable (fluid) antenna arrays. We introduce respective WIET design methods, analyze the promising performance gains of these WIET systems, and discuss challenges, open issues, and future research directions. Finally, a near-field energy beamforming scheme and a power-based IRS beamforming algorithm are experimentally validated using a wireless energy transfer testbed. The vision of WIET in communication systems has been gaining momentum in recent years, with constant progress with respect to theoretical but also practical aspects. The comprehensive overview of the state of the art of WIET presented in this paper highlights the potentials of WIET systems as well as their overall benefits in 6G networks. △ Less

Submitted 16 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Proceedings of the IEEE, 36 pages, 33 figures

arXiv:2403.16477 [pdf, other]

Safeguarding Next Generation Multiple Access Using Physical Layer Security Techniques: A Tutorial

Authors: Lu Lv, Dongyang Xu, Rose Qingyang Hu, Yinghui Ye, Long Yang, Xianfu Lei, Xianbin Wang, Dong In Kim, Arumugam Nallanathan

Abstract: Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Neve… ▽ More Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Nevertheless, the support of massive access via NOMA leads to additional security threats, due to the open nature of the air interface, the broadcast characteristic of radio propagation as well as intertwined relationship among paired NOMA users. To address this specific challenge, the superimposed transmission of NOMA can be explored as new opportunities for security aware design, for example, multiuser interference inherent in NOMA can be constructively engineered to benefit communication secrecy and privacy. The purpose of this tutorial is to provide a comprehensive overview on the state-of-the-art physical layer security techniques that guarantee wireless security and privacy for NOMA networks, along with the opportunities, technical challenges, and future research trends. △ Less

Submitted 21 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Invited paper by Proceedings of the IEEE

arXiv:2402.09976 [pdf, ps, other]

Sensing-assisted Robust SWIPT for Mobile Energy Harvesting Receivers

Authors: Yiming Xu, Dongfang Xu, Shenghui Song

Abstract: Simultaneous wireless information and power transfer (SWIPT) has been proposed to offer communication services and transfer power to the energy harvesting receiver (EHR) concurrently. However, existing works mainly focused on static EHRs, without considering the location uncertainty caused by the movement of EHRs and location estimation errors. To tackle this issue, this paper considers the sensin… ▽ More Simultaneous wireless information and power transfer (SWIPT) has been proposed to offer communication services and transfer power to the energy harvesting receiver (EHR) concurrently. However, existing works mainly focused on static EHRs, without considering the location uncertainty caused by the movement of EHRs and location estimation errors. To tackle this issue, this paper considers the sensing-assisted SWIPT design in a networked integrated sensing and communication (ISAC) system in the presence of location uncertainty. A two-phase robust design is proposed to reduce the location uncertainty and improve the power transfer efficiency. In particular, each time frame is divided into two phases, i.e., sensing and WPT phases, via time-splitting. The sensing phase performs collaborative sensing to localize the EHR, whose results are then utilized in the WPT phase for efficient WPT. To minimize the power consumption with given communication and power transfer requirements, a two-layer optimization framework is proposed to jointly optimize the time-splitting ratio, coordinated beamforming policy, and sensing node selection. Simulation results validate the effectiveness of the proposed design and demonstrate the existence of an optimal time-splitting ratio for given location uncertainty. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.09974 [pdf, ps, other]

Interference Mitigation for Network-Level ISAC: An Optimization Perspective

Authors: Dongfang Xu, Yiming Xu, Xin Zhang, Xianghao Yu, Shenghui Song, Robert Schober

Abstract: Future wireless networks are envisioned to simultaneously provide high data-rate communication and ubiquitous environment-aware services for numerous users. One promising approach to meet this demand is to employ network-level integrated sensing and communications (ISAC) by jointly designing the signal processing and resource allocation over the entire network. However, to unleash the full potenti… ▽ More Future wireless networks are envisioned to simultaneously provide high data-rate communication and ubiquitous environment-aware services for numerous users. One promising approach to meet this demand is to employ network-level integrated sensing and communications (ISAC) by jointly designing the signal processing and resource allocation over the entire network. However, to unleash the full potential of network-level ISAC, some critical challenges must be tackled. Among them, interference management is one of the most significant ones. In this article, we build up a bridge between interference mitigation techniques and the corresponding optimization methods, which facilitates efficient interference mitigation in network-level ISAC systems. In particular, we first identify several types of interference in network-level ISAC systems, including self-interference, mutual interference, crosstalk, clutter, and multiuser interference. Then, we present several promising techniques that can be utilized to suppress specific types of interference. For each type of interference, we discuss the corresponding problem formulation and identify the associated optimization methods. Moreover, to illustrate the effectiveness of the proposed interference mitigation techniques, two concrete network-level ISAC systems, namely coordinated cellular network-based and distributed antenna-based ISAC systems, are investigated from interference management perspective. Experiment results indicate that it is beneficial to collaboratively employ different interference mitigation techniques and leverage the network structure to achieve the full potential of network-level ISAC. Finally, we highlight several promising future research directions for the design of ISAC systems. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures, and the relevant simulation code can be found at https://dongfang-xu.github.io/homepage/code/Two_cases.zip

arXiv:2402.09463 [pdf]

Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, ** Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

Abstract: Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

arXiv:2401.02678 [pdf, other]

MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music

Authors: Yikai Qian, Tianle Wang, Xinyi Tong, Xin **, Duo Xu, Bo Zheng, Tiezheng Ge, Feng Yu, Song-Chun Zhu

Abstract: In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical ele… ▽ More In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical elements, including structures, textures, rhythms, and harmonies. This hierarchical approach expands the representability across various scales of music. This representation serves as the foundation for an energy-based model, uniquely tailored to learn musical concepts through a flexible algorithm framework relying on the minimax entropy principle. Utilizing an adapted Metropolis-Hastings sampling technique, the model enables fine-grained control over music generation. A comprehensive empirical evaluation, contrasting this novel approach with existing methodologies, manifests considerable advancements in interpretability and controllability. This study marks a substantial contribution to the fields of music analysis, composition, and computational musicology. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.02192 [pdf]

Nodule detection and generation on chest X-rays: NODE21 Challenge

Authors: Ecem Sogancioglu, Bram van Ginneken, Finn Behrendt, Marcel Bengs, Alexander Schlaefer, Miron Radu, Di Xu, Ke Sheng, Fabien Scalzo, Eric Marcus, Samuele Papa, Jonas Teuwen, Ernst Th. Scholten, Steven Schalekamp, Nils Hendrix, Colin Jacobs, Ward Hendrix, Clara I Sánchez, Keelin Murphy

Abstract: Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of… ▽ More Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 15 pages, 5 figures

arXiv:2311.16771 [pdf, other]

The HR-Calculus: Enabling Information Processing with Quaternion Algebra

Authors: Danilo P. Mandic, Sayed Pouria Talebi, Clive Cheong Took, Yili Xia, Dongpo Xu, Min Xiang, Pauline Bourigault

Abstract: From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing technique… ▽ More From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing techniques specifically designed for quaternion-valued signals have only recently come to the attention of the machine learning, signal processing, and control communities. The most important development in this direction is introduction of the HR-calculus, which provides the required mathematical foundation for deriving adaptive information processing techniques directly in the quaternion domain. In this article, the foundations of the HR-calculus are revised and the required tools for deriving adaptive learning techniques suitable for dealing with quaternion-valued signals, such as the gradient operator, chain and product derivative rules, and Taylor series expansion are presented. This serves to establish the most important applications of adaptive information processing in the quaternion domain for both single-node and multi-node formulations. The article is supported by Supplementary Material, which will be referred to as SM. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.08829 [pdf, other]

Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection

Authors: Yifan Zhou, Dongxing Xu, Haoran Wei, Yanhua Long

Abstract: In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features… ▽ More In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features. To address these challenges, we propose a new AE-based framework termed AEGM. Specifically, we first insert an auxiliary classifier into AE to enhance ASD in a multi-task learning manner. Then, we design a group-based decoder structure, accompanied by an adaptive loss function, to endow the model with domain-specific knowledge. Results on the DCASE 2021 Task 2 development set show that our methods achieve a relative improvement of 13.11% and 15.20% respectively in average AUC over the official AE and MobileNetV2 across test sets of seven machines. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Submitted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

arXiv:2311.02376 [pdf, ps, other]

Intelligent Reflecting Surface-Aided Wireless Communication with Movable Elements

Authors: Guojie Hu, Qingqing Wu, Dognhui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Intelligent reflecting surface (IRS) has been recognized as a powerful technology for boosting communication performance. To reduce manufacturing and control costs, it is preferable to consider discrete phase shifts (DPSs) for IRS, which are set by default as uniformly distributed in the range of $[ - π,π)$ in the literature. Such setting, however, cannot achieve a desirable performance over the g… ▽ More Intelligent reflecting surface (IRS) has been recognized as a powerful technology for boosting communication performance. To reduce manufacturing and control costs, it is preferable to consider discrete phase shifts (DPSs) for IRS, which are set by default as uniformly distributed in the range of $[ - π,π)$ in the literature. Such setting, however, cannot achieve a desirable performance over the general Rician fading where the channel phase concentrates in a narrow range with a higher probability. Motivated by this drawback, we in this paper design optimal non-uniform DPSs for IRS to achieve a desirable performance level. The fundamental challenge is the \textit{possible offset in phase distribution across different cascaded source-element-destination channels}, if adopting conventional IRS where the position of each element is fixed. Such phenomenon leads to different patterns of optimal non-uniform DPSs for each IRS element and thus causes huge manufacturing costs especially when the number of IRS elements is large. Driven by the recently emerging fluid antenna system (or movable antenna technology), we demonstrate that if the position of each IRS element can be flexibly adjusted, the above phase distribution offset can be surprisingly eliminated, leading to the same pattern of DPSs for each IRS element. Armed with this, we then determine the form of unified non-uniform DPSs based on a low-complexity iterative algorithm. Simulations show that our proposed design significantly improves the system performance compared to competitive benchmarks. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2311.00188 [pdf]

A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain

Authors: Di Xu, Qihui Lyu, Dan Ruan, Ke Sheng

Abstract: Dual-energy computed tomography (DECT) utilizes separate X-ray energy spectra to improve multi-material decomposition (MMD) for various diagnostic applications. However accurate decomposing more than two types of material remains challenging using conventional methods. Deep learning (DL) methods have shown promise to improve the MMD performance, but typical approaches of conducing DL-MMD in the im… ▽ More Dual-energy computed tomography (DECT) utilizes separate X-ray energy spectra to improve multi-material decomposition (MMD) for various diagnostic applications. However accurate decomposing more than two types of material remains challenging using conventional methods. Deep learning (DL) methods have shown promise to improve the MMD performance, but typical approaches of conducing DL-MMD in the image domain fail to fully utilize projection information or under iterative setup are computationally inefficient in both training and prediction. In this work, we present a clinical-applicable MMD (>2) framework rFast-MMDNet, operating with raw projection data in non-recursive setup, for breast tissue differentiation. rFast-MMDNet is a two-stage algorithm, including stage-one SinoNet to perform dual energy projection decomposition on tissue sinograms and stage-two FBP-DenoiseNet to perform domain adaptation and image post-processing. rFast-MMDNet was tested on a 2022 DL-Spectral-Challenge breast phantom dataset. The two stages of rFast-MMDNet were evaluated separately and then compared with four noniterative reference methods including a direct inversion method (AA-MMD), an image domain DL method (ID-UNet), AA-MMD/ID-UNet + DenoiseNet and a sinogram domain DL method (Triple-CBCT). Our results show that models trained from information stored in DE transmission domain can yield high-fidelity decomposition of the adipose, calcification, and fibroglandular materials with averaged RMSE, MAE, negative PSNR, and SSIM of 0.004+/-~0, 0.001+/-~0, -45.027+/-~0.542, and 0.002+/-~0 benchmarking to the ground truth, respectively. Training of entire rFast-MMDNet on a 4xRTX A6000 GPU cluster took a day with inference time <1s. All DL methods generally led to more accurate MMD than AA-MMD. rFast-MMDNet outperformed Triple-CBCT, but both are superior to the image-domain based methods. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: AAPM 2023 Dl-spectral Challenge Summary

arXiv:2310.04114 [pdf, other]

Aorta Segmentation from 3D CT in MICCAI SEG.A. 2023 Challenge

Authors: Andriy Myronenko, Dong Yang, Yufan He, Daguang Xu

Abstract: Aorta provides the main blood supply of the body. Screening of aorta with imaging helps for early aortic disease detection and monitoring. In this work, we describe our solution to the Segmentation of the Aorta (SEG.A.231) from 3D CT challenge. We use automated segmentation method Auto3DSeg available in MONAI. Our solution achieves an average Dice score of 0.920 and 95th percentile of the Hausdorf… ▽ More Aorta provides the main blood supply of the body. Screening of aorta with imaging helps for early aortic disease detection and monitoring. In this work, we describe our solution to the Segmentation of the Aorta (SEG.A.231) from 3D CT challenge. We use automated segmentation method Auto3DSeg available in MONAI. Our solution achieves an average Dice score of 0.920 and 95th percentile of the Hausdorff Distance (HD95) of 6.013, which ranks first and wins the SEG.A. 2023 challenge. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: MICCAI 2023, SEG.A. 2023 challenge 1st place

arXiv:2309.10227 [pdf]

Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer

Authors: Di Xu, Hengjie Liu, Dan Ruan, Ke Sheng

Abstract: Dynamic magnetic resonance imaging (DMRI) is an effective imaging tool for diagnosis tasks that require motion tracking of a certain anatomy. To speed up DMRI acquisition, k-space measurements are commonly undersampled along spatial or spatial-temporal domains. The difficulty of recovering useful information increases with increasing undersampling ratios. Compress sensing was invented for this pur… ▽ More Dynamic magnetic resonance imaging (DMRI) is an effective imaging tool for diagnosis tasks that require motion tracking of a certain anatomy. To speed up DMRI acquisition, k-space measurements are commonly undersampled along spatial or spatial-temporal domains. The difficulty of recovering useful information increases with increasing undersampling ratios. Compress sensing was invented for this purpose and has become the most popular method until deep learning (DL) based DMRI reconstruction methods emerged in the past decade. Nevertheless, existing DL networks are still limited in long-range sequential dependency understanding and computational efficiency and are not fully automated. Considering the success of Transformers positional embedding and "swin window" self-attention mechanism in the vision community, especially natural video understanding, we hereby propose a novel architecture named Reconstruction Swin Transformer (RST) for 4D MRI. RST inherits the backbone design of the Video Swin Transformer with a novel reconstruction head introduced to restore pixel-wise intensity. A convolution network called SADXNet is used for rapid initialization of 2D MR frames before RST learning to effectively reduce the model complexity, GPU hardware demand, and training time. Experimental results in the cardiac 4D MR dataset further substantiate the superiority of RST, achieving the lowest RMSE of 0.0286 +/- 0.0199 and 1 - SSIM of 0.0872 +/- 0.0783 on 9 times accelerated validation sequences. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: MICCAI 2023 Workshop

arXiv:2309.02171 [pdf, other]

A Wideband MIMO Channel Model for Aerial Intelligent Reflecting Surface-Assisted Wireless Communications

Authors: Shaoyi Liu, Nan Ma, Yaning Chen, Ke Peng, Dongsheng Xue

Abstract: Compared to traditional intelligent reflecting surfaces(IRS), aerial IRS (AIRS) has unique advantages, such as more flexible deployment and wider service coverage. However, modeling AIRS in the channel presents new challenges due to their mobility. In this paper, a three-dimensional (3D) wideband channel model for AIRS and IRS joint-assisted multiple-input multiple-output (MIMO) communication syst… ▽ More Compared to traditional intelligent reflecting surfaces(IRS), aerial IRS (AIRS) has unique advantages, such as more flexible deployment and wider service coverage. However, modeling AIRS in the channel presents new challenges due to their mobility. In this paper, a three-dimensional (3D) wideband channel model for AIRS and IRS joint-assisted multiple-input multiple-output (MIMO) communication system is proposed, where considering the rotational degrees of freedom in three directions and the motion angles of AIRS in space. Based on the proposed model, the channel impulse response (CIR), correlation function, and channel capacity are derived, and several feasible joint phase shifts schemes for AIRS and IRS units are proposed. Simulation results show that the proposed model can capture the channel characteristics accurately, and the proposed phase shifts methods can effectively improve the channel statistical characteristics and increase the system capacity. Additionally, we observe that in certain scenarios, the paths involving the IRS and the line-of-sight (LoS) paths exhibit similar characteristics. These findings provide valuable insights for the future development of intelligent communication systems. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 6 pages, 7 figures

arXiv:2308.15696 [pdf, other]

Implementation and Evaluation of Physical Layer Key Generation on SDR based LoRa Platform

Authors: Yingying Hu, Dongyang Xu, Tiantian Zhang

Abstract: Physical layer key generation technology which leverages channel randomness to generate secret keys has attracted extensive attentions in long range (LoRa)-based networks recently. We in this paper develop a software-defined radio (SDR) based LoRa communications platform using GNU Radio on universal software radio peripheral (USRP) to implement and evaluate typical physical layer key generation sc… ▽ More Physical layer key generation technology which leverages channel randomness to generate secret keys has attracted extensive attentions in long range (LoRa)-based networks recently. We in this paper develop a software-defined radio (SDR) based LoRa communications platform using GNU Radio on universal software radio peripheral (USRP) to implement and evaluate typical physical layer key generation schemes. Thanks to the flexibility and configurability of GNU Radio to extract LoRa packets, we are able to obtain the fine-grained channel frequency response (CFR) through LoRa preamble based channel estimation for key generation. Besides, we propose a lowcomplexity preprocessing method to enhance the randomness of quantization while reducing the secret key disagreement ratio. The results indicate that we can achieve 367 key bits with a high level of randomness through just a single effective channel probing in an indoor environment at a distance of 2 meters under the circumstance of a spreading factor (SF) of 7, a preamble length of 8, a signal bandwidth of 250 kHz, and a sampling rate of 1 MHz. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Submitted to IEEE VTC2023 Fall

arXiv:2308.12526 [pdf, other]

UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

Authors: Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu

Abstract: This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voice… ▽ More This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voiceprints in similarity score by a Consistency Measure Factor (CMF). CMF brings a huge performance boost in this challenge. Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023. The minDCF of our submission is 0.0855 and the EER is 1.5880%. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.05760 [pdf, other]

Unified Statistical Channel Modeling and performance analysis of Vertical Underwater Wireless Optical Communication Links considering Turbulence-Induced Fading

Authors: Dongling Xu, Xiang Yi, Yalçn Ata, Xinyue Tao, Yuxuan Li, Peng Yue

Abstract: The reliability of a vertical underwater wireless optical communication (UWOC) network is seriously impacted by turbulence-induced fading due to fluctuations in the water temperature and salinity, which vary with depth. To better assess the vertical UWOC system performances, an accurate probability distribution function (PDF) model that can describe this fading is indispensable. In view of the lim… ▽ More The reliability of a vertical underwater wireless optical communication (UWOC) network is seriously impacted by turbulence-induced fading due to fluctuations in the water temperature and salinity, which vary with depth. To better assess the vertical UWOC system performances, an accurate probability distribution function (PDF) model that can describe this fading is indispensable. In view of the limitations of theoretical and experimental studies, this paper is the first to establish a more accurate modeling scheme for wave optics simulation (WOS) by fully considering the constraints of sampling conditions on multi-phase screen parameters. On this basis, we complete the modeling of light propagation in a vertical oceanic turbulence channel and subsequently propose a unified statistical model named mixture Weibull-generalized Gamma (WGG) distribution model to characterize turbulence-induced fading in vertical links. Interestingly, the WGG model is shown to provide a perfect fit with the acquired data under all considered channel conditions. We further show that the application of the WGG model leads to closed-form and analytically tractable expressions for key UWOC system performance metrics such as the average bit-error rate (BER). The presented results give valuable insight into the practical aspects of development of UWOC networks. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.02304 [pdf, other]

Movable Antenna-Enhanced Multiuser Communication: Optimal Discrete Antenna Positioning and Beamforming

Authors: Yifei Wu, Dongfang Xu, Derrick Wing Kwan Ng, Wolfgang Gerstacker, Robert Schober

Abstract: Movable antennas (MAs) are a promising paradigm to enhance the spatial degrees of freedom of conventional multi-antenna systems by flexibly adapting the positions of the antenna elements within a given transmit area. In this paper, we model the motion of the MA elements as discrete movements and study the corresponding resource allocation problem for MA-enabled multiuser multiple-input single-outp… ▽ More Movable antennas (MAs) are a promising paradigm to enhance the spatial degrees of freedom of conventional multi-antenna systems by flexibly adapting the positions of the antenna elements within a given transmit area. In this paper, we model the motion of the MA elements as discrete movements and study the corresponding resource allocation problem for MA-enabled multiuser multiple-input single-output (MISO) communication systems. Specifically, we jointly optimize the beamforming and the MA positions at the base station (BS) for the minimization of the total transmit power while guaranteeing the minimum required signal-to-interference-plus-noise ratio (SINR) of each individual user. To obtain the globally optimal solution to the formulated resource allocation problem, we develop an iterative algorithm capitalizing on the generalized Bender's decomposition with guaranteed convergence. Our numerical results demonstrate that the proposed MA-enabled communication system can significantly reduce the BS transmit power and the number of antenna elements needed to achieve a desired performance compared to state-of-the-art techniques, such as antenna selection. Furthermore, we observe that refining the step size of the MA motion driver improves performance at the expense of a higher computational complexity. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.06634 [pdf, ps, other]

Coherent Compensation based ISAC Signal Processing for Long-range Sensing

Authors: Lin Wang, Zhiqing Wei, Liyan Su, Zhiyong Feng, Huici Wu, Dongsheng Xue

Abstract: Integrated sensing and communication (ISAC) will greatly enhance the efficiency of physical resource utilization. The design of ISAC signal based on the orthogonal frequency division multiplex (OFDM) signal is the mainstream. However, when detecting the long-range target, the delay of echo signal exceeds CP duration, which will result in inter-symbol interference (ISI) and inter-carrier interferen… ▽ More Integrated sensing and communication (ISAC) will greatly enhance the efficiency of physical resource utilization. The design of ISAC signal based on the orthogonal frequency division multiplex (OFDM) signal is the mainstream. However, when detecting the long-range target, the delay of echo signal exceeds CP duration, which will result in inter-symbol interference (ISI) and inter-carrier interference (ICI), limiting the sensing range. Facing the above problem, we propose to increase useful signal power through coherent compensation and improve the signal to interference plus noise power ratio (SINR) of each OFDM block. Compared with the traditional 2D-FFT algorithm, the improvement of SINR of range-doppler map (RDM) is verified by simulation, which will expand the sensing range. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2306.10476 [pdf, other]

Bid Optimization for Offsite Display Ad Campaigns on eCommerce

Authors: Hangjian Li, Dong Xu, Konstantin Shmakov, Kuang-Chih Lee, Wei Shen

Abstract: Online retailers often use third-party demand-side-platforms (DSPs) to conduct offsite advertising and reach shoppers across the Internet on behalf of their advertisers. The process involves the retailer participating in instant auctions with real-time bidding for each ad slot of their interest. In this paper, we introduce a bid optimization system that leverages the dimensional bidding function p… ▽ More Online retailers often use third-party demand-side-platforms (DSPs) to conduct offsite advertising and reach shoppers across the Internet on behalf of their advertisers. The process involves the retailer participating in instant auctions with real-time bidding for each ad slot of their interest. In this paper, we introduce a bid optimization system that leverages the dimensional bidding function provided by most well-known DSPs for Walmart offsite display ad campaigns. The system starts by automatically searching for the optimal segmentation of the ad requests space based on their characteristics such as geo location, time, ad format, serving website, device type, etc. Then, it assesses the quality of impressions observed from each dimension based on revenue signals driven by the campaign effect. During the campaign, the system iteratively approximates the bid landscape based on the data observed and calculates the bid adjustments for each dimension. Finally, a higher bid adjustment factor is applied to dimensions with potentially higher revenue over ad spend (ROAS), and vice versa. The initial A/B test results of the proposed optimization system has shown its effectiveness of increasing the ROAS and conversion rate while reducing the effective cost per mille for ad serving. △ Less

Submitted 11 August, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Journal ref: Workshop on Decision Intelligence and Analytics for Online Marketplaces, KDD 2023

arXiv:2305.10655 [pdf, other]

doi 10.1007/978-3-031-17027-0_2

DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images

Authors: Andres Diaz-Pinto, Pritesh Mehta, Sachidanand Alle, Muhammad Asad, Richard Brown, Vishwesh Nath, Alvin Ihsani, Michela Antonelli, Daniel Palkovics, Csaba Pinter, Ron Alkalay, Steve Pieper, Holger R. Roth, Daguang Xu, Prerna Dogra, Tom Vercauteren, Andrew Feng, Abood Quraini, Sebastien Ourselin, M. Jorge Cardoso

Abstract: Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and… ▽ More Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and click-based refinement. DeepEdit combines the power of two methods: a non-interactive (i.e. automatic segmentation using nnU-Net, UNET or UNETR) and an interactive segmentation method (i.e. DeepGrow), into a single deep learning model. It allows easy integration of uncertainty-based ranking strategies (i.e. aleatoric and epistemic uncertainty computation) and active learning. We propose and implement a method for training DeepEdit by using standard training combined with user interaction simulation. Once trained, DeepEdit allows clinicians to quickly segment their datasets by using the algorithm in auto segmentation mode or by providing clicks via a user interface (i.e. 3D Slicer, OHIF). We show the value of DeepEdit through evaluation on the PROSTATEx dataset for prostate/prostatic lesions and the Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) dataset for abdominal CT segmentation, using state-of-the-art network architectures as baseline for comparison. DeepEdit could reduce the time and effort annotating 3D medical images compared to DeepGrow alone. Source code is available at https://github.com/Project-MONAI/MONAILabel △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.09302 [pdf, other]

Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions

Authors: Di Xu, Yang Zhao, Xiang Hao, Xin Meng

Abstract: We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations. The purpose of this dataset is to aid researchers in the analysis of the spread of Pomacea canaliculata species by utilizing deep learning techniques, as well as supporting other investigative pursuits that require visu… ▽ More We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations. The purpose of this dataset is to aid researchers in the analysis of the spread of Pomacea canaliculata species by utilizing deep learning techniques, as well as supporting other investigative pursuits that require visual data pertaining to the eggs of Pomacea canaliculata. It is worth noting, however, that the identity of the eggs in question is not definitively established, as other species within the same taxonomic family have been observed to lay similar-looking eggs in regions of the Americas. Therefore, a crucial prerequisite to any decision regarding the elimination of these eggs would be to establish with certainty whether they are exclusively attributable to invasive Pomacea canaliculata or if other species are also involved. The dataset is available at https://www.kaggle.com/datasets/deeshenzhen/pinkeggs △ Less

Submitted 16 May, 2023; originally announced May 2023.

Report number: 02

arXiv:2305.06879 [pdf, ps, other]

doi 10.1109/TSP.2023.3328053

Convex Quaternion Optimization for Signal Processing: Theory and Applications

Authors: Shuning Sun, Qiankun Diao, Dongpo Xu, Pauline Bourigault, Danilo P. Mandic

Abstract: Convex optimization methods have been extensively used in the fields of communications and signal processing. However, the theory of quaternion optimization is currently not as fully developed and systematic as that of complex and real optimization. To this end, we establish an essential theory of convex quaternion optimization for signal processing based on the generalized Hamilton-real (GHR) cal… ▽ More Convex optimization methods have been extensively used in the fields of communications and signal processing. However, the theory of quaternion optimization is currently not as fully developed and systematic as that of complex and real optimization. To this end, we establish an essential theory of convex quaternion optimization for signal processing based on the generalized Hamilton-real (GHR) calculus. This is achieved in a way which conforms with traditional complex and real optimization theory. For rigorous, We present five discriminant theorems for convex quaternion functions, and four discriminant criteria for strongly convex quaternion functions. Furthermore, we provide a fundamental theorem for the optimality of convex quaternion optimization problems, and demonstrate its utility through three applications in quaternion signal processing. These results provide a solid theoretical foundation for convex quaternion optimization and open avenues for further developments in signal processing applications. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Journal ref: IEEE Trans. Signal Process., vol. 71, pp. 4106-4115, Oct. 2023

arXiv:2305.05265 [pdf, ps, other]

Joint BS Selection, User Association, and Beamforming Design for Network Integrated Sensing and Communication

Authors: Yiming Xu, Dongfang Xu, Lei Xie, Shenghui Song

Abstract: Different from conventional radar, the cellular network in the integrated sensing and communication (ISAC) system enables collaborative sensing by multiple sensing nodes, e.g., base stations (BSs). However, existing works normally assume designated BSs as the sensing nodes, and thus can't fully exploit the macro-diversity gain. In the paper, we propose a joint BS selection, user association, and b… ▽ More Different from conventional radar, the cellular network in the integrated sensing and communication (ISAC) system enables collaborative sensing by multiple sensing nodes, e.g., base stations (BSs). However, existing works normally assume designated BSs as the sensing nodes, and thus can't fully exploit the macro-diversity gain. In the paper, we propose a joint BS selection, user association, and beamforming design to tackle this problem. The total transmit power is minimized while guaranteeing the communication and sensing performance measured by the signal-to-interference-plus-noise ratio (SINR) for the communication users and the Cramer-Rao lower bound (CRLB) for location estimation, respectively. An alternating optimization (AO)-based algorithm is developed to solve the non-convex problem. Simulation results validate the effectiveness of the proposed algorithm and unveil the benefits brought by collaborative sensing and BS selection. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 6 pages

arXiv:2305.01213 [pdf, ps, other]

Integrated Sensing and Communication in Coordinated Cellular Networks

Authors: Dongfang Xu, Chang Liu, Shenghui Song, Derrick Wing Kwan Ng

Abstract: Integrated sensing and communication (ISAC) is a promising technique to provide sensing services in future wireless networks. Numerous existing works have adopted a monostatic radar architecture to realize ISAC, i.e., employing the same base station (BS) to transmit the ISAC signal and receive the echo. Yet, the concurrent information transmission causes unavoidable self-interference (SI) to the r… ▽ More Integrated sensing and communication (ISAC) is a promising technique to provide sensing services in future wireless networks. Numerous existing works have adopted a monostatic radar architecture to realize ISAC, i.e., employing the same base station (BS) to transmit the ISAC signal and receive the echo. Yet, the concurrent information transmission causes unavoidable self-interference (SI) to the radar echo at the BS. To overcome this difficulty, we propose a coordinated cellular network-supported multistatic radar architecture to implement ISAC, which allows us to spatially separate the ISAC signal transmission and radar echo reception, intrinsically circumventing the problem of SI. To this end, we jointly optimize the transmit and receive beamforming policy to minimize the sensing beam pattern mismatch error subject to ISAC quality-of-service requirements. The resulting non-convex optimization problem is tackled by an alternating optimization-based suboptimal algorithm. Simulation results showed that the proposed scheme outperforms the two baseline schemes adopting conventional designs. △ Less

Submitted 16 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 6 pages, 3 figures

arXiv:2304.11521 [pdf, other]

An Order-Complexity Model for Aesthetic Quality Assessment of Homophony Music Performance

Authors: Xin **, Wu Zhou, **yu Wang, Duo Xu, Yiqing Rong, Jialin Sun

Abstract: Although computational aesthetics evaluation has made certain achievements in many fields, its research of music performance remains to be explored. At present, subjective evaluation is still a ultimate method of music aesthetics research, but it will consume a lot of human and material resources. In addition, the music performance generated by AI is still mechanical, monotonous and lacking in bea… ▽ More Although computational aesthetics evaluation has made certain achievements in many fields, its research of music performance remains to be explored. At present, subjective evaluation is still a ultimate method of music aesthetics research, but it will consume a lot of human and material resources. In addition, the music performance generated by AI is still mechanical, monotonous and lacking in beauty. In order to guide the generation task of AI music performance, and to improve the performance effect of human performers, this paper uses Birkhoff's aesthetic measure to propose a method of objective measurement of beauty. The main contributions of this paper are as follows: Firstly, we put forward an objective aesthetic evaluation method to measure the music performance aesthetic; Secondly, we propose 10 basic music features and 4 aesthetic music features. Experiments show that our method performs well on performance assessment. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Journal ref: AIART 2023 ICME Workshop

arXiv:2304.07106 [pdf, ps, other]

Extremum Seeking Nonlinear Regulator with Concurrent Uncertainties in Exosystems and Control Directions

Authors: Shimin Wang, Martin Guay, Dabo Xu, Denis Dochain

Abstract: This paper proposes a non-adaptive control solution framework to the practical output regulation problem (PORP) for a class of nonlinear systems with uncertain parameters, unknown control directions and uncertain exosystem dynamics. The concurrence of the unknown control directions and uncertainties in both the system dynamics and the exosystem pose a significant challenge to the problem. In light… ▽ More This paper proposes a non-adaptive control solution framework to the practical output regulation problem (PORP) for a class of nonlinear systems with uncertain parameters, unknown control directions and uncertain exosystem dynamics. The concurrence of the unknown control directions and uncertainties in both the system dynamics and the exosystem pose a significant challenge to the problem. In light of a nonlinear internal model approach, we first convert the robust PORP into a robust non-adaptive stabilization problem for the augmented system with integral Input-to-State Stable (iISS) inverse dynamics. By employing an extremum-seeking control (ESC) approach, the construction of our solution method avoids the use of Nussbaum-type gain techniques to address the robust PORP subject to unknown control directions with time-varying coefficients. The stability of the non-adaptive output regulation design is proven via a Lie bracket averaging technique where uniform ultimate boundedness of the closed-loop signals is guaranteed. As a result, both the estimation and tracking errors converge to zero exponentially, provided that the frequency of the dither signal goes to infinity. Finally, a simulation example with unknown coefficients is provided to exemplify the validity of the proposed control solution frameworks. △ Less

Submitted 8 May, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: 11 pages, 7 figures

arXiv:2303.12136 [pdf]

Improving Fabrication Fidelity of Integrated Nanophotonic Devices Using Deep Learning

Authors: Dusan Gostimirovic, Yuri Grinberg, Dan-Xia Xu, Odile Liboiron-Ladouceur

Abstract: Next-generation integrated nanophotonic device designs leverage advanced optimization techniques such as inverse design and topology optimization which achieve high performance and extreme miniaturization by optimizing a massively complex design space enabled by small feature sizes. However, unless the optimization is heavily constrained, the generated small features are not reliably fabricated, l… ▽ More Next-generation integrated nanophotonic device designs leverage advanced optimization techniques such as inverse design and topology optimization which achieve high performance and extreme miniaturization by optimizing a massively complex design space enabled by small feature sizes. However, unless the optimization is heavily constrained, the generated small features are not reliably fabricated, leading to optical performance degradation. Even for simpler, conventional designs, fabrication-induced performance degradation still occurs. The degree of deviation from the original design not only depends on the size and shape of its features, but also on the distribution of features and the surrounding environment, presenting complex, proximity-dependent behavior. Without proprietary fabrication process specifications, design corrections can only be made after calibrating fabrication runs take place. In this work, we introduce a general deep machine learning model that automatically corrects photonic device design layouts prior to first fabrication. Only a small set of scanning electron microscopy images of engineered training features are required to create the deep learning model. With correction, the outcome of the fabricated layout is closer to what is intended, and thus so too is the performance of the design. Without modifying the nanofabrication process, adding significant computation in design, or requiring proprietary process specifications, we believe our model opens the door to new levels of reliability and performance in next-generation photonic circuits. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 13 pages, 8 figures

arXiv:2302.11795 [pdf, other]

Bridging Synthetic and Real Images: a Transferable and Multiple Consistency aided Fundus Image Enhancement Framework

Authors: Erjian Guo, Huazhu Fu, Lu** Zhou, Dong Xu

Abstract: Deep learning based image enhancement models have largely improved the readability of fundus images in order to decrease the uncertainty of clinical observations and the risk of misdiagnosis. However, due to the difficulty of acquiring paired real fundus images at different qualities, most existing methods have to adopt synthetic image pairs as training data. The domain shift between the synthetic… ▽ More Deep learning based image enhancement models have largely improved the readability of fundus images in order to decrease the uncertainty of clinical observations and the risk of misdiagnosis. However, due to the difficulty of acquiring paired real fundus images at different qualities, most existing methods have to adopt synthetic image pairs as training data. The domain shift between the synthetic and the real images inevitably hinders the generalization of such models on clinical data. In this work, we propose an end-to-end optimized teacher-student framework to simultaneously conduct image enhancement and domain adaptation. The student network uses synthetic pairs for supervised enhancement, and regularizes the enhancement model to reduce domain-shift by enforcing teacher-student prediction consistency on the real fundus images without relying on enhanced ground-truth. Moreover, we also propose a novel multi-stage multi-attention guided enhancement network (MAGE-Net) as the backbones of our teacher and student network. Our MAGE-Net utilizes multi-stage enhancement module and retinal structure preservation module to progressively integrate the multi-scale features and simultaneously preserve the retinal structures for better fundus image quality enhancement. Comprehensive experiments on both real and synthetic datasets demonstrate that our framework outperforms the baseline approaches. Moreover, our method also benefits the downstream clinical tasks. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.10124 [pdf, other]

Energy-Aware Resource Allocation and Trajectory Design for UAV-Enabled ISAC

Authors: Ata Khalili, Atefeh Rezaei, Dongfang Xu, Robert Schober

Abstract: In this paper, we investigate joint resource allocation and trajectory design for multi-user multi-target unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC). To improve sensing accuracy, the UAV is forced to hover during sensing.~In particular, we jointly optimize the two-dimensional trajectory, velocity, downlink information and sensing beamformers, and sensing indi… ▽ More In this paper, we investigate joint resource allocation and trajectory design for multi-user multi-target unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC). To improve sensing accuracy, the UAV is forced to hover during sensing.~In particular, we jointly optimize the two-dimensional trajectory, velocity, downlink information and sensing beamformers, and sensing indicator to minimize the average power consumption of a fixed-altitude UAV, while considering the quality of service of the communication users and the sensing tasks. To tackle the resulting non-convex mixed integer non-linear program (MINLP), we exploit semidefinite relaxation, the big-M method, and successive convex approximation to develop an alternating optimization-based algorithm.~Our simulation results demonstrate the significant power savings enabled by the proposed scheme compared to two baseline schemes employing heuristic trajectories. △ Less

Submitted 24 October, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: This paper has been accepted for presentation at IEEE GLOBECOM 2023

arXiv:2302.09696 [pdf]

An Efficient and Robust Method for Chest X-Ray Rib Suppression that Improves Pulmonary Abnormality Diagnosis

Authors: Di Xu, Qifan Xu, Kevin Nhieu, Dan Ruan, Ke Sheng

Abstract: Suppression of thoracic bone shadows on chest X-rays (CXRs) has been indicated to improve the diagnosis of pulmonary disease. Previous approaches can be categorized as unsupervised physical and supervised deep learning models. Nevertheless, with physical models able to preserve morphological details but at the cost of extremely long processing time, existing DL methods face challenges of gathering… ▽ More Suppression of thoracic bone shadows on chest X-rays (CXRs) has been indicated to improve the diagnosis of pulmonary disease. Previous approaches can be categorized as unsupervised physical and supervised deep learning models. Nevertheless, with physical models able to preserve morphological details but at the cost of extremely long processing time, existing DL methods face challenges of gathering sufficient/qualitative ground truth (GT) for robust training, thus leading to failure in maintaining clinically acceptable false positive rates. We hereby propose a generalizable yet efficient workflow of two stages: (1) training pairs generation with GT bone shadows eliminated in by a physical model in spatially transformed gradient fields. (2) fully supervised image denoising network training on stage-one datasets for fast rib removal on incoming CXRs. For step two, we designed a densely connected network called SADXNet, combined with peak signal to noise ratio and multi-scale structure similarity index measure objective minimization to suppress bony structures. The SADXNet organizes spatial filters in U shape (e.g., X=7; filters = 16, 64, 256, 512, 256, 64, 16) and preserves the feature map dimension throughout the network flow. Visually, SADXNet can suppress the rib edge and that near the lung wall/vertebra without jeopardizing the vessel/abnormality conspicuity. Quantitively, it achieves RMSE of ~0 during testing with one prediction taking <1s. Downstream tasks including lung nodule detection as well as common lung disease classification and localization are used to evaluate our proposed rib suppression mechanism. We observed 3.23% and 6.62% area under the curve (AUC) increase as well as 203 and 385 absolute false positive decrease for lung nodule detection and common lung disease localization, separately. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2302.09353 [pdf, ps, other]

A Framework for Transmission Design for Active RIS-Aided Communication with Partial CSI

Authors: Gui Zhou, Cunhua Pan, Hong Ren, Dongfang Xu, Zaichen Zhang, Jiangzhou Wang, Robert Schober

Abstract: Active reconfigurable intelligent surfaces (RISs) have recently been proposed to compensate for the severe multiplicative fading effect of conventional passive RIS-aided systems. Each reflecting element of active RISs is assisted by an amplifier such that the incident signal can be reflected and amplified instead of only being reflected as in passive RIS-aided systems. This work addresses the prac… ▽ More Active reconfigurable intelligent surfaces (RISs) have recently been proposed to compensate for the severe multiplicative fading effect of conventional passive RIS-aided systems. Each reflecting element of active RISs is assisted by an amplifier such that the incident signal can be reflected and amplified instead of only being reflected as in passive RIS-aided systems. This work addresses the practical challenge that, on the one hand, in active RIS-aided systems the perfect individual CSI of the RIS-aided channels cannot be acquired due to the lack of signal processing power at the active RISs, but, on the other hand, this CSI is required to calculate the expected system data rate and RIS transmit power needed for transceiver design. To address this issue, we first derive closed-form expressions for the average achievable rate and the average RIS transmit power based on partial CSI of the RIS-aided channels. Then, we formulate an average achievable rate maximization problem for jointly optimizing the active beamforming at both the base station (BS) and the RIS. This problem is then tackled using the majorization--minimization (MM) algorithm framework, and, for each iteration, semi-closed-form solutions for the BS and RIS beamforming are derived based on the Karush-Kuhn-Tucker (KKT) conditions. To ensure the quality of service (QoS) of each user, we further formulate a rate outage constrained beamforming problem, which is solved using the Bernstein-Type inequality (BTI) and semidefinite relaxation (SDR) techniques. Numerical results show that the proposed algorithms can efficiently overcome the challenges imposed by imperfect CSI in active RIS-aided wireless systems. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: Active reconfigurable intelligent surfaces, Partial CSI

arXiv:2301.05908 [pdf, other]

An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores

Authors: Xin **, Wu Zhou, **yu Wang, Duo Xu, Yiqing Rong, Shuai Cui

Abstract: Computational aesthetics evaluation has made great achievements in the field of visual arts, but the research work on music still needs to be explored. Although the existing work of music generation is very substantial, the quality of music score generated by AI is relatively poor compared with that created by human composers. The music scores created by AI are usually monotonous and devoid of emo… ▽ More Computational aesthetics evaluation has made great achievements in the field of visual arts, but the research work on music still needs to be explored. Although the existing work of music generation is very substantial, the quality of music score generated by AI is relatively poor compared with that created by human composers. The music scores created by AI are usually monotonous and devoid of emotion. Based on Birkhoff's aesthetic measure, this paper proposes an objective quantitative evaluation method for homophony music score aesthetic quality assessment. The main contributions of our work are as follows: first, we put forward a homophony music score aesthetic model to objectively evaluate the quality of music score as a baseline model; second, we put forward eight basic music features and four music aesthetic features. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2301.03081 [pdf]

Automatic Diagnosis of Carotid Atherosclerosis Using a Portable Freehand 3D Ultrasound Imaging System

Authors: Jiawen Li, Yunqian Huang, Sheng Song, Hongbo Chen, Junni Shi, Duo Xu, Haibin Zhang, Man Chen, Rui Zheng

Abstract: The objective of this study is to develop a deep-learning based detection and diagnosis technique for carotid atherosclerosis using a portable freehand 3D ultrasound (US) imaging system. A total of 127 3D carotid artery scans were acquired using a portable 3D US system which consisted of a handheld US scanner and an electromagnetic tracking system. A U-Net segmentation network was firstly applied… ▽ More The objective of this study is to develop a deep-learning based detection and diagnosis technique for carotid atherosclerosis using a portable freehand 3D ultrasound (US) imaging system. A total of 127 3D carotid artery scans were acquired using a portable 3D US system which consisted of a handheld US scanner and an electromagnetic tracking system. A U-Net segmentation network was firstly applied to extract the carotid artery on 2D transverse frame, then a novel 3D reconstruction algorithm using fast dot projection (FDP) method with position regularization was proposed to reconstruct the carotid artery volume. Furthermore, a convolutional neural network was used to classify healthy and diseased cases qualitatively. 3D volume analysis methods including longitudinal image acquisition and stenosis grade measurement were developed to obtain the clinical metrics quantitatively. The proposed system achieved sensitivity of 0.714, specificity of 0.851 and accuracy of 0.803 respectively for diagnosis of carotid atherosclerosis. The automatically measured stenosis grade illustrated good correlation (r=0.762) with the experienced expert measurement. The developed technique based on 3D US imaging can be applied to the automatic diagnosis of carotid atherosclerosis. The proposed deep-learning based technique was specially designed for a portable 3D freehand US system, which can provide more convenient carotid atherosclerosis examination and decrease the dependence on clinician's experience. △ Less

Submitted 9 November, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

arXiv:2212.10132 [pdf, other]

Content Adaptive Latents and Decoder for Neural Image Compression

Authors: Guanbo Pan, Guo Lu, Zhihao Hu, Dong Xu

Abstract: In recent years, neural image compression (NIC) algorithms have shown powerful coding performance. However, most of them are not adaptive to the image content. Although several content adaptive methods have been proposed by updating the encoder-side components, the adaptability of both latents and the decoder is not well exploited. In this work, we propose a new NIC framework that improves the con… ▽ More In recent years, neural image compression (NIC) algorithms have shown powerful coding performance. However, most of them are not adaptive to the image content. Although several content adaptive methods have been proposed by updating the encoder-side components, the adaptability of both latents and the decoder is not well exploited. In this work, we propose a new NIC framework that improves the content adaptability on both latents and the decoder. Specifically, to remove redundancy in the latents, our content adaptive channel drop** (CACD) method automatically selects the optimal quality levels for the latents spatially and drops the redundant channels. Additionally, we propose the content adaptive feature transformation (CAFT) method to improve decoder-side content adaptability by extracting the characteristic information of the image content, which is then used to transform the features in the decoder side. Experimental results demonstrate that our proposed methods with the encoder-side updating algorithm achieve the state-of-the-art performance. △ Less

Submitted 20 December, 2022; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: V1 is accepted to ECCV 2022. V2 is the improved version

arXiv:2211.08402 [pdf, other]

Introducing Semantics into Speech Encoders

Authors: Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang

Abstract: Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio… ▽ More Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding performance by over 10\% on intent classification, with modest gains in named entity resolution and slot filling, and spoken question answering FF1 score by over 2\%. Our unsupervised approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 11 pages, 3 figures

arXiv:2211.04470 [pdf, other]

Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

arXiv:2211.01571 [pdf, other]

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Authors: Li Li, Dongxing Xu, Haoran Wei, Yanhua Long

Abstract: Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (… ▽ More Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (PASM) and byte pair encoding (BPE) to produce phonetic-induced and text-induced target units separately; Then, three new frameworks are investigated to enhance the acoustic encoder, including a basic PMU, a paraCTC and a pcaCTC, they integrate the PASM and BPE units at different levels for CTC and transducer multi-task training. Experiments on both LibriSpeech and accented ASR tasks show that, the proposed PMU significantly outperforms the conventional BPE, it reduces the WER of LibriSpeech clean, other, and six accented ASR testsets by relative 12.7%, 6.0% and 7.7%, respectively. △ Less

Submitted 7 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted by Interspeech 2023

arXiv:2210.14880 [pdf, ps, other]

Integrated Sensing and Communication in Distributed Antenna Networks

Authors: Dongfang Xu, Ata Khalili, Xianghao Yu, Derrick Wing Kwan Ng, Robert Schober

Abstract: In this paper, we investigate the resource allocation design for integrated sensing and communication (ISAC) in distributed antenna networks (DANs). In particular, coordinated by a central processor (CP), a set of remote radio heads (RRHs) provide communication services to multiple users and sense several target locations within an ISAC frame. To avoid the severe interference between the informati… ▽ More In this paper, we investigate the resource allocation design for integrated sensing and communication (ISAC) in distributed antenna networks (DANs). In particular, coordinated by a central processor (CP), a set of remote radio heads (RRHs) provide communication services to multiple users and sense several target locations within an ISAC frame. To avoid the severe interference between the information transmission and the radar echo, we propose to divide the ISAC frame into a communication phase and a sensing phase. During the communication phase, the data signal is generated at the CP and then conveyed to the RRHs via fronthaul links. As for the sensing phase, based on pre-determined RRH-target pairings, each RRH senses a dedicated target location with a synthesized highly-directional beam and then transfers the samples of the received echo to the CP via its fronthaul link for further processing of the sensing information. Taking into account the limited fronthaul capacity and the quality-of-service requirements of both communication and sensing, we jointly optimize the durations of the two phases, the information beamforming, and the covariance matrix of the sensing signal for minimization of the total energy consumption over a given finite time horizon. To solve the formulated non-convex design problem, we develop a low-complexity alternating optimization algorithm which converges to a suboptimal solution. Simulation results show that the proposed scheme achieves significant energy savings compared to two baseline schemes. Moreover, our results reveal that for efficient ISAC in wireless networks, energy-focused short-duration pulses are favorable for sensing while low-power long-duration signals are preferable for communication. △ Less

Submitted 2 May, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: 8 pages, 5 figures

arXiv:2209.10809 [pdf, other]

Automated head and neck tumor segmentation from 3D PET/CT

Authors: Andriy Myronenko, Md Mahfuzur Rahman Siddiquee, Dong Yang, Yufan He, Daguang Xu

Abstract: Head and neck tumor segmentation challenge (HECKTOR) 2022 offers a platform for researchers to compare their solutions to segmentation of tumors and lymph nodes from 3D CT and PET images. In this work, we describe our solution to HECKTOR 2022 segmentation task. We re-sample all images to a common resolution, crop around head and neck region, and train SegResNet semantic segmentation network from M… ▽ More Head and neck tumor segmentation challenge (HECKTOR) 2022 offers a platform for researchers to compare their solutions to segmentation of tumors and lymph nodes from 3D CT and PET images. In this work, we describe our solution to HECKTOR 2022 segmentation task. We re-sample all images to a common resolution, crop around head and neck region, and train SegResNet semantic segmentation network from MONAI. We use 5-fold cross validation to select best model checkpoints. The final submission is an ensemble of 15 models from 3 runs. Our solution (team name NVAUTO) achieves the 1st place on the HECKTOR22 challenge leaderboard with an aggregated dice score of 0.78802. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: HECKTOR22 segmentation challenge. MICCAI 2022. arXiv admin note: text overlap with arXiv:2209.09546

arXiv:2209.10648 [pdf, other]

Automated segmentation of intracranial hemorrhages from 3D CT

Authors: Md Mahfuzur Rahman Siddiquee, Dong Yang, Yufan He, Daguang Xu, Andriy Myronenko

Abstract: Intracranial hemorrhage segmentation challenge (INSTANCE 2022) offers a platform for researchers to compare their solutions to segmentation of hemorrhage stroke regions from 3D CTs. In this work, we describe our solution to INSTANCE 2022. We use a 2D segmentation network, SegResNet from MONAI, operating slice-wise without resampling. The final submission is an ensemble of 18 models. Our solution (… ▽ More Intracranial hemorrhage segmentation challenge (INSTANCE 2022) offers a platform for researchers to compare their solutions to segmentation of hemorrhage stroke regions from 3D CTs. In this work, we describe our solution to INSTANCE 2022. We use a 2D segmentation network, SegResNet from MONAI, operating slice-wise without resampling. The final submission is an ensemble of 18 models. Our solution (team name NVAUTO) achieves the top place in terms of Dice metric (0.721), and overall rank 2. It is implemented with Auto3DSeg. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: INSTANCE22 challenge report, MICCAI2022. arXiv admin note: substantial text overlap with arXiv:2209.09546

arXiv:2209.09546 [pdf, other]

Automated ischemic stroke lesion segmentation from 3D MRI

Authors: Md Mahfuzur Rahman Siddique, Dong Yang, Yufan He, Daguang Xu, Andriy Myronenko

Abstract: Ischemic Stroke Lesion Segmentation challenge (ISLES 2022) offers a platform for researchers to compare their solutions to 3D segmentation of ischemic stroke regions from 3D MRIs. In this work, we describe our solution to ISLES 2022 segmentation task. We re-sample all images to a common resolution, use two input MRI modalities (DWI and ADC) and train SegResNet semantic segmentation network from MO… ▽ More Ischemic Stroke Lesion Segmentation challenge (ISLES 2022) offers a platform for researchers to compare their solutions to 3D segmentation of ischemic stroke regions from 3D MRIs. In this work, we describe our solution to ISLES 2022 segmentation task. We re-sample all images to a common resolution, use two input MRI modalities (DWI and ADC) and train SegResNet semantic segmentation network from MONAI. The final submission is an ensemble of 15 models (from 3 runs of 5-fold cross validation). Our solution (team name NVAUTO) achieves the top place in terms of Dice metric (0.824), and overall rank 2 (based on the combined metric ranking). △ Less

Submitted 21 September, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: ISLES22 challenge report, MICCAI2022

arXiv:2206.13307 [pdf, ps, other]

Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable-Length Snapshots

Authors: Dongfang Xu, Xianghao Yu, Derrick Wing Kwan Ng, Anke Schmeink, Robert Schober

Abstract: In this paper, we investigate the robust resource allocation design for secure communication in an integrated sensing and communication (ISAC) system. A multi-antenna dual-functional radar-communication (DFRC) base station (BS) serves multiple single-antenna legitimate users and senses for targets simultaneously, where already identified targets are treated as potential single-antenna eavesdropper… ▽ More In this paper, we investigate the robust resource allocation design for secure communication in an integrated sensing and communication (ISAC) system. A multi-antenna dual-functional radar-communication (DFRC) base station (BS) serves multiple single-antenna legitimate users and senses for targets simultaneously, where already identified targets are treated as potential single-antenna eavesdroppers. The DFRC BS scans a sector with a sequence of dedicated beams, and the ISAC system takes a snapshot of the environment during the transmission of each beam. Based on the sensing information, the DFRC BS can acquire the channel state information (CSI) of the potential eavesdroppers. Different from existing works that focused on the resource allocation design for a single snapshot, in this paper, we propose a novel optimization framework that jointly optimizes the communication and sensing resources over a sequence of snapshots with adjustable durations. To this end, we jointly optimize the duration of each snapshot, the beamforming vector, and the covariance matrix of the AN for maximization of the system sum secrecy rate over a sequence of snapshots while guaranteeing a minimum required average achievable rate and a maximum information leakage constraint for each legitimate user. The resource allocation algorithm design is formulated as a non-convex optimization problem, where we account for the imperfect CSI of both the legitimate users and the potential eavesdroppers. To make the problem tractable, we derive a bound for the uncertainty region of the potential eavesdroppers' small-scale fading based on a safe approximation, which facilitates the development of a block coordinate descent-based iterative algorithm for obtaining an efficient suboptimal solution. △ Less

Submitted 23 October, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: 38 pages, 12 figures

Showing 1–50 of 143 results for author: Xu, D