Search | arXiv e-print repository

Weakly Supervised Learning of Cortical Surface Reconstruction from Segmentations

Authors: Qiang Ma, Liu Li, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

Abstract: Existing learning-based cortical surface reconstruction approaches heavily rely on the supervision of pseudo ground truth (pGT) cortical surfaces for training. Such pGT surfaces are generated by traditional neuroimage processing pipelines, which are time consuming and difficult to generalize well to low-resolution brain MRI, e.g., from fetuses and neonates. In this work, we present CoSeg, a learni… ▽ More Existing learning-based cortical surface reconstruction approaches heavily rely on the supervision of pseudo ground truth (pGT) cortical surfaces for training. Such pGT surfaces are generated by traditional neuroimage processing pipelines, which are time consuming and difficult to generalize well to low-resolution brain MRI, e.g., from fetuses and neonates. In this work, we present CoSeg, a learning-based cortical surface reconstruction framework weakly supervised by brain segmentations without the need for pGT surfaces. CoSeg introduces temporal attention networks to learn time-varying velocity fields from brain MRI for diffeomorphic surface deformations, which fit an initial surface to target cortical surfaces within only 0.11 seconds for each brain hemisphere. A weakly supervised loss is designed to reconstruct pial surfaces by inflating the white surface along the normal direction towards the boundary of the cortical gray matter segmentation. This alleviates partial volume effects and encourages the pial surface to deform into deep and challenging cortical sulci. We evaluate CoSeg on 1,113 adult brain MRI at 1mm and 2mm resolution. CoSeg achieves superior geometric and morphological accuracy compared to existing learning-based approaches. We also verify that CoSeg can extract high-quality cortical surfaces from fetal brain MRI on which traditional pipelines fail to produce acceptable results. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted by the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024)

arXiv:2405.20336 [pdf, other]

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

Authors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

Abstract: In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rap** vocals, lyrics, and high-quality 3D holistic bo… ▽ More In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rap** vocals, lyrics, and high-quality 3D holistic body meshes. With the RapVerse dataset, we investigate the extent to which scaling autoregressive multimodal transformers across language, audio, and motion can enhance the coherent and realistic generation of vocals and whole-body human motions. For modality unification, a vector-quantized variational autoencoder is employed to encode whole-body motion sequences into discrete motion tokens, while a vocal-to-unit model is leveraged to obtain quantized audio tokens preserving content, prosodic information, and singer identity. By jointly performing transformer modeling on these three modalities in a unified way, our framework ensures a seamless and realistic blend of vocals and human motions. Extensive experiments demonstrate that our unified generation framework not only produces coherent and realistic singing vocals alongside human motions directly from textual inputs but also rivals the performance of specialized single-modality generation systems, establishing new benchmarks for joint vocal-motion generation. The project page is available for research purposes at https://vis-www.cs.umass.edu/RapVerse. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Project website: https://vis-www.cs.umass.edu/RapVerse

arXiv:2405.08783 [pdf, other]

The Develo** Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction

Authors: Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

Abstract: The Develo** Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 hours to proc… ▽ More The Develo** Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 hours to process a single MRI scan, making it expensive for large-scale neuroimaging studies. In this paper, we propose a fast deep learning (DL) based pipeline for dHCP neonatal cortical surface reconstruction, incorporating DL-based brain extraction, cortical surface reconstruction and spherical projection, as well as GPU-accelerated cortical surface inflation and cortical feature estimation. We introduce a multiscale deformation network to learn diffeomorphic cortical surface reconstruction end-to-end from T2-weighted brain MRI. A fast unsupervised spherical map** approach is integrated to minimize metric distortions between cortical surfaces and projected spheres. The entire workflow of our DL-based dHCP pipeline completes within only 24 seconds on a modern GPU, which is nearly 1000 times faster than the original dHCP pipeline. Manual quality control demonstrates that for 82.5% of the test samples, our DL-based pipeline produces superior (54.2%) or equal quality (28.3%) cortical surfaces compared to the original dHCP pipeline. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2403.16062 [pdf]

Holography inspired self-controlled reconfigurable intelligent surface

Authors: Jieao Zhu, Ze Gu, Qian Ma, Linglong Dai, Tie Jun Cui

Abstract: Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy effi… ▽ More Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy efficiency. To reprogram the channel, each meta-atom on RIS needs an external control signal, which is usually generated by base station (BS). However, BS-controlled RISs require complicated control cables, which hamper their massive deployments. Here, we eliminate the need for BS control by proposing a self-controlled RIS (SC-RIS), which is inspired by the optical holography principle. Different from the existing BS-controlled RISs, each meta-atom of SC-RIS is integrated with an additional power detector for holographic recording. By applying the classical Fourier-transform processing to the measured hologram, SC-RIS is capable of retrieving the user's channel state information required for beamforming, thus enabling autonomous RIS beamforming without control cables. Owing to this WiFi-like plug-and-play capability without the BS control, SC-RISs are expected to enable easy and massive deployments in the future 6G systems. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Traditional BS-controlled RISs suffer from complicated control cables. To "cut" the control cables, we propose a self-controlled RIS by leveraging the holographic interference principle, thus realizing autonomous RIS beamforming

arXiv:2403.03736 [pdf, other]

Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

Authors: Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma

Abstract: Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivat… ▽ More Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivated by the capabilities of predictive language models for lossless compression, this paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization, alongside a multi-stage transformer designed to exploit spatial contextual information for modeling the prior distribution. As such, the dual-purpose framework effectively utilizes the learned prior for entropy estimation and assists in the regeneration of lost tokens. Extensive experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception, particularly in ultra-low bitrate scenarios (<=0.03 bpp), pioneering a new direction in generative compression. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.03492 [pdf, other]

Beyond Strong labels: Weakly-supervised Learning Based on Gaussian Pseudo Labels for The Segmentation of Ellipse-like Vascular Structures in Non-contrast CTs

Authors: Qixiang Ma, Antoine Łucas, Huazhong Shu, Adrien Kaladji, Pascal Haigron

Abstract: Deep-learning-based automated segmentation of vascular structures in preoperative CT scans contributes to computer-assisted diagnosis and intervention procedure in vascular diseases. While CT angiography (CTA) is the common standard, non-contrast CT imaging is significant as a contrast-risk-free alternative, avoiding complications associated with contrast agents. However, the challenges of labor-i… ▽ More Deep-learning-based automated segmentation of vascular structures in preoperative CT scans contributes to computer-assisted diagnosis and intervention procedure in vascular diseases. While CT angiography (CTA) is the common standard, non-contrast CT imaging is significant as a contrast-risk-free alternative, avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a weakly-supervised framework using ellipses' topology in slices, including 1) an efficient annotation process based on predefined standards, 2) ellipse-fitting processing, 3) the generation of 2D Gaussian heatmaps serving as pseudo labels, 4) a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54\% of Dice score on average), reducing labeling time by around 82.0\%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74\% of Dice score on average) with a reduction of 66.3\% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95\% in Dice score for 2D models while a reduction of 11.65 voxel spacing in Hausdorff distance for 3D model. △ Less

Submitted 10 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02514 [pdf, other]

Deep Supervision by Gaussian Pseudo-label-based Morphological Attention for Abdominal Aorta Segmentation in Non-Contrast CTs

Authors: Qixiang Ma, Antoine Lucas, Adrien Kaladji, Pascal Haigron

Abstract: The segmentation of the abdominal aorta in non-contrast CT images is a non-trivial task for computer-assisted endovascular navigation, particularly in scenarios where contrast agents are unsuitable. While state-of-the-art deep learning segmentation models have been proposed recently for this task, they are trained on manually annotated strong labels. However, the inherent ambiguity in the boundary… ▽ More The segmentation of the abdominal aorta in non-contrast CT images is a non-trivial task for computer-assisted endovascular navigation, particularly in scenarios where contrast agents are unsuitable. While state-of-the-art deep learning segmentation models have been proposed recently for this task, they are trained on manually annotated strong labels. However, the inherent ambiguity in the boundary of the aorta in non-contrast CT may undermine the reliability of strong labels, leading to potential overfitting risks. This paper introduces a Gaussian-based pseudo label, integrated into conventional deep learning models through deep supervision, to achieve Morphological Attention (MA) enhancement. As the Gaussian pseudo label retains the morphological features of the aorta without explicitly representing its boundary distribution, we suggest that it preserves aortic morphology during training while mitigating the negative impact of ambiguous boundaries, reducing the risk of overfitting. It is introduced in various 2D/3D deep learning models and validated on our local data set of 30 non-contrast CT volumes comprising 5749 CT slices. The results underscore the effectiveness of MA in preserving the morphological characteristics of the aorta and addressing overfitting concerns, thereby enhancing the performance of the models. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by 21st IEEE International Symposium on Biomedical Imaging

arXiv:2401.07422 [pdf, other]

Multiperson Detection and Vital-Sign Sensing Empowered by Space-Time-Coding RISs

Authors: Xinyu Li, Jian Wei You, Ze Gu, Qian Ma, **gyuan Zhang, Long Chen, Tie Jun Cui

Abstract: Passive human sensing using wireless signals has attracted increasing attention due to its superiorities of non-contact and robustness in various lighting conditions. However, when multiple human individuals are present, their reflected signals could be intertwined in the time, frequency and spatial domains, making it challenging to separate them. To address this issue, this paper proposes a novel… ▽ More Passive human sensing using wireless signals has attracted increasing attention due to its superiorities of non-contact and robustness in various lighting conditions. However, when multiple human individuals are present, their reflected signals could be intertwined in the time, frequency and spatial domains, making it challenging to separate them. To address this issue, this paper proposes a novel system for multiperson detection and monitoring of vital signs (i.e., respiration and heartbeat) with the assistance of space-time-coding (STC) reconfigurable intelligent metasurfaces (RISs). Specifically, the proposed system scans the area of interest (AoI) for human detection by using the harmonic beams generated by the STC RIS. Simultaneously, frequencyorthogonal beams are assigned to each detected person for accurate estimation of their respiration rate (RR) and heartbeat rate (HR). Furthermore, to efficiently extract the respiration signal and the much weaker heartbeat signal, we propose an improved variational mode decomposition (VMD) algorithm to accurately decompose the complex reflected signals into a smaller number of intrinsic mode functions (IMFs). We build a prototype to validate the proposed multiperson detection and vital-sign monitoring system. Experimental results demonstrate that the proposed system can simultaneously monitor the vital signs of up to four persons. The errors of RR and HR estimation using the improved VMD algorithm are below 1 RPM (respiration per minute) and 5 BPM (beats per minute), respectively. Further analysis reveals that the flexible beam controlling mechanism empowered by the STC RIS can reduce the noise reflected from other irrelative objects on the physical layer, and improve the signal-to-noise ratio of echoes from the human chest. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2311.11460 [pdf, other]

Classical Stability Margins by PID Control

Authors: Qi Mao, Yong Xu, Jianqi Chen, Jie Chen, Tryphon Georgiou

Abstract: Proportional-Integral-Derivative (PID) control has been the workhorse of control technology for about a century. Yet to this day, designing and tuning PID controllers relies mostly on either tabulated rules (Ziegler-Nichols) or on classical graphical techniques (Bode). Our goal in this paper is to take a fresh look on PID control in the context of optimizing stability margins for low-order (first-… ▽ More Proportional-Integral-Derivative (PID) control has been the workhorse of control technology for about a century. Yet to this day, designing and tuning PID controllers relies mostly on either tabulated rules (Ziegler-Nichols) or on classical graphical techniques (Bode). Our goal in this paper is to take a fresh look on PID control in the context of optimizing stability margins for low-order (first- and second-order) linear time-invariant systems. Specifically, we seek to derive explicit expressions for gain and phase margins that are achievable using PID control, and thereby gain insights on the role of unstable poles and nonminimum-phase zeros in attaining robust stability. In particular, stability margins attained by PID control for minimum-phase systems match those obtained by more general control, while for nonminimum-phase systems, PID control achieves margins that are no worse than those of general control modulo a predetermined factor. Furthermore, integral action does not contribute to robust stabilization beyond what can be achieved by PD control alone. △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.07873 [pdf, other]

Passive Human Sensing Enhanced by Reconfigurable Intelligent Surface: Opportunities and Challenges

Authors: Xinyu Li, Jian Wei You, Ze Gu, Qian Ma, Long Chen, **gyuan Zhang, Shi **, Tie Jun Cui

Abstract: Reconfigurable intelligent surfaces (RISs) have flexible and exceptional performance in manipulating electromagnetic waves and customizing wireless channels. These capabilities enable them to provide a plethora of valuable activity-related information for promoting wireless human sensing. In this article, we present a comprehensive review of passive human sensing using radio frequency signals with… ▽ More Reconfigurable intelligent surfaces (RISs) have flexible and exceptional performance in manipulating electromagnetic waves and customizing wireless channels. These capabilities enable them to provide a plethora of valuable activity-related information for promoting wireless human sensing. In this article, we present a comprehensive review of passive human sensing using radio frequency signals with the assistance of RISs. Specifically, we first introduce fundamental principles and physical platform of RISs. Subsequently, based on the specific applications, we categorize the state-of-the-art human sensing techniques into three types, including human imaging,localization, and activity recognition. Meanwhile, we would also investigate the benefits that RISs bring to these applications. Furthermore, we explore the application of RISs in human micro-motion sensing, and propose a vital signs monitoring system enhanced by RISs. Experimental results are presented to demonstrate the promising potential of RISs in sensing vital signs for manipulating individuals. Finally, we discuss the technical challenges and opportunities in this field. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2307.11870 [pdf, other]

Conditional Temporal Attention Networks for Neonatal Cortical Surface Reconstruction

Authors: Qiang Ma, Liu Li, Vanessa Kyriakopoulou, Joseph Hajnal, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

Abstract: Cortical surface reconstruction plays a fundamental role in modeling the rapid brain development during the perinatal period. In this work, we propose Conditional Temporal Attention Network (CoTAN), a fast end-to-end framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields (SVF) from neonatal brain magnetic resonance images (M… ▽ More Cortical surface reconstruction plays a fundamental role in modeling the rapid brain development during the perinatal period. In this work, we propose Conditional Temporal Attention Network (CoTAN), a fast end-to-end framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields (SVF) from neonatal brain magnetic resonance images (MRI). Instead of integrating multiple SVFs, CoTAN introduces attention mechanisms to learn a conditional time-varying velocity field (CTVF) by computing the weighted sum of all SVFs at each integration step. The importance of each SVF, which is estimated by learned attention maps, is conditioned on the age of the neonates and varies with the time step of integration. The proposed CTVF defines a diffeomorphic surface deformation, which reduces mesh self-intersection errors effectively. It only requires 0.21 seconds to deform an initial template mesh to cortical white matter and pial surfaces for each brain hemisphere. CoTAN is validated on the Develo** Human Connectome Project (dHCP) dataset with 877 3D brain MR images acquired from preterm and term born neonates. Compared to state-of-the-art baselines, CoTAN achieves superior performance with only 0.12mm geometric error and 0.07% self-intersecting faces. The visualization of our attention maps illustrates that CoTAN indeed learns coarse-to-fine surface deformations automatically without intermediate supervision. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: Accepted by the 26th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2023

arXiv:2307.08265 [pdf, other]

Extreme Image Compression using Fine-tuned VQGANs

Authors: Qi Mao, Tinghan Yang, Yinuo Zhang, Zijian Wang, Meng Wang, Shiqi Wang, Siwei Ma

Abstract: Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. However, their efficacy and applicability to achieve extreme compression ratios ($<0.05$ bpp) remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization… ▽ More Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. However, their efficacy and applicability to achieve extreme compression ratios ($<0.05$ bpp) remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)--based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields a strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We propose clustering a pre-trained large-scale codebook into smaller codebooks through the K-means algorithm, yielding variable bitrates and different levels of reconstruction quality within the coding framework. Furthermore, we introduce a transformer to predict lost indices and restore images in unstable environments. Extensive qualitative and quantitative experiments on various benchmark datasets demonstrate that the proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception at extremely low bitrates ($\le 0.04$ bpp). Remarkably, even with the loss of up to $20\%$ of indices, the images can be effectively restored with minimal perceptual loss. △ Less

Submitted 15 December, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: Generative Compression, Extreme Compression, VQGANs, Low Bitrate

arXiv:2305.00837 [pdf, other]

LCAUnet: A skin lesion segmentation network with enhanced edge and body fusion

Authors: Qisen Ma, Keming Mao, Gao Wang, Lisheng Xu, Yuhai Zhao

Abstract: Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representa… ▽ More Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representation with fusion of edge and body features, which are often paid little attentions in traditional methods. First, two separate branches are set for edge and body segmentation with CNNs and Transformer based architecture respectively. Then, LCAF module is utilized to fuse feature maps of edge and body of the same level by local cross-attention operation in encoder stage. Furthermore, PGMF module is embedded for feature integration with prior guided multi-scale adaption. Comprehensive experiments on public available dataset ISIC 2017, ISIC 2018, and PH2 demonstrate that LCAUnet outperforms most state-of-the-art methods. The ablation studies also verify the effectiveness of the proposed fusion techniques. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 14 pages, 10 figures

arXiv:2304.13471 [pdf, other]

OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

Authors: Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Abstract: 360° omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360° omnidirectional image superresolution.… ▽ More 360° omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360° omnidirectional image superresolution. The first stage employs two branches: model A, which incorporates omnidirectional position-aware deformable blocks (OPDB) and Fourier upsampling, and model B, which adds a spatial frequency fusion module (SFF) to model A. Model A aims to enhance the feature extraction ability of 360° image positional information, while Model B further focuses on the high-frequency information of 360° images. The second stage performs same-resolution enhancement based on the structure of model A with a pixel unshuffle operation. In addition, we collected data from YouTube to improve the fitting ability of the transformer, and created pseudo low-resolution images using a degradation network. Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360° omnidirectional image super-resolution. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted to CVPRW 2023

arXiv:2304.12205 [pdf, other]

doi 10.1109/TIV.2023.3331024

Synthetic Datasets for Autonomous Driving: A Survey

Authors: Zhihang Song, Zimin He, Xingyu Li, Qiming Ma, Ruibo Ming, Zhiqi Mao, Huaxin Pei, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

Abstract: Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and chan… ▽ More Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution. △ Less

Submitted 27 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 19 pages, 5 figures

Journal ref: in IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1847-1864, Jan. 2024

arXiv:2303.00155 [pdf, other]

Exponential Consensus of Multiple Agents over Dynamic Network Topology: Controllability, Connectivity, and Compactness

Authors: Qichao Ma, Jiahu Qin, Brian D. O. Anderson, Long Wang

Abstract: This paper investigates the problem of securing exponentially fast consensus (exponential consensus for short) for identical agents with finite-dimensional linear system dynamics over dynamic network topology. Our aim is to find the weakest possible conditions that guarantee exponentially fast consensus using a Lyapunov function consisting of a sum of terms of the same functional form. We first in… ▽ More This paper investigates the problem of securing exponentially fast consensus (exponential consensus for short) for identical agents with finite-dimensional linear system dynamics over dynamic network topology. Our aim is to find the weakest possible conditions that guarantee exponentially fast consensus using a Lyapunov function consisting of a sum of terms of the same functional form. We first investigate necessary conditions, starting by examining the system (both agent and network) parameters. It is found that controllability of the linear agents is necessary for reaching consensus. Then, to work out necessary conditions incorporating the network topology, we construct a set of Laplacian matrix-valued functions. The precompactness of this set of functions is shown to be a significant generalization of existing assumptions on network topology, including the common assumption that the edge weights are bounded piecewise constant functions or continuous functions. With the aid of such a precompactness assumption and restricting the Lyapunov function to one consisting of a sum of terms of the same functional form, we prove that a joint $(δ, T)$-connectivity condition on the network topology is necessary for exponential consensus. Finally, we investigate how the above two ``necessities'' work together to guarantee exponential consensus. To partially address this problem, we define a synchronization index to characterize the interplay between agent parameters and network topology. Based on this notion, it is shown that by designing a proper feedback matrix and under the precompactness assumption, exponential consensus can be reached globally and uniformly if the joint $(δ,T)$-connectivity and controllability conditions are satisfied, and the synchronization index is not less than one. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2210.16197 [pdf]

Dimensionality Reduced Antenna Array for Beamforming/steering

Authors: Shiyi Xia, Mingyang Zhao, Qian Ma, Xunnan Zhang, Ling Yang, Yazhi Pi, Hyunchul Chung, Ad Reniers, A. M. J. Koonen, Zizheng Cao

Abstract: Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of larg… ▽ More Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of large-scale phased arrays can be overcome by reducing the number of active controllers, pushing beamforming into satellite communications and deep space exploration. Here, we suggest a brand-new design for a phased array antenna with a dimension reduced cascaded angle offset (DRCAO-PAA). Furthermore, the suggested DRCAO-PAA was compressed by using the concept of singular value deposition. To pave the way for practical application the particle swarm optimization algorithm and deep neural network Transformer were adopted. Based on this theoretical framework, an experimental board was built to verify the theory. Finally, the 16/8/4 -array beam steering was demonstrated by using 4/3/2 active controllers, respectively. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2209.06434 [pdf, other]

ConvNeXt Based Neural Network for Audio Anti-Spoofing

Authors: Qiaowei Ma, **ghui Zhong, Yitao Yang, Weiheng Liu, Ying Gao, Wing W. Y. Ng

Abstract: With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the pr… ▽ More With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the promising performance of ConvNeXt in image classification tasks, we revise the ConvNeXt network architecture and propose a lightweight end-to-end anti-spoofing model. By integrating with the channel attention block and using the focal loss function, the proposed model can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify. Experiments show that our proposed system could achieve an equal error rate of 0.64% and min-tDCF of 0.0187 for the ASVSpoof 2019 LA evaluation dataset, which outperforms the state-of-the-art systems. △ Less

Submitted 21 December, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: 6 pages

arXiv:2209.05482 [pdf, ps, other]

Improved Fuzzy $H_{\infty}$ Filter Design Method for Nonlinear Systems with Time-Varing Delay

Authors: Qianqian Ma, Li Li, Junhui Shen, Haowei Guan, Guangcheng Ma, Hongwei Xia

Abstract: This paper investigates the fuzzy $H_{\infty}$ filter design issue for nonlinear systems with time-varying delay. In order to obtain less conservative fuzzy $H_{\infty}$ filter design method, a novel integral inequality is employed to replace the conventional Lebniz-Newton formula to analyze the stability conditions of the filtering error system. Besides, the information of the membership function… ▽ More This paper investigates the fuzzy $H_{\infty}$ filter design issue for nonlinear systems with time-varying delay. In order to obtain less conservative fuzzy $H_{\infty}$ filter design method, a novel integral inequality is employed to replace the conventional Lebniz-Newton formula to analyze the stability conditions of the filtering error system. Besides, the information of the membership functions is introduced in the criterion to further relax the derived results. The proposed delay dependent filter design method is presented as LMI-based conditions, and corresponding definite expressions of fuzzy $H_{\infty}$ filter are given as well. Finally, a simulation example is provided to prove the effectiveness and superiority of the designed fuzzy $H_{\infty}$ filter. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: This paper was published in 2017 IEEE SMC. arXiv admin note: text overlap with arXiv:2209.04989. text overlap with arXiv:2209.04989

arXiv:2209.04989 [pdf, ps, other]

A New Fuzzy $H_{\infty}$ Filter Design for Nonlinear Time-Delay Systems with Mismatched Premise Membership Functions

Authors: Qianqian Ma, Hongwei Xia, Li Li, Guangcheng Ma

Abstract: This paper is concerned with the fuzzy $H_{\infty}$ filter design issue for nonlinear systems with time-varying delay. To overcome the shortcomings of the conventional methods with matched preconditions, the fuzzy $H_{\infty}$ filter to be designed and the T-S fuzzy model are assumed to have different premise membership functions and number of rules, thus, greater design flexibility and robustness… ▽ More This paper is concerned with the fuzzy $H_{\infty}$ filter design issue for nonlinear systems with time-varying delay. To overcome the shortcomings of the conventional methods with matched preconditions, the fuzzy $H_{\infty}$ filter to be designed and the T-S fuzzy model are assumed to have different premise membership functions and number of rules, thus, greater design flexibility and robustness to uncertainty can be achieved. However, such design will also make the derived results conservative, to relax the result, a novel integral inequality which is tighter than the traditional inequalities derived from the Leibniz-Newton formula is applied, besides, a fuzzy Lypunov function and the information of the membership functions are also introduced. All the design methods are presented in LMI-based conditions. Finally, two numerical examples are given to prove the effectiveness and superiority of the proposed approach. △ Less

Submitted 13 September, 2022; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: This paper was published at IFAC 2017

arXiv:2208.14022 [pdf, other]

Stabilize, Decompose, and Denoise: Self-Supervised Fluoroscopy Denoising

Authors: Ruizhou Liu, Qiang Ma, Zhiwei Cheng, Yuanyuan Lyu, Jianji Wang, S. Kevin Zhou

Abstract: Fluoroscopy is an imaging technique that uses X-ray to obtain a real-time 2D video of the interior of a 3D object, hel** surgeons to observe pathological structures and tissue functions especially during intervention. However, it suffers from heavy noise that mainly arises from the clinical use of a low dose X-ray, thereby necessitating the technology of fluoroscopy denoising. Such denoising is… ▽ More Fluoroscopy is an imaging technique that uses X-ray to obtain a real-time 2D video of the interior of a 3D object, hel** surgeons to observe pathological structures and tissue functions especially during intervention. However, it suffers from heavy noise that mainly arises from the clinical use of a low dose X-ray, thereby necessitating the technology of fluoroscopy denoising. Such denoising is challenged by the relative motion between the object being imaged and the X-ray imaging system. We tackle this challenge by proposing a self-supervised, three-stage framework that exploits the domain knowledge of fluoroscopy imaging. (i) Stabilize: we first construct a dynamic panorama based on optical flow calculation to stabilize the non-stationary background induced by the motion of the X-ray detector. (ii) Decompose: we then propose a novel mask-based Robust Principle Component Analysis (RPCA) decomposition method to separate a video with detector motion into a low-rank background and a sparse foreground. Such a decomposition accommodates the reading habit of experts. (iii) Denoise: we finally denoise the background and foreground separately by a self-supervised learning strategy and fuse the denoised parts into the final output via a bilateral, spatiotemporal filter. To assess the effectiveness of our work, we curate a dedicated fluoroscopy dataset of 27 videos (1,568 frames) and corresponding ground truth. Our experiments demonstrate that it achieves significant improvements in terms of denoising and enhancement effects when compared with standard approaches. Finally, expert rating confirms this efficacy. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 11 pages, 18 figures

arXiv:2206.03173 [pdf, other]

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

Authors: Yinan Bao, Qianwen Ma, Lingwei Wei, Wei Zhou, Songlin Hu

Abstract: The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modelin… ▽ More The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: Accepted by IJCAI-ECAI 2022

arXiv:2205.08239 [pdf, other]

CAS-Net: Conditional Atlas Generation and Brain Segmentation for Fetal MRI

Authors: Liu Li, Qiang Ma, Matthew Sinclair, Antonios Makropoulos, Joseph Hajnal, A. David Edwards, Bernhard Kainz, Daniel Rueckert, Amir Alansary

Abstract: Fetal Magnetic Resonance Imaging (MRI) is used in prenatal diagnosis and to assess early brain development. Accurate segmentation of the different brain tissues is a vital step in several brain analysis tasks, such as cortical surface reconstruction and tissue thickness measurements. Fetal MRI scans, however, are prone to motion artifacts that can affect the correctness of both manual and automati… ▽ More Fetal Magnetic Resonance Imaging (MRI) is used in prenatal diagnosis and to assess early brain development. Accurate segmentation of the different brain tissues is a vital step in several brain analysis tasks, such as cortical surface reconstruction and tissue thickness measurements. Fetal MRI scans, however, are prone to motion artifacts that can affect the correctness of both manual and automatic segmentation techniques. In this paper, we propose a novel network structure that can simultaneously generate conditional atlases and predict brain tissue segmentation, called CAS-Net. The conditional atlases provide anatomical priors that can constrain the segmentation connectivity, despite the heterogeneity of intensity values caused by motion or partial volume effects. The proposed method is trained and evaluated on 253 subjects from the develo** Human Connectome Project (dHCP). The results demonstrate that the proposed method can generate conditional age-specific atlas with sharp boundary and shape variance. It also segment multi-category brain tissues for fetal MRI with a high overall Dice similarity coefficient (DSC) of $85.2\%$ for the selected 9 tissue labels. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2202.08329 [pdf, other]

CortexODE: Learning Cortical Surface Reconstruction by Neural ODEs

Authors: Qiang Ma, Liu Li, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert, Amir Alansary

Abstract: We present CortexODE, a deep learning framework for cortical surface reconstruction. CortexODE leverages neural ordinary differential equations (ODEs) to deform an input surface into a target shape by learning a diffeomorphic flow. The trajectories of the points on the surface are modeled as ODEs, where the derivatives of their coordinates are parameterized via a learnable Lipschitz-continuous def… ▽ More We present CortexODE, a deep learning framework for cortical surface reconstruction. CortexODE leverages neural ordinary differential equations (ODEs) to deform an input surface into a target shape by learning a diffeomorphic flow. The trajectories of the points on the surface are modeled as ODEs, where the derivatives of their coordinates are parameterized via a learnable Lipschitz-continuous deformation network. This provides theoretical guarantees for the prevention of self-intersections. CortexODE can be integrated to an automatic learning-based pipeline, which reconstructs cortical surfaces efficiently in less than 5 seconds. The pipeline utilizes a 3D U-Net to predict a white matter segmentation from brain Magnetic Resonance Imaging (MRI) scans, and further generates a signed distance function that represents an initial surface. Fast topology correction is introduced to guarantee homeomorphism to a sphere. Following the isosurface extraction step, two CortexODE models are trained to deform the initial surface to white matter and pial surfaces respectively. The proposed pipeline is evaluated on large-scale neuroimage datasets in various age groups including neonates (25-45 weeks), young adults (22-36 years) and elderly subjects (55-90 years). Our experiments demonstrate that the CortexODE-based pipeline can achieve less than 0.2mm average geometric error while being orders of magnitude faster compared to conventional processing pipelines. △ Less

Submitted 10 September, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by IEEE Transactions on Medical Imaging

arXiv:2111.13923 [pdf, other]

Learning A 3D-CNN and Transformer Prior for Hyperspectral Image Super-Resolution

Authors: Qing Ma, Junjun Jiang, Xianming Liu, Jiayi Ma

Abstract: To solve the ill-posed problem of hyperspectral image super-resolution (HSISR), an usually method is to use the prior information of the hyperspectral images (HSIs) as a regularization term to constrain the objective function. Model-based methods using hand-crafted priors cannot fully characterize the properties of HSIs. Learning-based methods usually use a convolutional neural network (CNN) to le… ▽ More To solve the ill-posed problem of hyperspectral image super-resolution (HSISR), an usually method is to use the prior information of the hyperspectral images (HSIs) as a regularization term to constrain the objective function. Model-based methods using hand-crafted priors cannot fully characterize the properties of HSIs. Learning-based methods usually use a convolutional neural network (CNN) to learn the implicit priors of HSIs. However, the learning ability of CNN is limited, it only considers the spatial characteristics of the HSIs and ignores the spectral characteristics, and convolution is not effective for long-range dependency modeling. There is still a lot of room for improvement. In this paper, we propose a novel HSISR method that uses Transformer instead of CNN to learn the prior of HSIs. Specifically, we first use the proximal gradient algorithm to solve the HSISR model, and then use an unfolding network to simulate the iterative solution processes. The self-attention layer of Transformer makes it have the ability of spatial global interaction. In addition, we add 3D-CNN behind the Transformer layers to better explore the spatio-spectral correlation of HSIs. Both quantitative and visual results on two widely used HSI datasets and the real-world dataset demonstrate that the proposed method achieves a considerable gain compared to all the mainstream algorithms including the most competitive conventional methods and the recently proposed deep learning-based methods. △ Less

Submitted 27 November, 2021; originally announced November 2021.

Comments: 10 pages, 5 figures

arXiv:2109.03693 [pdf, other]

PialNN: A Fast Deep Learning Framework for Cortical Pial Surface Reconstruction

Authors: Qiang Ma, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert, Amir Alansary

Abstract: Traditional cortical surface reconstruction is time consuming and limited by the resolution of brain Magnetic Resonance Imaging (MRI). In this work, we introduce Pial Neural Network (PialNN), a 3D deep learning framework for pial surface reconstruction. PialNN is trained end-to-end to deform an initial white matter surface to a target pial surface by a sequence of learned deformation blocks. A loc… ▽ More Traditional cortical surface reconstruction is time consuming and limited by the resolution of brain Magnetic Resonance Imaging (MRI). In this work, we introduce Pial Neural Network (PialNN), a 3D deep learning framework for pial surface reconstruction. PialNN is trained end-to-end to deform an initial white matter surface to a target pial surface by a sequence of learned deformation blocks. A local convolutional operation is incorporated in each block to capture the multi-scale MRI information of each vertex and its neighborhood. This is fast and memory-efficient, which allows reconstructing a pial surface mesh with 150k vertices in one second. The performance is evaluated on the Human Connectome Project (HCP) dataset including T1-weighted MRI scans of 300 subjects. The experimental results demonstrate that PialNN reduces the geometric error of the predicted pial surface by 30% compared to state-of-the-art deep learning approaches. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: Accepted in The 4th International Workshop on Machine Learning in Clinical Neuroimaging (MLCN2021)

arXiv:2106.05905 [pdf, other]

Multiple Dynamic Pricing for Demand Response with Adaptive Clustering-based Customer Segmentation in Smart Grids

Authors: Fanlin Meng, Qian Ma, Zixu Liu, Xiao-Jun Zeng

Abstract: In this paper, we propose a realistic multiple dynamic pricing approach to demand response in the retail market. First, an adaptive clustering-based customer segmentation framework is proposed to categorize customers into different groups to enable the effective identification of usage patterns. Second, customized demand models with important market constraints which capture the price-demand relat… ▽ More In this paper, we propose a realistic multiple dynamic pricing approach to demand response in the retail market. First, an adaptive clustering-based customer segmentation framework is proposed to categorize customers into different groups to enable the effective identification of usage patterns. Second, customized demand models with important market constraints which capture the price-demand relationship explicitly, are developed for each group of customers to improve the model accuracy and enable meaningful pricing. Third, the multiple pricing based demand response is formulated as a profit maximization problem subject to realistic market constraints. The overall aim of the proposed scalable and practical method aims to achieve 'right' prices for 'right' customers so as to benefit various stakeholders in the system such as grid operators, customers and retailers. The proposed multiple pricing framework is evaluated via simulations based on real-world datasets. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2105.06887 [pdf]

A Frequency Domain Constraint for Synthetic and Real X-ray Image Super Resolution

Authors: Qing Ma, Jae Chul Koh, WonSook Lee

Abstract: Synthetic X-ray images are simulated X-ray images projected from CT data. High-quality synthetic X-ray images can facilitate various applications such as surgical image guidance systems and VR training simulations. However, it is difficult to produce high-quality arbitrary view synthetic X-ray images in real-time due to different CT slice thickness, high computational cost, and the complexity of a… ▽ More Synthetic X-ray images are simulated X-ray images projected from CT data. High-quality synthetic X-ray images can facilitate various applications such as surgical image guidance systems and VR training simulations. However, it is difficult to produce high-quality arbitrary view synthetic X-ray images in real-time due to different CT slice thickness, high computational cost, and the complexity of algorithms. Our goal is to generate high-resolution synthetic X-ray images in real-time by upsampling low-resolution images with deep learning-based super-resolution methods. Reference-based Super Resolution (RefSR) has been well studied in recent years and has shown higher performance than traditional Single Image Super-Resolution (SISR). It can produce fine details by utilizing the reference image but still inevitably generates some artifacts and noise. In this paper, we introduce frequency domain loss as a constraint to further improve the quality of the RefSR results with fine details and without obvious artifacts. To the best of our knowledge, this is the first paper utilizing the frequency domain for the loss functions in the field of super-resolution. We achieved good results in evaluating our method on both synthetic and real X-ray image datasets. △ Less

Submitted 10 August, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

arXiv:2011.04976 [pdf, other]

Conceptual Compression via Deep Structure and Texture Synthesis

Authors: Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, Siwei Ma

Abstract: Existing compression methods typically focus on the removal of signal-level redundancies, while the potential and versatility of decomposing visual data into compact conceptual components still lack further study. To this end, we propose a novel conceptual compression framework that encodes visual data into compact structure and texture representations, then decodes in a deep synthesis fashion, ai… ▽ More Existing compression methods typically focus on the removal of signal-level redundancies, while the potential and versatility of decomposing visual data into compact conceptual components still lack further study. To this end, we propose a novel conceptual compression framework that encodes visual data into compact structure and texture representations, then decodes in a deep synthesis fashion, aiming to achieve better visual reconstruction quality, flexible content manipulation, and potential support for various vision tasks. In particular, we propose to compress images by a dual-layered model consisting of two complementary visual features: 1) structure layer represented by structural maps and 2) texture layer characterized by low-dimensional deep representations. At the encoder side, the structural maps and texture representations are individually extracted and compressed, generating the compact, interpretable, inter-operable bitstreams. During the decoding stage, a hierarchical fusion GAN (HF-GAN) is proposed to learn the synthesis paradigm where the textures are rendered into the decoded structural maps, leading to high-quality reconstruction with remarkable visual realism. Extensive experiments on diverse images have demonstrated the superiority of our framework with lower bitrates, higher reconstruction quality, and increased versatility towards visual analysis and content manipulation tasks. △ Less

Submitted 10 March, 2022; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: 15 pages, 14 figures

arXiv:2007.13975 [pdf, other]

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

Authors: **g**g Chen, Qirong Mao, Dong Liu

Abstract: The dominant speech separation models are based on complex recurrent or convolution neural network that model speech sequences indirectly conditioning on context, such as passing information through many intermediate states in recurrent neural network, leading to suboptimal separation performance. In this paper, we propose a dual-path transformer network (DPTNet) for end-to-end speech separation,… ▽ More The dominant speech separation models are based on complex recurrent or convolution neural network that model speech sequences indirectly conditioning on context, such as passing information through many intermediate states in recurrent neural network, leading to suboptimal separation performance. In this paper, we propose a dual-path transformer network (DPTNet) for end-to-end speech separation, which introduces direct context-awareness in the modeling for speech sequences. By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness. The improved transformer in our approach learns the order information of the speech sequences without positional encodings by incorporating a recurrent neural network into the original transformer. In addition, the structure of dual paths makes our model efficient for extremely long speech sequence modeling. Extensive experiments on benchmark datasets show that our approach outperforms the current state-of-the-arts (20.6 dB SDR on the public WSj0-2mix data corpus). △ Less

Submitted 14 August, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

Comments: 5 pages. Accepted by INTERSPEECH 2020

arXiv:2004.07442 [pdf, other]

Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

Authors: Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa

Abstract: With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint priva… ▽ More With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint privacy protection do not provide either a meaningful privacy-utility trade-off or a formal and rigorous definition of privacy. In this study, we design a novel and rigorous privacy metric for voiceprint privacy, which is referred to as voice-indistinguishability, by extending differential privacy. We also propose mechanisms and frameworks for privacy-preserving speech data release satisfying voice-indistinguishability. Experiments on public datasets verify the effectiveness and efficiency of the proposed methods. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: The paper has been accepted by the IEEE International Conference on Multimedia & Expo 2020(ICME 2020)

arXiv:2003.10916 [pdf, other]

Age of Processing: Age-driven Status Sampling and Processing Offloading for Edge Computing-enabled Real-time IoT Applications

Authors: Rui Li, Qian Ma, Jie Gong, Zhi Zhou, Xu Chen

Abstract: The freshness of status information is of great importance for time-critical Internet of Things (IoT) applications. A metric measuring status freshness is the age-of-information (AoI), which captures the time elapsed from the status being generated at the source node (e.g., a sensor) to the latest status update.However, in intelligent IoT applications such as video surveillance, the status informa… ▽ More The freshness of status information is of great importance for time-critical Internet of Things (IoT) applications. A metric measuring status freshness is the age-of-information (AoI), which captures the time elapsed from the status being generated at the source node (e.g., a sensor) to the latest status update.However, in intelligent IoT applications such as video surveillance, the status information is revealed after some computation intensive and time-consuming data processing operations, which would affect the status freshness. In this paper, we propose a novel metric, age-of-processing (AoP), to quantify such status freshness, which captures the time elapsed of the newest received processed status data since it is generated. Compared with AoI, AoP further takes the data processing time into account. Since an IoT device has limited computation and energy resource, the device can choose to offload the data processing to the nearby edge server under constrained status sampling frequency.We aim to minimize the average AoP in a long-term process by jointly optimizing the status sampling frequency and processing offloading policy. We formulate this online problem as an infinite-horizon constrained Markov decision process (CMDP) with average reward criterion. We then transform the CMDP problem into an unconstrained Markov decision process (MDP) by leveraging a Lagrangian method, and propose a Lagrangian transformation framework for the original CMDP problem. Furthermore, we integrate the framework with perturbation based refinement for achieving the optimal policy of the CMDP problem. Extensive numerical evaluations show that the proposed algorithm outperforms the benchmarks, with an average AoP reduction up to 30%. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: submitted for review

arXiv:1912.00191 [pdf, other]

Learning a Decision Module by Imitating Driver's Control Behaviors

Authors: Junning Huang, Sirui Xie, Jiankai Sun, Qiurui Ma, Chunxiao Liu, Jian** Shi, Dahua Lin, Bolei Zhou

Abstract: Autonomous driving systems have a pipeline of perception, decision, planning, and control. The decision module processes information from the perception module and directs the execution of downstream planning and control modules. On the other hand, the recent success of deep learning suggests that this pipeline could be replaced by end-to-end neural control policies, however, safety cannot be well… ▽ More Autonomous driving systems have a pipeline of perception, decision, planning, and control. The decision module processes information from the perception module and directs the execution of downstream planning and control modules. On the other hand, the recent success of deep learning suggests that this pipeline could be replaced by end-to-end neural control policies, however, safety cannot be well guaranteed for the data-driven neural networks. In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning. This hybrid framework can preserve the merits of the classical pipeline such as the strict enforcement of physical and logical constraints while learning complex driving decisions from data. To circumvent the ambiguous annotation of human driving decisions, our method learns high-level driving decisions by imitating low-level control behaviors. We show in the simulation experiments that our modular driving agent can generalize its driving decision and control to various complex scenarios where the rule-based programs fail. It can also generate smoother and safer driving trajectories than end-to-end neural policies. △ Less

Submitted 5 May, 2021; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: Proceedings of the Conference on Robot Learning (CoRL) 2020

arXiv:1910.10202 [pdf, other]

Complex Transformer: A Framework for Modeling Complex-Valued Sequence

Authors: Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov

Abstract: While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers. However, speech, signal and audio data are naturally complex-valued after Fourier Transform, and studies have shown a potentially richer representation of complex nets. In this paper, we propose a Complex Transformer, which incorporates the transformer… ▽ More While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers. However, speech, signal and audio data are naturally complex-valued after Fourier Transform, and studies have shown a potentially richer representation of complex nets. In this paper, we propose a Complex Transformer, which incorporates the transformer model as a backbone for sequence modeling; we also develop attention and encoder-decoder network operating for complex input. The model achieves state-of-the-art performance on the MusicNet dataset and an In-phase Quadrature (IQ) signal dataset. △ Less

Submitted 6 August, 2021; v1 submitted 22 October, 2019; originally announced October 2019.

arXiv:1910.00497 [pdf]

Intelligent Metasurface Imager and Recognizer

Authors: Lianlin Li, Ya Shuang, Qian Ma, Haoyang Li, Hanting Zhao, Menglin Wei1, Che Liu, Chenglong Hao, Cheng-Wei Qiu, Tie Jun Cui

Abstract: It is ever-increasingly demanded to remotely monitor people in daily life using radio-frequency probing signals. However, conventional systems can hardly be deployed in real-world settings since they typically require objects to either deliberately cooperate or carry a wireless active device or identification tag. To accomplish the complicated successive tasks using a single device in real time, w… ▽ More It is ever-increasingly demanded to remotely monitor people in daily life using radio-frequency probing signals. However, conventional systems can hardly be deployed in real-world settings since they typically require objects to either deliberately cooperate or carry a wireless active device or identification tag. To accomplish the complicated successive tasks using a single device in real time, we propose a smart metasurface imager and recognizer simultaneously, empowered by a network of artificial neural networks (ANNs) for adaptively controlling data flow. Here, three ANNs are employed in an integrated hierarchy: transforming measured microwave data into images of whole human body; classifying the specifically designated spots (hand and chest) within the whole image; and recognizing human hand signs instantly at Wi-Fi frequency of 2.4 GHz. Instantaneous in-situ imaging of full scene and adaptive recognition of hand signs and vital signs of multiple non-cooperative people have been experimentally demonstrated. We also show that the proposed intelligent metasurface system work well even when it is passively excited by stray Wi-Fi signals that ubiquitously exist in our daily lives. The reported strategy could open a new avenue for future smart cities, smart homes, human-device interactive interfaces, healthy monitoring, and safety screening free of visual privacy issues. △ Less

Submitted 2 September, 2019; originally announced October 2019.

arXiv:1907.09320 [pdf]

An Efficient Target Detection and Recognition Method in Aerial Remote-sensing Images Based on Multiangle Regions-of-Interest

Authors: Guangcun Shan, Hongyu Wang, Wei Liang, Congcong Liu, Qizi Ma, Quan Quan

Abstract: Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the ta… ▽ More Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the target and calculation of its position. Aerial remote sensing images have different shooting angles and methods compared with ordinary pictures or images, which makes remote-sensing images play an irreplaceable role in some areas. In this study, a new target detection and recognition method in remote-sensing images is proposed based on deep convolution neural network (CNN) for the provision of multilevel information of images in combination with a region proposal network used to generate multiangle regions-of-interest. The proposed method generated results that were much more accurate and precise than those obtained with traditional ways. This demonstrated that the model proposed herein displays tremendous applicability potential in remote-sensing image recognition. △ Less

Submitted 7 June, 2022; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: 5 pages, 3 figures

arXiv:1907.03548 [pdf, ps, other]

Unified Attentional Generative Adversarial Network for Brain Tumor Segmentation From Multimodal Unpaired Images

Authors: Wenguang Yuan, Jia Wei, Jiabing Wang, Qianli Ma, Tolga Tasdizen

Abstract: In medical applications, the same anatomical structures may be observed in multiple modalities despite the different image characteristics. Currently, most deep models for multimodal segmentation rely on paired registered images. However, multimodal paired registered images are difficult to obtain in many cases. Therefore, develo** a model that can segment the target objects from different modal… ▽ More In medical applications, the same anatomical structures may be observed in multiple modalities despite the different image characteristics. Currently, most deep models for multimodal segmentation rely on paired registered images. However, multimodal paired registered images are difficult to obtain in many cases. Therefore, develo** a model that can segment the target objects from different modalities with unpaired images is significant for many clinical applications. In this work, we propose a novel two-stream translation and segmentation unified attentional generative adversarial network (UAGAN), which can perform any-to-any image modality translation and segment the target objects simultaneously in the case where two or more modalities are available. The translation stream is used to capture modality-invariant features of the target anatomical structures. In addition, to focus on segmentation-related features, we add attentional blocks to extract valuable features from the translation stream. Experiments on three-modality brain tumor segmentation indicate that UAGAN outperforms the existing methods in most cases. △ Less

Submitted 8 July, 2019; originally announced July 2019.

Comments: 9 pages, 4 figures, Accepted by MICCAI2019

arXiv:1807.07840 [pdf, other]

On Synchronization of Dynamical Systems over Directed Switching Topologies: An Algebraic and Geometric Perspective

Authors: Jiahu Qin, Qichao Ma, Xinghuo Yu, Long Wang

Abstract: In this paper, we aim to investigate the synchronization problem of dynamical systems, which can be of generic linear or Lipschitz nonlinear type, communicating over directed switching network topologies. A mild connectivity assumption on the switching topologies is imposed, which allows them to be directed and jointly connected. We propose a novel analysis framework from both algebraic and geomet… ▽ More In this paper, we aim to investigate the synchronization problem of dynamical systems, which can be of generic linear or Lipschitz nonlinear type, communicating over directed switching network topologies. A mild connectivity assumption on the switching topologies is imposed, which allows them to be directed and jointly connected. We propose a novel analysis framework from both algebraic and geometric perspectives to justify the attractiveness of the synchronization manifold. Specifically, it is proven that the complementary space of the synchronization manifold can be spanned by certain subspaces. These subspaces can be the eigenspaces of the nonzero eigenvalues of Laplacian matrices in linear case. They can also be subspaces in which the projection of the nonlinear self-dynamics still retains the Lipschitz property. This allows to project the states of the dynamical systems into these subspaces and transform the synchronization problem under consideration equivalently into a convergence one of the projected states in each subspace. Then, assuming the joint connectivity condition on the communication topologies, we are able to work out a simple yet effective and unified convergence analysis for both types of dynamical systems. More specifically, for partial-state coupled generic linear systems, it is proven that synchronization can be reached if an extra condition, which is easy to verify in several cases, on the system dynamics is satisfied. For Lipschitz-type nonlinear systems with positive-definite inner coupling matrix, synchronization is realized if the coupling strength is strong enough to stabilize the evolution of the projected states in each subspace under certain conditions. △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: 17 pages, 11 figures

Showing 1–38 of 38 results for author: Ma, Q