Search | arXiv e-print repository

Exploiting Data Significance in Remote Estimation of Discrete-State Markov Sources

Abstract: We consider the semantics-aware remote estimation of a discrete-state Markov source with normal (low-priority) and alarm (high-priority) states. Erroneously announcing a normal state at the destination when the source is actually in an alarm state (i.e., missed alarm error) incurs a significantly higher cost than falsely announcing an alarm state when the source is in a normal state (i.e., false a… ▽ More We consider the semantics-aware remote estimation of a discrete-state Markov source with normal (low-priority) and alarm (high-priority) states. Erroneously announcing a normal state at the destination when the source is actually in an alarm state (i.e., missed alarm error) incurs a significantly higher cost than falsely announcing an alarm state when the source is in a normal state (i.e., false alarm error). Moreover, successive reception of an estimation error may cause significant lasting impact, e.g., maintenance cost and misoperations. Motivated by this, we assign different costs to different estimation errors and introduce two new age metrics, namely the Age of Missed Alarm (AoMA) and the Age of False Alarm (AoFA), to account for the lasting impact incurred by different estimation errors. Notably, the two age processes evolve dependently and can distinguish between different types of estimation errors and different synced states. The aim is to achieve an optimal trade-off between the cost of estimation error, lasting impact, and communication utilization. The problem is formulated as an average-cost, countably infinite state-space Markov decision process (MDP). We show that the optimal policy exhibits a switching-type structure, making it amenable to policy storage and algorithm design. Notably, when the source is symmetric and states are equally important, the optimal policy has identical thresholds, i.e., threshold-type. Theoretical and numerical results underscore that our approach extends the current understanding of the Age of Incorrect Information (AoII) and the cost of actuation error (CAE), showing that they are specific instances within our broader framework. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.05671 [pdf, ps, other]

doi 10.1109/LWC.2024.3410383

Efficient Beamforming Feedback Information-Based Wi-Fi Sensing by Feature Selection

Authors: Xin Li, **gzhi Hu, Jun Luo

Abstract: Wi-Fi sensing leveraging plain-text beamforming feedback information (BFI) in multiple-input-multiple-output (MIMO) systems attracts increasing attention. However, due to the implicit relationship between BFI and the channel state information (CSI), quantifying the sensing capability of BFI poses a challenge in building efficient BFI-based sensing algorithms. In this letter, we first derive a math… ▽ More Wi-Fi sensing leveraging plain-text beamforming feedback information (BFI) in multiple-input-multiple-output (MIMO) systems attracts increasing attention. However, due to the implicit relationship between BFI and the channel state information (CSI), quantifying the sensing capability of BFI poses a challenge in building efficient BFI-based sensing algorithms. In this letter, we first derive a mathematical model of BFI, characterizing its relationship with CSI explicitly, and then develop a closed-form expression of BFI for 2x2 MIMO systems. To enhance the efficiency of BFI-based sensing by selecting only the most informative features, we quantify the sensing capacity of BFI using the Cramer-Rao bound (CRB) and then propose an efficient CRB-based BFI feature selection algorithm. Simulation results verify that BFI and CSI exhibit comparable sensing capabilities and that the proposed algorithm halves the number of features, reducing 20% more parameters than baseline methods, at the cost of only slightly increasing positioning errors. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.20064 [pdf, other]

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.06299 [pdf, other]

Cross-domain Learning Framework for Tracking Users in RIS-aided Multi-band ISAC Systems with Sparse Labeled Data

Authors: **gzhi Hu, Dusit Niyato, Jun Luo

Abstract: Integrated sensing and communications (ISAC) is pivotal for 6G communications and is boosted by the rapid development of reconfigurable intelligent surfaces (RISs). Using the channel state information (CSI) across multiple frequency bands, RIS-aided multi-band ISAC systems can potentially track users' positions with high precision. Though tracking with CSI is desirable as no communication overhead… ▽ More Integrated sensing and communications (ISAC) is pivotal for 6G communications and is boosted by the rapid development of reconfigurable intelligent surfaces (RISs). Using the channel state information (CSI) across multiple frequency bands, RIS-aided multi-band ISAC systems can potentially track users' positions with high precision. Though tracking with CSI is desirable as no communication overheads are incurred, it faces challenges due to the multi-modalities of CSI samples, irregular and asynchronous data traffic, and sparse labeled data for learning the tracking function. This paper proposes the X2Track framework, where we model the tracking function by a hierarchical architecture, jointly utilizing multi-modal CSI indicators across multiple bands, and optimize it in a cross-domain manner, tackling the sparsity of labeled data for the target deployment environment (namely, target domain) by adapting the knowledge learned from another environment (namely, source domain). Under X2Track, we design an efficient deep learning algorithm to minimize tracking errors, based on transformer neural networks and adversarial learning techniques. Simulation results verify that X2Track achieves decimeter-level axial tracking errors even under scarce UL data traffic and strong interference conditions and can adapt to diverse deployment environments with fewer than 5% training data, or equivalently, 5 minutes of UE tracks, being labeled. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.04848 [pdf, other]

Task-Aware Encoder Control for Deep Video Compression

Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, **g Geng, Yan Wang, Jun Zhang, Hongwei Qin

Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an innovative encoder controller for deep video compression for machines. This controller features a mode prediction and a Group of Pictures (GoP) selection module. Our approach centralizes control at the encoding stage, allowing for adaptable encoder adjustments across different tasks, such as detection and tracking, while maintaining compatibility with a standard pre-trained DVC decoder. Empirical evidence demonstrates that our method is applicable across multiple tasks with various existing pre-trained DVCs. Moreover, extensive experiments demonstrate that our method outperforms previous DVC by about 25% bitrate for different tasks, with only one pre-trained decoder. △ Less

Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.16855 [pdf, other]

Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints

Authors: Ji** Luo, Nikolaos Pappas

Abstract: This paper studies semantic-aware communication for remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the semantics of information and consider that the remote actuator has different tolerances for the estimation errors of different states. We aim to find an optimal scheduling policy… ▽ More This paper studies semantic-aware communication for remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the semantics of information and consider that the remote actuator has different tolerances for the estimation errors of different states. We aim to find an optimal scheduling policy that minimizes the long-term state-dependent costs of estimation errors under a transmission frequency constraint. We theoretically show the structure of the optimal policy by leveraging the average-cost Constrained Markov Decision Process (CMDP) theory and the Lagrangian dynamic programming. By exploiting the optimal structural results, we develop a novel policy search algorithm, termed intersection search plus relative value iteration (Insec-RVI), that can find the optimal policy using only a few iterations. To avoid the ``curse of dimensionality'' of MDPs, we propose an online low-complexity drift-plus-penalty (DPP) scheduling algorithm based on the Lyapunov optimization theorem. We also design an efficient average-cost Q-learning algorithm to estimate the optimal policy without knowing a priori the channel and source statistics. Numerical results show that continuous transmission is inefficient, and remarkably, our semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13030 [pdf, other]

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

Authors: Jixiang Luo, Yan Wang, Hongwei Qin

Abstract: Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text… ▽ More Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear map** within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG. △ Less

Submitted 20 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.09871 [pdf, other]

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Authors: Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, **yang Luo, Yan Liu, Ming Xi, Kejun Zhang

Abstract: The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music descripti… ▽ More The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced (https://github.com/CarlWangChina/MuChin/). △ Less

Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)

MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

arXiv:2402.03919 [pdf, other]

Sensing Mutual Information with Random Signals in Gaussian Channels: Bridging Sensing and Communication Metrics

Authors: Lei Xie, Fan Liu, Jia** Luo, Shenghui Song

Abstract: Sensing performance is typically evaluated by classical radar metrics, such as Cramer-Rao bound and signal-to-clutter-plus-noise ratio. The recent development of the integrated sensing and communication (ISAC) framework motivated the efforts to unify the performance metric for sensing and communication, where mutual information (MI) was proposed as a sensing performance metric with deterministic s… ▽ More Sensing performance is typically evaluated by classical radar metrics, such as Cramer-Rao bound and signal-to-clutter-plus-noise ratio. The recent development of the integrated sensing and communication (ISAC) framework motivated the efforts to unify the performance metric for sensing and communication, where mutual information (MI) was proposed as a sensing performance metric with deterministic signals. However, the need of communication in ISAC systems necessitates the transmission of random signals for sensing applications, whereas an explicit evaluation for the sensing mutual information (SMI) with random signals is not yet available in the literature. This paper aims to fill the research gap and investigate the unification of sensing and communication performance metrics. For that purpose, we first derive the explicit expression for the SMI with random signals utilizing random matrix theory. On top of that, we further build up the connections between SMI and traditional sensing metrics, such as ergodic minimum mean square error (EMMSE), ergodic linear minimum mean square error (ELMMSE), and ergodic Bayesian Cramér-Rao bound (EBCRB). Such connections open up the opportunity to unify sensing and communication performance metrics, which facilitates the analysis and design for ISAC systems. Finally, SMI is utilized to optimize the precoder for both sensing-only and ISAC applications. Simulation results validate the accuracy of the theoretical results and the effectiveness of the proposed precoding designs. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2311.07081

arXiv:2402.01186 [pdf, other]

Ambient-Pix2PixGAN for Translating Medical Images from Noisy Data

Authors: Wentao Chen, Xichen Xu, Jie Luo, Weimin Zhou

Abstract: Image-to-image translation is a common task in computer vision and has been rapidly increasing the impact on the field of medical imaging. Deep learning-based methods that employ conditional generative adversarial networks (cGANs), such as Pix2PixGAN, have been extensively explored to perform image-to-image translation tasks. However, when noisy medical image data are considered, such methods cann… ▽ More Image-to-image translation is a common task in computer vision and has been rapidly increasing the impact on the field of medical imaging. Deep learning-based methods that employ conditional generative adversarial networks (cGANs), such as Pix2PixGAN, have been extensively explored to perform image-to-image translation tasks. However, when noisy medical image data are considered, such methods cannot be directly applied to produce clean images. Recently, an augmented GAN architecture named AmbientGAN has been proposed that can be trained on noisy measurement data to synthesize high-quality clean medical images. Inspired by AmbientGAN, in this work, we propose a new cGAN architecture, Ambient-Pix2PixGAN, for performing medical image-to-image translation tasks by use of noisy measurement data. Numerical studies that consider MRI-to-PET translation are conducted. Both traditional image quality metrics and task-based image quality metrics are employed to assess the proposed Ambient-Pix2PixGAN. It is demonstrated that our proposed Ambient-Pix2PixGAN can be successfully trained on noisy measurement data to produce high-quality translated images in target imaging modality. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: SPIE Medical Imaging 2024

arXiv:2311.07346 [pdf, other]

Goal-oriented Estimation of Multiple Markov Sources in Resource-constrained Systems

Authors: Ji** Luo, Nikolaos Pappas

Abstract: This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent decides the updating times of the sources and transmits the packet to a remote destination over an unreliable channel with delay. The destination is tasked with source reconstruction for actuation. We utilize the metric \textit{cost of actuation error} (CA… ▽ More This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent decides the updating times of the sources and transmits the packet to a remote destination over an unreliable channel with delay. The destination is tasked with source reconstruction for actuation. We utilize the metric \textit{cost of actuation error} (CAE) to capture the state-dependent actuation costs. We aim for a sampling policy that minimizes the long-term average CAE subject to an average resource constraint. We formulate this problem as an average-cost constrained Markov Decision Process (CMDP) and relax it into an unconstrained problem by utilizing \textit{Lyapunov drift} techniques. Then, we propose a low-complexity \textit{drift-plus-penalty} (DPP) policy for systems with known source/channel statistics and a Lyapunov optimization-based deep reinforcement learning (LO-DRL) policy for unknown environments. Our policies significantly reduce the number of uninformative transmissions by exploiting the timing of the important information. △ Less

Submitted 3 June, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted to be presented at the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2024

arXiv:2311.03175 [pdf]

Frequency Domain Decomposition Translation for Enhanced Medical Image Translation Using GANs

Authors: Zhuhui Wang, Jianwei Zuo, Xuliang Deng, Jiajia Luo

Abstract: Medical Image-to-image translation is a key task in computer vision and generative artificial intelligence, and it is highly applicable to medical image analysis. GAN-based methods are the mainstream image translation methods, but they often ignore the variation and distribution of images in the frequency domain, or only take simple measures to align high-frequency information, which can lead to d… ▽ More Medical Image-to-image translation is a key task in computer vision and generative artificial intelligence, and it is highly applicable to medical image analysis. GAN-based methods are the mainstream image translation methods, but they often ignore the variation and distribution of images in the frequency domain, or only take simple measures to align high-frequency information, which can lead to distortion and low quality of the generated images. To solve these problems, we propose a novel method called frequency domain decomposition translation (FDDT). This method decomposes the original image into a high-frequency component and a low-frequency component, with the high-frequency component containing the details and identity information, and the low-frequency component containing the style information. Next, the high-frequency and low-frequency components of the transformed image are aligned with the transformed results of the high-frequency and low-frequency components of the original image in the same frequency band in the spatial domain, thus preserving the identity information of the image while destroying as little stylistic information of the image as possible. We conduct extensive experiments on MRI images and natural images with FDDT and several mainstream baseline models, and we use four evaluation metrics to assess the quality of the generated images. Compared with the baseline models, optimally, FDDT can reduce Fréchet inception distance by up to 24.4%, structural similarity by up to 4.4%, peak signal-to-noise ratio by up to 5.8%, and mean squared error by up to 31%. Compared with the previous method, optimally, FDDT can reduce Fréchet inception distance by up to 23.7%, structural similarity by up to 1.8%, peak signal-to-noise ratio by up to 6.8%, and mean squared error by up to 31.6%. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.06336 [pdf, other]

HoloFed: Environment-Adaptive Positioning via Multi-band Reconfigurable Holographic Surfaces and Federated Learning

Authors: **gzhi Hu, Zhe Chen, Tianyue Zheng, Robert Schober, Jun Luo

Abstract: Positioning is an essential service for various applications and is expected to be integrated with existing communication infrastructures in 5G and 6G. Though current Wi-Fi and cellular base stations (BSs) can be used to support this integration, the resulting precision is unsatisfactory due to the lack of precise control of the wireless signals. Recently, BSs adopting reconfigurable holographic s… ▽ More Positioning is an essential service for various applications and is expected to be integrated with existing communication infrastructures in 5G and 6G. Though current Wi-Fi and cellular base stations (BSs) can be used to support this integration, the resulting precision is unsatisfactory due to the lack of precise control of the wireless signals. Recently, BSs adopting reconfigurable holographic surfaces (RHSs) have been advocated for positioning as RHSs' large number of antenna elements enable generation of arbitrary and highly-focused signal beam patterns. However, existing designs face two major challenges: i) RHSs only have limited operating bandwidth, and ii) the positioning methods cannot adapt to the diverse environments encountered in practice. To overcome these challenges, we present HoloFed, a system providing high-precision environment-adaptive user positioning services by exploiting multi-band(MB)-RHS and federated learning (FL). For improving the positioning performance, a lower bound on the error variance is obtained and utilized for guiding MB-RHS's digital and analog beamforming design. For better adaptability while preserving privacy, an FL framework is proposed for users to collaboratively train a position estimator, where we exploit the transfer learning technique to handle the lack of position labels of the users. Moreover, a scheduling algorithm for the BS to select which users train the position estimator is designed, jointly considering the convergence and efficiency of FL. Our simulation results confirm that HoloFed achieves a 57% lower positioning error variance compared to a beam-scanning baseline and can effectively adapt to diverse environments. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.05959 [pdf, other]

doi 10.1109/TVT.2023.3321335

A Computationally Efficient Bi-level Coordination Framework for CAVs at Unsignalized Intersections

Authors: Ji** Luo, Tingting Zhang, Qinyu Zhang

Abstract: In this paper, we investigate cooperative vehicle coordination for connected and automated vehicles (CAVs) at unsignalized intersections. To support high traffic throughput while reducing computational complexity, we present a novel collision region model and decompose the optimal coordination problem into two sub-problems: \textit{centralized} priority scheduling and \textit{distributed} trajecto… ▽ More In this paper, we investigate cooperative vehicle coordination for connected and automated vehicles (CAVs) at unsignalized intersections. To support high traffic throughput while reducing computational complexity, we present a novel collision region model and decompose the optimal coordination problem into two sub-problems: \textit{centralized} priority scheduling and \textit{distributed} trajectory planning. Then, we propose a bi-level coordination framework which includes: (i) a Monte Carlo Tree Search (MCTS)-based high-level priority scheduler aims to find high-quality passing orders to maximize traffic throughput, and (ii) a priority-based low-level trajectory planner that generates optimal collision-free control inputs. Simulation results demonstrate that our bi-level strategy achieves near-optimal coordination performance, comparable to state-of-the-art centralized strategies, and significantly outperform the traffic signal control systems in terms of traffic throughput. Moreover, our approach exhibits good scalability, with computational complexity scaling linearly with the number of vehicles. Video demonstrations can be found online at \url{https://youtu.be/WYAKFMNnQfs}. △ Less

Submitted 21 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.04672 [pdf, other]

SSHNN: Semi-Supervised Hybrid NAS Network for Echocardiographic Image Segmentation

Authors: Renqi Chen, **g**g Luo, Fan Nian, Yuhui Cen, Yiheng Peng, Zekuan Yu

Abstract: Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak de… ▽ More Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak decoder" structure, insufficient to handle global relationships and local details. To resolve these issues, we propose a novel semi-supervised hybrid NAS network for accurate medical image segmentation termed SSHNN. In SSHNN, we creatively use convolution operation in layer-wise feature fusion instead of normalized scalars to avoid losing details, making NAS a stronger encoder. Moreover, Transformers are introduced for the compensation of global context and U-shaped decoder is designed to efficiently connect global context with local features. Specifically, we implement a semi-supervised algorithm Mean-Teacher to overcome the limited volume problem of labeled medical image dataset. Extensive experiments on CAMUS echocardiography dataset demonstrate that SSHNN outperforms state-of-the-art approaches and realizes accurate segmentation. Code will be made publicly available. △ Less

Submitted 27 December, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: Accepted by ICASSP2024

arXiv:2308.13789 [pdf]

Sensiverse: A dataset for ISAC study

Authors: Jia** Luo, Baojian Zhou, Yang Yu, ** Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong

Abstract: In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the… ▽ More In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the dataset are also described, and finally the use of the dataset is illustrated with examples through the evaluation of use cases such as 3D environment reconstruction and moving targets. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.13287 [pdf, other]

Efficient Learned Lossless JPEG Recompression

Authors: Lina Guo, Yuanyuan Wang, Tongda Xu, Jixiang Luo, Dailan He, Zhenjun Ji, Shanshan Wang, Yang Wang, Hongwei Qin

Abstract: JPEG is one of the most popular image compression methods. It is beneficial to compress those existing JPEG files without introducing additional distortion. In this paper, we propose a deep learning based method to further compress JPEG images losslessly. Specifically, we propose a Multi-Level Parallel Conditional Modeling (ML-PCM) architecture, which enables parallel decoding in different granula… ▽ More JPEG is one of the most popular image compression methods. It is beneficial to compress those existing JPEG files without introducing additional distortion. In this paper, we propose a deep learning based method to further compress JPEG images losslessly. Specifically, we propose a Multi-Level Parallel Conditional Modeling (ML-PCM) architecture, which enables parallel decoding in different granularities. First, luma and chroma are processed independently to allow parallel coding. Second, we propose pipeline parallel context model (PPCM) and compressed checkerboard context model (CCCM) for the effective conditional modeling and efficient decoding within luma and chroma components. Our method has much lower latency while achieves better compression ratio compared with previous SOTA. After proper software optimization, we can obtain a good throughput of 57 FPS for 1080P images on NVIDIA T4 GPU. Furthermore, combined with quantization, our approach can also act as a lossy JPEG codec which has obvious advantage over SOTA lossy compression methods in high bit rate (bpp$>0.9$). △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2307.11757 [pdf, other]

Improving Video Colorization by Test-Time Tuning

Authors: Ya** Zhao, Haitian Zheng, Jiebo Luo, Edmund Y. Lam

Abstract: With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches often suffer from overfitting the training dataset and sequentially lead to suboptimal performance on colorizing testing samples. To address this issue, we propose an effective method, wh… ▽ More With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches often suffer from overfitting the training dataset and sequentially lead to suboptimal performance on colorizing testing samples. To address this issue, we propose an effective method, which aims to enhance video colorization through test-time tuning. By exploiting the reference to construct additional training samples during testing, our approach achieves a performance boost of 1~3 dB in PSNR on average compared to the baseline. Code is available at: https://github.com/IndigoPurple/T3 △ Less

Submitted 25 June, 2023; originally announced July 2023.

Comments: 5 pages, 4 figures

arXiv:2307.05386 [pdf, other]

Exploring the Potential of Integrated Optical Sensing and Communication (IOSAC) Systems with Si Waveguides for Future Networks

Authors: Xiangpeng Ou, Ying Qiu, Ming Luo, Fujun Sun, Peng Zhang, Gang Yang, Junjie Li, Jianfeng Gao, Xiaobin He, Anyan Du, Bo Tang, Bin Li, Zichen Liu, Zhihua Li, Ling Xie, Xi Xiao, Jun Luo, Wenwu Wang, ** Tao, Yan Yang

Abstract: Advanced silicon photonic technologies enable integrated optical sensing and communication (IOSAC) in real time for the emerging application requirements of simultaneous sensing and communication for next-generation networks. Here, we propose and demonstrate the IOSAC system on the silicon nitride (SiN) photonics platform. The IOSAC devices based on microring resonators are capable of monitoring t… ▽ More Advanced silicon photonic technologies enable integrated optical sensing and communication (IOSAC) in real time for the emerging application requirements of simultaneous sensing and communication for next-generation networks. Here, we propose and demonstrate the IOSAC system on the silicon nitride (SiN) photonics platform. The IOSAC devices based on microring resonators are capable of monitoring the variation of analytes, transmitting the information to the terminal along with the modulated optical signal in real-time, and replacing bulk optics in high-precision and high-speed applications. By directly integrating SiN ring resonators with optical communication networks, simultaneous sensing and optical communication are demonstrated by an optical signal transmission experimental system using especially filtering amplified spontaneous emission spectra. The refractive index (RI) sensing ring with a sensitivity of 172 nm/RIU, a figure of merit (FOM) of 1220, and a detection limit (DL) of 8.2*10-6 RIU is demonstrated. Simultaneously, the 1.25 Gbps optical on-off-keying (OOK) signal is transmitted at the concentration of different NaCl solutions, which indicates the bit-error-ratio (BER) decreases with the increase in concentration. The novel IOSAC technology shows the potential to realize high-performance simultaneous biosensing and communication in real time and further accelerate the development of IoT and 6G networks. △ Less

Submitted 27 June, 2023; originally announced July 2023.

Comments: 11pages, 5 figutres

arXiv:2306.16556 [pdf, other]

Inter-Rater Uncertainty Quantification in Medical Image Segmentation via Rater-Specific Bayesian Neural Networks

Authors: Qingqiao Hu, Hao Wang, **g Luo, Yunhao Luo, Zhiheng Zhangg, Jan S. Kirschke, Benedikt Wiestler, Bjoern Menze, Jianguo Zhang, Hongwei Bran Li

Abstract: Automated medical image segmentation inherently involves a certain degree of uncertainty. One key factor contributing to this uncertainty is the ambiguity that can arise in determining the boundaries of a target region of interest, primarily due to variations in image appearance. On top of this, even among experts in the field, different opinions can emerge regarding the precise definition of spec… ▽ More Automated medical image segmentation inherently involves a certain degree of uncertainty. One key factor contributing to this uncertainty is the ambiguity that can arise in determining the boundaries of a target region of interest, primarily due to variations in image appearance. On top of this, even among experts in the field, different opinions can emerge regarding the precise definition of specific anatomical structures. This work specifically addresses the modeling of segmentation uncertainty, known as inter-rater uncertainty. Its primary objective is to explore and analyze the variability in segmentation outcomes that can occur when multiple experts in medical imaging interpret and annotate the same images. We introduce a novel Bayesian neural network-based architecture to estimate inter-rater uncertainty in medical image segmentation. Our approach has three key advancements. Firstly, we introduce a one-encoder-multi-decoder architecture specifically tailored for uncertainty estimation, enabling us to capture the rater-specific representation of each expert involved. Secondly, we propose Bayesian modeling for the new architecture, allowing efficient capture of the inter-rater distribution, particularly in scenarios with limited annotations. Lastly, we enhance the rater-specific representation by integrating an attention module into each decoder. This module facilitates focused and refined segmentation results for each rater. We conduct extensive evaluations using synthetic and real-world datasets to validate our technical innovations rigorously. Our method surpasses existing baseline methods in five out of seven diverse tasks on the publicly available \emph{QUBIQ} dataset, considering two evaluation metrics encompassing different uncertainty aspects. Our codes, models, and the new dataset are available through our GitHub repository: https://github.com/HaoWang420/bOEMD-net . △ Less

Submitted 25 August, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: submitted to a journal for review

arXiv:2306.15753 [pdf]

Integrated Simulation Platform for Quantifying the Traffic-Induced Environmental and Health Impacts

Authors: Xuanpeng Zhao, Guoyuan Wu, Akula Venkatram, Ji Luo, Peng Hao, Kanok Boriboonsomsin, Shaohua Hu

Abstract: Air quality and human exposure to mobile source pollutants have become major concerns in urban transportation. Existing studies mainly focus on mitigating traffic congestion and reducing carbon footprints, with limited understanding of traffic-related health impacts from the environmental justice perspective. To address this gap, we present an innovative integrated simulation platform that models… ▽ More Air quality and human exposure to mobile source pollutants have become major concerns in urban transportation. Existing studies mainly focus on mitigating traffic congestion and reducing carbon footprints, with limited understanding of traffic-related health impacts from the environmental justice perspective. To address this gap, we present an innovative integrated simulation platform that models traffic-related air quality and human exposure at the microscopic level. The platform consists of five modules: SUMO for traffic modeling, MOVES for emissions modeling, a 3D grid-based dispersion model, a Matlab-based concentration visualizer, and a human exposure model. Our case study on multi-modal mobility on-demand services demonstrates that a distributed pickup strategy can reduce human cancer risk associated with PM2.5 by 33.4% compared to centralized pickup. Our platform offers quantitative results of traffic-related air quality and health impacts, useful for evaluating environmental issues and improving transportation systems management and operations strategies. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 35 pages, 11 figures

arXiv:2306.13843 [pdf, other]

Score-based Generative Models for Photoacoustic Image Reconstruction with Rotation Consistency Constraints

Authors: Shangqing Tong, Hengrong Lan, Liming Nie, Jianwen Luo, Fei Gao

Abstract: Photoacoustic tomography (PAT) is a newly emerged imaging modality which enables both high optical contrast and acoustic depth of penetration. Reconstructing images of photoacoustic tomography from limited amount of senser data is among one of the major challenges in photoacoustic imaging. Previous works based on deep learning were trained in supervised fashion, which directly map the input partia… ▽ More Photoacoustic tomography (PAT) is a newly emerged imaging modality which enables both high optical contrast and acoustic depth of penetration. Reconstructing images of photoacoustic tomography from limited amount of senser data is among one of the major challenges in photoacoustic imaging. Previous works based on deep learning were trained in supervised fashion, which directly map the input partially known sensor data to the ground truth reconstructed from full field of view. Recently, score-based generative models played an increasingly significant role in generative modeling. Leveraging this probabilistic model, we proposed Rotation Consistency Constrained Score-based Generative Model (RCC-SGM), which recovers the PAT images by iterative sampling between Langevin dynamics and a constraint term utilizing the rotation consistency between the images and the measurements. Our proposed method can generalize to different measurement processes (32.29 PSNR with 16 measurements under random sampling, whereas 28.50 for supervised counterpart), while supervised methods need to train on specific inverse map**s. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2305.16789 [pdf, other]

Modulate Your Spectrum in Self-Supervised Learning

Authors: Xi Weng, Yunhao Ni, Tengwei Song, Jie Luo, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan, Lei Huang

Abstract: Whitening loss offers a theoretical guarantee against feature collapse in self-supervised learning (SSL) with joint embedding architectures. Typically, it involves a hard whitening approach, transforming the embedding and applying loss to the whitened output. In this work, we introduce Spectral Transformation (ST), a framework to modulate the spectrum of embedding and to seek for functions beyond… ▽ More Whitening loss offers a theoretical guarantee against feature collapse in self-supervised learning (SSL) with joint embedding architectures. Typically, it involves a hard whitening approach, transforming the embedding and applying loss to the whitened output. In this work, we introduce Spectral Transformation (ST), a framework to modulate the spectrum of embedding and to seek for functions beyond whitening that can avoid dimensional collapse. We show that whitening is a special instance of ST by definition, and our empirical investigations unveil other ST instances capable of preventing collapse. Additionally, we propose a novel ST instance named IterNorm with trace loss (INTL). Theoretical analysis confirms INTL's efficacy in preventing collapse and modulating the spectrum of embedding toward equal-eigenvalues during optimization. Our experiments on ImageNet classification and COCO object detection demonstrate INTL's potential in learning superior representations. The code is available at https://github.com/winci-ai/INTL. △ Less

Submitted 21 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted at ICLR 2024. The code is available at https://github.com/winci-ai/intl

arXiv:2305.06459 [pdf, other]

SlicerTMS: Real-Time Visualization of Transcranial Magnetic Stimulation for Mental Health Treatment

Authors: Loraine Franke, Tae Young Park, Jie Luo, Yogesh Rathi, Steve Pieper, Lipeng Ning, Daniel Haehn

Abstract: We present a real-time visualization system for Transcranial Magnetic Stimulation (TMS), a non-invasive neuromodulation technique for treating various brain disorders and mental health diseases. Our solution targets the current challenges of slow and labor-intensive practices in treatment planning. Integrating Deep Learning (DL), our system rapidly predicts electric field (E-field) distributions i… ▽ More We present a real-time visualization system for Transcranial Magnetic Stimulation (TMS), a non-invasive neuromodulation technique for treating various brain disorders and mental health diseases. Our solution targets the current challenges of slow and labor-intensive practices in treatment planning. Integrating Deep Learning (DL), our system rapidly predicts electric field (E-field) distributions in 0.2 seconds for precise and effective brain stimulation. The core advancement lies in our tool's real-time neuronavigation visualization capabilities, which support clinicians in making more informed decisions quickly and effectively. We assess our system's performance through three studies: First, a real-world use case scenario in a clinical setting, providing concrete feedback on applicability and usability in a practical environment. Second, a comparative analysis with another TMS tool focusing on computational efficiency across various hardware platforms. Lastly, we conducted an expert user study to measure usability and influence in optimizing TMS treatment planning. The system is openly available for community use and further development on GitHub: \url{https://github.com/lorifranke/SlicerTMS}. △ Less

Submitted 12 March, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures, 2 tables, MICCAI

arXiv:2304.06883 [pdf, other]

Intelligent Reflecting Surface Aided Wireless Communication Systems: Joint Location and Passive Beamforming Design

Authors: **tao Luo, Sixing Yin

Abstract: Intelligent reflecting surface (IRS) has been widely studied in recent years, it has emerged as a new technology which can reflect the incident signal by intelligently configuring the reflection elements, thus changing the signal propagation environment, enhancing the signals users desire and suppressing the interference between users. In this paper, we study an IRS aided multi-users wireless comm… ▽ More Intelligent reflecting surface (IRS) has been widely studied in recent years, it has emerged as a new technology which can reflect the incident signal by intelligently configuring the reflection elements, thus changing the signal propagation environment, enhancing the signals users desire and suppressing the interference between users. In this paper, we study an IRS aided multi-users wireless communication where the base station (BS) sends a variety of signals, each user receives desired signals. In order to guarantee the fairness of wireless communications, we need to maximize the minimum rates of users, subject to the power constraint of BS and the phase constraint of IRS. Prior works on IRS mainly consider optimizing BS beamforming and IRS passive beamforming, this paper also aims to optimize the IRS location. The considered problem is shown to be non-convex, we decompose the problem into two subproblems, transforming the two subproblems into a lower bound problem and using alternating optimization (AO) and successive convex approximation (SCA) to solve them, respectively. Finally, the two subproblems are optimized alternately to make the objective function value converge in an acceptable range. Simulation results verify the convergence results of our proposed algorithm, and the performance improvement compared with the benchmark scheme in wireless communication system. △ Less

Submitted 5 May, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: Following the publication of our work, we identified errors in our data analysis process. To uphold the standards of academic integrity and the accuracy of our findings, we feel it necessary to withdraw the current version of our paper. We plan to submit a revised version upon thorough review and correction of these errors

arXiv:2303.15686 [pdf, other]

Multi-band Reconfigurable Holographic Surface Based ISAC Systems: Design and Optimization

Authors: **gzhi Hu, Zhe Chen, Jun Luo

Abstract: Metamaterial-based reconfigurable holographic surfaces (RHSs) have been proposed as novel cost-efficient antenna arrays, which are promising for improving the positioning and communication performance of integrated sensing and communications (ISAC) systems. However, due to the high frequency selectivity of the metamaterial elements, RHSs face challenges in supporting ultra-wide bandwidth (UWB), wh… ▽ More Metamaterial-based reconfigurable holographic surfaces (RHSs) have been proposed as novel cost-efficient antenna arrays, which are promising for improving the positioning and communication performance of integrated sensing and communications (ISAC) systems. However, due to the high frequency selectivity of the metamaterial elements, RHSs face challenges in supporting ultra-wide bandwidth (UWB), which significantly limits the positioning precision. In this paper, to avoid the physical limitations of UWB RHS while enhancing the performance of RHS-based ISAC systems, we propose a multi-band (MB) based ISAC system. We analyze its positioning precision and propose an efficient algorithm to optimize the large number of variables in analog and digital beamforming. Through comparison with benchmark results, simulation results verify the efficiency of our proposed system and algorithm, and show that the system achieves $42\%$ less positioning error, which reduces $82\%$ communication capacity loss. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures, IEEE ICC

arXiv:2303.07687 [pdf, other]

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Authors: Xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, **g Xiao

Abstract: Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic progr… ▽ More Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. The AXE ignores the absolute position alignment between prediction and ground truth sentence and focuses on tokens matching in relative order. The dynamic rectification method makes the model capable of simulating the non-mask but possible wrong tokens, even if they have high confidence. Our experiments on WSJ dataset demonstrated that not only AXE loss but also the rectification method could improve the WER performance of Mask CTC. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2303.02666 [pdf, other]

Learned Lossless Compression for JPEG via Frequency-Domain Prediction

Authors: Jixiang Luo, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

Abstract: JPEG images can be further compressed to enhance the storage and transmission of large-scale image datasets. Existing learned lossless compressors for RGB images cannot be well transferred to JPEG images due to the distinguishing distribution of DCT coefficients and raw pixels. In this paper, we propose a novel framework for learned lossless compression of JPEG images that achieves end-to-end opti… ▽ More JPEG images can be further compressed to enhance the storage and transmission of large-scale image datasets. Existing learned lossless compressors for RGB images cannot be well transferred to JPEG images due to the distinguishing distribution of DCT coefficients and raw pixels. In this paper, we propose a novel framework for learned lossless compression of JPEG images that achieves end-to-end optimized prediction of the distribution of decoded DCT coefficients. To enable learning in the frequency domain, DCT coefficients are partitioned into groups to utilize implicit local redundancy. An autoencoder-like architecture is designed based on the weight-shared blocks to realize entropy modeling of grouped DCT coefficients and independently compress the priors. We attempt to realize learned lossless compression of JPEG images in the frequency domain. Experimental results demonstrate that the proposed framework achieves superior or comparable performance in comparison to most recent lossless compressors with handcrafted context modeling for JPEG images. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2302.02447 [pdf, other]

cross-modal fusion techniques for utterance-level emotion recognition from text and speech

Authors: Jiachen Luo, Huy Phan, Joshua Reiss

Abstract: Multimodal emotion recognition (MER) is a fundamental complex research problem due to the uncertainty of human emotional expression and the heterogeneity gap between different modalities. Audio and text modalities are particularly important for a human participant in understanding emotions. Although many successful attempts have been designed multimodal representations for MER, there still exist m… ▽ More Multimodal emotion recognition (MER) is a fundamental complex research problem due to the uncertainty of human emotional expression and the heterogeneity gap between different modalities. Audio and text modalities are particularly important for a human participant in understanding emotions. Although many successful attempts have been designed multimodal representations for MER, there still exist multiple challenges to be addressed: 1) bridging the heterogeneity gap between multimodal features and model inter- and intra-modal interactions of multiple modalities; 2) effectively and efficiently modelling the contextual dynamics in the conversation sequence. In this paper, we propose Cross-Modal RoBERTa (CM-RoBERTa) model for emotion detection from spoken audio and corresponding transcripts. As the core unit of the CM-RoBERTa, parallel self- and cross- attention is designed to dynamically capture inter- and intra-modal interactions of audio and text. Specially, the mid-level fusion and residual module are employed to model long-term contextual dependencies and learn modality-specific patterns. We evaluate the approach on the MELD dataset and the experimental results show the proposed approach achieves the state-of-art performance on the dataset. △ Less

Submitted 5 February, 2023; originally announced February 2023.

Comments: 6 pages, 2 figures

arXiv:2302.02419 [pdf, other]

deep learning of segment-level feature representation for speech emotion recognition in conversations

Authors: Jiachen Luo, Huy Phan, Joshua Reiss

Abstract: Accurately detecting emotions in conversation is a necessary yet challenging task due to the complexity of emotions and dynamics in dialogues. The emotional state of a speaker can be influenced by many different factors, such as interlocutor stimulus, dialogue scene, and topic. In this work, we propose a conversational speech emotion recognition method to deal with capturing attentive contextual d… ▽ More Accurately detecting emotions in conversation is a necessary yet challenging task due to the complexity of emotions and dynamics in dialogues. The emotional state of a speaker can be influenced by many different factors, such as interlocutor stimulus, dialogue scene, and topic. In this work, we propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions. First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances. Second, an attentive bi-directional gated recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly in a dynamic manner. The experiments conducted on the standard conversational dataset MELD demonstrate the effectiveness of the proposed method when compared against state-of the-art methods. △ Less

Submitted 5 February, 2023; originally announced February 2023.

Comments: 6 pages, 4 figures

arXiv:2301.06681 [pdf]

Cross-domain Self-supervised Framework for Photoacoustic Computed Tomography Image Reconstruction

Authors: Hengrong Lan, Lijie Huang, Zhiqiang Li, **g Lv, Jianwen Luo

Abstract: Accurate image reconstruction is crucial for photoacoustic (PA) computed tomography (PACT). Recently, deep learning has been used to reconstruct the PA image with a supervised scheme, which requires high-quality images as ground truth labels. In practice, there are inevitable trade-offs between cost and performance since the use of more channels is an expensive strategy to access more measurements… ▽ More Accurate image reconstruction is crucial for photoacoustic (PA) computed tomography (PACT). Recently, deep learning has been used to reconstruct the PA image with a supervised scheme, which requires high-quality images as ground truth labels. In practice, there are inevitable trade-offs between cost and performance since the use of more channels is an expensive strategy to access more measurements. Here, we propose a cross-domain unsupervised reconstruction (CDUR) strategy with a pure transformer model, which overcomes the lack of ground truth labels from limited PA measurements. The proposed approach exploits the equivariance of PACT to achieve high performance with a smaller number of channels. We implement a self-supervised reconstruction in a model-based form. Meanwhile, we also leverage the self-supervision to enforce the measurement and image consistency on three partitions of measured PA data, by randomly masking different channels. We find that dynamically masking a high proportion of the channels, e.g., 80%, yields nontrivial self-supervisors in both image and signal domains, which decrease the multiplicity of the pseudo solution to efficiently reconstruct the image from fewer PA measurements with minimum error of the image. Experimental results on in-vivo PACT dataset of mice demonstrate the potential of our unsupervised framework. In addition, our method shows a high performance (0.83 structural similarity index (SSIM) in the extreme sparse case with 13 channels), which is close to that of supervised scheme (0.77 SSIM with 16 channels). On top of all the advantages, our method may be deployed on different trainable models in an end-to-end manner. △ Less

Submitted 20 September, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

arXiv:2212.10431 [pdf, other]

QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity

Authors: Siyu Huang, Jie An, Donglai Wei, Jiebo Luo, Hanspeter Pfister

Abstract: The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style. However, this type of approach cannot guarantee visual fidelity, i.e., the generated artworks should be indistinguishable from real ones. In this paper, we devise a new style transfer framework called QuantArt for high visual-fi… ▽ More The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style. However, this type of approach cannot guarantee visual fidelity, i.e., the generated artworks should be indistinguishable from real ones. In this paper, we devise a new style transfer framework called QuantArt for high visual-fidelity stylization. QuantArt pushes the latent representation of the generated artwork toward the centroids of the real artwork distribution with vector quantization. By fusing the quantized and continuous latent representations, QuantArt allows flexible control over the generated artworks in terms of content preservation, style similarity, and visual fidelity. Experiments on various style transfer settings show that our QuantArt framework achieves significantly higher visual fidelity compared with the existing style transfer methods. △ Less

Submitted 5 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted to CVPR 2023. Code is available at https://github.com/siyuhuang/QuantArt

arXiv:2211.07357 [pdf, other]

Controlling Commercial Cooling Systems Using Reinforcement Learning

Authors: Jerry Luo, Cosmin Paduraru, Octavian Voicu, Yuri Chervonyi, Scott Munns, Jerry Li, Crystal Qian, Praneet Dutta, Jared Quincy Davis, Ningjia Wu, Xingwei Yang, Chu-Ming Chang, Ted Li, Rob Rose, Mingyan Fan, Hootan Nakhost, Tinglin Liu, Brian Kirkman, Frank Altamura, Lee Cline, Patrick Tonker, Joel Gouker, Dave Uden, Warren Buddy Bryan, Jason Law , et al. (11 additional authors not shown)

Abstract: This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments ha… ▽ More This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites. △ Less

Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 27 pages, 11 figures

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, **hua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, **gfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2209.12265 [pdf, other]

Cooperative Sensing and Heterogeneous Information Fusion in VCPS: A Multi-agent Deep Reinforcement Learning Approach

Authors: Xincao Xu, Kai Liu, Penglin Dai, Ruitao Xie, **g**g Cao, Jiangtao Luo

Abstract: Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-i… ▽ More Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-infrastructure (V2I) communications in vehicular edge computing (VEC). Logical views are constructed by fusing the heterogeneous information at edge nodes. Further, we formulate the problem by deriving a cooperative sensing model based on the multi-class M/G/1 priority queue, and defining the AoV by modeling the timeliness, completeness and consistency of the logical views. On this basis, a multi-agent deep reinforcement learning solution is proposed. In particular, the system state includes vehicle sensed information, edge cached information and view requirements. The vehicle action space consists of the sensing frequencies and uploading priorities of information. A difference-reward-based credit assignment is designed to divide the system reward, which is defined as the VCPS quality, into the difference reward for vehicles. Edge node allocates V2I bandwidth to vehicles based on predicted vehicle trajectories and view requirements. Finally, we build the simulation model and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solution. △ Less

Submitted 27 January, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.08112 [pdf, other]

Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning

Authors: William Wong, Praneet Dutta, Octavian Voicu, Yuri Chervonyi, Cosmin Paduraru, Jerry Luo

Abstract: Reinforcement learning (RL) techniques have been developed to optimize industrial cooling systems, offering substantial energy savings compared to traditional heuristic policies. A major challenge in industrial control involves learning behaviors that are feasible in the real world due to machinery constraints. For example, certain actions can only be executed every few hours while other actions c… ▽ More Reinforcement learning (RL) techniques have been developed to optimize industrial cooling systems, offering substantial energy savings compared to traditional heuristic policies. A major challenge in industrial control involves learning behaviors that are feasible in the real world due to machinery constraints. For example, certain actions can only be executed every few hours while other actions can be taken more frequently. Without extensive reward engineering and experimentation, an RL agent may not learn realistic operation of machinery. To address this, we use hierarchical reinforcement learning with multiple agents that control subsets of actions according to their operation time scales. Our hierarchical approach achieves energy savings over existing baselines while maintaining constraints such as operating chillers within safe bounds in a simulated HVAC control environment. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 11 pages, 5 figures

arXiv:2207.14524 [pdf, other]

Evaluating the Practicality of Learned Image Compression

Authors: Hongjiu Yu, Qiancheng Sun, ** Hu, Xingyuan Xue, Jixiang Luo, Dailan He, Yilong Li, Pengbo Wang, Yuanyuan Wang, Yaxu Dai, Yan Wang, Hongwei Qin

Abstract: Learned image compression has achieved extraordinary rate-distortion performance in PSNR and MS-SSIM compared to traditional methods. However, it suffers from intensive computation, which is intolerable for real-world applications and leads to its limited industrial application for now. In this paper, we introduce neural architecture search (NAS) to designing more efficient networks with lower lat… ▽ More Learned image compression has achieved extraordinary rate-distortion performance in PSNR and MS-SSIM compared to traditional methods. However, it suffers from intensive computation, which is intolerable for real-world applications and leads to its limited industrial application for now. In this paper, we introduce neural architecture search (NAS) to designing more efficient networks with lower latency, and leverage quantization to accelerate the inference process. Meanwhile, efforts in engineering like multi-threading and SIMD have been made to improve efficiency. Optimized using a hybrid loss of PSNR and MS-SSIM for better visual quality, we obtain much higher MS-SSIM than JPEG, JPEG XL and AVIF over all bit rates, and PSNR between that of JPEG XL and AVIF. Our software implementation of LIC achieves comparable or even faster inference speed compared to jpeg-turbo while being multiple times faster than JPEG XL and AVIF. Besides, our implementation of LIC reaches stunning throughput of 145 fps for encoding and 208 fps for decoding on a Tesla T4 GPU for 1080p images. On CPU, the latency of our implementation is comparable with JPEG XL. △ Less

Submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.03605 [pdf, other]

Learning-based Autonomous Channel Access in the Presence of Hidden Terminals

Authors: Yulin Shao, Yucheng Cai, Taotao Wang, Ziyang Guo, Peng Liu, Jiajun Luo, Deniz Gunduz

Abstract: We consider the problem of autonomous channel access (AutoCA), where a group of terminals tries to discover a communication strategy with an access point (AP) via a common wireless channel in a distributed fashion. Due to the irregular topology and the limited communication range of terminals, a practical challenge for AutoCA is the hidden terminal problem, which is notorious in wireless networks… ▽ More We consider the problem of autonomous channel access (AutoCA), where a group of terminals tries to discover a communication strategy with an access point (AP) via a common wireless channel in a distributed fashion. Due to the irregular topology and the limited communication range of terminals, a practical challenge for AutoCA is the hidden terminal problem, which is notorious in wireless networks for deteriorating the throughput and delay performances. To meet the challenge, this paper presents a new multi-agent deep reinforcement learning paradigm, dubbed MADRL-HT, tailored for AutoCA in the presence of hidden terminals. MADRL-HT exploits topological insights and transforms the observation space of each terminal into a scalable form independent of the number of terminals. To compensate for the partial observability, we put forth a look-back mechanism such that the terminals can infer behaviors of their hidden terminals from the carrier sensed channel states as well as feedback from the AP. A window-based global reward function is proposed, whereby the terminals are instructed to maximize the system throughput while balancing the terminals' transmission opportunities over the course of learning. Extensive numerical experiments verified the superior performance of our solution benchmarked against the legacy carrier-sense multiple access with collision avoidance (CSMA/CA) protocol. △ Less

Submitted 2 December, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Keywords: multiple channel access, hidden terminal, multi-agent deep reinforcement learning, Wi-Fi, proximal policy optimization

arXiv:2206.13689 [pdf, other]

Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation

Authors: Jian Luo, Jianzong Wang, Ning Cheng, Edward Xiao, Xulong Zhang, **g Xiao

Abstract: Time-domain Transformer neural networks have proven their superiority in speech separation tasks. However, these models usually have a large number of network parameters, thus often encountering the problem of GPU memory explosion. In this paper, we proposed Tiny-Sepformer, a tiny version of Transformer network for speech separation. We present two techniques to reduce the model parameters and mem… ▽ More Time-domain Transformer neural networks have proven their superiority in speech separation tasks. However, these models usually have a large number of network parameters, thus often encountering the problem of GPU memory explosion. In this paper, we proposed Tiny-Sepformer, a tiny version of Transformer network for speech separation. We present two techniques to reduce the model parameters and memory consumption: (1) Convolution-Attention (CA) block, spliting the vanilla Transformer to two paths, multi-head attention and 1D depthwise separable convolution, (2) parameter sharing, sharing the layer parameters within the CA block. In our experiments, Tiny-Sepformer could greatly reduce the model size, and achieves comparable separation performance with vanilla Sepformer on WSJ0-2/3Mix datasets. △ Less

Submitted 30 June, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted by Interspeech 2022

arXiv:2206.10499 [pdf, other]

doi 10.1109/LWC.2022.3186160

A Learning Aided Flexible Gradient Descent Approach to MISO Beamforming

Authors: Zhixiong Yang, **g-Yuan Xia, Junshan Luo, Shuanghui Zhang, Deniz Gündüz

Abstract: This paper proposes a learning aided gradient descent (LAGD) algorithm to solve the weighted sum rate (WSR) maximization problem for multiple-input single-output (MISO) beamforming. The proposed LAGD algorithm directly optimizes the transmit precoder through implicit gradient descent based iterations, at each of which the optimization strategy is determined by a neural network, and thus, is dynami… ▽ More This paper proposes a learning aided gradient descent (LAGD) algorithm to solve the weighted sum rate (WSR) maximization problem for multiple-input single-output (MISO) beamforming. The proposed LAGD algorithm directly optimizes the transmit precoder through implicit gradient descent based iterations, at each of which the optimization strategy is determined by a neural network, and thus, is dynamic and adaptive. At each instance of the problem, this network is initialized randomly, and updated throughout the iterative solution process. Therefore, the LAGD algorithm can be implemented at any signal-to-noise ratio (SNR) and for arbitrary antenna/user numbers, does not require labelled data or training prior to deployment. Numerical results show that the LAGD algorithm can outperform of the well-known WMMSE algorithm as well as other learning-based solutions with a modest computational complexity. Our code is available at https://github.com/XiaGroup/LAGD. △ Less

Submitted 25 July, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Journal ref: [J]. IEEE Wireless Communications Letters, 2022

arXiv:2205.14501 [pdf, other]

PO-ELIC: Perception-Oriented Efficient Learned Image Coding

Authors: Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, Yan Wang

Abstract: In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, w… ▽ More In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, we propose PO-ELIC, i.e., Perception-Oriented Efficient Learned Image Coding. To be specific, we adapt ELIC, one of the state-of-the-art LIC models, with adversarial training techniques. We apply a mixture of losses including hinge-form adversarial loss, Charbonnier loss, and style loss, to finetune the model towards better perceptual quality. Experimental results demonstrate that our method achieves comparable perceptual quality with HiFiC with much lower bitrate. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: CVPR2022 Workshop, 5-th CLIC Image Compression Track

arXiv:2205.14329 [pdf, other]

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

Authors: Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, **g Xiao

Abstract: In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to conduct the KWS task. CNN layers focus on the local acoustic features, and attention layers model the long-time dependency. To improve the robustness o… ▽ More In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to conduct the KWS task. CNN layers focus on the local acoustic features, and attention layers model the long-time dependency. To improve the robustness of KWS model, we also proposed an unsupervised learning method. The unsupervised loss is based on the similarity between the original and augmented speech features, as well as the audio reconstructing information. Two speech augmentation methods are explored in the unsupervised learning: speed and intensity. The experiments on Google Speech Commands V2 Dataset demonstrated that our CNN-Attention model has competitive results. Moreover, the augmentation based unsupervised learning could further improve the classification accuracy of KWS task. In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: accepted by WCCI 2022

arXiv:2205.14326 [pdf, other]

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

Authors: Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, **g Xiao

Abstract: Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts of training data. The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language. In this work, we introduced an adaptive activation network to the upper layers of ASR m… ▽ More Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts of training data. The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language. In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, (2) multilingual learning, jointly training the Connectionist Temporal Classification (CTC) loss of each language and the relevance of different languages. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods. In addition, combining the cross-lingual learning and multilingual learning together could further improve the performance of multilingual speech recognition. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: accepted by WCCI 2022

arXiv:2205.01278 [pdf, other]

doi 10.1109/TITS.2023.3243940

Real-time Cooperative Vehicle Coordination at Unsignalized Road Intersections

Authors: Ji** Luo, Tingting Zhang, Rui Hao, Donglin Li, Chunsheng Chen, Zhenyu Na, Qinyu Zhang

Abstract: Cooperative coordination at unsignalized road intersections, which aims to improve the driving safety and traffic throughput for connected and automated vehicles, has attracted increasing interests in recent years. However, most existing investigations either suffer from computational complexity or cannot harness the full potential of the road infrastructure. To this end, we first present a dedica… ▽ More Cooperative coordination at unsignalized road intersections, which aims to improve the driving safety and traffic throughput for connected and automated vehicles, has attracted increasing interests in recent years. However, most existing investigations either suffer from computational complexity or cannot harness the full potential of the road infrastructure. To this end, we first present a dedicated intersection coordination framework, where the involved vehicles hand over their control authorities and follow instructions from a centralized coordinator. Then a unified cooperative trajectory optimization problem will be formulated to maximize the traffic throughput while ensuring the driving safety and long-term stability of the coordination system. To address the key computational challenges in the real-world deployment, we reformulate this non-convex sequential decision problem into a model-free Markov Decision Process (MDP) and tackle it by devising a Twin Delayed Deep Deterministic Policy Gradient (TD3)-based strategy in the deep reinforcement learning (DRL) framework. Simulation and practical experiments show that the proposed strategy could achieve near-optimal performance in sub-static coordination scenarios and significantly improve the traffic throughput in the realistic continuous traffic flow. The most remarkable advantage is that our strategy could reduce the time complexity of computation to milliseconds, and is shown scalable when the road lanes increase. △ Less

Submitted 22 March, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: in IEEE Transactions on Intelligent Transportation Systems

arXiv:2204.05649 [pdf, other]

ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

Authors: Zi Huang, Shulei Ji, Zhilan Hu, Chuangjian Cai, **g Luo, Xinyu Yang

Abstract: Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module… ▽ More Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43% and 4.82% relative improvement of valence and arousal respectively on the R2 score compared to the state-of-the-art model, meanwhile, performs better on datasets with distinct scales and in multi-task learning. △ Less

Submitted 30 June, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

Comments: It has been received by Interspeech2022

arXiv:2203.10645 [pdf, other]

Breast Cancer Induced Bone Osteolysis Prediction Using Temporal Variational Auto-Encoders

Authors: Wei Xiong, Neil Yeung, Shubo Wang, Haofu Liao, Liyun Wang, Jiebo Luo

Abstract: Objective and Impact Statement. We adopt a deep learning model for bone osteolysis prediction on computed tomography (CT) images of murine breast cancer bone metastases. Given the bone CT scans at previous time steps, the model incorporates the bone-cancer interactions learned from the sequential images and generates future CT images. Its ability of predicting the development of bone lesions in ca… ▽ More Objective and Impact Statement. We adopt a deep learning model for bone osteolysis prediction on computed tomography (CT) images of murine breast cancer bone metastases. Given the bone CT scans at previous time steps, the model incorporates the bone-cancer interactions learned from the sequential images and generates future CT images. Its ability of predicting the development of bone lesions in cancer-invading bones can assist in assessing the risk of impending fractures and choosing proper treatments in breast cancer bone metastasis. Introduction. Breast cancer often metastasizes to bone, causes osteolytic lesions, and results in skeletal related events (SREs) including severe pain and even fatal fractures. Although current imaging techniques can detect macroscopic bone lesions, predicting the occurrence and progression of bone lesions remains a challenge. Methods. We adopt a temporal variational auto-encoder (T-VAE) model that utilizes a combination of variational auto-encoders and long short-term memory networks to predict bone lesion emergence on our micro-CT dataset containing sequential images of murine tibiae. Given the CT scans of murine tibiae at early weeks, our model can learn the distribution of their future states from data. Results. We test our model against other deep learning-based prediction models on the bone lesion progression prediction task. Our model produces much more accurate predictions than existing models under various evaluation metrics. Conclusion. We develop a deep learning framework that can accurately predict and visualize the progression of osteolytic bone lesions. It will assist in planning and evaluating treatment strategies to prevent SREs in breast cancer patients. △ Less

Submitted 28 March, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: 18 pages

arXiv:2201.04302 [pdf, other]

De-Noising of Photoacoustic Microscopy Images by Deep Learning

Authors: Da He, Jiasheng Zhou, Xiaoyu Shang, Jiajia Luo, Sung-Liang Chen

Abstract: As a hybrid imaging technology, photoacoustic microscopy (PAM) imaging suffers from noise due to the maximum permissible exposure of laser intensity, attenuation of ultrasound in the tissue, and the inherent noise of the transducer. De-noising is a post-processing method to reduce noise, and PAM image quality can be recovered. However, previous de-noising techniques usually heavily rely on mathema… ▽ More As a hybrid imaging technology, photoacoustic microscopy (PAM) imaging suffers from noise due to the maximum permissible exposure of laser intensity, attenuation of ultrasound in the tissue, and the inherent noise of the transducer. De-noising is a post-processing method to reduce noise, and PAM image quality can be recovered. However, previous de-noising techniques usually heavily rely on mathematical priors as well as manually selected parameters, resulting in unsatisfactory and slow de-noising performance for different noisy images, which greatly hinders practical and clinical applications. In this work, we propose a deep learning-based method to remove complex noise from PAM images without mathematical priors and manual selection of settings for different input images. An attention enhanced generative adversarial network is used to extract image features and remove various noises. The proposed method is demonstrated on both synthetic and real datasets, including phantom (leaf veins) and in vivo (mouse ear blood vessels and zebrafish pigment) experiments. The results show that compared with previous PAM de-noising methods, our method exhibits good performance in recovering images qualitatively and quantitatively. In addition, the de-noising speed of 0.016 s is achieved for an image with $256\times256$ pixels. Our approach is effective and practical for the de-noising of PAM images. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: 12 pages, 8 figures

arXiv:2112.08744 [pdf, other]

Distributed Nash Equilibrium Seeking for Noncooperative Games of High-Order Nonlinear Multi-Agent Systems Over Weight-Unbalanced Digraphs

Authors: Zhenhua Deng, ** Luo

Abstract: In this paper, we investigate the noncooperative games of multi-agent systems. Different from existing noncooperative games, our formulation involves the high-order nonlinear dynamics of players, and the communication topologies among players are weight-unbalanced digraphs. Due to the high-order nonlinear dynamics and the weight-unbalanced digraphs, existing Nash equilibrium seeking algorithms can… ▽ More In this paper, we investigate the noncooperative games of multi-agent systems. Different from existing noncooperative games, our formulation involves the high-order nonlinear dynamics of players, and the communication topologies among players are weight-unbalanced digraphs. Due to the high-order nonlinear dynamics and the weight-unbalanced digraphs, existing Nash equilibrium seeking algorithms cannot solve our problem. In order to seek the Nash equilibrium of the noncooperative games, we propose two distributed algorithms based on state feedback and output feedback, respectively. Moreover, we analyze the convergence of the two algorithms with the help of variational analysis and Lyapunov stability theory. By the two algorithms, the high-order nonlinear players exponentially converge to the Nash equilibrium. Finally, two simulation examples illustrate the effectiveness of the algorithms. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2111.08195 [pdf, other]

doi 10.1145/3485730.3485932

MoRe-Fi: Motion-robust and Fine-grained Respiration Monitoring via Deep-Learning UWB Radar

Authors: Tianyue Zheng, Zhe Chen, Shujie Zhang, Chao Cai, Jun Luo

Abstract: Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static, largely confining their adoptions in… ▽ More Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static, largely confining their adoptions in everyday environments where body movements are inevitable. Fortunately, radio-frequency (RF) enabled contact-free sensing, though suffering motion interference inseparable by conventional filtering, may offer a potential to distill respiratory waveform with the help of deep learning. To realize this potential, we introduce MoRe-Fi to conduct fine-grained respiration monitoring under body movements. MoRe-Fi leverages an IR-UWB radar to achieve contact-free sensing, and it fully exploits the complex radar signal for data augmentation. The core of MoRe-Fi is a novel variational encoder-decoder network; it aims to single out the respiratory waveforms that are modulated by body movements in a non-linear manner. Our experiments with 12 subjects and 66-hour data demonstrate that MoRe-Fi accurately recovers respiratory waveform despite the interference caused by body movements. We also discuss potential applications of MoRe-Fi for pulmonary disease diagnoses. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 14 pages

Journal ref: SenSys '21: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems November 2021

arXiv:2111.04566 [pdf, other]

doi 10.1145/3384419.3430735

RF-Net: a Unified Meta-learning Framework for RF-enabled One-shot Human Activity Recognition

Authors: Shuya Ding, Zhe Chen, Tianyue Zheng, Jun Luo

Abstract: Radio-Frequency (RF) based device-free Human Activity Recognition (HAR) rises as a promising solution for many applications. However, device-free (or contactless) sensing is often more sensitive to environment changes than device-based (or wearable) sensing. Also, RF datasets strictly require on-line labeling during collection, starkly different from image and text data collections where human int… ▽ More Radio-Frequency (RF) based device-free Human Activity Recognition (HAR) rises as a promising solution for many applications. However, device-free (or contactless) sensing is often more sensitive to environment changes than device-based (or wearable) sensing. Also, RF datasets strictly require on-line labeling during collection, starkly different from image and text data collections where human interpretations can be leveraged to perform off-line labeling. Therefore, existing solutions to RF-HAR entail a laborious data collection process for adapting to new environments. To this end, we propose RF-Net as a meta-learning based approach to one-shot RF-HAR; it reduces the labeling efforts for environment adaptation to the minimum level. In particular, we first examine three representative RF sensing techniques and two major meta-learning approaches. The results motivate us to innovate in two designs: i) a dual-path base HAR network, where both time and frequency domains are dedicated to learning powerful RF features including spatial and attention-based temporal ones, and ii) a metric-based meta-learning framework to enhance the fast adaption capability of the base network, including an RF-specific metric module along with a residual classification module. We conduct extensive experiments based on all three RF sensing techniques in multiple real-world indoor environments; all results strongly demonstrate the efficacy of RF-Net compared with state-of-the-art baselines. △ Less

Submitted 28 October, 2021; originally announced November 2021.

Comments: 14 pages

Journal ref: SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor Systems, November 2020

Showing 1–50 of 113 results for author: Luo, J