Search | arXiv e-print repository

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

Authors: Meng Cui, Xubo Liu, Haohe Liu, **zheng Zhao, Daoliang Li, Wenwu Wang

Abstract: Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which are essential for optimizing production efficiency, enhancing fish welfare, and improving resource management. Previous reviews have focused on single… ▽ More Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which are essential for optimizing production efficiency, enhancing fish welfare, and improving resource management. Previous reviews have focused on single modalities, limiting their ability to address the diverse challenges encountered in these tasks comprehensively. This review provides a comprehensive analysis of the current state of aquaculture digital technologies, including vision-based, acoustic-based, and biosensor-based methods. We examine the advantages, limitations, and applications of these methods, highlighting recent advancements and identifying critical research gaps. The scarcity of comprehensive fish datasets and the lack of unified evaluation standards, which make it difficult to compare the performance of different technologies, are identified as major obstacles hindering progress in this field. To overcome current limitations and improve the accuracy, robustness, and efficiency of fish monitoring systems, we explore the potential of emerging technologies such as multimodal data fusion and deep learning. Additionally, we contribute to the field by providing a summary of existing datasets available for fish tracking, counting, and behaviour analysis. Future research directions are outlined, emphasizing the need for comprehensive datasets and evaluation standards to facilitate meaningful comparisons between technologies and promote their practical implementation in real-world aquaculture settings. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: under peer review

arXiv:2406.04158 [pdf, other]

Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets

Authors: Da Li, Guoqiang Zhao, Houjun Sun, Jiacheng Bao

Abstract: Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat… ▽ More Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar data. Consequently, imaging performance is limited, and acquiring full-aperture data for multi-baseline SAR is costly and sometimes impractical in real-world applications. In this paper, we propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images to reconstruct highly sparse multi-baseline SAR 3D images of vehicle targets into visually structured and high-resolution images. We meticulously designed the network architecture and training strategies to enhance network generalization capability. Remarkably, CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets, outperforming traditional sparse reconstruction algorithms based on compressed sensing and other learning-based methods. Additionally, using optical images as supervision provides a cost-effective way to build training datasets, reducing the difficulty of method dissemination. Our work showcases the broad prospects of deep learning in multi-baseline SAR 3D imaging and offers a novel path for researching radar imaging based on cross-modal learning theory. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.01010 [pdf, ps, other]

Joint Frame Structure, Beamwidth, and Power Allocation for UAV-Aided Localization and Communication

Authors: Tianhao. Liang, Tingting. Zhang, Sheng. Zhou, Wentao. Liu, Dong. Li, Qinyu. Zhang

Abstract: In wireless sensors networks, integrating localization and communications techniques is crucial for efficient spectrum and hardware utilization. In this paper, we present a novel framework of unmanned aerial vehicle (UAV)-aided localization and communication for ground node (GN), where the average spectral efficiency (SE) is used to reveal the intricate relationship among frame structure, channel… ▽ More In wireless sensors networks, integrating localization and communications techniques is crucial for efficient spectrum and hardware utilization. In this paper, we present a novel framework of unmanned aerial vehicle (UAV)-aided localization and communication for ground node (GN), where the average spectral efficiency (SE) is used to reveal the intricate relationship among frame structure, channel estimation error, and localization accuracy. In particular, we first derive the lower bounds for channel estimation error and the three dimensional location prediction error. Leveraging these comprehensive analysis, we formulate a problem to maximize the average SE in UAV-GN communication, where the frame structure, beamwidth and power allocation are jointly optimized. Subsequently, we propose an efficient iterative algorithm to address this non-convex problem with closed-form expressions for beamwidth and power allocation. Numerical results demonstrate that the performance of our proposed method can approach the upper bound with much lower complexity, and achieve over 70\% performance gain compared to non-localization benchmarks. Additionally, the analysis highlights the dominant impacts from the Doppler effect over noise on the average SE. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00372 [pdf]

Performance Evaluation of Dam** Systems in Civil Engineering Structures Via Minimal Sensor

Authors: Xinhao He, Dan Li

Abstract: To control structural responses under various actions, the growing use of supplementary dam** systems in modern civil engineering structures necessitates inspecting and evaluating their operational performance postinstallation. However, due to the dispersed placement and complex nonlinearities of these devices, difficulties arise in determining minimal sensor configuration. This is inherently co… ▽ More To control structural responses under various actions, the growing use of supplementary dam** systems in modern civil engineering structures necessitates inspecting and evaluating their operational performance postinstallation. However, due to the dispersed placement and complex nonlinearities of these devices, difficulties arise in determining minimal sensor configuration. This is inherently connected to a pivotal challenge: establishing a reliable input-output map**, which comprises both the mathematical model and sensor arrangements. Prior work indicates this can be achieved through theoretical observability analysis or Lie symmetries analysis, both of which provide different perspectives on the existence of a way to access the solutions of a system identification problem uniquely (at least locally). The present study introduces a unified framework, enhanced by algorithm realization as an application guide, for analyzing the observability and Lie symmetries of a given input-output map**. We demonstrate its implementation via examples of a building structure with various dam** systems under different conditions such as seismic loads, wind loads, and operational vibrations. Finally, we present a case study for an isolation building with an inerter damper and minimal sensor arrangement under seismic action. The results demonstrate that the unscented Kalman filter, a system identification method, can precisely estimate structural responses and assess dam** device performance once a reliable input-output map** is established. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.19450 [pdf, other]

FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining

Authors: Dong Li, Yidi Liu, Xueyang Fu, Senyan Xu, Zheng-Jun Zha

Abstract: Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely… ▽ More Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely exploit the correlation of different frequencies for conjuncting their learning procedures, limiting the full utilization of frequency information for image deraining. Alternatively, the recently emerged Mamba technique depicts its effectiveness and efficiency for modeling correlation in various domains (e.g., spatial, temporal), and we argue that introducing Mamba into its unexplored Fourier spaces to correlate different frequencies would help improve image deraining. This motivates us to propose a new framework termed FourierMamba, which performs image deraining with Mamba in the Fourier space. Owning to the unique arrangement of frequency orders in Fourier space, the core of FourierMamba lies in the scanning encoding of different frequencies, where the low-high frequency order formats exhibit differently in the spatial dimension (unarranged in axis) and channel dimension (arranged in axis). Therefore, we design FourierMamba that correlates Fourier space information in the spatial and channel dimensions with distinct designs. Specifically, in the spatial dimension Fourier space, we introduce the zigzag coding to scan the frequencies to rearrange the orders from low to high frequencies, thereby orderly correlating the connections between frequencies; in the channel dimension Fourier space with arranged orders of frequencies in axis, we can directly use Mamba to perform frequency correlation and improve the channel information representation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16062 [pdf, other]

Movable Antenna Empowered Physical Layer Security Without Eve's CSI: Joint Optimization of Beamforming and Antenna Positions

Authors: Zhiyong Feng, Yujia Zhao, Kan Yu, Dong Li

Abstract: Physical layer security (PLS) technology based on the fixed-position antenna (FPA) has {attracted widespread attention}. Due to the fixed feature of the antennas, current FPA-based PLS schemes cannot fully utilize the spatial degree of freedom, and thus a weaken secure gain in the desired/undesired direction may exist. Different from the concept of FPA, mobile antenna (MA) is a novel technology th… ▽ More Physical layer security (PLS) technology based on the fixed-position antenna (FPA) has {attracted widespread attention}. Due to the fixed feature of the antennas, current FPA-based PLS schemes cannot fully utilize the spatial degree of freedom, and thus a weaken secure gain in the desired/undesired direction may exist. Different from the concept of FPA, mobile antenna (MA) is a novel technology that {reconfigures} the wireless channels and enhances the corresponding capacity through the flexible movement of antennas on a minor scale. MA-empowered PLS enjoys huge potential and deserves further investigation. In this paper, we, for the first time, investigate the secrecy performance of MA-enabled PLS system where a MA-based Alice transmits the confidential information to multiple single-antenna Bobs, in the presence of the single-antenna eavesdropper (Eve) {in the absence} of perfect channel state information (CSI). For the purpose of the secrecy rate maximization of the worst Bob, we jointly design the transmit beamforming and antenna positions at the Alice, subject to the minimum moving distance of the antenna, uncertainty CSI of Eve, and maximum transmit power. Furthermore, the projected gradient ascent (PGA), alternating optimization (AO), and simulated annealing (SA) {are} adopted to solve the non-convex characteristics of the problem of the secrecy rate maximization. Simulation results demonstrate the effectiveness and correctness of the proposed method. In particular, MA-enabled PLS scheme can significantly enhance the secrecy rate compared to the conventional FPA-based ones for different settings of key system parameters. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.06342 [pdf, other]

Compression-Realized Deep Structural Network for Video Quality Enhancement

Authors: Hanchi Sun, Xiaohong Liu, Xinyang Jiang, Yifei Shen, Dongsheng Li, Xiongkuo Min, Guangtao Zhai

Abstract: This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo… ▽ More This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a more "conscious" process of quality enhancement. As a result, we propose the Compression-Realize Deep Structural Network (CRDS), introducing three inductive biases aligned with the three primary processes in the classic compression codec, merging the strengths of classical encoder architecture with deep network capabilities. Inspired by the residual extraction and domain transformation process in the codec, a pre-trained Latent Degradation Residual Auto-Encoder is proposed to transform video frames into a latent feature space, and the mutual neighborhood attention mechanism is integrated for precise motion estimation and residual extraction. Furthermore, drawing inspiration from the quantization noise distribution of the codec, CRDS proposes a novel Progressive Denoising framework with intermediate supervision that decomposes the quality enhancement into a series of simpler denoising sub-tasks. Experimental results on datasets like LDV 2.0 and MFQE 2.0 indicate our approach surpasses state-of-the-art models. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.01170 [pdf, other]

doi 10.1109/TCSVT.2024.3395481

GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

Abstract: Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we p… ▽ More Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we propose a novel transformer-based entropy model called GroupedMixer, which enjoys both faster coding speed and better compression performance than previous transformer-based methods. Specifically, our approach builds upon group-wise autoregression by first partitioning the latent variables into groups along spatial-channel dimensions, and then entropy coding the groups with the proposed transformer-based entropy model. The global causal self-attention is decomposed into more efficient group-wise interactions, implemented using inner-group and cross-group token-mixers. The inner-group token-mixer incorporates contextual elements within a group while the cross-group token-mixer interacts with previously decoded groups. Alternate arrangement of two token-mixers enables global contextual reference. To further expedite the network inference, we introduce context cache optimization to GroupedMixer, which caches attention activation values in cross-group token-mixers and avoids complex and duplicated computation. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted by IEEE TCSVT

arXiv:2405.00307 [pdf, other]

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

Authors: Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro, Manabu Okumura

Abstract: Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover… ▽ More Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are employed to iteratively select a subset of the most informative and diverse samples for fine-tuning, thereby reducing time consumption. Experiments demonstrate that our proposed method \textsc{After}, using only 20\% of samples, improves accuracy by 8.45\% and reduces time consumption by 79\%. The additional extension of \textsc{After} and ablation studies further confirm its effectiveness and applicability to various real-world scenarios. Our source code is available on Github for reproducibility. (https://github.com/Clearloveyuan/AFTER). △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted by Journal of Natural Language Processing. arXiv admin note: text overlap with arXiv:2310.00283

arXiv:2404.18418 [pdf, other]

Decomposition Model Assisted Energy-Saving Design in Radio Access Network

Authors: Xiaoxue Zhao, Yijun Yu, Yexing Li, Dong Li, Yao Wang, Chungang Yang

Abstract: The continuous emergence of novel services and massive connections involve huge energy consumption towards ultra-dense radio access networks. Moreover, there exist much more number of controllable parameters that can be adjusted to reduce the energy consumption from a network-wide perspective. However, a network-level energy-saving intent usually contains multiple network objectives and constraint… ▽ More The continuous emergence of novel services and massive connections involve huge energy consumption towards ultra-dense radio access networks. Moreover, there exist much more number of controllable parameters that can be adjusted to reduce the energy consumption from a network-wide perspective. However, a network-level energy-saving intent usually contains multiple network objectives and constraints. Therefore, it is critical to decompose a network-level energy-saving intent into multiple levels of configurated operations from a top-down refinement perspective. In this work, we utilize a softgoal interdependency graph decomposition model to assist energy-saving scheme design. Meanwhile, we propose an energy-saving approach based on deep Q-network, which achieve a better trade-off among the energy consumption, the throughput, and the first packet delay. In addition, we illustrate how the decomposition model can assist in making energy-saving decisions. Evaluation results demonstrate the performance gain of the proposed scheme in accelerating the model training process. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.15761 [pdf, other]

Rechargeable UAV Trajectory Optimization for Real-Time Persistent Data Collection of Large-Scale Sensor Networks

Authors: Rui Wang, Deshi Li, Qingqing Wu, Kaitao Meng, Boning Feng, Lele Cong

Abstract: Unmanned aerial vehicles (UAVs) have received plenty of attention due to their high flexibility and enhanced communication ability, nonetheless, the limited onboard energy restricts UAVs' application on persistent data collection missions in large areas. In this paper, we propose a rechargeable UAV-assisted periodic data collection scheme, where a UAV is dispatched to periodically collect data fro… ▽ More Unmanned aerial vehicles (UAVs) have received plenty of attention due to their high flexibility and enhanced communication ability, nonetheless, the limited onboard energy restricts UAVs' application on persistent data collection missions in large areas. In this paper, we propose a rechargeable UAV-assisted periodic data collection scheme, where a UAV is dispatched to periodically collect data from sensor nodes (SNs) in the mission area and charged by a wireless charging platform. Specifically, the periodic data collection completion time is minimized by optimizing the UAV trajectory to reach the optimal balance among the collection time, flight time, and recharging time. The formulated problem is non-convex and difficult to solve directly. To tackle this problem, we divide the main problem into two sub-problems and address them by leveraging successive convex approximation (SCA), bisection search, and heuristic methods. Then, we propose a periodic trajectory optimization algorithm to iteratively solve the two sub-problems to minimize the completion time. Furthermore, to deal with the dynamics of SNs, we propose a low-complexity trajectory adjustment strategy, where the trajectory can be maintained or adjusted locally at the SNs change, which significantly mitigates the computation cost of re-optimization. The simulation results show the superiority and robustness of the proposed scheme and the completion time is on average 39% and 33% lower than the two benchmarks, respectively. △ Less

Submitted 6 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 13 pages, 17 figures, submitted to IEEE for possible publication

arXiv:2404.13277 [pdf, other]

Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang

Abstract: Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecti… ▽ More Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models. △ Less

Submitted 24 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: Submitted to a conference

arXiv:2404.11313 [pdf, other]

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2404.11278 [pdf, other]

Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging

Authors: Dikai Li, Jian Yu, Qian Chen, Chunhui Zhang, Xiangyu Wan, Leifeng Cao

Abstract: Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while ma… ▽ More Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while maintaining imaging quality. The sphere encoding technology effectively solves the imaging blurring caused by the tilting of the encoding system, and successfully images micrometer sized X-ray sources. This paper will combine MIXE and X-ray sphere coding imaging techniques, including ball coding and zone plates, to study the method of non-destructive deep structure imaging of ICF targets and obtaining sub element distribution. This method aims to develop a new method for ICF target detection, which is particularly important for inertial confinement fusion. At the same time, this method can be used to detect and analyze materials that are difficult to penetrate or sensitive, and is expected to solve the problem of element resolution and imaging that traditional technologies cannot overcome. It will provide new methods for the future development of multiple fields such as particle physics, material science, and X-ray optics. △ Less

Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11171 [pdf, other]

Personalized Heart Disease Detection via ECG Digital Twin Generation

Authors: Yaojun Hu, **tai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

Abstract: Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ dig… ▽ More Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. △ Less

Submitted 11 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.02663 [pdf]

Ground-to-UAV sub-Terahertz channel measurement and modeling

Authors: Da Li, Peian Li, Jiabiao Zhao, Jianjian Liang, Jiacheng Liu, Guohao Liu, Yuanshuai Lei, Wenbo Liu, Jianqin Deng, Fuyong Liu, Jianjun Ma

Abstract: Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless… ▽ More Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless channels leveraging UAVs remain under explored. This work delves into a ground-to-UAV channel at 140 GHz, with a specific focus on the influence of UAV hovering behavior on channel performance. Employing experimental measurements through an unmodulated channel setup and a geometry-based stochastic model (GBSM) that integrates three-dimensional positional coordinates and beamwidth, this work evaluates the impact of UAV dynamic movements and antenna orientation on channel performance. Our findings highlight the minimal impact of UAV orientation adjustments on channel performance and underscore the diminishing necessity for precise alignment between UAVs and ground stations as beamwidth increases. △ Less

Submitted 28 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Submitted to Optics Express

arXiv:2404.02661 [pdf]

Terahertz channel modeling based on surface sensing characteristics

Authors: Jiayuan Cui, Da Li, Jiabiao Zhao, Jiacheng Liu, Guohao Liu, Xiangkun He, Yue Su, Fei Song, Peian Li, Jianjun Ma

Abstract: The dielectric properties of environmental surfaces, including walls, floors and the ground, etc., play a crucial role in sha** the accuracy of terahertz (THz) channel modeling, thereby directly impacting the effectiveness of communication systems. Traditionally, acquiring these properties has relied on methods such as terahertz time-domain spectroscopy (THz-TDS) or vector network analyzers (VNA… ▽ More The dielectric properties of environmental surfaces, including walls, floors and the ground, etc., play a crucial role in sha** the accuracy of terahertz (THz) channel modeling, thereby directly impacting the effectiveness of communication systems. Traditionally, acquiring these properties has relied on methods such as terahertz time-domain spectroscopy (THz-TDS) or vector network analyzers (VNA), demanding rigorous sample preparation and entailing a significant expenditure of time. However, such measurements are not always feasible, particularly in novel and uncharacterized scenarios. In this work, we propose a new approach for channel modeling that leverages the inherent sensing capabilities of THz channels. By comparing the results obtained through channel sensing with that derived from THz-TDS measurements, we demonstrate the method's ability to yield dependable surface property information. The application of this approach in both a miniaturized cityscape scenario and an indoor environment has shown consistency with experimental measurements, thereby verifying its effectiveness in real-world settings. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Submitted to Nano Communication Networks

arXiv:2404.01155 [pdf, other]

Dynamic Modeling and Stability Analysis for Repeated LVRT Process of Wind Turbine Based on Switched System Theory

Authors: Qi** Lai, Chen Shen, Dongsheng Li

Abstract: The significant electrical distance between wind power collection points and the main grid poses challenges for weak grid-connected wind power systems. A new type of voltage oscillation phenomenon induced by repeated low voltage ride-through (LVRT) of the wind turbine has been observed, threatening the safe and stable operation of such power systems. Therefore, exploring dynamic evolution mechanis… ▽ More The significant electrical distance between wind power collection points and the main grid poses challenges for weak grid-connected wind power systems. A new type of voltage oscillation phenomenon induced by repeated low voltage ride-through (LVRT) of the wind turbine has been observed, threatening the safe and stable operation of such power systems. Therefore, exploring dynamic evolution mechanisms and develo** stability analysis approaches for this phenomenon have become pressing imperatives. This paper introduces switched system theory for dynamic modeling, mechanism elucidation, and stability analysis of the repeated LVRT process. Firstly, considering the external connection impedance and internal control dynamics, a novel wind turbine grid-side converter (WT-GSC) switched system model is established to quantitatively characterize the evolution dynamic and mechanism of the voltage oscillation. Subsequently, a sufficient stability criterion and index grounded in the common Lyapunov function are proposed for stability analysis and assessment of the WT-GSC switched system. Moreover, to enhance the system stability, the Sobol' global sensitivity analysis method is adopted to identify dominant parameters, which can be further optimized via the particle swarm optimization (PSO) algorithm. Finally, simulations conducted on a modified IEEE 39-bus test system verify the effectiveness of the proposed dynamic modeling and stability analysis methods. △ Less

Submitted 8 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 10 pages, 10 figures

arXiv:2403.20075 [pdf, ps, other]

Adaptive Decentralized Federated Learning in Energy and Latency Constrained Wireless Networks

Authors: Zhigang Yan, Dong Li

Abstract: In Federated Learning (FL), with parameter aggregated by a central node, the communication overhead is a substantial concern. To circumvent this limitation and alleviate the single point of failure within the FL framework, recent studies have introduced Decentralized Federated Learning (DFL) as a viable alternative. Considering the device heterogeneity, and energy cost associated with parameter ag… ▽ More In Federated Learning (FL), with parameter aggregated by a central node, the communication overhead is a substantial concern. To circumvent this limitation and alleviate the single point of failure within the FL framework, recent studies have introduced Decentralized Federated Learning (DFL) as a viable alternative. Considering the device heterogeneity, and energy cost associated with parameter aggregation, in this paper, the problem on how to efficiently leverage the limited resources available to enhance the model performance is investigated. Specifically, we formulate a problem that minimizes the loss function of DFL while considering energy and latency constraints. The proposed solution involves optimizing the number of local training rounds across diverse devices with varying resource budgets. To make this problem tractable, we first analyze the convergence of DFL with edge devices with different rounds of local training. The derived convergence bound reveals the impact of the rounds of local training on the model performance. Then, based on the derived bound, the closed-form solutions of rounds of local training in different devices are obtained. Meanwhile, since the solutions require the energy cost of aggregation as low as possible, we modify different graph-based aggregation schemes to solve this energy consumption minimization problem, which can be applied to different communication scenarios. Finally, a DFL framework which jointly considers the optimized rounds of local training and the energy-saving aggregation scheme is proposed. Simulation results show that, the proposed algorithm achieves a better performance than the conventional schemes with fixed rounds of local training, and consumes less energy than other traditional aggregation schemes. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.16699 [pdf, other]

Resonant Beam Communications: A New Design Paradigm and Challenges

Authors: Yuanming Tian, Dongxu Li, Chuan Huang, Qingwen Liu, Shengli Zhou

Abstract: Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interference generated by the modulated beam reflected from the receiver affects the transmission of the desired information. To tackle this challenge, a synchro… ▽ More Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interference generated by the modulated beam reflected from the receiver affects the transmission of the desired information. To tackle this challenge, a synchronization-based point-to-point RBCom system is proposed to eliminate the echo interference, and the design for the transmitter and receiver is discussed. Subsequently, the performance of the proposed RBCom is evaluated and compared with that of visible light communications (VLC) and free space optical communications (FOC). Finally, future research directions are outlined and several implementation challenges of RBCom systems are highlighted. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16694 [pdf, other]

Design and Performance of Resonant Beam Communications -- Part II: Mobile Scenario

Authors: Dongxu Li, Yuanming Tian, Chuan Huang, Qingwen Liu, Shengli Zhou

Abstract: This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based information transmission scheme and derives the capacity upper and lower bounds for the quasi-static channel case. In Part II, we address the mobile scenario,… ▽ More This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based information transmission scheme and derives the capacity upper and lower bounds for the quasi-static channel case. In Part II, we address the mobile scenario, where the receiver is in relative motion to the transmitter, and derive a mobile RBCom channel model that jointly considers the Doppler effect, channel variation, and echo interference. With the obtained channel model, we prove that the channel gain of the mobile RBCom decreases as the number of transmitted frames increases, and thus show that the considered mobile RBCom terminates after the transmitter sends a certain number of frames without frequency compensation. By deriving an upper bound on the number of successfully transmitted frames, we formulate the throughput maximization problem for the considered mobile RBCom system, and solve it via a sequential parametric convex approximation (SPCA) method. Finally, simulation results validate the analysis of our proposed method in some typical scenarios. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16676 [pdf, other]

Design and Performance of Resonant Beam Communications -- Part I: Quasi-Static Scenario

Authors: Dongxu Li, Yuanming Tian, Chuan Huang, Qingwen Liu, Shengli Zhou

Abstract: This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission rate of the considered system under both the quasi-static and mobile scenarios. Part I of this paper focuses on the quasi-static scenario where the loc… ▽ More This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission rate of the considered system under both the quasi-static and mobile scenarios. Part I of this paper focuses on the quasi-static scenario where the locations of the transmitter and the receiver are relatively fixed. Specifically, we propose a new information-bearing scheme which adopts a synchronization-based amplitude modulation method to mitigate the echo interference caused by the reflected resonant beam. With this scheme, we show that the quasi-static RBCom channel is equivalent to a Markov channel and can be further simplified as an amplitude-constrained additive white Gaussian noise channel. Moreover, we develop an algorithm that jointly employs the bisection and exhaustive search to maximize its capacity upper and lower bounds. Finally, numerical results validate our analysis. Part II of this paper discusses the performance of the RBCom system under the mobile scenario. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.11397 [pdf, other]

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Authors: Yujia Liu, Chenxi Yang, Dingquan Li, Jianhao Ding, Tingting Jiang

Abstract: The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulti… ▽ More The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulting in significant changes in predicted scores. In this paper, we propose a defense method to improve the stability in predicted scores when attacked by small perturbations, thus enhancing the adversarial robustness of NR-IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the $\ell_1$ norm of the model's gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the $\ell_1$ norm of the gradient, thereby boosting the robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: accepted by CVPR 2024

arXiv:2403.07721 [pdf, other]

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Authors: Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

Abstract: How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for E… ▽ More How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for EEG-based visual reconstruction. In this study, we present an EEG-based visual reconstruction framework. It consists of a plug-and-play EEG encoder called the Adaptive Thinking Mapper (ATM), which is aligned with image embeddings, and a two-stage EEG guidance image generator that first transforms EEG features into image priors and then reconstructs the visual stimuli with a pre-trained image generator. Our approach allows EEG embeddings to achieve superior performance in image classification and retrieval tasks. Our two-stage image generation strategy vividly reconstructs images seen by humans. Furthermore, we analyzed the impact of signals from different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. We report that EEG-based visual decoding achieves SOTA performance, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. The code of ATM is available at https://github.com/dongyangli-del/EEG_Image_decode. △ Less

Submitted 4 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.11250 [pdf, other]

Hierarchical Prior-based Super Resolution for Point Cloud Geometry Compression

Authors: Dingquan Li, Kede Ma, **g Wang, Ge Li

Abstract: The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds. In its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to the naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compre… ▽ More The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds. In its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to the naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compression. The content-dependent hierarchical prior is constructed at the encoder side, which enables coarse-to-fine super resolution of the point cloud geometry at the decoder side. A more accurate prior generally yields improved reconstruction performance, at the cost of increased bits required to encode this side information. With a proper balance between prior accuracy and bit consumption, the proposed method demonstrates substantial Bjontegaard-delta bitrate savings on the MPEG Cat1A dataset, surpassing the octree-based and trisoup-based G-PCC v14. We provide our implementations for reproducible research at https://github.com/lidq92/mpeg-pcc-tmc13. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10310 [pdf, other]

Interpretable Generative Adversarial Imitation Learning

Authors: Wenliang Liu, Danyang Li, Erfan Aasi, Roberto Tron, Calin Belta

Abstract: Imitation learning methods have demonstrated considerable success in teaching autonomous systems complex tasks through expert demonstrations. However, a limitation of these methods is their lack of interpretability, particularly in understanding the specific task the learning agent aims to accomplish. In this paper, we propose a novel imitation learning method that combines Signal Temporal Logic (… ▽ More Imitation learning methods have demonstrated considerable success in teaching autonomous systems complex tasks through expert demonstrations. However, a limitation of these methods is their lack of interpretability, particularly in understanding the specific task the learning agent aims to accomplish. In this paper, we propose a novel imitation learning method that combines Signal Temporal Logic (STL) inference and control synthesis, enabling the explicit representation of the task as an STL formula. This approach not only provides a clear understanding of the task but also allows for the incorporation of human knowledge and adaptation to new scenarios through manual adjustments of the STL formulae. Additionally, we employ a Generative Adversarial Network (GAN)-inspired training approach for both the inference and the control policy, effectively narrowing the gap between the expert and learned policies. The effectiveness of our algorithm is demonstrated through two case studies, showcasing its practical applicability and adaptability. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: Submitted to L4DC 2024 (under review)

arXiv:2402.05441 [pdf]

Spiking Neural Network Enhanced Hand Gesture Recognition Using Low-Cost Single-photon Avalanche Diode Array

Authors: Zhenya Zang, Xingda Li, David Day Uei Li

Abstract: We present a compact spiking convolutional neural network (SCNN) and spiking multilayer perceptron (SMLP) to recognize ten different gestures in dark and bright light environments, using a $9.6 single-photon avalanche diode (SPAD) array. In our hand gesture recognition (HGR) system, photon intensity data was leveraged to train and test the network. A vanilla convolutional neural network (CNN) was… ▽ More We present a compact spiking convolutional neural network (SCNN) and spiking multilayer perceptron (SMLP) to recognize ten different gestures in dark and bright light environments, using a $9.6 single-photon avalanche diode (SPAD) array. In our hand gesture recognition (HGR) system, photon intensity data was leveraged to train and test the network. A vanilla convolutional neural network (CNN) was also implemented to compare the performance of SCNN with the same network topologies and training strategies. Our SCNN was trained from scratch instead of being converted from the CNN. We tested the three models in dark and ambient light (AL)-corrupted environments. The results indicate that SCNN achieves comparable accuracy (90.8%) to CNN (92.9%) and exhibits lower floating operations with only 8 timesteps. SMLP also presents a trade-off between computational workload and accuracy. The code and collected datasets of this work are available at https://github.com/zzy666666zzy/TinyLiDAR_NET_SNN. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures

arXiv:2402.00565 [pdf, other]

A Review of Carsickness Mitigation: Navigating Challenges and Exploiting Opportunities in the Era of Intelligent Vehicles

Authors: Daofei Li, Tingzhe Yu, Binbin Tang

Abstract: Motion sickness (MS) has long been a common complaint in road transportation. However, in the era of driving automation, MS has become an increasingly significant issue. The future intelligent vehicle is envisioned as a mobile space for work or entertainment, but unfortunately passengers' engagement in non-driving tasks may exacerbate MS. Finding effective MS countermeasures is crucial to ensure a… ▽ More Motion sickness (MS) has long been a common complaint in road transportation. However, in the era of driving automation, MS has become an increasingly significant issue. The future intelligent vehicle is envisioned as a mobile space for work or entertainment, but unfortunately passengers' engagement in non-driving tasks may exacerbate MS. Finding effective MS countermeasures is crucial to ensure a pleasant passenger experience. Nevertheless, due to the complex mechanism of MS, there are numerous challenges in mitigating it, hindering the development of practical countermeasures. To address this, we first review two prevalent theories explaining the mechanism of MS. Subsequently, this paper provides a summary of current subjective and objective approaches for quantifying motion sickness levels. Then, it surveys existing methods for alleviating MS, including passenger adjustment, intelligent vehicle solutions, and motion cues of various modalities. Furthermore, we outline the limitations and remaining challenges of current research and highlight novel opportunities in the context of intelligent vehicles. Finally, we propose an integrated framework for alleviating MS. The findings of this review will enhance our understanding of carsickness and offer valuable insights for future research and practice in MS mitigation within modern vehicles. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 19 pages, 5 figures, 5 tables

arXiv:2401.17837 [pdf, ps, other]

Safe Reinforcement Learning-Based Eco-Driving Control for Mixed Traffic Flows With Disturbances

Authors: Ke Lu, Dongjun Li, Qun Wang, Kaidi Yang, Lin Zhao, Ziyou Song

Abstract: This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guaran… ▽ More This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guarantees is the other concern when deploying a trained policy in real-world application. Compared with RL, model predicted control (MPC) can handle constrained dynamics systems, ensuring safe driving. However, the major challenges lie in complicated eco-driving tasks and the presence of disturbances, which respectively challenge the MPC design and the satisfaction of constraints. To address these limitations, the proposed framework incorporates the tube-based enhanced MPC (RMPC) to ensure the safe execution of the RL policy under disturbances, thereby improving the control robustness. RL not only optimizes the energy efficiency of the connected and automated vehicle in mixed traffic but also handles more uncertain scenarios, in which the energy consumption of the human-driven vehicle and its diverse and stochastic driving behaviors are considered in the optimization framework. Simulation results demonstrate that the proposed algorithm, compared with RMPC technique, shows an average improvement of 10.88% in holistic energy efficiency, while compared with RL algorithm, it effectively prevents inter-vehicle collisions. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.14007 [pdf, other]

Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression

Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu

Abstract: Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for opti… ▽ More Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, balance the trade-off between distortion and fidelity, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression. On CLIC2024 validation set, our approach achieves a 62% bitrate saving compared to MS-ILLM under FID metric. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 7 pages, 4 figures

arXiv:2401.11620 [pdf, other]

Real-Time Systems Optimization with Black-box Constraints and Hybrid Variables

Authors: Sen Wang, Dong Li, Shao-Yu Huang, Xuanliang Deng, Ashrarul H. Sifat, Changhee Jung, Ryan Williams, Haibo Zeng

Abstract: When optimizing real-time systems, designers often face a challenging problem where the schedulability constraints are non-convex, non-continuous, or lack an analytical form to understand their properties. Although the optimization framework NORTH proposed in previous work is general (it works with arbitrary schedulability analysis) and scalable, it can only handle problems with continuous variabl… ▽ More When optimizing real-time systems, designers often face a challenging problem where the schedulability constraints are non-convex, non-continuous, or lack an analytical form to understand their properties. Although the optimization framework NORTH proposed in previous work is general (it works with arbitrary schedulability analysis) and scalable, it can only handle problems with continuous variables, which limits its application. In this paper, we extend the applications of the framework NORTH to problems with a hybrid of continuous and discrete variables. This is achieved in a coordinate-descent method, where the continuous and discrete variables are optimized separately during iterations. The new framework, NORTH+, improves around 20% solution quality than NORTH in experiments. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Workshop on OPtimization for Embedded and ReAl-time systems (OPERA 2023) co-located with the 44th IEEE Real-Time Systems Symposium (RTSS)

arXiv:2401.10278 [pdf, other]

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model

Authors: Yuqi Chen, Kan Ren, Kaitao Song, Yansen Wang, Yifan Wang, Dongsheng Li, Lili Qiu

Abstract: Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision. It is also applicable to brain signals such as electroencephalography (EEG) data, given the abundance of available unlabeled data that exist in a wide spectrum of real-world medical applications ranging from seizure detection to wave analysis. The existing works lev… ▽ More Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision. It is also applicable to brain signals such as electroencephalography (EEG) data, given the abundance of available unlabeled data that exist in a wide spectrum of real-world medical applications ranging from seizure detection to wave analysis. The existing works leveraging self-supervised learning on EEG modeling mainly focus on pretraining upon each individual dataset corresponding to a single downstream task, which cannot leverage the power of abundant data, and they may derive sub-optimal solutions with a lack of generalization. Moreover, these methods rely on end-to-end model learning which is not easy for humans to understand. In this paper, we present a novel EEG foundation model, namely EEGFormer, pretrained on large-scale compound EEG data. The pretrained model cannot only learn universal representations on EEG signals with adaptable performance on various downstream tasks but also provide interpretable outcomes of the useful patterns within the data. To validate the effectiveness of our model, we extensively evaluate it on various downstream tasks and assess the performance under different transfer settings. Furthermore, we demonstrate how the learned model exhibits transferable anomaly detection performance and provides valuable interpretability of the acquired patterns via self-supervised learning. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: A preprint version of an ongoing work

arXiv:2401.05682 [pdf]

Adaptive Regularized Low-Rank Tensor Decomposition for Hyperspectral Image Denoising and Destri**

Authors: Dongyi Li, Dong Chu, Xiaobin Guan, Wei He, Huanfeng Shen

Abstract: Hyperspectral images (HSIs) are inevitably degraded by a mixture of various types of noise, such as Gaussian noise, impulse noise, stripe noise, and dead pixels, which greatly limits the subsequent applications. Although various denoising methods have already been developed, accurately recovering the spatial-spectral structure of HSIs remains a challenging problem to be addressed. Furthermore, ser… ▽ More Hyperspectral images (HSIs) are inevitably degraded by a mixture of various types of noise, such as Gaussian noise, impulse noise, stripe noise, and dead pixels, which greatly limits the subsequent applications. Although various denoising methods have already been developed, accurately recovering the spatial-spectral structure of HSIs remains a challenging problem to be addressed. Furthermore, serious stripe noise, which is common in real HSIs, is still not fully separated by the previous models. In this paper, we propose an adaptive hyperLaplacian regularized low-rank tensor decomposition (LRTDAHL) method for HSI denoising and destri**. On the one hand, the stripe noise is separately modeled by the tensor decomposition, which can effectively encode the spatial-spectral correlation of the stripe noise. On the other hand, adaptive hyper-Laplacian spatial-spectral regularization is introduced to represent the distribution structure of different HSI gradient data by adaptively estimating the optimal hyper-Laplacian parameter, which can reduce the spatial information loss and over-smoothing caused by the previous total variation regularization. The proposed model is solved using the alternating direction method of multipliers (ADMM) algorithm. Extensive simulation and real-data experiments all demonstrate the effectiveness and superiority of the proposed method. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05217 [pdf, other]

Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Tingting Jiang

Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack m… ▽ More No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack methods for NR-IQA provide a powerful instrument to test the robustness of NR-IQA. However, current attack methods of NR-IQA heavily rely on the gradient of the NR-IQA model, leading to limitations when the gradient information is unavailable. In this paper, we present a pioneering query-based black box attack against NR-IQA methods. We propose the concept of score boundary and leverage an adaptive iterative approach with multiple score boundaries. Meanwhile, the initial attack directions are also designed to leverage the characteristics of the Human Visual System (HVS). Experiments show our method outperforms all compared state-of-the-art attack methods and is far ahead of previous black-box methods. The effective NR-IQA model DBCNN suffers a Spearman's rank-order correlation coefficient (SROCC) decline of 0.6381 attacked by our method, revealing the vulnerability of NR-IQA models to black-box attacks. The proposed attack method also provides a potent tool for further exploration into NR-IQA robustness. △ Less

Submitted 25 April, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.03284 [pdf, other]

A General and Scalable Method for Optimizing Real-Time Systems

Authors: Sen Wang, Dong Li, Shao-Yu Huang, Xuanliang Deng, Ashrarul H. Sifat, Changhee Jung, Ryan Williams, Haibo Zeng

Abstract: In real-time systems optimization, designers often face a challenging problem posed by the non-convex and non-continuous schedulability conditions, which may even lack an analytical form to understand their properties. To tackle this challenging problem, we treat the schedulability analysis as a black box that only returns true/false results. We propose a general and scalable framework to optimize… ▽ More In real-time systems optimization, designers often face a challenging problem posed by the non-convex and non-continuous schedulability conditions, which may even lack an analytical form to understand their properties. To tackle this challenging problem, we treat the schedulability analysis as a black box that only returns true/false results. We propose a general and scalable framework to optimize real-time systems, named Numerical Optimizer with Real-Time Highlight (NORTH). NORTH is built upon the gradient-based active-set methods from the numerical optimization literature but with new methods to manage active constraints for the non-differentiable schedulability constraints. In addition, we also generalize NORTH to NORTH+, to collaboratively optimize certain types of discrete variables (\eg priority assignments, categorical variables) with continuous variables based on numerical optimization algorithms. We demonstrate the algorithm performance with two example applications: energy minimization based on dynamic voltage and frequency scaling (DVFS), and optimization of control system performance. In these experiments, NORTH achieved $10^2$ to $10^5$ times speed improvements over state-of-the-art methods while maintaining similar or better solution quality. NORTH+ outperforms NORTH by 30\% with similar algorithm scalability. Both NORTH and NORTH+ support black-box schedulability analysis, ensuring broad applicability. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: Extension of a conference paper

arXiv:2401.01176 [pdf, other]

Fundamental Limitation of Semantic Communications: Neural Estimation for Rate-Distortion

Authors: Dongxu Li, Jianhao Huang, Chuan Huang, Xiaoqi Qin, Han Zhang, ** Zhang

Abstract: This paper studies the fundamental limit of semantic communications over the discrete memoryless channel. We consider the scenario to send a semantic source consisting of an observation state and its corresponding semantic state, both of which are recovered at the receiver. To derive the performance limitation, we adopt the semantic rate-distortion function (SRDF) to study the relationship among t… ▽ More This paper studies the fundamental limit of semantic communications over the discrete memoryless channel. We consider the scenario to send a semantic source consisting of an observation state and its corresponding semantic state, both of which are recovered at the receiver. To derive the performance limitation, we adopt the semantic rate-distortion function (SRDF) to study the relationship among the minimum compression rate, observation distortion, semantic distortion, and channel capacity. For the case with unknown semantic source distribution, while only a set of the source samples is available, we propose a neural-network-based method by leveraging the generative networks to learn the semantic source distribution. Furthermore, for a special case where the semantic state is a deterministic function of the observation, we design a cascade neural network to estimate the SRDF. For the case with perfectly known semantic source distribution, we propose a general Blahut-Arimoto algorithm to effectively compute the SRDF. Finally, experimental results validate our proposed algorithms for the scenarios with ideal Gaussian semantic source and some practical datasets. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.16419 [pdf]

Radar detection of wake vortex behind the aircraft: the detection range problem

Authors: Jiangkun Gong, Jun Yan, Deyong Kong, Deren Li

Abstract: In this study, we showcased the detection of the wake vortex produced by a medium aircraft at distances exceeding 10 km using an X-band pulse-Doppler radar. We analyzed radar signals within the range profiles behind a Boeing 737 aircraft on February 7, 2021, within the airspace of the Runway Protection Zone (RPZ) at Tianhe Airport, Wuhan, China. The findings revealed that the wake vortex extended… ▽ More In this study, we showcased the detection of the wake vortex produced by a medium aircraft at distances exceeding 10 km using an X-band pulse-Doppler radar. We analyzed radar signals within the range profiles behind a Boeing 737 aircraft on February 7, 2021, within the airspace of the Runway Protection Zone (RPZ) at Tianhe Airport, Wuhan, China. The findings revealed that the wake vortex extended up to 6 km from the aircraft, which is 10 km from the radar, displaying distinct stages characterized by scattering patterns and Doppler signatures. Despite the wake vortex exhibiting a scattering power approximately 10 dB lower than that of the aircraft, its Doppler Signal-to-Clutter Ratio (DSCR) values were only 5 dB lower, indicating a notably strong scattering power within a single radar bin. Additionally, certain radar parameters proved inconsistent in the stable detection and tracking of wake vortex, aligning with our earlier concept of cognitive micro-Doppler radar. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.15443 [pdf, other]

Road-Aware Localization With Salient Feature Matching in Heterogeneous Networks

Authors: Lele Cong, Deshi Li, Kaitao Meng, Shuya Zhu

Abstract: Vehicle localization is essential for intelligent transportation. However, achieving low-latency vehicle localization without sacrificing precision is challenging. In this paper, we propose a road-aware localization mechanism in heterogeneous networks (HetNet), where distinct features of HetNet signals are extracted for two-spatial-scale position map**, enabling low-latency positioning with high… ▽ More Vehicle localization is essential for intelligent transportation. However, achieving low-latency vehicle localization without sacrificing precision is challenging. In this paper, we propose a road-aware localization mechanism in heterogeneous networks (HetNet), where distinct features of HetNet signals are extracted for two-spatial-scale position map**, enabling low-latency positioning with high precision. Specifically, we propose a sequence segmentation method to extract the low-dimensional positioning space on two spatial scales. To represent roads and sub-segments according to HetNet signals, we propose a salient feature extraction method to eliminate redundant features and retain distinct features, thereby reducing feature-matching complexity and improving representation accuracy. Based on the extracted salient features, a two-spatial-scale localization algorithm is designed through salient feature matching, which can achieve low-latency road-aware localization. Furthermore, high-precision positioning is achieved by coordinate map** based on curve fitting. Simulation results show that our mechanism can provide a low-latency and high-precision positioning service compared to the benchmark schemes. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: 6 pages, 7 figures

arXiv:2312.11974 [pdf, other]

Ms-senet: Enhancing Speech Emotion Recognition Through Multi-scale Feature Fusion With Squeeze-and-excitation Blocks

Authors: Mengbo Li, Yuanzhong Zheng, Dichucheng Li, Yulun Wu, Yaoxuan Wang, Haojun Fei

Abstract: Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. Spatiotemporal features play a crucial role in SER, yet current research lacks comprehensive spatiotemporal feature learning. This paper focuses on addressing this gap by proposing a novel approach. In this paper, we employ Convolutional Neural Network (CNN) with varying kernel sizes for spatial… ▽ More Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. Spatiotemporal features play a crucial role in SER, yet current research lacks comprehensive spatiotemporal feature learning. This paper focuses on addressing this gap by proposing a novel approach. In this paper, we employ Convolutional Neural Network (CNN) with varying kernel sizes for spatial and temporal feature extraction. Additionally, we introduce Squeeze-and-Excitation (SE) modules to capture and fuse multi-scale features, facilitating effective information fusion for improved emotion recognition and a deeper understanding of the temporal evolution of speech emotion. Moreover, we employ skip connections and Spatial Dropout (SD) layers to prevent overfitting and increase the model's depth. Our method outperforms the previous state-of-the-art method, achieving an average UAR and WAR improvement of 1.62% and 1.32%, respectively, across six benchmark SER datasets. Further experiments demonstrated that our method can fully extract spatiotemporal features in low-resource conditions. △ Less

Submitted 24 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.10921 [pdf, other]

AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis

Authors: Dongze Li, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, **g Dong, Tieniu Tan

Abstract: Audio-driven talking head synthesis is a promising topic with wide applications in digital human, film making and virtual reality. Recent NeRF-based approaches have shown superiority in quality and fidelity compared to previous studies. However, when it comes to few-shot talking head generation, a practical scenario where only few seconds of talking video is available for one identity, two limitat… ▽ More Audio-driven talking head synthesis is a promising topic with wide applications in digital human, film making and virtual reality. Recent NeRF-based approaches have shown superiority in quality and fidelity compared to previous studies. However, when it comes to few-shot talking head generation, a practical scenario where only few seconds of talking video is available for one identity, two limitations emerge: 1) they either have no base model, which serves as a facial prior for fast convergence, or ignore the importance of audio when building the prior; 2) most of them overlook the degree of correlation between different face regions and audio, e.g., mouth is audio related, while ear is audio independent. In this paper, we present Audio Enhanced Neural Radiance Field (AE-NeRF) to tackle the above issues, which can generate realistic portraits of a new speaker with fewshot dataset. Specifically, we introduce an Audio Aware Aggregation module into the feature fusion stage of the reference scheme, where the weight is determined by the similarity of audio between reference and target image. Then, an Audio-Aligned Face Generation strategy is proposed to model the audio related and audio independent regions respectively, with a dual-NeRF framework. Extensive experiments have shown AE-NeRF surpasses the state-of-the-art on image fidelity, audio-lip synchronization, and generalization ability, even in limited training set or training iterations. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.10052 [pdf, other]

ESTformer: Transformer Utilizing Spatiotemporal Dependencies for EEG Super-resolution

Authors: Dongdong Li, Zhongliang Zeng, Zhe Wang, Hai Yang

Abstract: Towards practical applications of Electroencephalography (EEG) data, lightweight acquisition devices, equipped with a few electrodes, result in a predicament where analysis methods can only leverage EEG data with extremely low spatial resolution. Recent methods mainly focus on using mathematical interpolation methods and Convolutional Neural Networks for EEG super-resolution (SR), but they suffer… ▽ More Towards practical applications of Electroencephalography (EEG) data, lightweight acquisition devices, equipped with a few electrodes, result in a predicament where analysis methods can only leverage EEG data with extremely low spatial resolution. Recent methods mainly focus on using mathematical interpolation methods and Convolutional Neural Networks for EEG super-resolution (SR), but they suffer from high computation costs, extra bias, and few insights in spatiotemporal dependency modeling. To this end, we propose the ESTformer, an EEG SR framework utilizing spatiotemporal dependencies based on the Transformer. The ESTformer applies positional encoding methods and the Multi-head Self-attention mechanism to the space and time dimensions, which can learn spatial structural information and temporal functional variation. The ESTformer, with the fixed masking strategy, adopts a mask token to up-sample the low-resolution (LR) EEG data in case of disturbance from mathematical interpolation methods. On this basis, we design various Transformer blocks to construct the Spatial Interpolation Module (SIM) and the Temporal Reconstruction Module (TRM). Finally, the ESTformer cascades the SIM and the TRM to capture and model spatiotemporal dependencies for EEG SR with fidelity. Extensive experimental results on two EEG datasets show the effectiveness of the ESTformer against previous state-of-the-art methods and verify the superiority of the SR data to the LR data in EEG-based downstream tasks of person identification and emotion recognition. The proposed ESTformer demonstrates the versatility of the Transformer for EEG SR tasks. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2312.05953 [pdf]

RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging

Authors: Zelong Liu, Alexander Zhou, Arnold Yang, Alara Yilmaz, Maxwell Yoo, Mikey Sullivan, Catherine Zhang, James Grant, Daiqing Li, Zahi A. Fayad, Sean Huver, Timothy Deyer, Xueyan Mei

Abstract: Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, t… ▽ More Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, the first multi-modal radiologic data generator, which was developed by training StyleGAN-XL on the real RadImageNet dataset of 102,774 patients. RadImageGAN can generate high-resolution synthetic medical imaging datasets across 12 anatomical regions and 130 pathological classes in 3 modalities. Furthermore, we demonstrate that RadImageGAN generators can be utilized with BigDatasetGAN to generate multi-class pixel-wise annotated paired synthetic images and masks for diverse downstream segmentation tasks with minimal manual annotation. We showed that using synthetic auto-labeled data from RadImageGAN can significantly improve performance on four diverse downstream segmentation datasets by augmenting real training data and/or develo** pre-trained weights for fine-tuning. This shows that RadImageGAN combined with BigDatasetGAN can improve model performance and address data scarcity while reducing the resources needed for annotations for segmentation tasks. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.16531 [pdf]

Channel Modeling for Terahertz Communications in Rain

Authors: Peian Li, Wenbo Liu, Jiacheng Liu, Da Li, Guohao Liu, Yuanshuai Lei, Jiabiao Zhao, Xiaopeng Wang, Houjun Sun, Jianjun Ma, John F. Federici

Abstract: Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount sign… ▽ More Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount significance of the number of raindrops in the THz channel, particularly in scenarios with constant rain rates but varying drop sizes. Central to our findings is a novel model grounded in Mie scattering theory, which adeptly incorporates the variability of raindrop size distributions at different altitudes. This model has displayed strong congruence with our experimental results. In essence, our study underscores the inadequacy of solely depending on a fixed ground-based rain rate and emphasizes the imperative of calibrating distribution metrics to cater to specific environmental and operational contexts. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: submitted to IEEE Transactions on Antennas and Propagation

arXiv:2311.13616 [pdf, other]

Online Video Quality Enhancement with Spatial-Temporal Look-up Tables

Authors: Zefan Qu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Cairong Zhao

Abstract: Low latency rates are crucial for online video-based applications, such as video conferencing and cloud gaming, which make improving video quality in online scenarios increasingly important. However, existing quality enhancement methods are limited by slow inference speed and the requirement for temporal information contained in future frames, making it challenging to deploy them directly in onlin… ▽ More Low latency rates are crucial for online video-based applications, such as video conferencing and cloud gaming, which make improving video quality in online scenarios increasingly important. However, existing quality enhancement methods are limited by slow inference speed and the requirement for temporal information contained in future frames, making it challenging to deploy them directly in online tasks. In this paper, we propose a novel method, STLVQE, specifically designed to address the rarely studied online video quality enhancement (Online-VQE) problem. Our STLVQE designs a new VQE framework which contains a Module-Agnostic Feature Extractor that greatly reduces the redundant computations and redesign the propagation, alignment, and enhancement module of the network. A Spatial-Temporal Look-up Tables (STL) is proposed, which extracts spatial-temporal information in videos while saving substantial inference time. To the best of our knowledge, we are the first to exploit the LUT structure to extract temporal information in video tasks. Extensive experiments on the MFQE 2.0 dataset demonstrate that our STLVQE achieves a satisfactory performance-speed trade-off. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.02818 [pdf, other]

Signal Processing Meets SGD: From Momentum to Filter

Authors: Zhipeng Yao, Guiyuan Fu, Ying Li, Yu Zhang, Dazhou Li, Rui Yu

Abstract: In deep learning, stochastic gradient descent (SGD) and its momentum-based variants are widely used for optimization, but they typically suffer from slow convergence. Conversely, existing adaptive learning rate optimizers speed up convergence but often compromise generalization. To resolve this issue, we propose a novel optimization method designed to accelerate SGD's convergence without sacrifici… ▽ More In deep learning, stochastic gradient descent (SGD) and its momentum-based variants are widely used for optimization, but they typically suffer from slow convergence. Conversely, existing adaptive learning rate optimizers speed up convergence but often compromise generalization. To resolve this issue, we propose a novel optimization method designed to accelerate SGD's convergence without sacrificing generalization. Our approach reduces the variance of the historical gradient, improves first-order moment estimation of SGD by applying Wiener filter theory, and introduces a time-varying adaptive gain. Empirical results demonstrate that SGDF (SGD with Filter) effectively balances convergence and generalization compared to state-of-the-art optimizers. △ Less

Submitted 24 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.00970 [pdf, other]

Lightweight super resolution network for point cloud geometry compression

Authors: Wei Zhang, Dingquan Li, Ge Li, Wen Gao

Abstract: This paper presents an approach for compressing point cloud geometry by leveraging a lightweight super-resolution network. The proposed method involves decomposing a point cloud into a base point cloud and the interpolation patterns for reconstructing the original point cloud. While the base point cloud can be efficiently compressed using any lossless codec, such as Geometry-based Point Cloud Comp… ▽ More This paper presents an approach for compressing point cloud geometry by leveraging a lightweight super-resolution network. The proposed method involves decomposing a point cloud into a base point cloud and the interpolation patterns for reconstructing the original point cloud. While the base point cloud can be efficiently compressed using any lossless codec, such as Geometry-based Point Cloud Compression, a distinct strategy is employed for handling the interpolation patterns. Rather than directly compressing the interpolation patterns, a lightweight super-resolution network is utilized to learn this information through overfitting. Subsequently, the network parameter is transmitted to assist in point cloud reconstruction at the decoder side. Notably, our approach differentiates itself from lookup table-based methods, allowing us to obtain more accurate interpolation patterns by accessing a broader range of neighboring voxels at an acceptable computational cost. Experiments on MPEG Cat1 (Solid) and Cat2 datasets demonstrate the remarkable compression performance achieved by our method. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 10 pages, 3 figures, 2 tables, and 27 references

arXiv:2311.00418 [pdf, other]

Intelligent Surface Empowered Integrated Sensing and Communication: From Coexistence to Reciprocity

Authors: Kaitao Meng, Qingqing Wu, Christos Masouros, Wen Chen, Deshi Li

Abstract: Integrated sensing and communication (ISAC) has attracted growing interests for sixth-generation (6G) and beyond wireless networks. The primary challenges faced by highly efficient ISAC include limited sensing and communication (S&C) coverage, constrained integration gain between S&C under weak channel correlations, and unknown performance boundary. Intelligent reflecting/refracting surfaces (IRSs… ▽ More Integrated sensing and communication (ISAC) has attracted growing interests for sixth-generation (6G) and beyond wireless networks. The primary challenges faced by highly efficient ISAC include limited sensing and communication (S&C) coverage, constrained integration gain between S&C under weak channel correlations, and unknown performance boundary. Intelligent reflecting/refracting surfaces (IRSs) can effectively expand S&C coverage and control the degree of freedom of channels between the transmitters and receivers, thereby realizing increasing integration gains. In this work, we first delve into the fundamental characteristics of IRS-empowered ISAC and innovative IRS-assisted sensing architectures. Then, we discuss various objectives for IRS channel control and deployment optimization in ISAC systems. Furthermore, the interplay between S&C in different deployment strategies is investigated and some promising directions for IRS enhanced ISAC are outlined. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 8 pages, 4 figures, submitted to IEEE Journal for possible publication

arXiv:2310.19699 [pdf, other]

Optimizing Logical Execution Time Model for Both Determinism and Low Latency

Authors: Sen Wang, Dong Li, Ashrarul H. Sifat, Shao-Yu Huang, Xuanliang Deng, Changhee Jung, Ryan Williams, Haibo Zeng

Abstract: The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called the task's LET interval), and the task reads (resp. writes) at the beginning (resp. end) of the interval. Compared to other communication mechanisms, such as impl… ▽ More The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called the task's LET interval), and the task reads (resp. writes) at the beginning (resp. end) of the interval. Compared to other communication mechanisms, such as implicit communication and Dynamic Buffer Protocol (DBP), LET performs worse on many metrics, such as end-to-end latency (including reaction time and data age) and time disparity jitter. Compared with the default LET setting, the flexible LET (fLET) model shrinks the LET interval while still guaranteeing schedulability by introducing the virtual offset to defer the read operation and using the virtual deadline to move up the write operation. Therefore, fLET has the potential to significantly improve the end-to-end timing performance while kee** the benefits of deterministic behavior on timing and dataflow. To fully realize the potential of fLET, we consider the problem of optimizing the assignments of its virtual offsets and deadlines. We propose new abstractions to describe the task communication pattern and new optimization algorithms to explore the solution space efficiently. The algorithms leverage the linearizability of communication patterns and utilize symbolic operations to achieve efficient optimization while providing a theoretical guarantee. The framework supports optimizing multiple performance metrics and guarantees bounded suboptimality when optimizing end-to-end latency. Experimental results show that our optimization algorithms improve upon the default LET and its existing extensions and significantly outperform implicit communication and DBP in terms of various metrics, such as end-to-end latency, time disparity, and its jitter. △ Less

Submitted 7 March, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: accepted in RTAS'24

arXiv:2310.14769 [pdf]

An introduction to radar Automatic Target Recognition (ATR) technology in ground-based radar systems

Authors: Jiangkun Gong, Jun Yan, Deyong Kong, Deren Li

Abstract: This paper presents a brief examination of Automatic Target Recognition (ATR) technology within ground-based radar systems. It offers a lucid comprehension of the ATR concept, delves into its historical milestones, and categorizes ATR methods according to different scattering regions. By incorporating ATR solutions into radar systems, this study demonstrates the expansion of radar detection ranges… ▽ More This paper presents a brief examination of Automatic Target Recognition (ATR) technology within ground-based radar systems. It offers a lucid comprehension of the ATR concept, delves into its historical milestones, and categorizes ATR methods according to different scattering regions. By incorporating ATR solutions into radar systems, this study demonstrates the expansion of radar detection ranges and the enhancement of tracking capabilities, leading to superior situational awareness. Drawing insights from the Russo-Ukrainian War, the paper highlights three pressing radar applications that urgently necessitate ATR technology: detecting stealth aircraft, countering small drones, and implementing anti-jamming measures. Anticipating the next wave of radar ATR research, the study predicts a surge in cognitive radar and machine learning (ML)-driven algorithms. These emerging methodologies aspire to confront challenges associated with system adaptation, real-time recognition, and environmental adaptability. Ultimately, ATR stands poised to revolutionize conventional radar systems, ushering in an era of 4D sensing capabilities. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Showing 1–50 of 252 results for author: Li, D