-
Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device
Authors:
Fulan Li,
Yunfei Guo,
Wenda Xu,
Weide Zhang,
Fangyun Zhao,
Baiyu Wang,
Huaguang Du,
Chengkun Zhang
Abstract:
This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit…
▽ More
This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorithm (IEVC) highlighted for its state-of-the-art accuracy, stability, efficiency and generalizability in motion restriction control. Second, an Admittance Virtual Dynamics simulation algorithm that achieves a smooth and natural human interaction with the non-backdrivable end-effector. Third, a generalized impedance force calculation algorithm allowing efficient impedance control on any trajectory or area boundary. Experimental validation demonstrated the system's effectiveness in accurate end-effector position control across various trajectories and configurations. The proposed upper limb end-effector-based rehabilitation device, with its high performance and adaptability, holds significant promise for extensive clinical application, potentially improving rehabilitation outcomes for stroke patients.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
RMFA-Net: A Neural ISP for Real RAW to RGB Image Reconstruction
Authors:
Fei Li,
Wenbo Hou,
Peng Jia
Abstract:
Deep learning-based ISP algorithms have demonstrated significant potential in raw2rgb reconstruction. However, existing networks have not fully considered the specific characteristics of raw data, such as black level and CFA, which can negatively impact texture and color if mishandled. Moreover, uneven exposure in raw data is also not considered carefully, leading to adverse effects on contrast an…
▽ More
Deep learning-based ISP algorithms have demonstrated significant potential in raw2rgb reconstruction. However, existing networks have not fully considered the specific characteristics of raw data, such as black level and CFA, which can negatively impact texture and color if mishandled. Moreover, uneven exposure in raw data is also not considered carefully, leading to adverse effects on contrast and brightness. In this paper, we introduce RMFA-Net to tackle these problems. We perform implicit black level correction to mitigate color shifts in dim scenes. To preserve high-frequency information and prevent misalignment, we propose a novel Three-Channel-Split mode. To address the issue of uneven exposure, we designed an explicit tone map** module based on the Retinex theory. We train and evaluate our models using the dataset released by the Mobile AI 2022 Learned Smartphone ISP Challenge. It is demonstrated that RMFA-Net outperforms previous algorithms, achieving a PSNR score of over 25 dB, surpassing the state-of-the-art by +1 dB. Furthermore, we developed a lightweight version, RMFANet-tiny, for engineering deployment while still maintaining strong performance, surpassing the SOTA by +0.5 dB.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Authors:
Philip Anastassiou,
Jiawei Chen,
Jitong Chen,
Yuanzhe Chen,
Zhuo Chen,
Ziyi Chen,
Jian Cong,
Lelai Deng,
Chuang Ding,
Lu Gao,
Mingqing Gong,
Peisong Huang,
Qingqing Huang,
Zhiying Huang,
Yuanyuan Huo,
Dongya Jia,
Chumin Li,
Feiya Li,
Hui Li,
Jiaxin Li,
Xiaoyang Li,
Xingxing Li,
Lin Liu,
Shouda Liu,
Sichao Liu
, et al. (21 additional authors not shown)
Abstract:
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub…
▽ More
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption
Authors:
Anqi Li,
Yuxi Liu,
Huihui Bai,
Feng Li,
Runmin Cong,
Meng Wang,
Yao Zhao
Abstract:
Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of…
▽ More
Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of fine-grained bitrate adaption across a broad spectrum while ensuring high-fidelity and generality compression. We base Control-GIC on a VQGAN framework representing an image as a sequence of variable-length codes (i.e. VQ-indices), which can be losslessly compressed and exhibits a direct positive correlation with the bitrates. Therefore, drawing inspiration from the classical coding principle, we naturally correlate the information density of local image patches with their granular representations, to achieve dynamic adjustment of the code quantity following different granularity decisions. This implies we can flexibly determine a proper allocation of granularity for the patches to acquire desirable compression rates. We further develop a probabilistic conditional decoder that can trace back to historic encoded multi-granularity representations according to transmitted codes, and then reconstruct hierarchical granular features in the formalization of conditional probability, enabling more informative aggregation to improve reconstruction realism. Our experiments show that Control-GIC allows highly flexible and controllable bitrate adaption and even once compression on an entire dataset to fulfill constrained bitrate conditions. Experimental results demonstrate its superior performance over recent state-of-the-art methods.
△ Less
Submitted 5 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing
Authors:
Wei Wang,
Dongdong Zou,
Zhenpeng Wu,
Qi Sui,
Xingwen Yi,
Fan Li,
Chao Lu,
Zhaohui Li
Abstract:
Coherent technology inherent with more availabledegrees of freedom is deemed a competitive solution for nextgeneration ultra-high-speed short-reach optical interconnects.However, the fatal barriers to implementing the conventiona.coherent system in short-reach optical interconnect are the costfootprint, and power consumption. Self-homodyne coherentsystem exhibits its potential to reduce the power…
▽ More
Coherent technology inherent with more availabledegrees of freedom is deemed a competitive solution for nextgeneration ultra-high-speed short-reach optical interconnects.However, the fatal barriers to implementing the conventiona.coherent system in short-reach optical interconnect are the costfootprint, and power consumption. Self-homodyne coherentsystem exhibits its potential to reduce the power consumption ofthe receiver-side digital signal processing (Rx-DSP) by deliveringthe local oscillator (LO) from the transmitter. However, anautomatic polarization controller (APC) is inevitable in the remoteLO link to avoid polarization fading, resulting in additional costsTo address the polarization fading issue, a simplified self.homodyne coherent system is proposed enabled by Alamouticoding in this paper. Benefiting from the Alamouti coding betweentwo polarizations, a polarization-insensitive receiver onlyincluding a 3dB coupler, a 90o Hybrid, and two balancedphotodiodes (BPDs)is sufficient for reception. Meanwhile, theAPC in the LO link is needless, simplifying the receiver structuresignificantly. Besides, the digital subcarrier multiplexing (DSCM)technique is also adopted to relax the computational complexity ofthe chromatic dispersion compensation (CDC), which is one of thedominant power consumption modules in Rx-DSP. Thetransmission performance of 50Gbaud 4-subcarrier 16/32OAM(4SC-16/320AM) DSCM signal based on the proposed simplifiedself-homodyne coherent system is investigated experimentallyThe results show that the bit-error-ratio(BER) performancedegradation caused by CD can be solved by increasing 4 taps inthe equalizer for 80km single mode fiber(SMF)transmissionwithout individual CDC, which operates in a low-complexitymanner.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning
Authors:
Jiajun Shen,
Fengjun Li,
Morteza Hashemi,
Huazhen Fang
Abstract:
In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly c…
▽ More
In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly compromising the controller, thereby posing a direct risk to physical security. Within the framework of hierarchical control and incentive feedback Stackelberg game, we design a resilient leading controller (leader) that is adaptive to a compromised following controller (follower) such that the compromised follower acts cooperatively with the leader, aligning its strategies with the leader's objective to achieve a team-optimal solution. First, we provide sufficient conditions for the existence of an incentive Stackelberg solution when system dynamics are known. Then, we propose a Q-learning-based Approximate Dynamic Programming (ADP) approach, and corresponding algorithms for the online resolution of the incentive Stackelberg solution without requiring prior knowledge of system dynamics. Last but not least, we prove the convergence of our approach to the optimum.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
Authors:
Feihong He,
Gang Li,
Mengyuan Zhang,
Leilei Yan,
Lingyu Si,
Fanzhang Li
Abstract:
The rapid development of generative diffusion models has significantly advanced the field of style transfer. However, most current style transfer methods based on diffusion models typically involve a slow iterative optimization process, e.g., model fine-tuning and textual inversion of style concept. In this paper, we introduce FreeStyle, an innovative style transfer method built upon a pre-trained…
▽ More
The rapid development of generative diffusion models has significantly advanced the field of style transfer. However, most current style transfer methods based on diffusion models typically involve a slow iterative optimization process, e.g., model fine-tuning and textual inversion of style concept. In this paper, we introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model, requiring no further optimization. Besides, our method enables style transfer only through a text description of the desired style, eliminating the necessity of style images. Specifically, we propose a dual-stream encoder and single-stream decoder architecture, replacing the conventional U-Net in diffusion models. In the dual-stream encoder, two distinct branches take the content image and style text prompt as inputs, achieving content and style decoupling. In the decoder, we further modulate features from the dual streams based on a given content image and the corresponding style text prompt for precise style transfer. Our experimental results demonstrate high-quality synthesis and fidelity of our method across various content images and style text prompts. The code and more results are available at our project website:https://freestylefreelunch.github.io/.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Authors:
Jun Ma,
Feifei Li,
Bo Wang
Abstract:
Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Se…
▽ More
Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models known for their strong capability in handling long sequences, we design a hybrid CNN-SSM block that integrates the local feature extraction power of convolutional layers with the abilities of SSMs for capturing the long-range dependency. Moreover, U-Mamba enjoys a self-configuring mechanism, allowing it to automatically adapt to various datasets without manual intervention. We conduct extensive experiments on four diverse tasks, including the 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results reveal that U-Mamba outperforms state-of-the-art CNN-based and Transformer-based segmentation networks across all tasks. This opens new avenues for efficient long-range dependency modeling in biomedical image analysis. The code, models, and data are publicly available at https://wanglab.ai/u-mamba.html.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Coordination of Dam** Controllers: A Novel Data-Informed Approach for Adaptability
Authors:
Francisco Zelaya-Arrazabal,
Hector Pulgar-Painemal,
**gzi Liu,
Horacio Silva-Saravia,
Fangxing Li
Abstract:
This paper explores the novel concept of dam** controller coordination, which aims to minimize the Total Action metric by identifying an optimal switching combination (on/off) of these controllers. The metric is rooted in power system physics, capturing oscillation energy associated with all synchronous generators in the grid. While coordination has shown promising results, it has relied on comp…
▽ More
This paper explores the novel concept of dam** controller coordination, which aims to minimize the Total Action metric by identifying an optimal switching combination (on/off) of these controllers. The metric is rooted in power system physics, capturing oscillation energy associated with all synchronous generators in the grid. While coordination has shown promising results, it has relied on computing linear sensitivities based on the grid model. This paper proposes a data-informed framework to accurately estimate total action and subsequently determine an optimal switching combination. The estimation is provided by a multivariate function approximator that captures the nonlinear relationship between system-wide area measurements, the status of dam** controllers, and the conditions of the disturbance. By enabling real-time coordination, electromechanical oscillations are reduced, enhancing power system stability. The concept is tested in the Western North America Power System (wNAPS) and compared with the model-based approach for coordination. The proposed coordination outperforms the model-based approach, demonstrating effective adaptability and performance in handling multi-mode events. Additionally, the results show significant reductions in low-frequency electromechanical oscillations even under various operating conditions, fault locations, and time delay considerations.
△ Less
Submitted 1 May, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Enhanced Q-Learning Approach to Finite-Time Reachability with Maximum Probability for Probabilistic Boolean Control Networks
Authors:
Hongyue Fan,
**gjie Ni,
Fangfei Li
Abstract:
In this paper, we investigate the problem of controlling probabilistic Boolean control networks (PBCNs) to achieve reachability with maximum probability in the finite time horizon. We address three questions: 1) finding control policies that achieve reachability with maximum probability under fixed, and particularly, varied finite time horizon, 2) leveraging prior knowledge to solve question 1) wi…
▽ More
In this paper, we investigate the problem of controlling probabilistic Boolean control networks (PBCNs) to achieve reachability with maximum probability in the finite time horizon. We address three questions: 1) finding control policies that achieve reachability with maximum probability under fixed, and particularly, varied finite time horizon, 2) leveraging prior knowledge to solve question 1) with faster convergence speed in scenarios where time is a variable framework, and 3) proposing an enhanced Q-learning (QL) method to efficiently address the aforementioned questions for large-scale PBCNs. For question 1), we demonstrate the applicability of QL method on the finite-time reachability problem. For question 2), considering the possibility of varied time frames, we incorporate transfer learning (TL) technique to leverage prior knowledge and enhance convergence speed. For question 3), an enhanced model-free QL approach that improves upon the traditional QL algorithm by introducing memory-efficient modifications to address these issues in large-scale PBCNs effectively. Finally, we apply the proposed method to two examples: a small-scale PBCN and a large-scale PBCN, demonstrating the effectiveness of our approach.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
A 5G DMRS-based Signal for Integrated Sensing and Communication System
Authors:
Zhiqing Wei,
Fengyun Li,
Haotian Liu,
Xu Chen,
Huici Wu,
Kaifeng Han,
Zhiyong Feng
Abstract:
Integrated sensing and communication (ISAC) is considered as the potential key technology of the future mobile communication systems. The signal design is fundamental for the ISAC system. The reference signals in mobile communication systems have good detection performance, which is worth further research. Existing studies applied the single reference signal to radar sensing. In this paper, a mult…
▽ More
Integrated sensing and communication (ISAC) is considered as the potential key technology of the future mobile communication systems. The signal design is fundamental for the ISAC system. The reference signals in mobile communication systems have good detection performance, which is worth further research. Existing studies applied the single reference signal to radar sensing. In this paper, a multiple reference signals collaborative sensing scheme is designed. Specifically, we jointly apply channel state information reference signal (CSI-RS), positioning reference signal (PRS) and demodulation reference signal (DMRS) in radar sensing, which improve the performance of radar sensing via obtaining continuous time-frequency resource map**. Crámer-Rao lower bound (CRLB) of the joint reference signal for distance and velocity estimation is derived. The impacts of carrier frequency and subcarrier spacing on the performance of distance and velocity estimation are revealed. The results of simulation experiments show that compared with the single reference signal sensing scheme, the multiple reference signals collaborative sensing scheme effectively improves the sensing accuracy. Moreover, because of the discontinuous OFDM symbols, the accuracy of velocity estimation could be further improved via compressed sensing (CS). This paper has verified that multiple reference signals, instead of single reference signal, have much more superior performance on radar sensing, which is a practical and efficient approach in designing ISAC signal.
△ Less
Submitted 2 March, 2024; v1 submitted 1 November, 2023;
originally announced December 2023.
-
Convolutional Neural Networks for Segmentation of Malignant Pleural Mesothelioma: Analysis of Probability Map Thresholds (CALGB 30901, Alliance)
Authors:
Mena Shenouda,
Eyjólfur Gudmundsson,
Feng Li,
Christopher M. Straus,
Hedy L. Kindler,
Arkadiusz Z. Dudek,
Thomas Stinchcombe,
Xiaofei Wang,
Adam Starkey,
Samuel G. Armato III
Abstract:
Malignant pleural mesothelioma (MPM) is the most common form of mesothelioma. To assess response to treatment, tumor measurements are acquired and evaluated based on a patient's longitudinal computed tomography (CT) scans. Tumor volume, however, is the more accurate metric for assessing tumor burden and response. Automated segmentation methods using deep learning can be employed to acquire volume,…
▽ More
Malignant pleural mesothelioma (MPM) is the most common form of mesothelioma. To assess response to treatment, tumor measurements are acquired and evaluated based on a patient's longitudinal computed tomography (CT) scans. Tumor volume, however, is the more accurate metric for assessing tumor burden and response. Automated segmentation methods using deep learning can be employed to acquire volume, which otherwise is a tedious task performed manually. The deep learning-based tumor volume and contours can then be compared with a standard reference to assess the robustness of the automated segmentations. The purpose of this study was to evaluate the impact of probability map threshold on MPM tumor delineations generated using a convolutional neural network (CNN). Eighty-eight CT scans from 21 MPM patients were segmented by a VGG16/U-Net CNN. A radiologist modified the contours generated at a 0.5 probability threshold. Percent difference of tumor volume and overlap using the Dice Similarity Coefficient (DSC) were compared between the standard reference provided by the radiologist and CNN outputs for thresholds ranging from 0.001 to 0.9. CNN annotations consistently yielded smaller tumor volumes than radiologist contours. Reducing the probability threshold from 0.5 to 0.1 decreased the absolute percent volume difference, on average, from 43.96% to 24.18%. Median and mean DSC ranged from 0.58 to 0.60, with a peak at a threshold of 0.5; no distinct threshold was found for percent volume difference. No single output threshold in the CNN probability maps was optimal for both tumor volume and DSC. This work underscores the need to assess tumor volume and spatial overlap when evaluating CNN performance. While automated segmentations may yield comparable tumor volumes to that of the reference standard, the spatial region delineated by the CNN at a specific threshold is equally important.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks
Authors:
Xianlun Peng,
Yang Tang,
Fangfei Li,
Yang Liu
Abstract:
In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obta…
▽ More
In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obtains optimal attack strategies for large-scale PBCNs that the standard QL algorithm cannot handle. Finally, we verify the effectiveness of our proposed approach by considering two attacked PBCNs, including a 10-node network and a 28-node network.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Investigating the Use of Traveltime and Reflection Tomography for Deep Learning-Based Sound-Speed Estimation in Ultrasound Computed Tomography
Authors:
Gangwon Jeong,
Fu Li,
Umberto Villa,
Mark A. Anastasio
Abstract:
Ultrasound computed tomography (USCT) is actively being developed to quantify acoustic tissue properties such as the speed-of-sound (SOS). Although full-waveform inversion (FWI) is an effective method for accurate SOS reconstruction, it can be computationally challenging for large-scale problems. Deep learning-based image-to-image learned reconstruction (IILR) methods are being investigated as sca…
▽ More
Ultrasound computed tomography (USCT) is actively being developed to quantify acoustic tissue properties such as the speed-of-sound (SOS). Although full-waveform inversion (FWI) is an effective method for accurate SOS reconstruction, it can be computationally challenging for large-scale problems. Deep learning-based image-to-image learned reconstruction (IILR) methods are being investigated as scalable and computationally efficient alternatives. This study investigates the impact of the chosen input modalities on IILR methods for high-resolution SOS reconstruction in USCT. The selected modalities are traveltime tomography (TT) and reflection tomography (RT), which produce a low-resolution SOS map and a reflectivity map, respectively. These modalities have been chosen for their lower computational cost relative to FWI and their capacity to provide complementary information: TT offers a direct -- while low resolution -- SOS measure, while RT reveals tissue boundary information. Systematic analyses were facilitated by employing a stylized USCT imaging system with anatomically realistic numerical breast phantoms. Within this testbed, a supervised convolutional neural network (CNN) was trained to map dual-channel (TT and RT images) to a high-resolution SOS map. Moreover, the CNN was fine-tuned using a weighted reconstruction loss that prioritized tumor regions to address tumor underrepresentation in the training dataset. To understand the benefits of employing dual-channel inputs, single-input CNNs were trained separately using inputs from each modality alone (TT or RT). The methods were assessed quantitatively using normalized root mean squared error and structural similarity index measure for reconstruction accuracy and receiver operating characteristic analysis to assess signal detection-based performance measures.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Authors:
Zeyu Zhu,
Fanrong Li,
Gang Li,
Zejian Liu,
Zitao Mo,
Qinghao Hu,
Xiaoyao Liang,
Jian Cheng
Abstract:
Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy…
▽ More
Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy efficiency. To address this issue, we propose a Memory-Efficient GNN Accelerator (MEGA) through algorithm and hardware co-design in this work. Specifically, at the algorithm level, through an in-depth analysis of the node property, we observe that the data-independent quantization in previous works is not optimal in terms of accuracy and memory efficiency. This motivates us to propose the Degree-Aware mixed-precision quantization method, in which a proper bitwidth is learned and allocated to a node according to its in-degree to compress GNNs as much as possible while maintaining accuracy. At the hardware level, we employ a heterogeneous architecture design in which the aggregation and combination phases are implemented separately with different dataflows. In order to boost the performance and energy efficiency, we also present an Adaptive-Package format to alleviate the storage overhead caused by the fine-grained bitwidth and diverse sparsity, and a Condense-Edge scheduling method to enhance the data locality and further alleviate the access irregularity induced by the extremely sparse adjacency matrix in the graph. We implement our MEGA accelerator in a 28nm technology node. Extensive experiments demonstrate that MEGA can achieve an average speedup of 38.3x, 7.1x, 4.0x, 3.6x and 47.6x, 7.2x, 5.4x, 4.5x energy savings over four state-of-the-art GNN accelerators, HyGCN, GCNAX, GROW, and SGCN, respectively, while retaining task accuracy.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Modeling the impact of extreme summer drought on conventional and renewable generation capacity: methods and a case study on the Eastern U.S. power system
Authors:
Hang Shuai,
Fangxing Li,
**xiang Zhu,
William Jerome Tingen II,
Srijib Mukherjee
Abstract:
The United States has witnessed a growing prevalence of droughts in recent years, posing significant challenges to water supplies and power generation. The resulting impacts on power systems, including reduced capacity and the potential for power outages, underscore the need for accurate assessment methods to ensure the reliable operation of the nation's energy infrastructure. A critical step is t…
▽ More
The United States has witnessed a growing prevalence of droughts in recent years, posing significant challenges to water supplies and power generation. The resulting impacts on power systems, including reduced capacity and the potential for power outages, underscore the need for accurate assessment methods to ensure the reliable operation of the nation's energy infrastructure. A critical step is to evaluate the usable capacity of a regional power system's generation fleet, which is a complex undertaking and requires precise modeling of the effects of hydrological and meteorological conditions on diverse generating technologies. This paper proposes a systematic, analytical approach for assessing the impacts of extreme summer drought events on the available capacity of hydro, thermal, and renewable energy generators. More specifically, the systematic framework provides plant-level capacity derating models for hydroelectric, once-through cooling thermoelectric, recirculating cooling thermoelectric, combustion turbine, solar PV, and wind turbine systems. Application of the proposed impact assessment framework to the 2025 generation fleet of the real-world power system in the PJM and SERC regions yields insightful results. By examining the daily usable capacity of 6,055 at-risk generators throughout the study region, we find that in the event of the recurrence of the 2007 southeastern summer drought in the near future, the usable capacity of all at-risk power plants may experience a substantial decrease compared to a typical summer, falling within the range of 71% to 81%. The sensitivity analysis reveals that the usable capacity would experience a more pronounced decline under more severe drought conditions. The findings of this study offer valuable insights, enabling stakeholders to enhance the resilience of power systems against the potential effects of extreme drought in the future.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Transient Thermal and Electrical Characteristics of a Cylindrical LiFeS2 Cell with Equivalent Circuit Model
Authors:
Khaled I Alsharif,
Alexander H Pesch,
Vamsi Borra,
Pedro Cortes,
Eric MacDonald,
Frank X Li,
Kyosung Choo
Abstract:
This study examines the discharge behaviour of a cylindrical LiFeS2 cell to evaluate the parameters that can be used to predict and estimate the nonlinear dynamic response of a battery. A linear model is developed to simulate the discharge behaviour and examine the thermal behaviour. In particular, a commercial-grade battery is discharged with the industry-standard hybrid power pulsing characteriz…
▽ More
This study examines the discharge behaviour of a cylindrical LiFeS2 cell to evaluate the parameters that can be used to predict and estimate the nonlinear dynamic response of a battery. A linear model is developed to simulate the discharge behaviour and examine the thermal behaviour. In particular, a commercial-grade battery is discharged with the industry-standard hybrid power pulsing characterization (HPPC) test and the current and voltage responses are recorded. The dynamic system is modelled with the equivalent circuit model (ECM) through MATLAB Simulink. A block diagram representation of the equivalent circuit model governing equations was developed. The parameter estimation tool was utilized to reduce the error and fit the simulation results to the experimental voltage responses, in order to obtain state of charge dependent dynamic parameters. Those parameters were then used in a Dual-Potential Multi-Scale Multi-Domain (MSMD) Battery Model solved in ANSYS Fluent to analyze the thermal behaviour by acquiring the temperature profiles and the temperature distribution within the cell. The nonlinear behaviour of the battery was characterized and the equivalent circuit model parameters were identified and are shown to agree with the experimental voltage responses. Furthermore, it is found that the battery temperature increased by 7.35 deg and was distributed uniformly within the cell.
△ Less
Submitted 28 October, 2023;
originally announced November 2023.
-
Harmonic content analysis of a soft starting variable frequency motor drive based on FPGA
Authors:
Yogesh Sapkota,
Suman Devkota,
Vamsi Borra,
Pedro Cortes,
Frank Li,
Srikanth Itapu
Abstract:
As the demands for electric vehicles, electric aircrafts, unmanned aircraft systems, and other motor-driven systems increase, high-performance motor drives employing variable frequency control with higher efficiency and reliability are becoming increasingly important parts of the ever-changing technological landscape. This study proposes a Field Programmable Gate Array (FPGA)-based variable freque…
▽ More
As the demands for electric vehicles, electric aircrafts, unmanned aircraft systems, and other motor-driven systems increase, high-performance motor drives employing variable frequency control with higher efficiency and reliability are becoming increasingly important parts of the ever-changing technological landscape. This study proposes a Field Programmable Gate Array (FPGA)-based variable frequency soft-starting motor drive for a three-phase induction motor. The inverter output voltage and the load currents are analyzed for the harmonic contents using MATLAB. In the experimental realization, a four-pole squirrel cage delta-connected induction motor is utilized with a switching frequency of 4 kHz. The current and voltage characteristics of the induction motor are studied under different operating conditions to study harmonic contents and the effect of changing soft-start duration. The findings demonstrate a low-cost, flexible control of the induction motor with improved harmonic performance.
△ Less
Submitted 28 October, 2023;
originally announced November 2023.
-
Integrated Sensing and Communication Signal Processing Based on Compressed Sensing Over Unlicensed Spectrum Bands
Authors:
Haotian Liu,
Zhiqing Wei,
Fengyun Li,
Yuewei Lin,
Hanyang Qu,
Huici Wu,
Zhiyong Feng
Abstract:
As a promising key technology of 6th generation (6G) mobile communication system, integrated sensing and communication (ISAC) technology aims to make full use of spectrum resources to enable the functional integration of communication and sensing. The ISAC-enabled mobile communication system regularly operate in non-continuous spectrum bands due to crowded licensed frequency bands. However, the co…
▽ More
As a promising key technology of 6th generation (6G) mobile communication system, integrated sensing and communication (ISAC) technology aims to make full use of spectrum resources to enable the functional integration of communication and sensing. The ISAC-enabled mobile communication system regularly operate in non-continuous spectrum bands due to crowded licensed frequency bands. However, the conventional sensing algorithms over non-continuous spectrum bands have disadvantages such as reduced peak-to-side lobe ratio (PSLR) and degraded anti-noise performance. Facing this challenge, we propose a high-precision ISAC signal processing algorithm based on compressed sensing (CS) in this paper. By integrating the resource block group (RBG) configuration information in 5th generation new radio (5G NR) and channel information matrices, we can dynamically and accurately obtain power estimation spectra. Moreover, we employ the fast iterative shrinkage-thresholding algorithm (FISTA) to address the reconstruction problem and utilize K-fold cross validation (KCV) to obtain optimal parameters. Simulation results show that the proposed algorithm has lower sidelobes or even zero sidelobes compared with conventional sensing algorithms. Meanwhile, compared with the improved 2D FFT algorithm and conventional 2D FFT algorithm, the proposed algorithms in this paper have a maximum improvement of 54.66 % and 84.36 % in range estimation accuracy, and 41.54 % and 97.09 % in velocity estimation accuracy, respectively.
△ Less
Submitted 19 April, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Learned Full Waveform Inversion Incorporating Task Information for Ultrasound Computed Tomography
Authors:
Luke Lozenski,
Hanchen Wang,
Fu Li,
Mark A. Anastasio,
Brendt Wohlberg,
Youzuo Lin,
Umberto Villa
Abstract:
Ultrasound computed tomography (USCT) is an emerging imaging modality that holds great promise for breast imaging. Full-waveform inversion (FWI)-based image reconstruction methods incorporate accurate wave physics to produce high spatial resolution quantitative images of speed of sound or other acoustic properties of the breast tissues from USCT measurement data. However, the high computational co…
▽ More
Ultrasound computed tomography (USCT) is an emerging imaging modality that holds great promise for breast imaging. Full-waveform inversion (FWI)-based image reconstruction methods incorporate accurate wave physics to produce high spatial resolution quantitative images of speed of sound or other acoustic properties of the breast tissues from USCT measurement data. However, the high computational cost of FWI reconstruction represents a significant burden for its widespread application in a clinical setting. The research reported here investigates the use of a convolutional neural network (CNN) to learn a map** from USCT waveform data to speed of sound estimates. The CNN was trained using a supervised approach with a task-informed loss function aiming at preserving features of the image that are relevant to the detection of lesions. A large set of anatomically and physiologically realistic numerical breast phantoms (NBPs) and corresponding simulated USCT measurements was employed during training. Once trained, the CNN can perform real-time FWI image reconstruction from USCT waveform data. The performance of the proposed method was assessed and compared against FWI using a hold-out sample of 41 NBPs and corresponding USCT data. Accuracy was measured using relative mean square error (RMSE), structural self-similarity index measure (SSIM), and lesion detection performance (DICE score). This numerical experiment demonstrates that a supervised learning model can achieve accuracy comparable to FWI in terms of RMSE and SSIM, and better performance in terms of task performance, while significantly reducing computational time.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Dynamic Dual-Graph Fusion Convolutional Network For Alzheimer's Disease Diagnosis
Authors:
Fanshi Li,
Zhihui Wang,
Yifan Guo,
Congcong Liu,
Yanjie Zhu,
Yihang Zhou,
Jun Li,
Dong Liang,
Haifeng Wang
Abstract:
In this paper, a dynamic dual-graph fusion convolutional network is proposed to improve Alzheimer's disease (AD) diagnosis performance. The following are the paper's main contributions: (a) propose a novel dynamic GCN architecture, which is an end-to-end pipeline for diagnosis of the AD task; (b) the proposed architecture can dynamically adjust the graph structure for GCN to produce better diagnos…
▽ More
In this paper, a dynamic dual-graph fusion convolutional network is proposed to improve Alzheimer's disease (AD) diagnosis performance. The following are the paper's main contributions: (a) propose a novel dynamic GCN architecture, which is an end-to-end pipeline for diagnosis of the AD task; (b) the proposed architecture can dynamically adjust the graph structure for GCN to produce better diagnosis outcomes by learning the optimal underlying latent graph; (c) incorporate feature graph learning and dynamic graph learning, giving those useful features of subjects more weight while decreasing the weights of other noise features. Experiments indicate that our model provides flexibility and stability while achieving excellent classification results in AD diagnosis.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Distributionally Robust Cross Subject EEG Decoding
Authors:
Tiehang Duan,
Zhenyi Wang,
Gianfranco Doretto,
Fang Li,
Cui Tao,
Donald Adjeroh
Abstract:
Recently, deep learning has shown to be effective for Electroencephalography (EEG) decoding tasks. Yet, its performance can be negatively influenced by two key factors: 1) the high variance and different types of corruption that are inherent in the signal, 2) the EEG datasets are usually relatively small given the acquisition cost, annotation cost and amount of effort needed. Data augmentation app…
▽ More
Recently, deep learning has shown to be effective for Electroencephalography (EEG) decoding tasks. Yet, its performance can be negatively influenced by two key factors: 1) the high variance and different types of corruption that are inherent in the signal, 2) the EEG datasets are usually relatively small given the acquisition cost, annotation cost and amount of effort needed. Data augmentation approaches for alleviation of this problem have been empirically studied, with augmentation operations on spatial domain, time domain or frequency domain handcrafted based on expertise of domain knowledge. In this work, we propose a principled approach to perform dynamic evolution on the data for improvement of decoding robustness. The approach is based on distributionally robust optimization and achieves robustness by optimizing on a family of evolved data distributions instead of the single training data distribution. We derived a general data evolution framework based on Wasserstein gradient flow (WGF) and provides two different forms of evolution within the framework. Intuitively, the evolution process helps the EEG decoder to learn more robust and diverse features. It is worth mentioning that the proposed approach can be readily integrated with other data augmentation approaches for further improvements. We performed extensive experiments on the proposed approach and tested its performance on different types of corrupted EEG signals. The model significantly outperforms competitive baselines on challenging decoding scenarios.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
EEG-based Emotion Style Transfer Network for Cross-dataset Emotion Recognition
Authors:
Yi** Zhou,
Fu Li,
Yang Li,
Youshuo Ji,
Lijian Zhang,
Yuanfang Chen,
Wenming Zheng,
Guangming Shi
Abstract:
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the pr…
▽ More
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
You Can Mask More For Extremely Low-Bitrate Image Compression
Authors:
Anqi Li,
Feng Li,
Jiaxin Han,
Huihui Bai,
Runmin Cong,
Chunjie Zhang,
Meng Wang,
Weisi Lin,
Yao Zhao
Abstract:
Learned image compression (LIC) methods have experienced significant progress during recent years. However, these methods are primarily dedicated to optimizing the rate-distortion (R-D) performance at medium and high bitrates (> 0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited. Besides, existing methods fail to explicitly explore the image structure and texture compon…
▽ More
Learned image compression (LIC) methods have experienced significant progress during recent years. However, these methods are primarily dedicated to optimizing the rate-distortion (R-D) performance at medium and high bitrates (> 0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited. Besides, existing methods fail to explicitly explore the image structure and texture components crucial for image compression, treating them equally alongside uninformative components in networks. This can cause severe perceptual quality degradation, especially under low-bitrate scenarios. In this work, inspired by the success of pre-trained masked autoencoders (MAE) in many downstream tasks, we propose to rethink its mask sampling strategy from structure and texture perspectives for high redundancy reduction and discriminative feature representation, further unleashing the potential of LIC methods. Therefore, we present a dual-adaptive masking approach (DA-Mask) that samples visible patches based on the structure and texture distributions of original images. We combine DA-Mask and pre-trained MAE in masked image modeling (MIM) as an initial compressor that abstracts informative semantic context and texture representations. Such a pipeline can well cooperate with LIC networks to achieve further secondary compression while preserving promising reconstruction quality. Consequently, we propose a simple yet effective masked compression model (MCM), the first framework that unifies MIM and LIC end-to-end for extremely low-bitrate image compression. Extensive experiments have demonstrated that our approach outperforms recent state-of-the-art methods in R-D performance, visual quality, and downstream applications, at very low bitrates. Our code is available at https://github.com/lianqi1008/MCM.git.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Non-Integer-Oversampling Digital Signal Processing for Coherent Passive Optical Networks
Authors:
Haide Wang,
Ji Zhou,
**yang Yang,
Jianrui Zeng,
Wei** Liu,
Changyuan Yu,
Fan Li,
Zhaohui Li
Abstract:
Beyond 100G passive optical networks (PONs) will be required to meet the ever-increasing traffic demand in the future. Coherent optical technologies are the competitive solutions for the future beyond 100G PON but also face challenges such as the high computational complexity of digital signal processing (DSP). A high oversampling rate in coherent optical technologies results in the high computati…
▽ More
Beyond 100G passive optical networks (PONs) will be required to meet the ever-increasing traffic demand in the future. Coherent optical technologies are the competitive solutions for the future beyond 100G PON but also face challenges such as the high computational complexity of digital signal processing (DSP). A high oversampling rate in coherent optical technologies results in the high computational complexity of DSP. Therefore, DSP running in a non-integer-oversampling below 2 samples-per-symbol (sps) is preferred, which can not only reduce computational complexity but also obviously lower the requirement for the analog-to-digital converter. In this paper, we propose a non-integer-oversampling DSP for meeting the requirements of coherent PON. The proposed DSP working at 9/8-sps and 5/4-sps oversampling rates can be reduced by 44.04% and 40.78% computational complexity compared to that working at the 2-sps oversampling rate, respectively. Moreover, a 400-Gb/s-net-rate coherent PON based on digital subcarrier multiplexing was demonstrated to verify the feasibility of the non-integer-oversampling DSP. There is almost no penalty on the receiver sensitivity when the non-integer-oversampling DSP is adopted. In conclusion, the non-integer-oversampling DSP shows great potential in the future coherent PON.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Experts' cognition-driven ensemble deep learning for external validation of predicting pathological complete response to neoadjuvant chemotherapy from histological images in breast cancer
Authors:
Yongquan Yang,
Fengling Li,
Yani Wei,
Yuanyuan Zhao,
**g Fu,
Xiuli Xiao,
Hong Bu
Abstract:
In breast cancer imaging, there has been a trend to directly predict pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) from histological images based on deep learning (DL). However, it has been a commonly known problem that the constructed DL-based models numerically have better performances in internal validation than in external validation. The primary reason for this situat…
▽ More
In breast cancer imaging, there has been a trend to directly predict pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) from histological images based on deep learning (DL). However, it has been a commonly known problem that the constructed DL-based models numerically have better performances in internal validation than in external validation. The primary reason for this situation lies in that the distribution of the external data for validation is different from the distribution of the training data for the construction of the predictive model. In this paper, we aim to alleviate this situation with a more intrinsic approach. We propose an experts' cognition-driven ensemble deep learning (ECDEDL) approach for external validation of predicting pCR to NAC from histological images in breast cancer. The proposed ECDEDL, which takes the cognition of both pathology and artificial intelligence experts into consideration to improve the generalization of the predictive model to the external validation, more intrinsically approximates the working paradigm of a human being which will refer to his various working experiences to make decisions. The proposed ECDEDL approach was validated with 695 WSIs collected from the same center as the primary dataset to develop the predictive model and perform the internal validation, and 340 WSIs collected from other three centers as the external dataset to perform the external validation. In external validation, the proposed ECDEDL approach improves the AUCs of pCR prediction from 61.52(59.80-63.26) to 67.75(66.74-68.80) and the Accuracies of pCR prediction from 56.09(49.39-62.79) to 71.01(69.44-72.58). The proposed ECDEDL was quite effective for external validation, numerically more approximating the internal validation.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Exploring Resolution Fields for Scalable Image Compression with Uncertainty Guidance
Authors:
Dongyi Zhang,
Feng Li,
Man Liu,
Runmin Cong,
Huihui Bai,
Meng Wang,
Yao Zhao
Abstract:
Recently, there are significant advancements in learning-based image compression methods surpassing traditional coding standards. Most of them prioritize achieving the best rate-distortion performance for a particular compression rate, which limits their flexibility and adaptability in various applications with complex and varying constraints. In this work, we explore the potential of resolution f…
▽ More
Recently, there are significant advancements in learning-based image compression methods surpassing traditional coding standards. Most of them prioritize achieving the best rate-distortion performance for a particular compression rate, which limits their flexibility and adaptability in various applications with complex and varying constraints. In this work, we explore the potential of resolution fields in scalable image compression and propose the reciprocal pyramid network (RPN) that fulfills the need for more adaptable and versatile compression. Specifically, RPN first builds a compression pyramid and generates the resolution fields at different levels in a top-down manner. The key design lies in the cross-resolution context mining module between adjacent levels, which performs feature enriching and distillation to mine meaningful contextualized information and remove unnecessary redundancy, producing informative resolution fields as residual priors. The scalability is achieved by progressive bitstream reusing and resolution field incorporation varying at different levels. Furthermore, between adjacent compression levels, we explicitly quantify the aleatoric uncertainty from the bottom decoded representations and develop an uncertainty-guided loss to update the upper-level compression parameters, forming a reverse pyramid process that enforces the network to focus on the textured pixels with high variance for more reliable and accurate reconstruction. Combining resolution field exploration and uncertainty guidance in a pyramid manner, RPN can effectively achieve spatial and quality scalable image compression. Experiments show the superiority of RPN against existing classical and deep learning-based scalable codecs. Code will be available at https://github.com/JGIroro/RPNSIC.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Authors:
Li Fu,
Siqi Li,
Qingtao Li,
Fangzhu Li,
Li** Deng,
Lu Fan,
Meng Chen,
Youzheng Wu,
Xiaodong He
Abstract:
Self-Supervised Learning (SSL) Automatic Speech Recognition (ASR) models have shown great promise over Supervised Learning (SL) ones in low-resource settings. However, the advantages of SSL are gradually weakened when the amount of labeled data increases in many industrial applications. To further improve the ASR performance when abundant labels are available, we first explore the potential of com…
▽ More
Self-Supervised Learning (SSL) Automatic Speech Recognition (ASR) models have shown great promise over Supervised Learning (SL) ones in low-resource settings. However, the advantages of SSL are gradually weakened when the amount of labeled data increases in many industrial applications. To further improve the ASR performance when abundant labels are available, we first explore the potential of combining SL and SSL ASR models via analyzing their complementarity in recognition accuracy and optimization property. Then, we propose a novel Optimal Transport based Fusion (OTF) method for SL and SSL models without incurring extra computation cost in inference. Specifically, optimal transport is adopted to softly align the layer-wise weights to unify the two different networks into a single one. Experimental results on the public 1k-hour English LibriSpeech dataset and our in-house 2.6k-hour Chinese dataset show that OTF largely outperforms the individual models with lower error rates.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
Rethinking PRL: A Multiscale Progressively Residual Learning Network for Inverse Halftoning
Authors:
Feiyu Li,
Jun Yang
Abstract:
Image inverse halftoning is a classic image restoration task, aiming to recover continuous-tone images from halftone images with only bilevel pixels. Because the halftone images lose much of the original image content, inverse halftoning is a classic ill-problem. Although existing inverse halftoning algorithms achieve good performance, their results lose image details and features. Therefore, it i…
▽ More
Image inverse halftoning is a classic image restoration task, aiming to recover continuous-tone images from halftone images with only bilevel pixels. Because the halftone images lose much of the original image content, inverse halftoning is a classic ill-problem. Although existing inverse halftoning algorithms achieve good performance, their results lose image details and features. Therefore, it is still a challenge to recover high-quality continuous-tone images. In this paper, we propose an end-to-end multiscale progressively residual learning network (MSPRL), which has a UNet architecture and takes multiscale input images. To make full use of different input image information, we design a shallow feature extraction module to capture similar features between images of different scales. We systematically study the performance of different methods and compare them with our proposed method. In addition, we employ different training strategies to optimize the model, which is important for optimizing the training process and improving performance. Extensive experiments demonstrate that our MSPRL model obtains considerable performance gains in detail restoration.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
A Laplacian Pyramid Based Generative H&E Stain Augmentation Network
Authors:
Fangda Li,
Zhiqiang Hu,
Wen Chen,
Avinash Kak
Abstract:
Hematoxylin and Eosin (H&E) staining is a widely used sample preparation procedure for enhancing the saturation of tissue sections and the contrast between nuclei and cytoplasm in histology images for medical diagnostics. However, various factors, such as the differences in the reagents used, result in high variability in the colors of the stains actually recorded. This variability poses a challen…
▽ More
Hematoxylin and Eosin (H&E) staining is a widely used sample preparation procedure for enhancing the saturation of tissue sections and the contrast between nuclei and cytoplasm in histology images for medical diagnostics. However, various factors, such as the differences in the reagents used, result in high variability in the colors of the stains actually recorded. This variability poses a challenge in achieving generalization for machine-learning based computer-aided diagnostic tools. To desensitize the learned models to stain variations, we propose the Generative Stain Augmentation Network (G-SAN) -- a GAN-based framework that augments a collection of cell images with simulated yet realistic stain variations. At its core, G-SAN uses a novel and highly computationally efficient Laplacian Pyramid (LP) based generator architecture, that is capable of disentangling stain from cell morphology. Through the task of patch classification and nucleus segmentation, we show that using G-SAN-augmented training data provides on average 15.7% improvement in F1 score and 7.3% improvement in panoptic quality, respectively. Our code is available at https://github.com/lifangda01/GSAN-Demo.
△ Less
Submitted 14 July, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
PALM: Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation
Authors:
Huihui Fang,
Fei Li,
Junde Wu,
Huazhu Fu,
Xu Sun,
José Ignacio Orlando,
Hrvoje Bogunović,
Xiulan Zhang,
Yanwu Xu
Abstract:
Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populati…
▽ More
Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review
Authors:
Chuang Zhu,
Shengjie Liu,
Zekuan Yu,
Feng Xu,
Arpit Aggarwal,
Germán Corredor,
Anant Madabhushi,
Qixun Qu,
Hongwei Fan,
Fangda Li,
Yueheng Li,
Xianchao Guan,
Yongbing Zhang,
Vivek Kumar Singh,
Farhan Akram,
Md. Mostafa Kamal Sarker,
Zhongyue Shi,
Mulan **
Abstract:
For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from Hematoxylin and Eosin (H&E) stained images is a valuable research direct…
▽ More
For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from Hematoxylin and Eosin (H&E) stained images is a valuable research direction. Therefore, we held the breast cancer immunohistochemical image generation challenge, aiming to explore novel ideas of deep learning technology in pathological image generation and promote research in this field. The challenge provided registered H&E and IHC-stained image pairs, and participants were required to use these images to train a model that can directly generate IHC-stained images from corresponding H&E-stained images. We selected and reviewed the five highest-ranking methods based on their PSNR and SSIM metrics, while also providing overviews of the corresponding pipelines and implementations. In this paper, we further analyze the current limitations in the field of breast cancer immunohistochemical image generation and forecast the future development of this field. We hope that the released dataset and the challenge will inspire more scholars to jointly study higher-quality IHC-stained image generation.
△ Less
Submitted 22 September, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Segment Anything in Medical Images
Authors:
Jun Ma,
Yuting He,
Feifei Li,
Lin Han,
Chenyu You,
Bo Wang
Abstract:
Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often tailored to specific modalities or disease types, lack generalizability across the diverse spectrum of medical image segmentation tasks. Here we present MedSAM, a foundation model designed for bridging this gap by ena…
▽ More
Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often tailored to specific modalities or disease types, lack generalizability across the diverse spectrum of medical image segmentation tasks. Here we present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. We conduct a comprehensive evaluation on 86 internal validation tasks and 60 external validation tasks, demonstrating better accuracy and robustness than modality-wise specialist models. By delivering accurate and efficient segmentation across a wide spectrum of tasks, MedSAM holds significant potential to expedite the evolution of diagnostic tools and the personalization of treatment plans.
△ Less
Submitted 1 April, 2024; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Reinforcement Learning Based Minimum State-flipped Control for the Reachability of Boolean Control Networks
Authors:
**gjie Ni,
Fangfei Li
Abstract:
To realize reachability as well as reduce control costs of Boolean Control Networks (BCNs) with state-flipped control, a reinforcement learning based method is proposed to obtain flip kernels and the optimal policy with minimal flip** actions to realize reachability. The method proposed is model-free and of low computational complexity. In particular, Q-learning (QL), fast QL, and small memory Q…
▽ More
To realize reachability as well as reduce control costs of Boolean Control Networks (BCNs) with state-flipped control, a reinforcement learning based method is proposed to obtain flip kernels and the optimal policy with minimal flip** actions to realize reachability. The method proposed is model-free and of low computational complexity. In particular, Q-learning (QL), fast QL, and small memory QL are proposed to find flip kernels. Fast QL and small memory QL are two novel algorithms. Specifically, fast QL, namely, QL combined with transfer-learning and special initial states, is of higher efficiency, and small memory QL is applicable to large-scale systems. Meanwhile, we present a novel reward setting, under which the optimal policy with minimal flip** actions to realize reachability is the one of the highest returns. Then, to obtain the optimal policy, we propose QL, and fast small memory QL for large-scale systems. Specifically, on the basis of the small memory QL mentioned before, the fast small memory QL uses a changeable reward setting to speed up the learning efficiency while ensuring the optimality of the policy. For parameter settings, we give some system properties for reference. Finally, two examples, which are a small-scale system and a large-scale one, are considered to verify the proposed method.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Deep Reinforcement Learning Based Optimal Infinite-Horizon Control of Probabilistic Boolean Control Networks
Authors:
**gjie Ni,
Fangfei Li,
Zheng-Guang Wu
Abstract:
In this paper, a deep reinforcement learning based method is proposed to obtain optimal policies for optimal infinite-horizon control of probabilistic Boolean control networks (PBCNs). Compared with the existing literatures, the proposed method is model-free, namely, the system model and the initial states needn't to be known. Meanwhile, it is suitable for large-scale PBCNs. First, we establish th…
▽ More
In this paper, a deep reinforcement learning based method is proposed to obtain optimal policies for optimal infinite-horizon control of probabilistic Boolean control networks (PBCNs). Compared with the existing literatures, the proposed method is model-free, namely, the system model and the initial states needn't to be known. Meanwhile, it is suitable for large-scale PBCNs. First, we establish the connection between deep reinforcement learning and optimal infinite-horizon control, and structure the problem into the framework of the Markov decision process. Then, PBCNs are defined as large-scale or small-scale, depending on whether the memory of the action-values exceeds the RAM of the computer. Based on the newly introduced definition, Q-learning (QL) and double deep Q-network (DDQN) are applied to the optimal infinite-horizon control of small-scale and large-scale PBCNs, respectively. Meanwhile, the optimal state feedback controllers are designed. Finally, two examples are presented, which are a small-scale PBCN with 3 nodes, and a large-scale one with 28 nodes. To verify the convergence of QL and DDQN, the optimal control policy and the optimal action-values, which are obtained from both the algorithms, are compared with the ones based on a model-based method named policy iteration. Meanwhile, the performance of QL is compared with DDQN in the small-scale PBCN.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Theoretical Evaluation of the Capacity-Achieving Distribution for IM-DD Fiber-Optic Channels
Authors:
Dongdong Zou,
Wei Wang,
Sui Qi,
Fan Li,
Zhaohui Li
Abstract:
The capacity and capacity-achieving distribution for intensity-modulation and direct-detection (IM-DD) fiber-optic channels is theoretically investigated. Different from coherent fiber-optic channels, we indicate that the capacity-achieving distribution of IM-DD systems should be discussed separately in two cases: 1) IM-DD systems without optical amplifier, which are constrained in peak power; 2)…
▽ More
The capacity and capacity-achieving distribution for intensity-modulation and direct-detection (IM-DD) fiber-optic channels is theoretically investigated. Different from coherent fiber-optic channels, we indicate that the capacity-achieving distribution of IM-DD systems should be discussed separately in two cases: 1) IM-DD systems without optical amplifier, which are constrained in peak power; 2) IM-DD systems with optical amplifier, which are the average power constraint (APC) system. For the two models, the maximum mutual information achieving distribution, instead of the maximum input entropy achieving distribution, is numerically computed by the iterative Blahut-Arimoto (BA) algorithm. For the IM-DD system under peak power constraint (PPC), a dynamic-assignment BA algorithm is applied to find the capacity-achieving distribution with minimum cardinality. It is observed that the maximum difference between the minimum input cardinality and capacity is around 0.8 bits. For a fixed support input cardinality, although the observed sha** gain is small and only appears in low peak-signal-to-noise ratio (PSNR) regions in the PPC IM-DD system, the probabilistic sha** technique can also be used to introduce rate adaptation to the system by adjusting the sha** and FEC overheads since the capacity-achieving distribution is symmetric. In the IM-DD system under APC, a modified BA algorithm is investigated to solve for the capacity and capacity-achieving distribution, and a significant sha** gain is observed. For PAM8 and PAM16 modulation formats, 0.294 bits/symbol and 0.531 bits/symbol sha** gain can be obtained at the SNR of 20dB. Furthermore, since the capacity-achieving distribution is asymmetric in this case, a practical discussion of the PS technique is also presented.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Transcending shift-invariance in the paraxial regime via end-to-end inverse design of freeform nanophotonics
Authors:
William F. Li,
Gaurav Arya,
Charles Roques-Carmes,
Zin Lin,
Steven G. Johnson,
Marin Soljačić
Abstract:
Traditional optical elements and conventional metasurfaces obey shift-invariance in the paraxial regime. For imaging systems obeying paraxial shift-invariance, a small shift in input angle causes a corresponding shift in the sensor image. Shift-invariance has deep implications for the design and functionality of optical devices, such as the necessity of free space between components (as in compoun…
▽ More
Traditional optical elements and conventional metasurfaces obey shift-invariance in the paraxial regime. For imaging systems obeying paraxial shift-invariance, a small shift in input angle causes a corresponding shift in the sensor image. Shift-invariance has deep implications for the design and functionality of optical devices, such as the necessity of free space between components (as in compound objectives made of several curved surfaces). We present a method for nanophotonic inverse design of compact imaging systems whose resolution is not constrained by paraxial shift-invariance. Our method is end-to-end, in that it integrates density-based full-Maxwell topology optimization with a fully iterative elastic-net reconstruction algorithm. By the design of nanophotonic structures that scatter light in a non-shift-invariant manner, our optimized nanophotonic imaging system overcomes the limitations of paraxial shift-invariance, achieving accurate, noise-robust image reconstruction beyond shift-invariant resolution.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Cost-minimization predictive energy management of a postal-delivery fuel cell electric vehicle with intelligent battery State-of-Charge Planner
Authors:
Yang Zhou,
Fuzeng Li,
Xianfeng Xu,
Zhen Zhang,
Alexandre Ravey,
Marie-Cécile Péra,
Ruiqing Ma
Abstract:
Fuel cell electric vehicles have earned substantial attentions in recent decades due to their high-efficiency and zero-emission features, while the high operating costs remain the major barrier towards their large-scale commercialization. In such context, this paper aims to devise an energy management strategy for an urban postal-delivery fuel cell electric vehicle for operating cost mitigation. F…
▽ More
Fuel cell electric vehicles have earned substantial attentions in recent decades due to their high-efficiency and zero-emission features, while the high operating costs remain the major barrier towards their large-scale commercialization. In such context, this paper aims to devise an energy management strategy for an urban postal-delivery fuel cell electric vehicle for operating cost mitigation. First, a data-driven dual-loop spatial-domain battery state-of-charge reference estimator is designed to guide battery energy depletion, which is trained by real-world driving data collected in postal delivery missions. Then, a fuzzy C-means clustering enhanced Markov speed predictor is constructed to project the upcoming velocity. Lastly, combining the state-of-charge reference and the forecasted speed, a model predictive control-based cost-optimization energy management strategy is established to mitigate vehicle operating costs imposed by energy consumption and power-source degradations. Validation results have shown that 1) the proposed strategy could mitigate the operating cost by 4.43% and 7.30% in average versus benchmark strategies, denoting its superiority in term of cost-reduction and 2) the computation burden per step of the proposed strategy is averaged at 0.123ms, less than the sampling time interval 1s, proving its potential of real-time applications.
△ Less
Submitted 2 March, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Enhancing Federated Learning with spectrum allocation optimization and device selection
Authors:
Tinghao Zhang,
Kwok-Yan Lam,
Jun Zhao,
Feng Li,
Huimei Han,
Norziana Jamil
Abstract:
Machine learning (ML) is a widely accepted means for supporting customized services for mobile devices and applications. Federated Learning (FL), which is a promising approach to implement machine learning while addressing data privacy concerns, typically involves a large number of wireless mobile devices to collect model training data. Under such circumstances, FL is expected to meet stringent tr…
▽ More
Machine learning (ML) is a widely accepted means for supporting customized services for mobile devices and applications. Federated Learning (FL), which is a promising approach to implement machine learning while addressing data privacy concerns, typically involves a large number of wireless mobile devices to collect model training data. Under such circumstances, FL is expected to meet stringent training latency requirements in the face of limited resources such as demand for wireless bandwidth, power consumption, and computation constraints of participating devices. Due to practical considerations, FL selects a portion of devices to participate in the model training process at each iteration. Therefore, the tasks of efficient resource management and device selection will have a significant impact on the practical uses of FL. In this paper, we propose a spectrum allocation optimization mechanism for enhancing FL over a wireless mobile network. Specifically, the proposed spectrum allocation optimization mechanism minimizes the time delay of FL while considering the energy consumption of individual participating devices; thus ensuring that all the participating devices have sufficient resources to train their local models. In this connection, to ensure fast convergence of FL, a robust device selection is also proposed to help FL reach convergence swiftly, especially when the local datasets of the devices are not independent and identically distributed (non-iid). Experimental results show that (1) the proposed spectrum allocation optimization method optimizes time delay while satisfying the individual energy constraints; (2) the proposed device selection method enables FL to achieve the fastest convergence on non-iid datasets.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Integrated optimization of train timetables rescheduling and response vehicles on a disrupted metro line
Authors:
Hui Wang,
Jialin Liu,
Feng Li,
Hao Ji,
Bin Jia,
Ziyou Gao
Abstract:
When an unexpected metro disruption occurs, metro managers need to reschedule timetables to avoid trains going into the disruption area, and transport passengers stranded at disruption stations as quickly as possible. This paper proposes a two-stage optimization model to jointly make decisions for two tasks. In the first stage, the timetable rescheduling problem with cancellation and short-turning…
▽ More
When an unexpected metro disruption occurs, metro managers need to reschedule timetables to avoid trains going into the disruption area, and transport passengers stranded at disruption stations as quickly as possible. This paper proposes a two-stage optimization model to jointly make decisions for two tasks. In the first stage, the timetable rescheduling problem with cancellation and short-turning strategies is formulated as a mixed integer linear programming (MILP). In particular, the instantaneous parameters and variables are used to describe the accumulation of time-varying passenger flow. In the second one, a system-optimal dynamic traffic assignment (SODTA) model is employed to dynamically schedule response vehicles, which is able to capture the dynamic traffic and congestion. Numerical cases of Bei**g Metro Line 9 verify the efficiency and effectiveness of our proposed model, and results show that: (1) when occurring a disruption event during peak hours, the impact on the normal timetable is greater, and passengers in the direction with fewer train services are more affected; (2) if passengers stranded at the terminal stations of disruption area are not transported in time, they will rapidly increase at a speed of more than 300 passengers per minute; (3) compared with the fixed shortest path, using the response vehicles reduces the total travel time about 7%. However, it results in increased travel time for some passengers.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
DiME and AGVis: A Distributed Messaging Environment and Geographical Visualizer for Large-scale Power System Simulation
Authors:
Nicholas Parsly,
**ning Wang,
Nick West,
Qiwei Zhang,
Hantao Cui,
Fangxing Li
Abstract:
This paper introduces the messaging environment and the geographical visualization tool of the CURENT Large-scale Testbed (LTB) that can be used for large-scale power system closed-loop simulation. First, Distributed Messaging Environment (DiME) implements an asynchronous shared workspace to enable high-concurrent data exchange. Second, Another Grid Visualizer (AGVis) is presented as a geovisualiz…
▽ More
This paper introduces the messaging environment and the geographical visualization tool of the CURENT Large-scale Testbed (LTB) that can be used for large-scale power system closed-loop simulation. First, Distributed Messaging Environment (DiME) implements an asynchronous shared workspace to enable high-concurrent data exchange. Second, Another Grid Visualizer (AGVis) is presented as a geovisualization tool that facilitates the visualization of real-time power system simulation. Third, case studies show the use of DiME and AGVis. The results demonstrate that, with the modular structure, the LTB is capable of not only federal use for real-time, large-scale power system simulation, but also independent use for customized power system research.
△ Less
Submitted 17 October, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
UFO2: A unified pre-training framework for online and offline speech recognition
Authors:
Li Fu,
Siqi Li,
Qingtao Li,
Li** Deng,
Fangzhu Li,
Lu Fan,
Meng Chen,
Xiaodong He
Abstract:
In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL…
▽ More
In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL)-based ASR approach to a unified manner, where the model training is conditioned on both the full-context and dynamic-chunked inputs. To enhance the pre-trained representation model, stop-gradient operation is applied to decouple the online-mode objectives to the quantizer. Moreover, in both the pre-training and the downstream fine-tuning stages, joint losses are proposed to train the unified model with full-weight sharing for the two modes. Experimental results on the LibriSpeech dataset show that UFO2 outperforms the SSL-based baseline method by 29.7% and 18.2% relative WER reduction in offline and online modes, respectively.
△ Less
Submitted 3 April, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Learning to screen Glaucoma like the ophthalmologists
Authors:
Junde Wu,
Huihui Fang,
Fei Li,
Huazhu Fu,
Yanwu Xu
Abstract:
GAMMA Challenge is organized to encourage the AI models to screen the glaucoma from a combination of 2D fundus image and 3D optical coherence tomography volume, like the ophthalmologists.
GAMMA Challenge is organized to encourage the AI models to screen the glaucoma from a combination of 2D fundus image and 3D optical coherence tomography volume, like the ophthalmologists.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Virtual Inertia Scheduling for Power Systems with High Penetration of Inverter-based Resources
Authors:
Buxin She,
Fangxing Li,
Hantao Cui,
**nng Wang,
Qiwei Zhang,
Rui Bo
Abstract:
This paper proposes a new concept called virtual inertia scheduling (VIS) to efficiently handle the high penetration of inverter-based resources (IBRs). VIS is an inertia management framework that targets security-constrained and economy-oriented inertia scheduling and generation dispatch of power systems with a large scale of renewable generations. Specifically, it schedules the proper power sett…
▽ More
This paper proposes a new concept called virtual inertia scheduling (VIS) to efficiently handle the high penetration of inverter-based resources (IBRs). VIS is an inertia management framework that targets security-constrained and economy-oriented inertia scheduling and generation dispatch of power systems with a large scale of renewable generations. Specifically, it schedules the proper power setting points and reserved capacities of both synchronous generators and IBRs, as well as the control modes and control parameters of IBRs to provide secure and cost-effective inertia support. First, a uniform system model is employed to quantify the frequency dynamics of the IBRs-penetrated power system after disturbances. Based on the model, the s-domain and time-domain analytical responses of IBRs with inertia support capability are derived. Then, VIS-based real-time economic dispatch (VIS-RTED) is formulated to minimize generation and reserve costs, with a full consideration of dynamic frequency constraints and derived inertia support reserve constraints. The virtual inertia and dam** of IBRs are formulated as decision variables. To address the non-linearity of dynamic constraints, deep learning-assisted linearization is employed to solve the optimization problem. Finally, the proposed VIS-RTED is demonstrated on a modified IEEE 39-bus system. A full-order time-domain simulation is performed to verify the scheduling results.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Intelligent detect for substation insulator defects based on CenterMask
Authors:
Bo Ye,
Feng Li,
Mingxuan Li,
Peipei Yan,
Huiting Yang,
Lihua Wang
Abstract:
With the development of intelligent operation and maintenance of substations, the daily inspection of substations needs to process massive video and image data. This puts forward higher requirements on the processing speed and accuracy of defect detection. Based on the end-to-end learning paradigm, this paper proposes an intelligent detection method for substation insulator defects based on Center…
▽ More
With the development of intelligent operation and maintenance of substations, the daily inspection of substations needs to process massive video and image data. This puts forward higher requirements on the processing speed and accuracy of defect detection. Based on the end-to-end learning paradigm, this paper proposes an intelligent detection method for substation insulator defects based on CenterMask. First, the backbone network VoVNet is improved according to the residual connection and eSE module, which effectively solves the problems of deep network saturation and gradient information loss. On this basis, an insulator mask generation method based on a spatial attentiondirected mechanism is proposed. Insulators with complex image backgrounds are accurately segmented. Then, three strategies of pixel-wise regression prediction, multi-scale features and centerness are introduced. The anchor-free single-stage target detector accurately locates the defect points of insulators. Finally, an example analysis is carried out with the substation inspection image of a power supply company in a certain area to verify the effectiveness and robustness of the proposed method.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results
Authors:
Ren Yang,
Radu Timofte,
Xin Li,
Qi Zhang,
Lin Zhang,
Fanglong Liu,
Dongliang He,
Fu li,
He Zheng,
Weihang Yuan,
Pavel Ostyakov,
Dmitry Vyal,
Magauiya Zhussip,
Xueyi Zou,
Youliang Yan,
Lei Li,
**gzhu Tang,
Ming Chen,
Shijie Zhao,
Yu Zhu,
Xiaoran Qin,
Chenghua Li,
Cong Leng,
Jian Cheng,
Claudio Rota
, et al. (28 additional authors not shown)
Abstract:
This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3…
▽ More
This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.
△ Less
Submitted 25 August, 2022; v1 submitted 23 August, 2022;
originally announced August 2022.
-
Learning Generalizable Latent Representations for Novel Degradations in Super Resolution
Authors:
Fengjun Li,
Xin Feng,
Fanglin Chen,
Guangming Lu,
Wenjie Pei
Abstract:
Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily…
▽ More
Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily true. The real-world degradations can be beyond the simulation scope by the handcrafted degradations, which are referred to as novel degradations. In this work, we propose to learn a latent representation space for degradations, which can be generalized from handcrafted (base) degradations to novel degradations. The obtained representations for a novel degradation in this latent space are then leveraged to generate degraded images consistent with the novel degradation to compose paired training data for SR model. Furthermore, we perform variational inference to match the posterior of degradations in latent representation space with a prior distribution (e.g., Gaussian distribution). Consequently, we are able to sample more high-quality representations for a novel degradation to augment the training data for SR model. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness and advantages of our method for blind super-resolution with novel degradations.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Wound Segmentation with Dynamic Illumination Correction and Dual-view Semantic Fusion
Authors:
Honghui Liu,
Changjian Wang,
Kele Xu,
Fangzhao Li,
Ming Feng,
Yuxing Peng,
Hongjun He
Abstract:
Wound image segmentation is a critical component for the clinical diagnosis and in-time treatment of wounds. Recently, deep learning has become the mainstream methodology for wound image segmentation. However, the pre-processing of the wound image, such as the illumination correction, is required before the training phase as the performance can be greatly improved. The correction procedure and the…
▽ More
Wound image segmentation is a critical component for the clinical diagnosis and in-time treatment of wounds. Recently, deep learning has become the mainstream methodology for wound image segmentation. However, the pre-processing of the wound image, such as the illumination correction, is required before the training phase as the performance can be greatly improved. The correction procedure and the training of deep models are independent of each other, which leads to sub-optimal segmentation performance as the fixed illumination correction may not be suitable for all images. To address aforementioned issues, an end-to-end dual-view segmentation approach was proposed in this paper, by incorporating a learn-able illumination correction module into the deep segmentation models. The parameters of the module can be learned and updated during the training stage automatically, while the dual-view fusion can fully employ the features from both the raw images and the enhanced ones. To demonstrate the effectiveness and robustness of the proposed framework, the extensive experiments are conducted on the benchmark datasets. The encouraging results suggest that our framework can significantly improve the segmentation performance, compared to the state-of-the-art methods.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Electromagnetic Nonreciprocity in a Magnetized Plasma Circulator
Authors:
Feng Li,
Robert J. Davis,
Sara M. Kandil,
Daniel F. Sievenpiper
Abstract:
Nonreciprocal transport of electromagnetic waves within magnetized plasma is a powerful building block towards understanding and exploiting the properties of more general topological systems. Much recent attention has been paid to the theoretical issues of wave interaction within such a medium, but there is a lack of experimental verification that such systems can be viable in a lab or industrial…
▽ More
Nonreciprocal transport of electromagnetic waves within magnetized plasma is a powerful building block towards understanding and exploiting the properties of more general topological systems. Much recent attention has been paid to the theoretical issues of wave interaction within such a medium, but there is a lack of experimental verification that such systems can be viable in a lab or industrial setting. This work provides an experimental proof-of-concept by demonstrating nonreciprocity in a unit component, a microwave plasma circulator. We design an E-plane Y junction plasma circulator operating in the range of 4 to 6 GHz using standardized waveguide specifications. From both simulations and experiments, we observe wide band isolation for the power transmission through the circulator. The performance and the frequency band of the circulator can be easily tuned by changing the plasma density and the magnetic field strength. By linking simulations and experimental results, we estimate the plasma density for the device.
△ Less
Submitted 29 November, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.