-
Renal digital pathology visual knowledge search platform based on language large model and book knowledge
Authors:
Xiaomin Lv,
Chong Lai,
Liya Ding,
Maode Lai,
Qingrong Sun
Abstract:
Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models,…
▽ More
Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models, ultimately building a retrieval system based on the semantic features of large models. Based above analysis, we established a knowledge base of 10,317 renal pathology images and paired corresponding text descriptions, and then we evaluated the semantic feature capabilities of 4 large models, including GPT2, gemma, LLma and Qwen, and the image-based feature capabilities of dinov2 large model. Furthermore, we built a semantic retrieval system to retrieve pathological images based on text descriptions, and named RppD (aidp.zjsru.edu.cn).
△ Less
Submitted 26 May, 2024;
originally announced June 2024.
-
ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection
Authors:
Jiayi Wang,
Yi-An Mao,
Xiaoyu Ma,
Sicen Guo,
Yuting Shao,
Xiao Lv,
Wenting Han,
Mark Christopher,
Linda M. Zangwill,
Yanlong Bi,
Rui Fan
Abstract:
Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose seman…
▽ More
Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer.
△ Less
Submitted 2 June, 2024; v1 submitted 15 April, 2024;
originally announced May 2024.
-
Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation
Authors:
Minghui Zhang,
Yangqian Wu,
Hanxiao Zhang,
Yulei Qin,
Hao Zheng,
Wen Tang,
Corey Arnold,
Chenhao Pei,
Pengxin Yu,
Yang Nan,
Guang Yang,
Simon Walsh,
Dominic C. Marshall,
Matthieu Komorowski,
Puyang Wang,
Dazhou Guo,
Dakai **,
Ya'nan Wu,
Shuiqing Zhao,
Runsheng Chang,
Boyu Zhang,
Xing Lv,
Abdul Qayyum,
Moona Mazher,
Qi Su
, et al. (11 additional authors not shown)
Abstract:
Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms drive…
▽ More
Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms driven by the maturity of deep learning based approaches and clinical drive for resolving finer details of distal airways for early intervention of pulmonary diseases. Thus far, public annotated datasets are extremely limited, hindering the development of data-driven methods and detailed performance evaluation of new algorithms. To provide a benchmark for the medical imaging community, we organized the Multi-site, Multi-domain Airway Tree Modeling (ATM'22), which was held as an official challenge event during the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed pulmonary airway annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for testing). The dataset was collected from different sites and it further included a portion of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three teams participated in the entire phase of the challenge and the algorithms for the top ten teams are reviewed in this paper. Quantitative and qualitative results revealed that deep learning models embedded with the topological continuity enhancement achieved superior performance in general. ATM'22 challenge holds as an open-call design, the training data and the gold standard evaluation are available upon successful registration via its homepage.
△ Less
Submitted 27 June, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Reconfigurable Intelligent Surfaces for 6G -- Applications, Challenges and Solutions
Authors:
Yajun Zhao,
Xin Lv
Abstract:
It is expected that scholars will continuously strengthen the depth and breadth of theoretical research on RIS, and provide a higher theoretical upper bound for the engineering application of RIS. While making breakthroughs in academic research, it has also made rapid progress in engineering application research and industrialization promotion. This paper will provide an overview of RIS engineerin…
▽ More
It is expected that scholars will continuously strengthen the depth and breadth of theoretical research on RIS, and provide a higher theoretical upper bound for the engineering application of RIS. While making breakthroughs in academic research, it has also made rapid progress in engineering application research and industrialization promotion. This paper will provide an overview of RIS engineering applications, and make a systematic and in-depth analysis of the challenges and candidate solutions of RIS engineering applications. Future trends and challenges are also provided.
△ Less
Submitted 3 December, 2022;
originally announced December 2022.
-
A Multi-scale Video Denoising Algorithm for Raw Image
Authors:
Bin Ma,
Yueli Hu,
Xianxian Lv,
Kai Li
Abstract:
Video denoising for raw image has always been the difficulty of camera image processing. On the one hand, image denoising performance largely determines the image quality, moreover denoising effect in raw image will affect the accuracy of the following operations of ISP processing flow. On the other hand, compared with image, video have motion information in time sequence, thus motion estimation w…
▽ More
Video denoising for raw image has always been the difficulty of camera image processing. On the one hand, image denoising performance largely determines the image quality, moreover denoising effect in raw image will affect the accuracy of the following operations of ISP processing flow. On the other hand, compared with image, video have motion information in time sequence, thus motion estimation which is complex and computationally expensive is needed in video denoising. In view of the above problems, this paper proposes a video denoising algorithm for raw image, performing multiple cascading processing stages on raw-RGB image based on convolutional neural network, and carries out implicit motion estimation in the network. The denoising performance is far superior to that of traditional algorithms with minimal computation and bandwidth, and has computational advantages compared with most deep learning algorithms.
△ Less
Submitted 4 September, 2022;
originally announced September 2022.
-
Network Coexistence Analysis of RIS-Assisted Wireless Communications
Authors:
Yajun Zhao,
Xin Lv
Abstract:
Reconfigurable intelligent surfaces (RISs) have attracted the attention of academia and industry circles because of their ability to control the electromagnetic characteristics of channel environments. However, it has been found that the introduction of an RIS may bring new and more serious network coexistence problems. It may even further deteriorate the network performance if these new network c…
▽ More
Reconfigurable intelligent surfaces (RISs) have attracted the attention of academia and industry circles because of their ability to control the electromagnetic characteristics of channel environments. However, it has been found that the introduction of an RIS may bring new and more serious network coexistence problems. It may even further deteriorate the network performance if these new network coexistence problems cannot be effectively solved. In this paper, an RIS network coexistence model is proposed and discussed in detail, and these problems are deeply analysed. Two novel RIS design mechanisms, including a novel multilayer RIS structure with an out-of-band filter and an RIS blocking mechanism, are further explored. Finally, numerical results and a discussion are given.
△ Less
Submitted 27 July, 2022;
originally announced August 2022.
-
Dual-space Compressed Sensing
Authors:
Xudong Lv,
Ashok Ajoy
Abstract:
Compressed sensing (CS) is a powerful method routinely employed to accelerate image acquisition. It is particularly suited to situations when the image under consideration is sparse but can be sampled in a basis where it is non-sparse. Here we propose an alternate CS regime in situations where the image can be sampled in two incoherent spaces simultaneously, with a special focus on image sampling…
▽ More
Compressed sensing (CS) is a powerful method routinely employed to accelerate image acquisition. It is particularly suited to situations when the image under consideration is sparse but can be sampled in a basis where it is non-sparse. Here we propose an alternate CS regime in situations where the image can be sampled in two incoherent spaces simultaneously, with a special focus on image sampling in Fourier reciprocal spaces (e.g. real-space and k-space). Information is fed-forward from one space to the other, allowing new opportunities to efficiently solve the optimization problem at the heart of CS image reconstruction. We show that considerable gains in imaging acceleration are then possible over conventional CS. The technique provides enhanced robustness to noise, and is well suited to edge-detection problems. We envision applications for imaging collections of nanodiamond (ND) particles targeting specific regions in a volume of interest, exploiting the ability of lattice defects (NV centers) to allow ND particles to be imaged in reciprocal spaces simultaneously via optical fluorescence and 13C magnetic resonance imaging (MRI) respectively. Broadly this work suggests the potential to interface CS principles with hybrid sampling strategies to yield speedup in signal acquisition in many practical settings.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Stochastic Event-triggered Variational Bayesian Filtering
Authors:
Xiaoxu Lv,
Peihu Duan,
Zhisheng Duan,
Guanrong Chen,
Ling Shi
Abstract:
This paper proposes an event-triggered variational Bayesian filter for remote state estimation with unknown and time-varying noise covariances. After presetting multiple nominal process noise covariances and an initial measurement noise covariance, a variational Bayesian method and a fixed-point iteration method are utilized to jointly estimate the posterior state vector and the unknown noise cova…
▽ More
This paper proposes an event-triggered variational Bayesian filter for remote state estimation with unknown and time-varying noise covariances. After presetting multiple nominal process noise covariances and an initial measurement noise covariance, a variational Bayesian method and a fixed-point iteration method are utilized to jointly estimate the posterior state vector and the unknown noise covariances under a stochastic event-triggered mechanism. The proposed algorithm ensures low communication loads and excellent estimation performances for a wide range of unknown noise covariances. Finally, the performance of the proposed algorithm is demonstrated by tracking simulations of a vehicle.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Authors:
Maokui He,
Xiang Lv,
Weilin Zhou,
**g**g Yin,
Xiaoqi Zhang,
Yuxuan Wang,
Shutong Niu,
Yuhang Cao,
Heng Lu,
Jun Du,
Chin-Hui Lee
Abstract:
We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberan…
▽ More
We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberant and noisy condition. First, for data preparation and augmentation in training TS-VAD models, speech data containing both real meetings and simulated indoor conversations are used. Second, in refining results obtained after TS-VAD based decoding, we perform a series of post-processing steps to improve the VAD results needed to reduce diarization error rates (DERs). Tested on the ALIMEETING corpus, the newly released Mandarin meeting dataset used in M2MeT, we demonstrate that our proposed system can decrease the DER by up to 66.55/60.59% relatively when compared with classical clustering based diarization on the Eval/Test set.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Achieving Optimal Output Consensus for Discrete-time Linear Multi-agent Systems with Disturbance Rejection
Authors:
Yutao Tang,
Hao Zhu,
Xiaoyong Lv
Abstract:
In this paper, an optimal output consensus problem is studied for discrete-time linear multiagent systems subject to external disturbances. Each agent is assigned with a local cost function which is known only to itself. Distributed protocols are to be designed to guarantee an output consensus for these high-order agents and meanwhile minimize the aggregate cost as the sum of these local costs. To…
▽ More
In this paper, an optimal output consensus problem is studied for discrete-time linear multiagent systems subject to external disturbances. Each agent is assigned with a local cost function which is known only to itself. Distributed protocols are to be designed to guarantee an output consensus for these high-order agents and meanwhile minimize the aggregate cost as the sum of these local costs. To overcome the difficulties brought by high-order dynamics and external disturbances, we develop an embedded design and constructively present a distributed rule to solve this problem. The proposed control includes three terms: an optimal signal generator under a directed information graph, an observer-based compensator to reject these disturbances, and a reference tracking controller for these linear agents. It is shown to solve the formulated problem with some mild assumptions. A numerical example is also provided to illustrate the effectiveness of our proposed distributed control laws.
△ Less
Submitted 25 December, 2020; v1 submitted 18 July, 2020;
originally announced July 2020.
-
Music Style Classification with Compared Methods in XGB and BPNN
Authors:
Lifeng Tan,
Cong **,
Zhiyuan Cheng,
Xin Lv,
Leiyu Song
Abstract:
Scientists have used many different classification methods to solve the problem of music classification. But the efficiency of each classification is different. In this paper, we propose two compared methods on the task of music style classification. More specifically, feature extraction for representing timbral texture, rhythmic content and pitch content are proposed. Comparative evaluations on p…
▽ More
Scientists have used many different classification methods to solve the problem of music classification. But the efficiency of each classification is different. In this paper, we propose two compared methods on the task of music style classification. More specifically, feature extraction for representing timbral texture, rhythmic content and pitch content are proposed. Comparative evaluations on performances of two classifiers were conducted for music classification with different styles. The result shows that XGB is better suited for small datasets than BPNN
△ Less
Submitted 3 December, 2019;
originally announced December 2019.