Skip to main content

Showing 1–50 of 53 results for author: Xie, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  2. arXiv:2404.10556  [pdf, other

    cs.NI eess.SP

    Generative AI for Advanced UAV Networking

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Hongyang Du, Jiawen Kang, **g Wu, Sumei Sun, ** Zhang

    Abstract: With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  3. arXiv:2401.16423  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Synchformer: Efficient Synchronization from Sparse Cues

    Authors: Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

    Abstract: Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Extended version of the ICASSP 24 paper. Project page: https://www.robots.ox.ac.uk/~vgg/research/synchformer/ Code: https://github.com/v-iashin/Synchformer

  4. arXiv:2401.11141  [pdf, other

    cs.IT eess.SP

    Wideband Beamforming for RIS Assisted Near-Field Communications

    Authors: Ji Wang, Jian Xiao, Yixuan Zou, Wenwu Xie, Yuanwei Liu

    Abstract: A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  5. arXiv:2312.17183  [pdf, other

    eess.IV cs.CV

    One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

    Authors: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this study, we focus on building up a model that aims to Segment Anything in medical scenarios, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we combine multiple knowledge sources to construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most compreh… ▽ More

    Submitted 1 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 53 pages

  6. arXiv:2311.18788  [pdf, other

    eess.IV cs.AI cs.CV cs.MM physics.med-ph

    Automated interpretation of congenital heart disease from multi-view echocardiograms

    Authors: **g Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao, Hanwen Zhang, Xin Zhang, Wanqing Xie, Binbin Wang

    Abstract: Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Published in Medical Image Analysis

    Journal ref: Medical Image Analysis (Volume 69, April 2021, 101942)

  7. arXiv:2311.17624  [pdf, other

    eess.SP cs.NI

    Combating Multi-path Interference to Improve Chirp-based Underwater Acoustic Communication

    Authors: Wenjun Xie, Enqi Zhang, Lizhao You, Deqing Wang, Zhaorui Wang, Liqun Fu

    Abstract: Linear chirp-based underwater acoustic communication has been widely used due to its reliability and long-range transmission capability. However, unlike the counterpart chirp technology in wireless -- LoRa, its throughput is severely limited by the number of modulated chirps in a symbol. The fundamental challenge lies in the underwater multi-path channel, where the delayed copied of one symbol may… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  8. Adaptive Digital Twin for UAV-Assisted Integrated Sensing, Communication, and Computation Networks

    Authors: Bin Li, Wenshuai Liu, Wancheng Xie, Ning Zhang, Yan Zhang

    Abstract: In this paper, we study a digital twin (DT)-empowered integrated sensing, communication, and computation network. Specifically, the users perform radar sensing and computation offloading on the same spectrum, while unmanned aerial vehicles (UAVs) are deployed to provide edge computing service. We first formulate a multi-objective optimization problem to minimize the beampattern performance of mult… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 14 pages, 11 figures,

    Journal ref: IEEE Transactions on Green Communications and Networking, 2023

  9. arXiv:2310.15371  [pdf, other

    eess.IV cs.AI cs.CV cs.LG physics.med-ph

    Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation

    Authors: Yongsong Huang, Wanqing Xie, Mingzhen Li, Mingmei Cheng, **zhou Wu, Weixiao Wang, Jane You, Xiaofeng Liu

    Abstract: Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the g… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023): Oral Paper

    Journal ref: In: Frangi, A., de Bruijne, M., Wassermann, D., Navab, N. (eds) Information Processing in Medical Imaging. IPMI 2023. Lecture Notes in Computer Science, vol 13939. Springer, Cham

  10. arXiv:2309.11500  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    A Large-scale Dataset for Audio-Language Representation Learning

    Authors: Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

    Abstract: The AI community has made significant strides in develo** powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic a… ▽ More

    Submitted 3 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  11. arXiv:2309.02576  [pdf, other

    eess.IV cs.CV cs.LG

    Emphysema Subty** on Thoracic Computed Tomography Scans using Deep Neural Networks

    Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Dirk Jan Slebos, Bram van Ginneken

    Abstract: Accurate identification of emphysema subtypes and severity is crucial for effective management of COPD and the study of disease heterogeneity. Manual analysis of emphysema subtypes and severity is laborious and subjective. To address this challenge, we present a deep learning-based approach for automating the Fleischner Society's visual score system for emphysema subty** and severity analysis. W… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Journal ref: Sci Rep. 2023 Aug 29;13(1):14147

  12. arXiv:2308.11980  [pdf, other

    eess.AS cs.SD

    Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

    Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

    Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

  13. arXiv:2307.12717  [pdf, ps, other

    cs.CV eess.IV

    Dense Transformer based Enhanced Coding Network for Unsupervised Metal Artifact Reduction

    Authors: Wangduo Xie, Matthew B. Blaschko

    Abstract: CT images corrupted by metal artifacts have serious negative effects on clinical diagnosis. Considering the difficulty of collecting paired data with ground truth in clinical settings, unsupervised methods for metal artifact reduction are of high interest. However, it is difficult for previous unsupervised methods to retain structural information from CT images while handling the non-local charact… ▽ More

    Submitted 28 July, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  14. arXiv:2306.02054  [pdf

    eess.AS

    Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

    Authors: Yanxiong Li, Wenchang Cao, Wei Xie, Qisheng Huang, Wenfeng Pang, Qianhua He

    Abstract: We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording dev… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th IEEE International Conference on Signal Processing (IEEE ICSP)

  15. arXiv:2306.02053  [pdf

    eess.AS

    Few-shot Class-incremental Audio Classification Using Stochastic Classifier

    Authors: Yanxiong Li, Wenchang Cao, Jialong Li, Wei Xie, Qianhua He

    Abstract: It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental aud… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, 4 tables. Accepted for publication in INTERSPEECH 2023

  16. Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

    Authors: Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos

    Abstract: Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classific… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 13 pages, 8 figures, 12 tables. Accepted for publication in IEEE TMM

  17. arXiv:2305.18045  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

    Authors: Wei Xie, Yanxiong Li, Qianhua He, Wenchang Cao, Tuomas Virtanen

    Abstract: New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned on… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 5 pages,2 figures, Accepted by Interspeech 2023

  18. arXiv:2303.10372  [pdf, other

    cs.CV cs.MM eess.IV

    Just Noticeable Visual Redundancy Forecasting: A Deep Multimodal-driven Approach

    Authors: Wuyuan Xie, Shukang Wang, Sukun Tian, Lirong Huang, Ye Liu, Miaohui Wang

    Abstract: Just noticeable difference (JND) refers to the maximum visual change that human eyes cannot perceive, and it has a wide range of applications in multimedia systems. However, most existing JND approaches only focus on a single modality, and rarely consider the complementary effects of multimodal information. In this article, we investigate the JND modeling from an end-to-end homologous multimodal p… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Journal ref: AAAI 2023

  19. Energy Efficient Computation Offloading in Aerial Edge Networks With Multi-Agent Cooperation

    Authors: Wenshuai Liu, Bin Li, Wancheng Xie, Yueyue Dai, Zesong Fei

    Abstract: With the high flexibility of supporting resource-intensive and time-sensitive applications, unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is proposed as an innovational paradigm to support the mobile users (MUs). As a promising technology, digital twin (DT) is capable of timely map** the physical entities to virtual models, and reflecting the MEC network state in real-time.… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 14 pages, 13 figures

  20. arXiv:2301.02228  [pdf, other

    eess.IV cs.CL cs.CV

    MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

    Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information,… ▽ More

    Submitted 3 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  21. arXiv:2210.07055  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

    Authors: Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

    Abstract: The objective of this paper is audio-visual synchronisation of general videos 'in the wild'. For such videos, the events that may be harnessed for synchronisation cues may be spatially small and may occur only infrequently during a many seconds-long video clip, i.e. the synchronisation signal is 'sparse in space and time'. This contrasts with the case of synchronising videos of talking heads, wher… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted as a spotlight presentation for the BMVC 2022. Code: https://github.com/v-iashin/SparseSync Project page: https://v-iashin.github.io/SparseSync

  22. arXiv:2209.05477  [pdf, other

    eess.IV cs.CV cs.LG

    Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images

    Authors: Pak-Hei Yeung, Moska Aliasi, Monique Haak, The INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete

    Abstract: Two-dimensional (2D) freehand ultrasound is the mainstay in prenatal care and fetal growth monitoring. The task of matching corresponding cross-sectional planes in the 3D anatomy for a given 2D ultrasound brain scan is essential in freehand scanning, but challenging. We propose AdLocUI, a framework that Adaptively Localizes 2D Ultrasound Images in the 3D anatomical atlas without using any external… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2022

  23. arXiv:2208.06222  [pdf, other

    cs.CV eess.IV

    Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator

    Authors: Xiangbo Gao, Cheng Luo, Qinliang Lin, Weicheng Xie, Minmin Liu, Linlin Shen, Keerthy Kusumam, Siyang Song

    Abstract: \noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local… ▽ More

    Submitted 19 November, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

  24. arXiv:2206.12772  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

    Authors: **xiang Liu, Chen Ju, Weidi Xie, Ya Zhang

    Abstract: We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. To understand what enables to learn useful representations, we systematically investigate the effects of data augmentations, and reveal that (1) composition of data augmentations plays a critical role, i.e. explicitly encouraging the audio-visual representat… ▽ More

    Submitted 15 August, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: Camera-ready Version for ACMMM 2022, Project page is https://**xiang-liu.github.io/SSL-TIE/

  25. arXiv:2206.06947  [pdf, other

    eess.IV cs.CV

    K-Space Transformer for Undersampled MRI Reconstruction

    Authors: Ziheng Zhao, Tianjiao Zhang, Weidi Xie, Yanfeng Wang, Ya Zhang

    Abstract: This paper considers the problem of undersampled MRI reconstruction. We propose a novel Transformer-based framework for directly processing signal in k-space, going beyond the limitation of regular grids as ConvNets do. We adopt an implicit representation of k-space spectrogram, treating spatial coordinates as inputs, and dynamically query the sparsely sampled points to reconstruct the spectrogram… ▽ More

    Submitted 10 November, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

  26. arXiv:2205.03920  [pdf, other

    q-bio.QM eess.SY

    From Discovery to Production: Challenges and Novel Methodologies for Next Generation Biomanufacturing

    Authors: Wei Xie, Giulia Pedrielli

    Abstract: The increasingly pressing demand of novel drugs (e.g., gene therapies for personalized cancer care, ever evolving vaccines) with unprecedented levels of personalization, has put a remarkable pressure on the traditionally long time required by the pharma R&D and manufacturing to go from design to production of new products. The revolution has already brought important changes in the technologies us… ▽ More

    Submitted 28 June, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: 15 pages, 5 figures

  27. arXiv:2205.03229  [pdf

    eess.SP physics.optics

    Multi-core fiber enabled fading noise suppression in φ-OFDR based quantitative distributed vibration sensing

    Authors: Yuxiang Feng, Weilin Xie, Yinxia Meng, Jiang Yang, Qiang Yang, Yan Ren, Tianwai Bo, Zhongwei Tan, Wei Wei, Yi Dong

    Abstract: Coherent fading has been regarded as a critical issue in phase-sensitive optical frequency domain reflectometry (φ-OFDR) based distributed fiber-optic sensing. Here, we report on an approach for fading noise suppression in φ-OFDR with multi-core fiber. By exploiting the independent nature of the randomness in the distribution of reflective index in each of the cores, the drastic phase fluctuations… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 4 pages

  28. arXiv:2203.08980  [pdf, other

    stat.ME eess.SY

    Stochastic Simulation Uncertainty Analysis to Accelerate Flexible Biomanufacturing Process Development

    Authors: Wei Xie, Russell R. Barton, Barry L. Nelson, Keqi Wang

    Abstract: Motivated by critical challenges and needs from biopharmaceuticals manufacturing, we propose a general metamodel-assisted stochastic simulation uncertainty analysis framework to accelerate the development of a simulation model with modular design for flexible production processes. There are often very limited process observations. Thus, there exist both simulation and model uncertainties in the sy… ▽ More

    Submitted 3 September, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: 32 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2011.04207

  29. arXiv:2201.03116  [pdf, other

    eess.SY cs.LG

    Opportunities of Hybrid Model-based Reinforcement Learning for Cell Therapy Manufacturing Process Control

    Authors: Hua Zheng, Wei Xie, Keqi Wang, Zheng Li

    Abstract: Driven by the key challenges of cell therapy manufacturing, including high complexity, high uncertainty, and very limited process observations, we propose a hybrid model-based reinforcement learning (RL) to efficiently guide process control. We first create a probabilistic knowledge graph (KG) hybrid model characterizing the risk- and science-based understanding of biomanufacturing process mechani… ▽ More

    Submitted 25 January, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

    Comments: 14 pages, 2 figures

  30. arXiv:2112.07948  [pdf, other

    cs.CV eess.IV

    Transcoded Video Restoration by Temporal Spatial Auxiliary Network

    Authors: Li Xu, Gang He, **jia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

    Abstract: In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers. Previous works in compressed video restoration typically assume the compression artifacts are caused by one-time encoding.… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI2022

  31. arXiv:2112.04432  [pdf, other

    cs.CV eess.AS

    Audio-Visual Synchronisation in the wild

    Authors: Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this paper, we consider the problem of audio-visual synchronisation applied to videos `in-the-wild' (ie of general classes beyond speech). As a new task, we identify and curate a test set with high audio-visual correlation, namely VGG-Sound Sync. We compare a number of transformer-based architectural variants specifically designed to model audio and visual signals of arbitrary length, while sig… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  32. arXiv:2109.12108  [pdf, other

    eess.IV cs.CV

    ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation

    Authors: Pak-Hei Yeung, Linde Hesse, Moska Aliasi, Monique Haak, the INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete

    Abstract: The objective of this work is to achieve sensorless reconstruction of a 3D volume from a set of 2D freehand ultrasound images with deep implicit representation. In contrast to the conventional way that represents a 3D volume as a discrete voxel grid, we do so by parameterizing it as the zero level-set of a continuous function, i.e. implicitly representing the 3D volume as a map** from the spatia… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  33. arXiv:2107.13431  [pdf

    eess.IV cs.CV

    AI assisted method for efficiently generating breast ultrasound screening reports

    Authors: Shuang Ge, Qiongyu Ye, Wenquan Xie, Desheng Sun, Huabin Zhang, Xiaobo Zhou, Kehong Yuan

    Abstract: Background: Ultrasound is one of the preferred choices for early screening of dense breast cancer. Clinically, doctors have to manually write the screening report which is time-consuming and laborious, and it is easy to miss and miswrite. Aim: We proposed a new pipeline to automatically generate AI breast ultrasound screening reports based on ultrasound images, aiming to assist doctors in improvin… ▽ More

    Submitted 22 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

  34. arXiv:2106.01351  [pdf, other

    eess.IV cs.CV

    Deep Clustering Activation Maps for Emphysema Subty**

    Authors: Weiyi Xie, Colin Jacobs, Bram van Ginneken

    Abstract: We propose a deep learning clustering method that exploits dense features from a segmentation network for emphysema subty** from computed tomography (CT) scans. Using dense features enables high-resolution visualization of image regions corresponding to the cluster assignment via dense clustering activation maps (dCAMs). This approach provides model interpretability. We evaluated clustering resu… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  35. arXiv:2105.11748  [pdf, other

    eess.IV cs.CV cs.LG

    Dense Regression Activation Maps For Lesion Segmentation in CT scans of COVID-19 patients

    Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Bram van Ginneken

    Abstract: Automatic lesion segmentation on thoracic CT enables rapid quantitative analysis of lung involvement in COVID-19 infections. However, obtaining a large amount of voxel-level annotations for training segmentation networks is prohibitively expensive. Therefore, we propose a weakly-supervised segmentation method based on dense regression activation maps (dRAMs). Most weakly-supervised segmentation ap… ▽ More

    Submitted 18 November, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

  36. arXiv:2104.14332  [pdf, other

    cs.LG eess.SY

    Hypernetwork Dismantling via Deep Reinforcement Learning

    Authors: Dengcheng Yan, Wenxin Xie, Yiwen Zhang, Qiang He, Yun Yang

    Abstract: Network dismantling aims to degrade the connectivity of a network by removing an optimal set of nodes. It has been widely adopted in many real-world applications such as epidemic control and rumor containment. However, conventional methods usually focus on simple network modeling with only pairwise interactions, while group-wise interactions modeled by hypernetwork are ubiquitous and critical. In… ▽ More

    Submitted 8 March, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

  37. arXiv:2104.12044  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Cycle-Consistent Adversarial Networks for Edge Denoising of Computed Tomography Images

    Authors: Xiaowe Xu, Jiawei Zhang, **glan Liu, Yukun Ding, Tianchen Wang, Hailong Qiu, Haiyun Yuan, Jian Zhuang, Wen Xie, Yuhao Dong, Qianjun Jia, Mei** Huang, Yiyu Shi

    Abstract: As one of the most commonly ordered imaging tests, computed tomography (CT) scan comes with inevitable radiation exposure that increases the cancer risk to patients. However, CT image quality is directly related to radiation dose, thus it is desirable to obtain high-quality CT images with as little dose as possible. CT image denoising tries to obtain high dose like high-quality CT images (domain X… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: 16 pages, 7 figures, 4 tables, accepted by the ACM Journal on Emerging Technologies in Computing Systems (JETC). arXiv admin note: substantial text overlap with arXiv:2002.12130

  38. arXiv:2104.02691  [pdf, other

    cs.CV eess.AS eess.IV

    Localizing Visual Sounds the Hard Way

    Authors: Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

    Abstract: The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image fragments, even for images that do contain the object emitting the sound, we can significantly boost the localization performance. We do so elegantly by introducing… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: CVPR2021

  39. arXiv:2103.08357  [pdf, other

    eess.IV cs.CV

    Learning Frequency-aware Dynamic Network for Efficient Super-Resolution

    Authors: Wenbin Xie, Dehua Song, Chang Xu, Chun**g Xu, Hui Zhang, Yunhe Wang

    Abstract: Deep learning based methods, especially convolutional neural networks (CNNs) have been successfully applied in the field of single image super-resolution (SISR). To obtain better fidelity and visual quality, most of existing networks are of heavy design with massive computation. However, the computation resources of modern mobile devices are limited, which cannot easily support the expensive cost.… ▽ More

    Submitted 16 August, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

  40. arXiv:2012.06867  [pdf, other

    cs.SD cs.LG eess.AS

    VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge

    Authors: Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

    Abstract: We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

  41. arXiv:2009.05399  [pdf, other

    eess.SP physics.optics

    Investigation of analog signal distortion introduced by a fiber phase sensitive amplifier

    Authors: Debanuj Chatterjee, Yousra Bouasria, Weilin Xie, Tarek Labidi, Fabienne Goldfarb, Ihsan Fsaifes, Fabien Bretenaker

    Abstract: We numerically simulate the distortion of an analog signal carried in a microwave photonics link containing a phase sensitive amplifier (PSA), focusing mainly on amplitude modulation format. The numerical model is validated by comparison with experimental measurements. By using the well known two-tone test, we compare the situations in which a standard intensity modulator is used with the one wher… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Journal ref: JOSA B 37, 2405 (2020)

  42. arXiv:2009.03591  [pdf

    eess.SP

    Are wave union methods still suitable for 20 nm FPGA-based high-resolution (< 2 ps) time-to-digital converters?

    Authors: Wujun Xie, Haochang Chen, David Day-Uei Li

    Abstract: This paper presents several new structures to pursue high-resolution (< 2 ps) time-to-digital converters (TDCs) in Xilinx 20 nm UltraScale field-programmable gate arrays (FPGAs). The proposed TDCs combined the advantages of 1) our newly proposed sub-tapped delay line (sub-TDL) architecture effective in removing bubbles and zero-bins and 2) the wave union (WU) A method to improve the resolution and… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

  43. arXiv:2006.10915  [pdf, other

    cs.CR eess.SY

    Simulation-Based Digital Twin Development for Blockchain Enabled End-to-End Industrial Hemp Supply Chain Risk Management

    Authors: Keqi Wang, Wei Xie, Wencen Wu, Bo Wang, **xiang Pei, Mike Baker, Qi Zhou

    Abstract: With the passage of the 2018 U.S. Farm Bill, Industrial Hemp production is moved from limited pilot programs to a regulated agriculture production system. However, Industrial Hemp Supply Chain (IHSC) faces critical challenges, including: high complexity and variability, very limited production knowledge, lack of data and information tracking. In this paper, we propose blockchain-enabled IHSC and d… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 11 pages, 2 figures, 2020 Winter Simulation Conference

  44. arXiv:2006.02666  [pdf

    eess.IV cs.CV

    Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

    Authors: Yesheng Xu, Ming Kong, Wenjia Xie, Run** Duan, Zhengqing Fang, Yuxiao Lin, Qiang Zhu, Siliang Tang, Fei Wu, Yu-Feng Yao

    Abstract: Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherw… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Accepted by Engineering

  45. arXiv:2005.11684  [pdf

    eess.SP

    Deep Learning-based Modulation Detection for NOMA Systems

    Authors: Wenwu Xie, Jian Xiao, **xia Yang, Xin Peng, Chao Yu, Peng Zhu

    Abstract: Since the signal with strong power should be demodulated first for successive interference cancellation (SIC) demodulation in non-orthogonal multiple access (NOMA) systems, the base station (BS) should inform the near user terminal (UT), which has allocated higher power, of modulation mode of the far user terminal. To avoid unnecessary signaling overhead in this process, a blind detection algorith… ▽ More

    Submitted 16 October, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

  46. arXiv:2004.14368  [pdf, other

    cs.CV cs.SD eess.AS

    VGGSound: A Large-scale Audio-Visual Dataset

    Authors: Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

    Abstract: Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques. The resulting dataset can be used for training and evaluating audio recognition models. We make three contributions. First, we propose a scalable pipeline based on computer vision techniques to create an audio dataset from open-source media. Our pipeline involves… ▽ More

    Submitted 24 September, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: ICASSP2020

  47. Relational Modeling for Robust and Efficient Pulmonary Lobe Segmentation in CT Scans

    Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Bram van Ginneken

    Abstract: Pulmonary lobe segmentation in computed tomography scans is essential for regional assessment of pulmonary diseases. Recent works based on convolution neural networks have achieved good performance for this task. However, they are still limited in capturing structured relationships due to the nature of convolution. The shape of the pulmonary lobes affect each other and their borders relate to the… ▽ More

    Submitted 12 May, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  48. 2.75D: Boosting learning by representing 3D Medical imaging to 2D features for small data

    Authors: Xin Wang, Ruisheng Su, Weiyi Xie, Wen** Wang, Yi Xu, Ritse Mann, Jungong Han, Tao Tan

    Abstract: In medical-data driven learning, 3D convolutional neural networks (CNNs) have started to show superior performance to 2D CNNs in numerous deep learning tasks, proving the added value of 3D spatial information in feature representation. However, the difficulty in collecting more training samples to converge, more computational resources and longer execution time make this approach less applied. Als… ▽ More

    Submitted 22 January, 2024; v1 submitted 11 February, 2020; originally announced February 2020.

  49. arXiv:2002.00275  [pdf, other

    eess.SY

    Data-Driven Stochastic Optimization for Power Grids Scheduling under High Wind Penetration

    Authors: Wei Xie, Yuan Yi, Zhi Zhou, Keqi Wang

    Abstract: To address the environmental concern and improve the economic efficiency, the wind power is rapidly integrated into smart grids. However, the inherent uncertainty of wind energy raises operational challenges. To ensure the cost-efficient, reliable and robust operation, it is critically important to find the optimal decision that can correctly and rigorously hedge against all sources of uncertainty… ▽ More

    Submitted 9 November, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

    Comments: 24 pages, 2 figures

  50. arXiv:1912.02522  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge

    Authors: Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

    Abstract: The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and workshop held at Inte… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Comments: ISCA Archive