Skip to main content

Showing 1–43 of 43 results for author: Hu, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  2. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  3. arXiv:2311.13052  [pdf, other

    eess.IV cs.CV cs.LG

    Novel OCT mosaicking pipeline with Feature- and Pixel-based registration

    Authors: Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz

    Abstract: High-resolution Optical Coherence Tomography (OCT) images are crucial for ophthalmology studies but are limited by their relatively narrow field of view (FoV). Image mosaicking is a technique for aligning multiple overlap** images to obtain a larger FoV. Current mosaicking pipelines often struggle with substantial noise and considerable displacement between the input sub-fields. In this paper, w… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  4. arXiv:2310.19721  [pdf, other

    eess.IV cs.CV

    Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models

    Authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

    Abstract: To address prevalent issues in medical imaging, such as data acquisition challenges and label availability, transfer learning from natural to medical image domains serves as a viable strategy to produce reliable segmentation results. However, several existing barriers between domains need to be broken down, including addressing contrast discrepancies, managing anatomical variability, and adapting… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: updated acknowledgments and fixed typos

  5. arXiv:2309.11845  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification

    Authors: Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu

    Abstract: Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inheren… ▽ More

    Submitted 26 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: This work has been accepted by ACM MM 2023 for publication

  6. arXiv:2309.07929  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

    Authors: Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li

    Abstract: Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual Localization and Segmentation tasks but under the demanding zero-shot and few-shot scenarios. To achieve this goal, different from existing approaches that mostly employ the encoder-fusion-decoder paradigm… ▽ More

    Submitted 2 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by AAAI 2024

  7. arXiv:2309.01384  [pdf

    q-bio.QM eess.IV eess.SY

    Deep Learning Approach for Large-Scale, Real-Time Quantification of Green Fluorescent Protein-Labeled Biological Samples in Microreactors

    Authors: Yuanyuan Wei, Sai Mu Dalike Abaxi, Nawaz Mehmood, Luoquan Li, Fuyang Qu, Guangyao Cheng, Dehua Hu, Yi-** Ho, Scott Wu Yuan, Ho-Pui Ho

    Abstract: Absolute quantification of biological samples entails determining expression levels in precise numerical copies, offering enhanced accuracy and superior performance for rare templates. However, existing methodologies suffer from significant limitations: flow cytometers are both costly and intricate, while fluorescence imaging relying on software tools or manual counting is time-consuming and prone… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 23 pages, 6 figures, 1 table

  8. arXiv:2308.06377  [pdf, other

    eess.IV cs.CV

    CATS v2: Hybrid encoders for robust medical segmentation

    Authors: Hao Li, Han Liu, Dewei Hu, Xing Yao, Jiacheng Wang, Ipek Oguz

    Abstract: Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks by capturing high-level (local) information, such as edges and textures. However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information. Recently, transformers have shown good performance for medical image segmentation due to their a… ▽ More

    Submitted 31 January, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

  9. arXiv:2308.03777  [pdf

    physics.bio-ph eess.IV eess.SP

    Lab-in-a-Tube: A portable imaging spectrophotometer for cost-effective, high-throughput, and label-free analysis of centrifugation processes

    Authors: Yuanyuan Wei, Dehua Hu, Bijie Bai, Chenqi Meng, Tsz Kin Chan, Xing Zhao, Yuye Wang, Yi-** Ho, Wu Yuan, Ho-Pui Ho

    Abstract: Centrifuges serve as essential instruments in modern experimental sciences, facilitating a wide range of routine sample processing tasks that necessitate material sedimentation. However, the study for real time observation of the dynamical process during centrifugation has remained elusive. In this study, we developed an innovative Lab_in_a_Tube imaging spectrophotometer that incorporates capabili… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 21 Pages, 6 Figures

  10. arXiv:2308.01515  [pdf, ps, other

    cs.IT eess.SP

    Hierarchical Codebook Design and Analytical Beamforming Solution for IRS Assisted Communication

    Authors: Xiyuan Liu, Qingqing Wu, Die Hu, Rui Wang, Jun Wu

    Abstract: In intelligent reflecting surface (IRS) assisted communication, beam search is usually time-consuming as the multiple-input multiple-output (MIMO) of IRS is usually very large. Hierarchical codebooks is a widely accepted method for reducing the complexity of searching time. The performance of this method strongly depends on the design scheme of beamforming of different beamwidths. In this paper, a… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 33 pages, 11 figures

  11. arXiv:2307.07932  [pdf, other

    eess.IV cs.LG

    A Novel Truncated Norm Regularization Method for Multi-channel Color Image Denoising

    Authors: Yiwen Shan, Dong Hu, Zhi Wang

    Abstract: Due to the high flexibility and remarkable performance, low-rank approximation methods has been widely studied for color image denoising. However, those methods mostly ignore either the cross-channel difference or the spatial variation of noise, which limits their capacity in real world color image denoising. To overcome those drawbacks, this paper is proposed to denoise color images with a double… ▽ More

    Submitted 3 March, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

  12. arXiv:2307.00511  [pdf

    eess.IV cs.CV cs.LG q-bio.NC

    SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

    Authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, ** Zhang, Dan Hu, Danhong Wang, Hesheng Liu

    Abstract: Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  13. arXiv:2307.00245  [pdf, other

    eess.IV cs.CV

    Deep Angiogram: Trivializing Retinal Vessel Segmentation

    Authors: Dewei Hu, Xing Yao, Jiacheng Wang, Yuankai K. Tao, Ipek Oguz

    Abstract: Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: 5 pages, 4 figures, SPIE 2023

    Journal ref: In Medical Imaging 2023: Image Processing, vol. 12464, pp. 656-660. SPIE, 2023

  14. arXiv:2306.00499  [pdf, other

    eess.IV cs.CV

    DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation

    Authors: Yifan Gao, Wei Xia, Dingdu Hu, Xin Gao

    Abstract: Deep learning based automatic medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a vision foundation model with powerful generalization capabilities, Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM and its f… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 12 pages. The code is available at https://github.com/yifangao112/DeSAM

  15. arXiv:2305.17993  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Multi-Scale Attention for Audio Question Answering

    Authors: Guangyao Li, Yixin Xu, Di Hu

    Abstract: Audio question answering (AQA), acting as a widely used proxy task to explore scene understanding, has got more attention. The AQA is challenging for it requires comprehensive temporal reasoning from different scales' events of an audio scene. However, existing methods mostly extend the structures of visual question answering task to audio ones in a simple pattern but may not perform well when per… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted by InterSpeech 2023

  16. arXiv:2303.05338  [pdf, other

    cs.SD cs.MM eess.AS

    MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

    Authors: Ruize Xu, Ruoxuan Feng, Shi-Xiong Zhang, Di Hu

    Abstract: Audio-visual learning helps to comprehensively understand the world by fusing practical information from multiple modalities. However, recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a bottleneck to enhancing the model's performance. We further find that the up-to-date imbalance-mitigating methods fail on some audio-visual fine-grained tasks,… ▽ More

    Submitted 11 March, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  17. arXiv:2303.05026  [pdf, other

    cs.CV cs.LG eess.IV

    SSL^2: Self-Supervised Learning meets Semi-Supervised Learning: Multiple Sclerosis Segmentation in 7T-MRI from large-scale 3T-MRI

    Authors: Jiacheng Wang, Hao Li, Han Liu, Dewei Hu, Daiwei Lu, Kee** Yoon, Kelsey Barter, Francesca Bagnato, Ipek Oguz

    Abstract: Automated segmentation of multiple sclerosis (MS) lesions from MRI scans is important to quantify disease progression. In recent years, convolutional neural networks (CNNs) have shown top performance for this task when a large amount of labeled data is available. However, the accuracy of CNNs suffers when dealing with few and/or sparsely labeled datasets. A potential solution is to leverage the in… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted at the International Society for Optics and Photonics - Medical Imaging (SPIE-MI) 2023

  18. arXiv:2302.03533  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Revisiting Pre-training in Audio-Visual Learning

    Authors: Ruoxuan Feng, Wenke Xia, Di Hu

    Abstract: Pre-training technique has gained tremendous success in enhancing model performance on various tasks, but found to perform worse than training from scratch in some uni-modal situations. This inspires us to think: are the pre-trained models always effective in the more complex multi-modal scenario, especially for the heterogeneous modalities such as audio and visual ones? We find that the answer is… ▽ More

    Submitted 17 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  19. arXiv:2210.15364  [pdf, other

    cs.SD cs.AI eess.AS

    Explicit Intensity Control for Accented Text-to-speech

    Authors: Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li

    Abstract: Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1). How to control the intensity of accent in the process of TTS is a very interesting research direction, and has attracted more and more attention. Recent work design a speaker-adversarial loss to disentangle the speaker and accent information, and then adjust the loss weig… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures. Submitted to ICASSP 2023. arXiv admin note: text overlap with arXiv:2209.10804

  20. arXiv:2209.08094  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-channel Nuclear Norm Minus Frobenius Norm Minimization for Color Image Denoising

    Authors: Yiwen Shan, Dong Hu, Zhi Wang, Tao Jia

    Abstract: Color image denoising is frequently encountered in various image processing and computer vision tasks. One traditional strategy is to convert the RGB image to a less correlated color space and denoise each channel of the new space separately. However, such a strategy can not fully exploit the correlated information between channels and is inadequate to obtain satisfactory results. To address this… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

  21. arXiv:2208.11572  [pdf, other

    eess.IV cs.CV

    Cats: Complementary CNN and Transformer Encoders for Segmentation

    Authors: Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, Ipek Oguz

    Abstract: Recently, deep learning methods have achieved state-of-the-art performance in many medical image segmentation tasks. Many of these are based on convolutional neural networks (CNNs). For such methods, the encoder is the key part for global and local information extraction from input images; the extracted features are then passed to the decoder for predicting the segmentations. In contrast, several… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

  22. arXiv:2205.02845  [pdf, other

    eess.IV cs.CV

    Invariant Content Synergistic Learning for Domain Generalization of Medical Image Segmentation

    Authors: Yuxin Kang, Hansheng Li, Xuan Zhao, Dongqing Hu, Feihong Liu, Lei Cui, Jun Feng, Lin Yang

    Abstract: While achieving remarkable success for medical image segmentation, deep convolution neural networks (DCNNs) often fail to maintain their robustness when confronting test data with the novel distribution. To address such a drawback, the inductive bias of DCNNs is recently well-recognized. Specifically, DCNNs exhibit an inductive bias towards image style (e.g., superficial texture) rather than invar… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 10 pages, 5 figures

  23. arXiv:2204.06455  [pdf, other

    eess.IV cs.CV

    WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma

    Authors: Chu Han, Xipeng Pan, Lixu Yan, Huan Lin, Bingbing Li, Su Yao, Shanshan Lv, Zhenwei Shi, **hai Mai, Jiatai Lin, Bingchao Zhao, Zeyan Xu, Zhizhen Wang, Yumeng Wang, Yuan Zhang, Huihui Wang, Chao Zhu, Chunhui Lin, Lijian Mao, Min Wu, Luwen Duan, **gsong Zhu, Dong Hu, Zijie Fang, Yang Chen , et al. (18 additional authors not shown)

    Abstract: Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient… ▽ More

    Submitted 13 April, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

  24. arXiv:2203.13535  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

    Authors: Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

    Abstract: Recent years have witnessed the success of deep learning on the visual sound separation task. However, existing works follow similar settings where the training and testing datasets share the same musical instrument categories, which to some extent limits the versatility of this task. In this work, we focus on a more general and challenging scenario, namely the separation of unknown musical instru… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  25. arXiv:2203.04959  [pdf, other

    eess.IV cs.CV

    ModDrop++: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities

    Authors: Han Liu, Yubo Fan, Hao Li, Jiacheng Wang, Dewei Hu, Can Cui, Ho Hin Lee, Huahong Zhang, Ipek Oguz

    Abstract: Multiple Sclerosis (MS) is a chronic neuroinflammatory disease and multi-modality MRIs are routinely used to monitor MS lesions. Many automatic MS lesion segmentation models have been developed and have reached human-level performance. However, most established methods assume the MRI modalities used during training are also available during testing, which is not guaranteed in clinical practice. Pr… ▽ More

    Submitted 1 July, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: MICCAI 2022

  26. arXiv:2203.00262  [pdf, ps, other

    eess.IV cs.CV

    Separable-HoverNet and Instance-YOLO for Colon Nuclei Identification and Counting

    Authors: Chunhui Lin, Liukun Zhang, Lijian Mao, Min Wu, Dong Hu

    Abstract: Nuclear segmentation, classification and quantification within Haematoxylin & Eosin stained histology images enables the extraction of interpretable cell-based features that can be used in downstream explainable models in computational pathology (CPath). However, automatic recognition of different nuclei is faced with a major challenge in that there are several different types of nuclei, some of t… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.14485 by other authors

  27. arXiv:2202.06406  [pdf, other

    cs.CV cs.SD eess.AS

    Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

    Authors: Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou

    Abstract: The task of audio-visual sound source localization has been well studied under constrained scenes, where the audio recordings are clean. However, in real-world scenarios, audios are usually contaminated by off-screen sound and background noise. They will interfere with the procedure of identifying desired sources and building visual-sound connections, making previous studies non-applicable. In thi… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

    Comments: Accepted by AAAI Conference on Artificial Intelligence (AAAI) 2022. 16 pages

  28. arXiv:2112.08133  [pdf

    physics.ins-det eess.IV physics.optics

    Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

    Authors: Shaowei Jiang, Chengfei Guo, Zichao Bian, Ruihai Wang, Jiakai Zhu, Pengming Song, Patrick Hu, Derek Hu, Zibang Zhang, Kazunori Hoshino, Bin Feng, Guoan Zheng

    Abstract: Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: 18 pages, 6 figures

  29. High-throughput lensless whole slide imaging via continuous height-varying modulation of tilted sensor

    Authors: Shaowei Jiang, Chengfei Guo, Patrick Hu, Derek Hu, Pengming Song, Tianbo Wang, Zichao Bian, Zibang Zhang, Guoan Zheng

    Abstract: We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffrac… ▽ More

    Submitted 28 September, 2021; originally announced October 2021.

  30. arXiv:2109.12169  [pdf, other

    eess.IV cs.CV

    Unsupervised Cross-Modality Domain Adaptation for Segmenting Vestibular Schwannoma and Cochlea with Data Augmentation and Model Ensemble

    Authors: Hao Li, Dewei Hu, Qibang Zhu, Kathleen E. Larson, Huahong Zhang, Ipek Oguz

    Abstract: Magnetic resonance images (MRIs) are widely used to quantify vestibular schwannoma and the cochlea. Recently, deep learning methods have shown state-of-the-art performance for segmenting these structures. However, training segmentation models may require manual labels in target domain, which is expensive and time-consuming. To overcome this problem, domain adaptation is an effective way to leverag… ▽ More

    Submitted 24 August, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

  31. arXiv:2108.09103  [pdf, other

    cs.LG cs.NI eess.SY

    Mobility-Aware Cluster Federated Learning in Hierarchical Wireless Networks

    Authors: Chenyuan Feng, Howard H. Yang, Deshun Hu, Zhiwei Zhao, Tony Q. S. Quek, Geyong Min

    Abstract: Implementing federated learning (FL) algorithms in wireless networks has garnered a wide range of attention. However, few works have considered the impact of user mobility on the learning performance. To fill this research gap, firstly, we develop a theoretical model to characterize the hierarchical federated learning (HFL) algorithm in wireless networks where the mobile users may roam across mult… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  32. arXiv:2107.04288  [pdf, other

    eess.IV cs.CV

    Retinal OCT Denoising with Pseudo-Multimodal Fusion Network

    Authors: Dewei Hu, Joseph D. Malone, Yigit Atay, Yuankai K. Tao, Ipek Oguz

    Abstract: Optical coherence tomography (OCT) is a prevalent imaging technique for retina. However, it is affected by multiplicative speckle noise that can degrade the visibility of essential anatomical structures, including blood vessels and tissue layers. Although averaging repeated B-scan frames can significantly improve the signal-to-noise-ratio (SNR), this requires longer acquisition time, which can int… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted by International Workshop on Ophthalmic Medical Image Analysis (OMIA) 2020

  33. arXiv:2107.04282  [pdf, other

    eess.IV cs.CV

    LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation

    Authors: Dewei Hu, Can Cui, Hao Li, Kathleen E. Larson, Yuankai K. Tao, Ipek Oguz

    Abstract: Optical coherence tomography (OCT) is a non-invasive imaging technique widely used for ophthalmology. It can be extended to OCT angiography (OCT-A), which reveals the retinal vasculature with improved contrast. Recent deep learning algorithms produced promising vascular segmentation results; however, 3D retinal vessel segmentation remains difficult due to the lack of manually annotated training da… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted by International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021

  34. arXiv:2104.02026  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation

    Authors: Yapeng Tian, Di Hu, Chenliang Xu

    Abstract: There are rich synchronized audio and visual events in our daily life. Inside the events, audio scenes are associated with the corresponding visual objects; meanwhile, sounding objects can indicate and help to separate their individual sounds in the audio track. Based on this observation, in this paper, we propose a cyclic co-learning (CCoL) paradigm that can jointly learn sounding object visual g… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  35. arXiv:2103.02791  [pdf, ps, other

    eess.SP

    Hybrid Interference Mitigation Using Analog Prewhitening

    Authors: Wei Zhang, Yi Jiang, Bin Zhou, Die Hu

    Abstract: This paper proposes a novel scheme for mitigating strong interferences, which is applicable to various wireless scenarios, including full-duplex wireless communications and uncoordinated heterogenous networks. As strong interferences can saturate the receiver's analog-to-digital converters (ADC), they need to be mitigated both before and after the ADCs, i.e., via hybrid processing. The key idea of… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: 11 pages, 11 figures

  36. arXiv:2102.06102  [pdf, other

    eess.IV cs.CV

    Deep Iteration Assisted by Multi-level Obey-pixel Network Discriminator (DIAMOND) for Medical Image Recovery

    Authors: Moran Xu, Dianlin Hu, Weifei Wu, Weiwen Wu

    Abstract: Image restoration is a typical ill-posed problem, and it contains various tasks. In the medical imaging field, an ill-posed image interrupts diagnosis and even following image processing. Both traditional iterative and up-to-date deep networks have attracted much attention and obtained a significant improvement in reconstructing satisfying images. This study combines their advantages into one unif… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

  37. arXiv:2010.05466  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

    Authors: Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, De**g Dou

    Abstract: Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the s… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: To appear in NeurIPS 2020. Previous Title: Learning to Discriminatively Localize Sounding Objects in a Cocktail-party Scenario

  38. arXiv:2008.13570  [pdf

    eess.IV

    Deep Learning based Spectral CT Imaging

    Authors: Weiwen Wu, Dianlin Hu, Chuang Niu, Lieza Vanden Broeke, Anthony P. H. Butler, Peng Cao, James Atlas, Alexander Chernoglazov, Varut Vardhanabhuti, Ge Wang

    Abstract: Spectral computed tomography (CT) has attracted much attention in radiation dose reduction, metal artifacts removal, tissue quantification and material discrimination. The x-ray energy spectrum is divided into several bins, each energy-bin-specific projection has a low signal-noise-ratio (SNR) than the current-integrating counterpart, which makes image reconstruction a unique challenge. Traditiona… ▽ More

    Submitted 25 August, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

  39. arXiv:2008.01846  [pdf

    eess.IV cs.CV cs.LG

    Stabilizing Deep Tomographic Reconstruction

    Authors: Weiwen Wu, Dianlin Hu, Wenxiang Cong, Hongming Shan, Shaoyu Wang, Chuang Niu, **kun Yan, Hengyong Yu, Varut Vardhanabhuti, Ge Wang

    Abstract: Tomographic image reconstruction with deep learning is an emerging field, but a recent landmark study reveals that several deep reconstruction networks are unstable for computed tomography (CT) and magnetic resonance imaging (MRI). Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missing in a deeply reconstructed image… ▽ More

    Submitted 13 September, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: 78 pages, 30 figures, 149 references

  40. arXiv:1904.09115  [pdf, other

    cs.CV cs.HC cs.MM cs.SD eess.AS

    Listen to the Image

    Authors: Di Hu, Dong Wang, Xuelong Li, Fei** Nie, Qi Wang

    Abstract: Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Comments: Accepted by CVPR2019

  41. arXiv:1807.03094  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Deep Multimodal Clustering for Unsupervised Audiovisual Learning

    Authors: Di Hu, Fei** Nie, Xuelong Li

    Abstract: The seen birds twitter, the running cars accompany with noise, etc. These naturally audiovisual correspondences provide the possibilities to explore and understand the outside world. However, the mixed multiple objects and sounds make it intractable to perform efficient matching in the unconstrained environment. To settle this problem, we propose to adequately excavate audio and visual components… ▽ More

    Submitted 19 April, 2019; v1 submitted 9 July, 2018; originally announced July 2018.

    Comments: Accepted by CVPR2019

  42. A New Technique for INS/GNSS Attitude and Parameter Estimation Using Online Optimization

    Authors: Yuanxin Wu, **ling Wang, Dewen Hu

    Abstract: Integration of inertial navigation system (INS) and global navigation satellite system (GNSS) is usually implemented in engineering applications by way of Kalman-like filtering. This form of INS/GNSS integration is prone to attitude initialization failure, especially when the host vehicle is moving freely. This paper proposes an online constrained-optimization method to simultaneously estimate the… ▽ More

    Submitted 10 March, 2014; originally announced March 2014.

    Comments: IEEE Trans. on Signal Processing, to appear

    Journal ref: IEEE Trans. on Signal Processing, 62 (10), 2642 - 2655, 2014

  43. Observability of Strapdown INS Alignment: A Global Perspective

    Authors: Yuanxin Wu, Hongliang Zhang, Mei** Wu, ** Hu, Dewen Hu

    Abstract: Alignment of the strapdown inertial navigation system (INS) has strong nonlinearity, even worse when maneuvers, e.g., tumbling techniques, are employed to improve the alignment. There is no general rule to attack the observability of a nonlinear system, so most previous works addressed the observability of the corresponding linearized system by implicitly assuming that the original nonlinear syste… ▽ More

    Submitted 22 December, 2011; originally announced December 2011.

    Comments: 25 pages; IEEE Trans. on Aerospace and Electronic Systems, Jan. 2012

    Journal ref: IEEE Trans. on Aerospace and Electronic Systems, 48(1), pp. 78-102, 2012