Search | arXiv e-print repository

JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

Authors: Boyu Chen, Peike Li, Yao Yao, Alex Wang

Abstract: Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper… ▽ More Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.08672 [pdf, other]

EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at https://github.com/BeileiCui/EndoDAC. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: early accepted by MICCAI 2024

arXiv:2404.10640 [pdf, other]

Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos

Authors: Jieming Yu, Long Bai, Guankun Wang, An Wang, Xiaoxiao Yang, Huxin Gao, Hongliang Ren

Abstract: The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instr… ▽ More The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instrument segmentation and tracking. Specifically, with a tiny subset of frames for segmentation, we ensure accurate segmentation across the entire surgical video. Our method adopts a two-stage approach to efficiently segment videos. Initially, we utilize the Segment-Anything (SAM) model, which has been fine-tuned using the Low-Rank Adaptation (LoRA) on the EndoVis17 Dataset. The fine-tuned SAM model is applied to segment the initial frames of the video accurately. Subsequently, we deploy the XMem++ tracking algorithm to follow the annotated frames, thereby facilitating the segmentation of the entire video sequence. This workflow enables us to precisely segment and track objects within the video. Through extensive evaluation of the in-distribution dataset (EndoVis17) and the out-of-distribution datasets (EndoVis18 \& the endoscopic submucosal dissection surgery (ESD) dataset), our framework demonstrates exceptional accuracy and robustness, thus showcasing its potential to advance the automated robotic-assisted surgery. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: To appear in IEEE ICRA 2024 C4SR+ Workshop

arXiv:2404.08199 [pdf, other]

doi 10.1109/TCSII.2023.3266594

Cepstral Analysis Based Artifact Detection, Recognition and Removal for Prefrontal EEG

Authors: Siqi Han, Chao Zhang, Jiaxin Lei, Qingquan Han, Yuhui Du, Anhe Wang, Shuo Bai, Milin Zhang

Abstract: This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the a… ▽ More This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the artifacts from the target EEG signals. The proposed method achieves an accuracy of 99.62% on the artifact detection task and a 82.79% accuracy on the 6-category eye movement classification task. A statistical value-based artifact removal method is proposed and evaluated on a public EEG database, where an accuracy improvement of 3.46% is obtained on the 3-category emotion classification task. In order to make a confident decision of each 5s EEG segment, the algorithm requires only 0.66M multiplication operations. Compared to the state-of-the-art approaches in artifact detection and removal, the proposed method features higher detection accuracy and lower computational cost, which makes it a more suitable solution to be integrated into a real-time and artifact robust Brain-Machine Interface (BMI). △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 5 pages, 4 figures, published by TCAS-II

Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2023

arXiv:2403.15411 [pdf, other]

UAV Deployment Optimization in UAV-assisted Wireless Communications

Authors: Xueqi Zhang, Aimin Wang, Geng Sun, Lingling Liu, **g Zhang

Abstract: Due to the fact that the locations of base stations (BSs) cannot be changed after they are installed, it is very difficult to communicate directly with remote user equipment (UE), which will directly affect the lifespan of the system. Unmanned aerial vehicles (UAVs) offer a hopeful solution as mobile relays for fifth-generation wireless communications due to the flexible and cost-effective deploym… ▽ More Due to the fact that the locations of base stations (BSs) cannot be changed after they are installed, it is very difficult to communicate directly with remote user equipment (UE), which will directly affect the lifespan of the system. Unmanned aerial vehicles (UAVs) offer a hopeful solution as mobile relays for fifth-generation wireless communications due to the flexible and cost-effective deployment. However, with the limited onboard energy of UAV and slow progress in energy storage technology, it is a key challenge to achieve the energy-efficient communication. Therefore, in this article, we study a wireless communication network using a UAV as a high-altitude relay, and formulate a UAV relay deployment optimization problem (URDOP) to minimize the energy consumption of system by optimizing the deployment of UAV, including the locations and number of UAV hover points. Since the formulated URDOP is a mixed-integer programming problem, it presents a significant challenge for conventional gradient-based approaches. To this end, we propose a self-adaptive differential evolution with a variable population size (SaDEVPS) algorithm to solve the formulated URDOP. The performance of proposed SaDEVPS is verified through simulations, and the results show that it can successfully decrease the energy consumption of system when compared to other benchmark algorithms across multiple instances. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures, 2 tables

arXiv:2403.15410 [pdf, other]

Secure and Energy-efficient Unmanned Aerial Vehicle-enabled Visible Light Communication via A Multi-objective Optimization Approach

Authors: Lingling Liu, Aimin Wang, **g Wu, Jiao Lu, Jiahui Li, Geng Sun

Abstract: In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a si… ▽ More In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a single photodetector. Then, a unmanned aerial vehicle deployment multi-objective optimization problem is formulated to simultaneously make the optical power received by receiving surface more uniform, minimize the amount of information collected by a eavesdropper, and minimize the energy consumption of unmanned aerial vehicles, while the locations and transmission power of unmanned aerial vehicles are simultaneously optimized under certain constraints. Since the formulated unmanned aerial vehicle deployment multi-objective optimization problem is complex and nonlinear, it is challenging to be tackled by using conventional methods. For the purpose of solving the problem, a multi-objective evolutionary algorithm based on decomposition with chaos initiation and crossover mutation is proposed. Simulation outcomes show that the proposed approach is superior to other approaches, and is efficient at improving the security and energy efficiency of visible light communication system. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 18 pages, 9 tables, 3 tables

arXiv:2403.12985 [pdf, other]

Multi-objective Optimization for Data Collection in UAV-assisted Agricultural IoT

Authors: Lingling Liu, Aimin Wang, Geng Sun, Jiahui Li, Hongyang Pan, Tony Q. S. Quek

Abstract: The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, the… ▽ More The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, therefore in this work we consider employing a UAV as an aerial BS to acquire data of agricultural Internet of Things (IoT) devices. To this end, we first formulate a UAV-assisted data collection multi-objective optimization problem (UDCMOP) to efficiently collect the data from agricultural sensing devices. Specifically, we aim to collaboratively optimize the hovering positions of UAV, visit sequence of UAV, speed of UAV, in addition to the transmit power of devices, to simultaneously achieve the maximization of minimum transmit rate of devices, the minimization of total energy consumption of devices, and the minimization of total energy consumption of UAV. Second, the proposed UDCMOP is a non-convex mixed integer nonlinear optimization problem, which indicates that it includes continuous and discrete solutions, making it intractable to be solved. Therefore, we solve it by proposing an improved multi-objective artificial hummingbird algorithm (IMOAHA) with several specific improvement factors, that are the hybrid initialization operator, Cauchy mutation foraging operator, in addition to the discrete mutation operator. Finally, simulations are carried out to testify that the proposed IMOAHA can effectively improve the system performance comparing to other benchmarks. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, 4 tables

arXiv:2403.09327 [pdf, other]

Perspective-Equivariant Imaging: an Unsupervised Framework for Multispectral Pansharpening

Authors: Andrew Wang, Mike Davies

Abstract: Ill-posed image reconstruction problems appear in many scenarios such as remote sensing, where obtaining high quality images is crucial for environmental monitoring, disaster management and urban planning. Deep learning has seen great success in overcoming the limitations of traditional methods. However, these inverse problems rarely come with ground truth data, highlighting the importance of unsu… ▽ More Ill-posed image reconstruction problems appear in many scenarios such as remote sensing, where obtaining high quality images is crucial for environmental monitoring, disaster management and urban planning. Deep learning has seen great success in overcoming the limitations of traditional methods. However, these inverse problems rarely come with ground truth data, highlighting the importance of unsupervised learning from partial and noisy measurements alone. We propose perspective-equivariant imaging (EI), a framework that leverages perspective variability in optical camera-based imaging systems, such as satellites or handheld cameras, to recover information lost in ill-posed optical camera imaging problems. This extends previous EI work to include a much richer non-linear class of group transforms and is shown to be an excellent prior for satellite and urban image data, where perspective-EI achieves state-of-the-art results in multispectral pansharpening, outperforming other unsupervised methods in the literature. Code at https://andrewwango.github.io/perspective-equivariant-imaging △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Pre-print

arXiv:2403.06459 [pdf, other]

From Pixel to Cancer: Cellular Automata in Computed Tomography

Authors: Yuxiang Lai, Xiaoxi Chen, Angtian Wang, Alan Yuille, Zongwei Zhou

Abstract: AI for cancer detection encounters the bottleneck of data scarcity, annotation difficulty, and low prevalence of early tumors. Tumor synthesis seeks to create artificial tumors in medical images, which can greatly diversify the data and annotations for AI training. However, current tumor synthesis approaches are not applicable across different organs due to their need for specific expertise and de… ▽ More AI for cancer detection encounters the bottleneck of data scarcity, annotation difficulty, and low prevalence of early tumors. Tumor synthesis seeks to create artificial tumors in medical images, which can greatly diversify the data and annotations for AI training. However, current tumor synthesis approaches are not applicable across different organs due to their need for specific expertise and design. This paper establishes a set of generic rules to simulate tumor development. Each cell (pixel) is initially assigned a state between zero and ten to represent the tumor population, and a tumor can be developed based on three rules to describe the process of growth, invasion, and death. We apply these three generic rules to simulate tumor development--from pixel to cancer--using cellular automata. We then integrate the tumor state into the original computed tomography (CT) images to generate synthetic tumors across different organs. This tumor synthesis approach allows for sampling tumors at multiple stages and analyzing tumor-organ interaction. Clinically, a reader study involving three expert radiologists reveals that the synthetic tumors and their develo** trajectories are convincingly realistic. Technically, we generate tumors at varied stages in 9,262 raw, unlabeled CT images sourced from 68 hospitals worldwide. The performance in segmenting tumors in the liver, pancreas, and kidneys exceeds prevailing literature benchmarks, underlining the immense potential of tumor synthesis, especially for earlier cancer detection. The code and models are available at https://github.com/MrGiovanni/Pixel2Cancer △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.04116 [pdf, other]

Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis

Authors: Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille

Abstract: X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian… ▽ More X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. Code and models will be publicly available at https://github.com/caiyuanhao1998/X-Gaussian . A video demo of the training process visualization is at https://www.youtube.com/watch?v=gDVf_Ngeghg . △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: The first 3D Gaussian Splatting-based method for X-ray 3D reconstruction

arXiv:2402.08159 [pdf, other]

Poisson flow consistency models for low-dose CT image denoising

Authors: Dennis Hein, Adam Wang, Ge Wang

Abstract: Diffusion and Poisson flow models have demonstrated remarkable success for a wide range of generative tasks. Nevertheless, their iterative nature results in computationally expensive sampling and the number of function evaluations (NFE) required can be orders of magnitude larger than for single-step methods. Consistency models are a recent class of deep generative models which enable single-step s… ▽ More Diffusion and Poisson flow models have demonstrated remarkable success for a wide range of generative tasks. Nevertheless, their iterative nature results in computationally expensive sampling and the number of function evaluations (NFE) required can be orders of magnitude larger than for single-step methods. Consistency models are a recent class of deep generative models which enable single-step sampling of high quality data without the need for adversarial training. In this paper, we introduce a novel image denoising technique which combines the flexibility afforded in Poisson flow generative models (PFGM)++ with the, high quality, single step sampling of consistency models. The proposed method first learns a trajectory between a noise distribution and the posterior distribution of interest by training PFGM++ in a supervised fashion. These pre-trained PFGM++ are subsequently "distilled" into Poisson flow consistency models (PFCM) via an updated version of consistency distillation. We call this approach posterior sampling Poisson flow consistency models (PS-PFCM). Our results indicate that the added flexibility of tuning the hyperparameter D, the dimensionality of the augmentation variables in PFGM++, allows us to outperform consistency models, a current state-of-the-art diffusion-style model with NFE=1 on clinical low-dose CT images. Notably, PFCM is in itself a novel family of deep generative models and we provide initial results on the CIFAR-10 dataset. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.09791 [pdf]

BreastRegNet: A Deep Learning Framework for Registration of Breast Faxitron and Histopathology Images

Authors: Negar Golestani, Aihui Wang, Gregory R Bean, Mirabela Rusu

Abstract: A standard treatment protocol for breast cancer entails administering neoadjuvant therapy followed by surgical removal of the tumor and surrounding tissue. Pathologists typically rely on cabinet X-ray radiographs, known as Faxitron, to examine the excised breast tissue and diagnose the extent of residual disease. However, accurately determining the location, size, and focality of residual cancer c… ▽ More A standard treatment protocol for breast cancer entails administering neoadjuvant therapy followed by surgical removal of the tumor and surrounding tissue. Pathologists typically rely on cabinet X-ray radiographs, known as Faxitron, to examine the excised breast tissue and diagnose the extent of residual disease. However, accurately determining the location, size, and focality of residual cancer can be challenging, and incorrect assessments can lead to clinical consequences. The utilization of automated methods can improve the histopathology process, allowing pathologists to choose regions for sampling more effectively and precisely. Despite the recognized necessity, there are currently no such methods available. Training such automated detection models require accurate ground truth labels on ex-vivo radiology images, which can be acquired through registering Faxitron and histopathology images and map** the extent of cancer from histopathology to x-ray images. This study introduces a deep learning-based image registration approach trained on mono-modal synthetic image pairs. The models were trained using data from 50 women who received neoadjuvant chemotherapy and underwent surgery. The results demonstrate that our method is faster and yields significantly lower average landmark error ($2.1\pm1.96$ mm) over the state-of-the-art iterative ($4.43\pm4.1$ mm) and deep learning ($4.02\pm3.15$ mm) approaches. Improved performance of our approach in integrating radiology and pathology information facilitates generating large datasets, which allows training models for more accurate breast cancer detection. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2312.05832 [pdf, other]

Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains

Authors: Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren

Abstract: Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting i… ▽ More Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 10 pages, 6 figures

arXiv:2312.01566 [pdf, other]

Coronary Atherosclerotic Plaque Characterization with Photon-counting CT: a Simulation-based Feasibility Study

Authors: Mengzhou Li, Mingye Wu, Jed Pack, Pengwei Wu, Bruno De Man, Adam Wang, Koen Nieman, Ge Wang

Abstract: Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities… ▽ More Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities. In this work, we report a systematic simulation study of a deep-silicon PCCT scanner with a new clinically-relevant digital plaque phantom with realistic geometrical parameters and chemical compositions. This work investigates the effects of spatial resolution, noise, motion artifacts, radiation dose, and spectral characterization. Our simulation results suggest that the deep-silicon PCCT design provides adequate spatial resolution for visualizing a necrotic core and quantitation of key plaque features. Advanced denoising techniques and aggressive bowtie filter designs can keep image noise to acceptable levels at this resolution while kee** radiation dose comparable to that of a conventional CT scan. The ultrahigh resolution of PCCT also means an elevated sensitivity to motion artifacts. It is found that a tolerance of less than 0.4 mm residual movement range requires the application of accurate motion correction methods for best plaque imaging quality with PCCT. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 13 figures, 5 tables

arXiv:2311.10959 [pdf, other]

Structure-Aware Sparse-View X-ray 3D Reconstruction

Authors: Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang

Abstract: X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural… ▽ More X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data are released at https://github.com/caiyuanhao1998/SAX-NeRF △ Less

Submitted 23 March, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: CVPR 2024; The first Transformer-based method for X-ray and CT 3D reconstruction

arXiv:2310.19180 [pdf, other]

JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation

Authors: Yao Yao, Peike Li, Boyu Chen, Alex Wang

Abstract: With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner… ▽ More With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner, differing from typical workflows of human composers. To address this issue, we propose JEN-1 Composer, a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music via a single model. JEN-1 Composer framework exhibits the capacity to seamlessly incorporate any diffusion-based music generation system, \textit{e.g.} Jen-1, enhancing its capacity for versatile multi-track music generation. We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations. During the inference, users have the ability to iteratively produce and choose music tracks that meet their preferences, subsequently creating an entire musical composition incrementally following the proposed Human-AI co-composition workflow. Quantitative and qualitative assessments demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis. The proposed JEN-1 Composer represents a significant advance toward interactive AI-facilitated music creation and composition. Demos will be available at https://www.jenmusic.ai/audio-demos. △ Less

Submitted 2 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: Preprints

arXiv:2310.13124 [pdf]

Efficient online cross-covariance monitoring with incremental SVD: An approach for the detection of emerging dependency patterns in IoT systems

Authors: Xinmiao Luan, Qing Zou, Jian Li, Andi Wang

Abstract: The development of the manufacturing systems has made it increasingly necessary to monitor the data generated by multiple interconnected subsystems with rapid incoming of samples. Based on incremental Singular Value Decomposition (ISVD), we develop a general online monitoring approach for the relationship of data generated from two interconnected subsystems, where each subsystem produces big data… ▽ More The development of the manufacturing systems has made it increasingly necessary to monitor the data generated by multiple interconnected subsystems with rapid incoming of samples. Based on incremental Singular Value Decomposition (ISVD), we develop a general online monitoring approach for the relationship of data generated from two interconnected subsystems, where each subsystem produces big data streams with several variation patterns in normal working condition. When special situation happens and new associations occur, a very small amount of computation is sufficient to update the system status and compute the control statistics by using this approach. The proposed method reduces computational overhead and retains only a small number of pairs of possible dependent patterns at each step. The validation of the method through simulation studies and a case study on semiconductor manufacturing processes further supports its effectiveness. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.00396 [pdf, other]

Joint Scheduling and Trajectory Optimization of Charging UAV in Wireless Rechargeable Sensor Networks

Authors: Yanheng Liu, Hongyang Pan, Geng Sun, Aimin Wang, Jiahui Li, Shuang Liang

Abstract: Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimizatio… ▽ More Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimization problem (JSTOP) to simultaneously minimize the hovering points of CUAV, the number of the repeatedly covered SNs and the flying distance of CUAV for charging all SNs. Due to the complexity of JSTOP, it is decomposed into two optimization subproblems that are CUAV scheduling optimization problem (CSOP) and CUAV trajectory optimization problem (CTOP). CSOP is a hybrid optimization problem that consists of the continuous and discrete solution space, and the solution dimension in CSOP is not fixed since it should be changed with the number of hovering points of CUAV. Moreover, CTOP is a completely discrete optimization problem. Thus, we propose a particle swarm optimization (PSO) with a flexible dimension mechanism, a K-means operator and a punishment-compensation mechanism (PSOFKP) and a PSO with a discretization factor, a 2-opt operator and a path crossover reduction mechanism (PSOD2P) to solve the converted CSOP and CTOP, respectively. Simulation results evaluate the benefits of PSOFKP and PSOD2P under different scales and settings of the network, and the stability of the proposed algorithms is verified. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2308.07156 [pdf, other]

SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation

Authors: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations,… ▽ More The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compare the performance of SAM with state-of-the-art supervised models. We conduct all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlap** instruments within the same bounding box or with the point-based prompt. In fact, SAM struggles to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempt to fine-tune SAM using Low-rank Adaptation (LoRA) and propose SurgicalSAM, which shows the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted as Oral Presentation at MedAGI Workshop - MICCAI 2023 1st International Workshop on Foundation Models for General Medical AI. arXiv admin note: substantial text overlap with arXiv:2304.14674

arXiv:2308.04729 [pdf, other]

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Authors: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

Abstract: Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational ef… ▽ More Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1 △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.03094 [pdf]

A reconfigurable multiple-format coherent-dual-band signal generator based on a single optoelectronic oscillation cavity

Authors: Yibei Wang, Yalan Wang, Hongyi Wang, Xiaotong Liu, Hong Chen, ** Zhang, Dongyu Li, Dangwei Wang, Anle Wang

Abstract: An optoelectronic oscillation method with reconfigurable multiple formats for simultaneous generation of coherent dual-band signals is proposed and experimentally demonstrated. By introducing a compatible filtering mechanism based on stimulated Brillouin scattering (SBS) effect into a typical Phase-shifted grating Bragg fiber (PS-FBG) notch filtering cavity, dual mode-selection mechanisms which ha… ▽ More An optoelectronic oscillation method with reconfigurable multiple formats for simultaneous generation of coherent dual-band signals is proposed and experimentally demonstrated. By introducing a compatible filtering mechanism based on stimulated Brillouin scattering (SBS) effect into a typical Phase-shifted grating Bragg fiber (PS-FBG) notch filtering cavity, dual mode-selection mechanisms which have independent frequency and time tuning mechanism can be constructed. By regulating three controllers, the proposed scheme can work in different states, named mode 1, mode 2 and mode 3. At mode 1 state, a dual single-frequency hop** signals is achieved with 50 ns hop** speed and flexible central frequency and pulse duration ratio. The mode 2 state is realized by applying the Fourier domain mode-locked (FDML) technology into the proposed optoelectrical oscillator, in which dual coherent pulsed single-frequency signal and broadband signal is generated simultaneously. The adjustability of the time duration of the single-frequency signal and the bandwidth of the broadband signal are shown and discussed. The mode 3 state is a dual broadband signal generator which is realized by injecting a triangular wave into the signal laser. The detection performance of the generated broadband signals has also been evaluated by the pulse compression and the phase noise figure. The proposed method may provide a multifunctional radar system signal generator based on the simply external controllers, which can realize low-phase-noise or multifunctional detection with high resolution imaging ability, especially in a complex interference environment. △ Less

Submitted 6 August, 2023; originally announced August 2023.

Comments: 12 pages, 8 figures

arXiv:2308.02845 [pdf, other]

Landmark Detection using Transformer Toward Robot-assisted Nasal Airway Intubation

Authors: Tianhang Liu, Hechen Li, Long Bai, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long… ▽ More Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long-range dependence. However, current DeTR requires long iterations to converge, and does not perform well in detecting small objects. This paper proposes a transformer-based landmark detection solution with deformable DeTR and the semantic-aligned-matching module for detecting landmarks in robot-assisted intubation. The semantics aligner can effectively align the semantics of object queries and image features in the same embedding space using the most discriminative features. To evaluate the performance of our solution, we utilize a publicly accessible glottis dataset and automatically annotate a nostril detection dataset. The experimental results demonstrate our competitive performance in detection accuracy. Our code is publicly accessible. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: ICBIR 2023 (Best Student Paper Award). Code availability: https://github.com/ConorLTH/airway_intubation_landmarks_detection

arXiv:2307.02452 [pdf, other]

LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

Authors: Long Bai, Tong Chen, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gr… ▽ More Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gradually attracts researchers. Given the exuberant development of the denoising diffusion probabilistic model (DDPM) in computer vision, we introduce a WCE LLIE framework based on the multi-scale convolutional neural network (CNN) and reverse diffusion process. The multi-scale design allows models to preserve high-resolution representation and context information from low-resolution, while the curved wavelet attention (CWA) block is proposed for high-frequency and local feature learning. Furthermore, we combine the reverse diffusion procedure to further optimize the shallow output and generate the most realistic image. The proposed method is compared with ten state-of-the-art (SOTA) LLIE methods and significantly outperforms quantitatively and qualitatively. The superior performance on GI disease segmentation further demonstrates the clinical potential of our proposed model. Our code is publicly accessible. △ Less

Submitted 22 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/LLCaps

arXiv:2306.16285 [pdf, other]

Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

Authors: An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract: Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging… ▽ More Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: First two authors contributed equally. Accepted by IROS2023

arXiv:2306.12109 [pdf, other]

DiffuseIR:Diffusion Models For Isotropic Reconstruction of 3D Microscopic Images

Authors: Mingjie Pan, Yulu Gan, Fangxu Zhou, Jiaming Liu, Aimin Wang, Shanghang Zhang, Dawei Li

Abstract: Three-dimensional microscopy is often limited by anisotropic spatial resolution, resulting in lower axial resolution than lateral resolution. Current State-of-The-Art (SoTA) isotropic reconstruction methods utilizing deep neural networks can achieve impressive super-resolution performance in fixed imaging settings. However, their generality in practical use is limited by degraded performance cause… ▽ More Three-dimensional microscopy is often limited by anisotropic spatial resolution, resulting in lower axial resolution than lateral resolution. Current State-of-The-Art (SoTA) isotropic reconstruction methods utilizing deep neural networks can achieve impressive super-resolution performance in fixed imaging settings. However, their generality in practical use is limited by degraded performance caused by artifacts and blurring when facing unseen anisotropic factors. To address these issues, we propose DiffuseIR, an unsupervised method for isotropic reconstruction based on diffusion models. First, we pre-train a diffusion model to learn the structural distribution of biological tissue from lateral microscopic images, resulting in generating naturally high-resolution images. Then we use low-axial-resolution microscopy images to condition the generation process of the diffusion model and generate high-axial-resolution reconstruction results. Since the diffusion model learns the universal structural distribution of biological tissues, which is independent of the axial resolution, DiffuseIR can reconstruct authentic images with unseen low-axial resolutions into a high-axial resolution without requiring re-training. The proposed DiffuseIR achieves SoTA performance in experiments on EM data and can even compete with supervised methods. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.03511 [pdf, other]

Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical Image Segmentation

Authors: An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract: Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance ac… ▽ More Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance across multiple testing domains, this work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation. In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift in the deployment phase, where the higher the shift is, the harder to recognize the variance. Considering this, we progressively introduce more amplitude information from the target domain to the source domain in the frequency space during the curriculum-style training to smoothly schedule the semantic knowledge transfer in an easier-to-harder manner. Besides, we incorporate the training-time chained augmentation mixing to help expand the data distributions while preserving the domain-invariant semantics, which is beneficial for the acquired model to be more robust and generalize better to unseen domains. Extensive experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance. Meanwhile, our approach proves to be more robust under various corruption types and increasing severity levels. In addition, we show our method is also beneficial in the domain-adaptive classification task with skin lesion datasets. The code is available at https://github.com/lofrienger/Curri-AFDA. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: Work under review. First three authors contributed equally

arXiv:2306.00451 [pdf, other]

S$^2$ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for Scribble-supervised Polyp Segmentation

Authors: An Wang, Mengya Xu, Yang Zhang, Mobarakol Islam, Hongliang Ren

Abstract: Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues… ▽ More Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues, including data shifts and corruption, put forward further requests for model generalization and robustness. To address these concerns, we design a framework of Spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning (S$^2$ME). Concretely, for the first time in weakly-supervised medical image segmentation, we promote the dual-branch co-teaching framework by leveraging the intrinsic complementarity of features extracted from the spatial and spectral domains and encouraging cross-space consistency through collaborative optimization. Furthermore, to produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning, we introduce a novel adaptive pixel-wise fusion technique based on the entropy guidance from the spatial and spectral branches. Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels and surpasses previous alternatives in terms of efficacy. Ultimately, we formulate a holistic optimization objective to learn from the hybrid supervision of scribbles and pseudo labels. Extensive experiments and evaluation on four public datasets demonstrate the superiority of our method regarding in-distribution accuracy, out-of-distribution generalization, and robustness, highlighting its promising clinical significance. Our code is available at https://github.com/lofrienger/S2ME. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: MICCAI 2023 Early Acceptance

arXiv:2304.14674 [pdf, other]

SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective

Authors: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corrupt… ▽ More Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlap** instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Work under active progress

arXiv:2304.06477 [pdf, other]

Building Performance Simulations Can Inform IoT Privacy Leaks in Buildings

Authors: Alan Wang, Bradford Campbell, Arsalan Heydarian

Abstract: As IoT devices become cheaper, smaller, and more ubiquitously deployed, they can reveal more information than their intended design and threaten user privacy. Indoor Environmental Quality (IEQ) sensors previously installed for energy savings and indoor health monitoring have emerged as an avenue to infer sensitive occupant information. For example, light sensors are a known conduit for inspecting… ▽ More As IoT devices become cheaper, smaller, and more ubiquitously deployed, they can reveal more information than their intended design and threaten user privacy. Indoor Environmental Quality (IEQ) sensors previously installed for energy savings and indoor health monitoring have emerged as an avenue to infer sensitive occupant information. For example, light sensors are a known conduit for inspecting room occupancy status with motion-sensitive lights. Light signals can also infer sensitive data such as occupant identity and digital screen information. To limit sensor overreach, we explore the selection of sensor placements as a methodology. Specifically, in this proof-of-concept exploration, we demonstrate the potential of physics-based simulation models to quantify the minimal number of positions necessary to capture sensitive inferences. We show how a single well-placed sensor can be sufficient in specific building contexts to holistically capture its environmental states and how additional well-placed sensors can contribute to more granular inferences. We contribute a device-agnostic and building-adaptive workflow to respectfully capture inferable occupant activity and elaborate on the implications of incorporating building simulations into sensing schemes in the real world. △ Less

Submitted 26 March, 2023; originally announced April 2023.

arXiv:2304.01568 [pdf, other]

doi 10.1109/ECAI58194.2023.10193930

Arrhythmia Classifier Based on Ultra-Lightweight Binary Neural Network

Authors: Ninghao Pu, Zhongxing Wu, Ao Wang, Hanshi Sun, Zi** Liu, Hao Liu

Abstract: Reasonably and effectively monitoring arrhythmias through ECG signals has significant implications for human health. With the development of deep learning, numerous ECG classification algorithms based on deep learning have emerged. However, most existing algorithms trade off high accuracy for complex models, resulting in high storage usage and power consumption. This also inevitably increases the… ▽ More Reasonably and effectively monitoring arrhythmias through ECG signals has significant implications for human health. With the development of deep learning, numerous ECG classification algorithms based on deep learning have emerged. However, most existing algorithms trade off high accuracy for complex models, resulting in high storage usage and power consumption. This also inevitably increases the difficulty of implementation on wearable Artificial Intelligence-of-Things (AIoT) devices with limited resources. In this study, we proposed a universally applicable ultra-lightweight binary neural network(BNN) that is capable of 5-class and 17-class arrhythmia classification based on ECG signals. Our BNN achieves 96.90% (full precision 97.09%) and 97.50% (full precision 98.00%) accuracy for 5-class and 17-class classification, respectively, with state-of-the-art storage usage (3.76 KB and 4.45 KB). Compared to other binarization works, our approach excels in supporting two multi-classification modes while achieving the smallest known storage space. Moreover, our model achieves optimal accuracy in 17-class classification and boasts an elegantly simple network architecture. The algorithm we use is optimized specifically for hardware implementation. Our research showcases the potential of lightweight deep learning models in the healthcare industry, specifically in wearable medical devices, which hold great promise for improving patient outcomes and quality of life. Code is available on: https://github.com/xpww/ECG_BNN_Net △ Less

Submitted 25 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 6 pages, 3 figures

arXiv:2303.12148 [pdf, other]

Neural Pre-Processing: A Learning Framework for End-to-end Brain MRI Pre-processing

Authors: Xinzi He, Alan Wang, Mert R. Sabuncu

Abstract: Head MRI pre-processing involves converting raw images to an intensity-normalized, skull-stripped brain in a standard coordinate space. In this paper, we propose an end-to-end weakly supervised learning approach, called Neural Pre-processing (NPP), for solving all three sub-tasks simultaneously via a neural network, trained on a large dataset without individual sub-task supervision. Because the ov… ▽ More Head MRI pre-processing involves converting raw images to an intensity-normalized, skull-stripped brain in a standard coordinate space. In this paper, we propose an end-to-end weakly supervised learning approach, called Neural Pre-processing (NPP), for solving all three sub-tasks simultaneously via a neural network, trained on a large dataset without individual sub-task supervision. Because the overall objective is highly under-constrained, we explicitly disentangle geometric-preserving intensity map** (skull-strip** and intensity normalization) and spatial transformation (spatial normalization). Quantitative results show that our model outperforms state-of-the-art methods which tackle only a single sub-task. Our ablation experiments demonstrate the importance of the architecture design we chose for NPP. Furthermore, NPP affords the user the flexibility to control each of these tasks at inference time. The code and model are freely-available at \url{https://github.com/Novestars/Neural-Pre-processing}. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 8

arXiv:2211.13937 [pdf, other]

Operator Splitting Value Iteration

Authors: Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand

Abstract: We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence… ▽ More We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna. Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: Accepted to NeurIPS2022

arXiv:2211.12421 [pdf, other]

Data-Driven Network Neuroscience: On Data Collection and Benchmark

Authors: Jiaxing Xu, Yunhan Yang, David Tse Jung Huang, Sophi Shilpa Gururajapathy, Yi** Ke, Miao Qiao, Alan Wang, Haribalan Kumar, Josh McGeown, Eryn Kwon

Abstract: This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics. Anatomical and functional MRI images have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such… ▽ More This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics. Anatomical and functional MRI images have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such as Alzheimer's, Parkinson's, and Autism. Recently, the study of the brain in the form of brain networks using machine learning and graph analytics has become increasingly popular, especially to predict the early onset of these conditions. A brain network, represented as a graph, retains rich structural and positional information that traditional examination methods are unable to capture. However, the lack of publicly accessible brain network data prevents researchers from data-driven explorations. One of the main difficulties lies in the complicated domain-specific preprocessing steps and the exhaustive computation required to convert the data from MRI images into brain networks. We bridge this gap by collecting a large amount of MRI images from public databases and a private source, working with domain experts to make sensible design choices, and preprocessing the MRI images to produce a collection of brain network datasets. The datasets originate from 6 different sources, cover 4 brain conditions, and consist of a total of 2,702 subjects. We test our graph datasets on 12 machine learning models to provide baselines and validate the data quality on a recent graph analysis model. To lower the barrier to entry and promote the research in this interdisciplinary field, we release our brain network data and complete preprocessing details including codes at https://doi.org/10.17608/k6.auckland.21397377 and https://github.com/brainnetuoa/data_driven_network_neuroscience. △ Less

Submitted 29 October, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Journal ref: Advances in Neural Information Processing Systems, 2023

arXiv:2211.04894 [pdf, other]

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, **gwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on conte… ▽ More The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on contents. To understand how these two perspectives affect overall subjective opinions in UGC-VQA, we conduct a large-scale subjective study to collect human quality opinions on overall quality of videos as well as perceptions from aesthetic and technical perspectives. The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. The DOVER proves state-of-the-art performance in UGC-VQA under very high efficiency. With perspective opinions in DIVIDE-3k, we further propose DOVER++, the first approach to provide reliable clear-cut quality evaluations from a single aesthetic or technical perspective. Code at https://github.com/VQAssessment/DOVER. △ Less

Submitted 7 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2209.01386 [pdf, other]

SaleNet: A low-power end-to-end CNN accelerator for sustained attention level evaluation using EEG

Authors: Chao Zhang, Zijian Tang, Taoming Guo, Jiaxin Lei, Jiaxin Xiao, Anhe Wang, Shuo Bai, Milin Zhang

Abstract: This paper proposes SaleNet - an end-to-end convolutional neural network (CNN) for sustained attention level evaluation using prefrontal electroencephalogram (EEG). A bias-driven pruning method is proposed together with group convolution, global average pooling (GAP), near-zero pruning, weight clustering and quantization for the model compression, achieving a total compression ratio of 183.11x. Th… ▽ More This paper proposes SaleNet - an end-to-end convolutional neural network (CNN) for sustained attention level evaluation using prefrontal electroencephalogram (EEG). A bias-driven pruning method is proposed together with group convolution, global average pooling (GAP), near-zero pruning, weight clustering and quantization for the model compression, achieving a total compression ratio of 183.11x. The compressed SaleNet obtains a state-of-the-art subject-independent sustained attention level classification accuracy of 84.2% on the recorded 6-subject EEG database in this work. The SaleNet is implemented on a Artix-7 FPGA with a competitive power consumption of 0.11 W and an energy-efficiency of 8.19 GOps/W. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 5 pages, 4 figures, to be published in IEEE International Symposium on Circuits and Systems (ISCAS) 2022

arXiv:2203.04294 [pdf, other]

NaviAirway: a Bronchiole-sensitive Deep Learning-based Airway Segmentation Pipeline

Authors: Andong Wang, Terence Chi Chun Tam, Ho Ming Poon, Kun-Chang Yu, Wei-Ning Lee

Abstract: Airway segmentation is essential for chest CT image analysis. Different from natural image segmentation, which pursues high pixel-wise accuracy, airway segmentation focuses on topology. The task is challenging not only because of its complex tree-like structure but also the severe pixel imbalance among airway branches of different generations. To tackle the problems, we present a NaviAirway method… ▽ More Airway segmentation is essential for chest CT image analysis. Different from natural image segmentation, which pursues high pixel-wise accuracy, airway segmentation focuses on topology. The task is challenging not only because of its complex tree-like structure but also the severe pixel imbalance among airway branches of different generations. To tackle the problems, we present a NaviAirway method which consists of a bronchiole-sensitive loss function for airway topology preservation and an iterative training strategy for accurate model learning across different airway generations. To supplement the features of airway branches learned by the model, we distill the knowledge from numerous unlabeled chest CT images in a teacher-student manner. Experimental results show that NaviAirway outperforms existing methods, particularly in the identification of higher-generation bronchioles and robustness to new CT scans. Moreover, NaviAirway is general enough to be combined with different backbone models to significantly improve their performance. NaviAirway can generate an airway roadmap for Navigation Bronchoscopy and can also be applied to other scenarios when segmenting fine and long tubular structures in biomedical images. The code is publicly available on https://github.com/AntonotnaWang/NaviAirway. △ Less

Submitted 16 June, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

arXiv:2202.12943 [pdf, other]

Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks Quantization

Authors: Hanshi Sun, Ao Wang, Ninghao Pu, Zhiqing Li, Junguang Huang, Hao Liu, Zhi Qi

Abstract: Cardiovascular disease (CVDs) is one of the universal deadly diseases, and the detection of it in the early stage is a challenging task to tackle. Recently, deep learning and convolutional neural networks have been employed widely for the classification of objects. Moreover, it is promising that lots of networks can be deployed on wearable devices. An increasing number of methods can be used to re… ▽ More Cardiovascular disease (CVDs) is one of the universal deadly diseases, and the detection of it in the early stage is a challenging task to tackle. Recently, deep learning and convolutional neural networks have been employed widely for the classification of objects. Moreover, it is promising that lots of networks can be deployed on wearable devices. An increasing number of methods can be used to realize ECG signal classification for the sake of arrhythmia detection. However, the existing neural networks proposed for arrhythmia detection are not hardware-friendly enough due to a remarkable quantity of parameters resulting in memory and power consumption. In this paper, we present a 1-D adaptive loss-aware quantization, achieving a high compression rate that reduces memory consumption by 23.36 times. In order to adapt to our compression method, we need a smaller and simpler network. We propose a 17 layer end-to-end neural network classifier to classify 17 different rhythm classes trained on the MIT-BIH dataset, realizing a classification accuracy of 93.5%, which is higher than most existing methods. Due to the adaptive bitwidth method making important layers get more attention and offered a chance to prune useless parameters, the proposed quantization method avoids accuracy degradation. It even improves the accuracy rate, which is 95.84%, 2.34% higher than before. Our study achieves a 1-D convolutional neural network with high performance and low resources consumption, which is hardware-friendly and illustrates the possibility of deployment on wearable devices to realize a real-time arrhythmia diagnosis. △ Less

Submitted 27 February, 2022; originally announced February 2022.

Comments: 7 pages, 7 figures

arXiv:2202.12806 [pdf, other]

doi 10.1364/OE.450321

Deep learning-assisted imaging through stationary scattering media

Authors: Siddharth Rawat, Jonathan Wendoloski, Anna Wang

Abstract: Imaging through scattering media is a challenging problem owing to speckle decorrelations from perturbations in the media itself. For in-line imaging modalities, which are appealing because they are compact, require no moving parts, and are robust, negating the effects of such scattering becomes particularly challenging. Here we explore the effect of stationary scattering media on light scattering… ▽ More Imaging through scattering media is a challenging problem owing to speckle decorrelations from perturbations in the media itself. For in-line imaging modalities, which are appealing because they are compact, require no moving parts, and are robust, negating the effects of such scattering becomes particularly challenging. Here we explore the effect of stationary scattering media on light scattering in in-line geometries, including digital holographic microscopy. We consider various object-scatterer scenarios where the object is distorted or obscured by additional stationary scatterers, and use an advanced deep learning (DL) generative methodology, generative adversarial networks (GANs), to mitigate the effects of the additional scatterers. Using light scattering simulations and experiments on objects of interest with and without additional scatterers, we find that conditional GANs can be quickly trained with minuscule datasets and can also efficiently learn the one-to-one statistical map** between the cross-domain input-output image pairs. Training such a network yields a standalone model, that can be used later to inverse or negate the effect of scattering, yielding clear object reconstructions for object retrieval and downstream processing. Moreover, it is well-known that the coherent point spread function (c-PSF) of a stationary scattering optical system is a speckle pattern which is spatially shift variant. We show that with rapid training using only 20 image pairs, it is possible to negate this undesired scattering to accurately localize diffraction-limited impulses with high spatial accuracy, therefore transforming the earlier shift variant system to a linear shift invariant (LSI) system. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: 4 figures

arXiv:2202.02701 [pdf, other]

Hyper-Convolutions via Implicit Kernels for Medical Imaging

Authors: Tianyu Ma, Alan Q. Wang, Adrian V. Dalca, Mert R. Sabuncu

Abstract: The convolutional neural network (CNN) is one of the most commonly used architectures for computer vision tasks. The key building block of a CNN is the convolutional kernel that aggregates information from the pixel neighborhood and shares weights across all pixels. A standard CNN's capacity, and thus its performance, is directly related to the number of learnable kernel weights, which is determin… ▽ More The convolutional neural network (CNN) is one of the most commonly used architectures for computer vision tasks. The key building block of a CNN is the convolutional kernel that aggregates information from the pixel neighborhood and shares weights across all pixels. A standard CNN's capacity, and thus its performance, is directly related to the number of learnable kernel weights, which is determined by the number of channels and the kernel size (support). In this paper, we present the \textit{hyper-convolution}, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates. Hyper-convolutions decouple kernel size from the total number of learnable parameters, enabling a more flexible architecture design. We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise. We provide our code here: \emph{https://github.com/tym002/Hyper-Convolution} △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2105.10559

arXiv:2112.10074 [pdf, other]

doi 10.59275/j.melba.2022-354b

QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Develo** scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS. △ Less

Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2112.05893 [pdf, other]

Hybrid Neural Networks for On-device Directional Hearing

Authors: Anran Wang, Maruchi Kim, Hao Zhang, Shyamnath Gollakota

Abstract: On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements. While neural nets can achieve significantly better performance than traditional beamformers, all existing models fall short of supporting low-latency causal inference on computationally-constrained wearables. We present DeepBeam, a hybrid model t… ▽ More On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements. While neural nets can achieve significantly better performance than traditional beamformers, all existing models fall short of supporting low-latency causal inference on computationally-constrained wearables. We present DeepBeam, a hybrid model that combines traditional beamformers with a custom lightweight neural net. The former reduces the computational burden of the latter and also improves its generalizability, while the latter is designed to further reduce the memory and computational overhead to enable real-time and low-latency operations. Our evaluation shows comparable performance to state-of-the-art causal inference models on synthetic data while achieving a 5x reduction of model size, 4x reduction of computation per second, 5x reduction in processing time and generalizing better to real hardware data. Further, our real-time hybrid model runs in 8 ms on mobile CPUs designed for low-power wearable devices and achieves an end-to-end latency of 17.5 ms. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Journal ref: AAAI 2022

arXiv:2110.09956 [pdf, other]

Food Odor Recognition via Multi-step Classification

Authors: Ang Xu, Tianzhang Cai, Dinghao Shen, Asher Wang

Abstract: Predicting food labels and freshness from its odor remains a decades-old task that requires a complicated algorithm combined with high sensitivity sensors. In this paper, we initiate a multi-step classifier, which firstly clusters food into four categories, then classifies the food label concerning the predicted category, and finally identifies the freshness. We use BME688 gas sensors packed with… ▽ More Predicting food labels and freshness from its odor remains a decades-old task that requires a complicated algorithm combined with high sensitivity sensors. In this paper, we initiate a multi-step classifier, which firstly clusters food into four categories, then classifies the food label concerning the predicted category, and finally identifies the freshness. We use BME688 gas sensors packed with BME AI studio for data collection and feature extraction. The normalized dataset was preprocessed with PCA and LDA. We evaluated the effectiveness of algorithms such as tree methods, MLP, and CNN through assessment indexes at each stage. We also carried out an ablation experiment to show the necessity and feasibility of the multi-step classifier. The results demonstrated the robustness and adaptability of the multi-step classifier. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2110.06694 [pdf, ps, other]

Joint Optimization of Beam-Hop** Design and NOMA-Assisted Transmission for Flexible Satellite Systems

Authors: Anyue Wang, Lei Lei, Eva Lagunas, Ana I. Perez-Neira, Symeon Chatzinotas, Bjorn Ottersten

Abstract: Next-generation satellite systems require more flexibility in resource management such that available radio resources can be dynamically allocated to meet time-varying and non-uniform traffic demands. Considering potential benefits of beam hop** (BH) and non-orthogonal multiple access (NOMA), we exploit the time-domain flexibility in multi-beam satellite systems by optimizing BH design, and enha… ▽ More Next-generation satellite systems require more flexibility in resource management such that available radio resources can be dynamically allocated to meet time-varying and non-uniform traffic demands. Considering potential benefits of beam hop** (BH) and non-orthogonal multiple access (NOMA), we exploit the time-domain flexibility in multi-beam satellite systems by optimizing BH design, and enhance the power-domain flexibility via NOMA. In this paper, we investigate the synergy and mutual influence of beam hop** and NOMA. We jointly optimize power allocation, beam scheduling, and terminal-timeslot assignment to minimize the gap between requested traffic demand and offered capacity. In the solution development, we formally prove the NP-hardness of the optimization problem. Next, we develop a bounding scheme to tightly gauge the global optimum and propose a suboptimal algorithm to enable efficient resource assignment. Numerical results demonstrate the benefits of combining NOMA and BH, and validate the superiority of the proposed BH-NOMA schemes over benchmarks. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2110.06634 [pdf, other]

End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

Authors: Yina Guo, Xiaofei Zhang, Zhenying Gong, Anhong Wang, Wenwu Wang

Abstract: In a recent study of auditory evoked potential (AEP) based brain-computer interface (BCI), it was shown that, with an encoder-decoder framework, it is possible to translate human neural activity to speech (T-CAS). However, current encoder-decoder-based methods achieve T-CAS often with a two-step method where the information is passed between the encoder and decoder with a shared dimension reductio… ▽ More In a recent study of auditory evoked potential (AEP) based brain-computer interface (BCI), it was shown that, with an encoder-decoder framework, it is possible to translate human neural activity to speech (T-CAS). However, current encoder-decoder-based methods achieve T-CAS often with a two-step method where the information is passed between the encoder and decoder with a shared dimension reduction vector, which may result in a loss of information. A potential approach to this problem is to design an end-to-end method by using a dual generative adversarial network (DualGAN) without dimension reduction of passing information, but it cannot realize one-to-one signal-to-signal translation (see Fig.1 (a) and (b)). In this paper, we propose an end-to-end model to translate human neural activity to speech directly, create a new electroencephalogram (EEG) datasets for participants with good attention by design a device to detect participants' attention, and introduce a dual-dual generative adversarial network (Dual-DualGAN) (see Fig. 1 (c) and (d)) to address an end-to-end translation of human neural activity to speech (ET-CAS) problem by group labelling EEG signals and speech signals, inserting a transition domain to realize cross-domain map**. In the transition domain, the transition signals are cascaded by the corresponding EEG and speech signals in a certain proportion, which can build bridges for EEG and speech signals without corresponding features, and realize one-to-one cross-domain EEG-to-speech translation. The proposed method can translate word-length and sentence-length sequences of neural activity to speech. Experimental evaluation has been conducted to show that the proposed method significantly outperforms state-of-the-art methods on both words and sentences of auditory stimulus. △ Less

Submitted 26 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: 12 pages, 13 figures

arXiv:2105.12718 [pdf]

Magnetic Particle Spectroscopy (MPS) with One-stage Lock-in Implementation for Magnetic Bioassays with Improved Sensitivities

Authors: Vinit Kumar Chugh, Kai Wu, Venkatramana D. Krishna, Arturo di Girolamo, Robert P. Bloom, Yongqiang Andrew Wang, Renata Saha, Shuang Liang, Maxim C-J Cheeran, Jian-** Wang

Abstract: In recent years, magnetic particle spectroscopy (MPS) has become a highly sensitive and versatile sensing technique for quantitative bioassays. It relies on the dynamic magnetic responses of magnetic nanoparticles (MNPs) for the detection of target analytes in liquid phase. There are many research studies reporting the application of MPS for detecting a variety of analytes including viruses, toxin… ▽ More In recent years, magnetic particle spectroscopy (MPS) has become a highly sensitive and versatile sensing technique for quantitative bioassays. It relies on the dynamic magnetic responses of magnetic nanoparticles (MNPs) for the detection of target analytes in liquid phase. There are many research studies reporting the application of MPS for detecting a variety of analytes including viruses, toxins, and nucleic acids, etc. Herein, we report a modified version of MPS platform with the addition of a one-stage lock-in design to remove the feedthrough signals induced by external driving magnetic fields, thus capturing only MNP responses for improved system sensitivity. This one-stage lock-in MPS system is able to detect as low as 781 ng multi-core Nanomag50 iron oxide MNPs (micromod Partikeltechnologie GmbH) and 78 ng single-core SHB30 iron oxide MNPs (Ocean NanoTech). In addition, using a streptavidin-biotin binding system as a proof-of-concept, we show that these single-core SHB30 MNPs can be used for Brownian relaxation-based bioassays while the multi-core Nanomag50 cannot be used. The effects of MNP amount on the concentration dependent response profiles for detecting streptavidin was also investigated. Results show that by using lower concentration/amount of MNPs, concentration-response curves shift to lower concentration/amount of target analytes. This lower concentrationresponse indicates the possibility of improved bioassay sensitivities by using lower amounts of MNPs. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: 26 Pages, 11 Figures

arXiv:2105.07961 [pdf, other]

Joint Optimization of Hadamard Sensing and Reconstruction in Compressed Sensing Fluorescence Microscopy

Authors: Alan Q. Wang, Aaron K. LaViolette, Leo Moon, Chris Xu, Mert R. Sabuncu

Abstract: Compressed sensing fluorescence microscopy (CS-FM) proposes a scheme whereby less measurements are collected during sensing and reconstruction is performed to recover the image. Much work has gone into optimizing the sensing and reconstruction portions separately. We propose a method of jointly optimizing both sensing and reconstruction end-to-end under a total measurement constraint, enabling lea… ▽ More Compressed sensing fluorescence microscopy (CS-FM) proposes a scheme whereby less measurements are collected during sensing and reconstruction is performed to recover the image. Much work has gone into optimizing the sensing and reconstruction portions separately. We propose a method of jointly optimizing both sensing and reconstruction end-to-end under a total measurement constraint, enabling learning of the optimal sensing scheme concurrently with the parameters of a neural network-based reconstruction network. We train our model on a rich dataset of confocal, two-photon, and wide-field microscopy images comprising of a variety of biological samples. We show that our method outperforms several baseline sensing schemes and a regularized regression reconstruction algorithm. △ Less

Submitted 9 July, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Accepted at MICCAI 2021

arXiv:2105.07153 [pdf, other]

Window-Level is a Strong Denoising Surrogate

Authors: Ayaan Haque, Adam Wang, Abdullah-Al-Zubaer Imran

Abstract: CT image quality is heavily reliant on radiation dose, which causes a trade-off between radiation dose and image quality that affects the subsequent image-based diagnostic performance. However, high radiation can be harmful to both patients and operators. Several (deep learning-based) approaches have been attempted to denoise low dose images. However, those approaches require access to large train… ▽ More CT image quality is heavily reliant on radiation dose, which causes a trade-off between radiation dose and image quality that affects the subsequent image-based diagnostic performance. However, high radiation can be harmful to both patients and operators. Several (deep learning-based) approaches have been attempted to denoise low dose images. However, those approaches require access to large training sets, specifically the full dose CT images for reference, which can often be difficult to obtain. Self-supervised learning is an emerging alternative for lowering the reference data requirement facilitating unsupervised learning. Currently available self-supervised CT denoising works are either dependent on foreign domain or pretexts are not very task-relevant. To tackle the aforementioned challenges, we propose a novel self-supervised learning approach, namely Self-Supervised Window-Leveling for Image DeNoising (SSWL-IDN), leveraging an innovative, task-relevant, simple, yet effective surrogate -- prediction of the window-leveled equivalent. SSWL-IDN leverages residual learning and a hybrid loss combining perceptual loss and MSE, all incorporated in a VAE framework. Our extensive (in- and cross-domain) experimentation demonstrates the effectiveness of SSWL-IDN in aggressive denoising of CT (abdomen and chest) images acquired at 5\% dose level only. △ Less

Submitted 15 May, 2021; originally announced May 2021.

Comments: 11 pages, 4 figures

arXiv:2104.04627 [pdf, other]

Accented Speech Recognition Inspired by Human Perception

Authors: Xiangyun Chu, Elizabeth Combs, Amber Wang, Michael Picheny

Abstract: While improvements have been made in automatic speech recognition performance over the last several years, machines continue to have significantly lower performance on accented speech than humans. In addition, the most significant improvements on accented speech primarily arise by overwhelming the problem with hundreds or even thousands of hours of data. Humans typically require much less data to… ▽ More While improvements have been made in automatic speech recognition performance over the last several years, machines continue to have significantly lower performance on accented speech than humans. In addition, the most significant improvements on accented speech primarily arise by overwhelming the problem with hundreds or even thousands of hours of data. Humans typically require much less data to adapt to a new accent. This paper explores methods that are inspired by human perception to evaluate possible performance improvements for recognition of accented speech, with a specific focus on recognizing speech with a novel accent relative to that of the training data. Our experiments are run on small, accessible datasets that are available to the research community. We explore four methodologies: pre-exposure to multiple accents, grapheme and phoneme-based pronunciations, dropout (to improve generalization to a novel accent), and the identification of the layers in the neural network that can specifically be associated with accent modeling. Our results indicate that methods based on human perception are promising in reducing WER and understanding how accented speech is modeled in neural networks for novel accents. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Submitted to INTERSPEECH 2021

arXiv:2101.12490 [pdf, other]

Moment-Based Exact Uncertainty Propagation Through Nonlinear Stochastic Autonomous Systems

Authors: Ashkan Jasour, Allen Wang, Brian C. Williams

Abstract: In this paper, we address the problem of uncertainty propagation through nonlinear stochastic dynamical systems. More precisely, given a discrete-time continuous-state probabilistic nonlinear dynamical system, we aim at finding the sequence of the moments of the probability distributions of the system states up to any desired order over the given planning horizon. Moments of uncertain states can b… ▽ More In this paper, we address the problem of uncertainty propagation through nonlinear stochastic dynamical systems. More precisely, given a discrete-time continuous-state probabilistic nonlinear dynamical system, we aim at finding the sequence of the moments of the probability distributions of the system states up to any desired order over the given planning horizon. Moments of uncertain states can be used in estimation, planning, control, and safety analysis of stochastic dynamical systems. Existing approaches to address moment propagation problems provide approximate descriptions of the moments and are mainly limited to particular set of uncertainties, e.g., Gaussian disturbances. In this paper, to describe the moments of uncertain states, we introduce trigonometric and also mixed-trigonometric-polynomial moments. Such moments allow us to obtain closed deterministic dynamical systems that describe the exact time evolution of the moments of uncertain states of an important class of autonomous and robotic systems including underwater, ground, and aerial vehicles, robotic arms and walking robots. Such obtained deterministic dynamical systems can be used, in a receding horizon fashion, to propagate the uncertainties over the planning horizon in real-time. To illustrate the performance of the proposed method, we benchmark our method against existing approaches including linear, unscented transformation, and sampling based uncertainty propagation methods that are widely used in estimation, prediction, planning, and control problems. △ Less

Submitted 29 January, 2021; originally announced January 2021.

Comments: This work has been submitted to the IEEE Transactions on Automatic Control

arXiv:2101.08136 [pdf]

doi 10.1007/s11433-021-1730-x

High-throughput fast full-color digital pathology based on Fourier ptychographic microscopy via color transfer

Authors: Yuting Gao, Jiurun Chen, Aiye Wang, An Pan, Caiwen Ma, Baoli Yao

Abstract: Full-color imaging is significant in digital pathology. Compared with a grayscale image or a pseudo-color image that only contains the contrast information, it can identify and detect the target object better with color texture information. Fourier ptychographic microscopy (FPM) is a high-throughput computational imaging technique that breaks the tradeoff between high resolution (HR) and large fie… ▽ More Full-color imaging is significant in digital pathology. Compared with a grayscale image or a pseudo-color image that only contains the contrast information, it can identify and detect the target object better with color texture information. Fourier ptychographic microscopy (FPM) is a high-throughput computational imaging technique that breaks the tradeoff between high resolution (HR) and large field-of-view (FOV), which eliminates the artifacts of scanning and stitching in digital pathology and improves its imaging efficiency. However, the conventional full-color digital pathology based on FPM is still time-consuming due to the repeated experiments with tri-wavelengths. A color transfer FPM approach, termed CFPM was reported. The color texture information of a low resolution (LR) full-color pathologic image is directly transferred to the HR grayscale FPM image captured by only a single wavelength. The color space of FPM based on the standard CIE-XYZ color model and display based on the standard RGB (sRGB) color space were established. Different FPM colorization schemes were analyzed and compared with thirty different biological samples. The average root-mean-square error (RMSE) of the conventional method and CFPM compared with the ground truth is 5.3% and 5.7%, respectively. Therefore, the acquisition time is significantly reduced by 2/3 with the sacrifice of precision of only 0.4%. And CFPM method is also compatible with advanced fast FPM approaches to reduce computation time further. △ Less

Submitted 19 January, 2021; originally announced January 2021.

Comments: 24 pages, 8 figures

Showing 1–50 of 73 results for author: Wang, A