Search | arXiv e-print repository

Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

Authors: Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W. Remedios, Shunxing Bao, Bennett A. Landman, Lee E. Wheless, Lori A. Coburn, Keith T. Wilson, Yaohong Wang, Shilin Zhao, Agnes B. Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo

Abstract: The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital… ▽ More The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation. Core Results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation. △ Less

Submitted 9 April, 2023; originally announced April 2023.

arXiv:2304.03760 [pdf, other]

Zero-shot CT Field-of-view Completion with Unconditional Generative Diffusion Prior

Authors: Kaiwen Xu, Aravind R. Krishnan, Thomas Z. Li, Yuankai Huo, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman

Abstract: Anatomically consistent field-of-view (FOV) completion to recover truncated body sections has important applications in quantitative analyses of computed tomography (CT) with limited FOV. Existing solution based on conditional generative models relies on the fidelity of synthetic truncation patterns at training phase, which poses limitations for the generalizability of the method to potential unkn… ▽ More Anatomically consistent field-of-view (FOV) completion to recover truncated body sections has important applications in quantitative analyses of computed tomography (CT) with limited FOV. Existing solution based on conditional generative models relies on the fidelity of synthetic truncation patterns at training phase, which poses limitations for the generalizability of the method to potential unknown types of truncation. In this study, we evaluate a zero-shot method based on a pretrained unconditional generative diffusion prior, where truncation pattern with arbitrary forms can be specified at inference phase. In evaluation on simulated chest CT slices with synthetic FOV truncation, the method is capable of recovering anatomically consistent body sections and subcutaneous adipose tissue measurement error caused by FOV truncation. However, the correction accuracy is inferior to the conditionally trained counterpart. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: Submitted to MIDL 2023, short paper track

arXiv:2304.02888 [pdf]

doi 10.1016/j.jallcom.2022.1681960925-8388

Microstructure and mechanical properties of mechanically-alloyed CoCrFeNi high-entropy alloys using low ball-to-powder ratio

Authors: A. Olejarz, W. Y. Huo, M. Zielinski, R. Diduszko, E. Wyszkowska, A. Kosinska, D. Kalita, I. Jozwik, M. Chmielewski, F. Fang, L. Kurpaska

Abstract: High-entropy alloys are extensively studied due to their very promising properties. However manufacturing methods currently used to prepare HEAs are complicated, costly, and likely non-industrially scalable processes. This limits their evolution and poses questions regarding the material's applicability in the future. Considering the abovementioned point, we developed a novel methodology for effic… ▽ More High-entropy alloys are extensively studied due to their very promising properties. However manufacturing methods currently used to prepare HEAs are complicated, costly, and likely non-industrially scalable processes. This limits their evolution and poses questions regarding the material's applicability in the future. Considering the abovementioned point, we developed a novel methodology for efficient HEA production using a low ball-to-powder ratio (BPR). Using different milling times, we manufactured four HEA powder precursors using a BPR of 5:1, which were later sintered via the Spark Plasma Sintering technique and heat treated. Microstructural characterization was performed by optical microscopy, Scanning Electron Microscopy equipped with EDS and EBSD detectors, and X-ray diffraction. Mechanical properties were measured using nano and microhardness techniques. In this work, we follow the structural evolution of the material and connect it with the strengthening effect as a function of milling time. Furthermore, we discuss the impact of different sintering and annealing conditions, proving that HEAs characterized by high mechanical properties may be manufactured using low BPR. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Journal ref: Volume 938, 25 March 2023, 168196

arXiv:2304.00216 [pdf, other]

Cross-scale Multi-instance Learning for Pathological Image Diagnosis

Authors: Ruining Deng, Can Cui, Lucas W. Remedios, Shunxing Bao, R. Michael Womick, Sophie Chiron, Jia Li, Joseph T. Roland, Ken S. Lau, Qi Liu, Keith T. Wilson, Yaohong Wang, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

Abstract: Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnifica… ▽ More Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnification) of WSIs, disregarding the vital inter-scale information that is key to diagnoses by human pathologists. In this study, we propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale relationships into a single MIL network for pathological image diagnosis. The contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL) algorithm that integrates the multi-scale information and the inter-scale relationships is proposed; (2) A toy dataset with scale-specific morphological features is created and released to examine and visualize differential cross-scale attention; (3) Superior performance on both in-house and public datasets is demonstrated by our simple cross-scale MIL strategy. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL. △ Less

Submitted 16 February, 2024; v1 submitted 31 March, 2023; originally announced April 2023.

arXiv:2303.16376 [pdf, other]

A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo

Abstract: Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture… ▽ More Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relies on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences. △ Less

Submitted 29 January, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.10674 [pdf]

URM4DMU: an user represention model for darknet markets users

Authors: Hongmeng Liu, Jiapeng Zhao, Yixuan Huo, Yuyan Wang, Chun Liao, Liyan Shen, Shiyao Cui, **qiao Shi

Abstract: Darknet markets provide a large platform for trading illicit goods and services due to their anonymity. Learning an invariant representation of each user based on their posts on different markets makes it easy to aggregate user information across different platforms, which helps identify anonymous users. Traditional user representation methods mainly rely on modeling the text information of posts… ▽ More Darknet markets provide a large platform for trading illicit goods and services due to their anonymity. Learning an invariant representation of each user based on their posts on different markets makes it easy to aggregate user information across different platforms, which helps identify anonymous users. Traditional user representation methods mainly rely on modeling the text information of posts and cannot capture the temporal content and the forum interaction of posts. While recent works mainly use CNN to model the text information of posts, failing to effectively model posts whose length changes frequently in an episode. To address the above problems, we propose a model named URM4DMU(User Representation Model for Darknet Markets Users) which mainly improves the post representation by augmenting convolutional operators and self-attention with an adaptive gate mechanism. It performs much better when combined with the temporal content and the forum interaction of posts. We demonstrate the effectiveness of URM4DMU on four darknet markets. The average improvements on MRR value and Recall@10 are 22.5% and 25.5% over the state-of-the-art method respectively. △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: 9pages

MSC Class: 62 (Primary); 54 (Secondary) ACM Class: I.2.7

arXiv:2303.07634 [pdf, other]

I$^2$-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs

Authors: **gsen Zhu, Yuchi Huo, Qi Ye, Fujun Luan, Jifan Li, Dianbing Xi, Lisha Wang, Rui Tang, Wei Hua, Hujun Bao, Rui Wang

Abstract: In this work, we present I$^2$-SDF, a new method for intrinsic indoor scene reconstruction and editing using differentiable Monte Carlo raytracing on neural signed distance fields (SDFs). Our holistic neural SDF-based framework jointly recovers the underlying shapes, incident radiance and materials from multi-view images. We introduce a novel bubble loss for fine-grained small objects and error-gu… ▽ More In this work, we present I$^2$-SDF, a new method for intrinsic indoor scene reconstruction and editing using differentiable Monte Carlo raytracing on neural signed distance fields (SDFs). Our holistic neural SDF-based framework jointly recovers the underlying shapes, incident radiance and materials from multi-view images. We introduce a novel bubble loss for fine-grained small objects and error-guided adaptive sampling scheme to largely improve the reconstruction quality on large-scale indoor scenes. Further, we propose to decompose the neural radiance field into spatially-varying material of the scene as a neural field through surface-based, differentiable Monte Carlo raytracing and emitter semantic segmentations, which enables physically based and photorealistic scene relighting and editing applications. Through a number of qualitative and quantitative experiments, we demonstrate the superior quality of our method on indoor scene reconstruction, novel view synthesis, and scene editing compared to state-of-the-art baselines. △ Less

Submitted 29 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023, project page: https://**gsenzhu.github.io/i2-sdf

arXiv:2303.05785 [pdf, other]

Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation

Authors: Ho Hin Lee, Quan Liu, Shunxing Bao, Qi Yang, Xin Yu, Leon Y. Cai, Thomas Li, Yuankai Huo, Xenofon Koutsoukos, Bennett A. Landman

Abstract: With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize tha… ▽ More With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score. △ Less

Submitted 5 June, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted to MICCAI 2023 (top 13.6%), both codes and pretrained models are available at: https://github.com/MASILab/RepUX-Net

arXiv:2302.06605 [pdf, other]

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Authors: Haoyu Lu, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Wei Zhan, Masayoshi Tomizuka, Mingyu Ding

Abstract: Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficie… ▽ More Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. Specifically, adapters are distributed to different modalities and their interactions, with the total number of tunable parameters reduced by partial weight sharing. The unified and knowledge-sharing design enables powerful cross-modal representations that can benefit various downstream tasks, requiring only 1.0%-2.0% tunable parameters of the pre-trained model. Extensive experiments on 6 cross-modal downstream benchmarks (including video-text retrieval, image-text retrieval, VideoQA, and VQA) show that in most cases, UniAdapter not only outperforms the state-of-the-arts, but even beats the full fine-tuning strategy. Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49.7% recall@1 with 2.2% model parameters, outperforming the latest competitors by 2.0%. The code and models are available at https://github.com/RERV/UniAdapter. △ Less

Submitted 21 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

arXiv:2302.00259 [pdf, ps, other]

doi 10.1016/j.aml.2023.108595

On the Schrödinger-Poisson system with $(p,q)$-Laplacian

Authors: Yueqiang Song, Yuanyuan Huo, Dušan D. Repovš

Abstract: We study a class of Schrödinger-Poisson systems with $(p,q)$-Laplacian. Using fixed point theory, we obtain a new existence result for nontrivial solutions. The main novelty of the paper is the combination of a double phase operator and the nonlocal term. Our results generalize some known results. We study a class of Schrödinger-Poisson systems with $(p,q)$-Laplacian. Using fixed point theory, we obtain a new existence result for nontrivial solutions. The main novelty of the paper is the combination of a double phase operator and the nonlocal term. Our results generalize some known results. △ Less

Submitted 1 February, 2023; originally announced February 2023.

MSC Class: 35J47; 35J60; 35R11

Journal ref: Appl. Math. Lett. 141 (2023), art. 108595, 6 pp

arXiv:2302.00133 [pdf, ps, other]

Sublinear Approximation Schemes for Scheduling Precedence Graphs of Bounded Depth

Authors: Bin Fu, Yumei Huo, Hairong Zhao

Abstract: We study the classical scheduling problem on parallel machines %with precedence constraints where the precedence graph has the bounded depth $h$. Our goal is to minimize the maximum completion time. We focus on develo** approximation algorithms that use only sublinear space or sublinear time. We develop the first one-pass streaming approximation schemes using sublinear space when all jobs' proce… ▽ More We study the classical scheduling problem on parallel machines %with precedence constraints where the precedence graph has the bounded depth $h$. Our goal is to minimize the maximum completion time. We focus on develo** approximation algorithms that use only sublinear space or sublinear time. We develop the first one-pass streaming approximation schemes using sublinear space when all jobs' processing times differ no more than a constant factor $c$ and the number of machines $m$ is at most $\tfrac {2n ε}{3 h c }$. This is so far the best approximation we can have in terms of $m$, since no polynomial time approximation better than $\tfrac{4}{3}$ exists when $m = \tfrac{n}{3}$ unless P=NP. %the problem cannot be approximated within a factor of $\tfrac{4}{3}$ when $m = \tfrac{n}{3}$ even if all jobs have equal processing time. The algorithms are then extended to the more general problem where the largest $αn$ jobs have no more than $c$ factor difference. % for some constant $0 < α\le 1$. We also develop the first sublinear time algorithms for both problems. For the more general problem, when $ m \le \tfrac { αn ε}{20 c^2 \cdot h } $, our algorithm is a randomized $(1+ε)$-approximation scheme that runs in sublinear time. This work not only provides an algorithmic solution to the studied problem under big data % and cloud computing environment, but also gives a methodological framework for designing sublinear approximation algorithms for other scheduling problems. △ Less

Submitted 31 January, 2023; originally announced February 2023.

arXiv:2301.01703 [pdf, other]

Technology Trends for Massive MIMO towards 6G

Authors: Yiming Huo, Xingqin Lin, Boya Di, Hongliang Zhang, Francisco Javier Lorca Hernando, Ahmet Serdar Tan, Shahid Mumtaz, Özlem Tuğfe Demir, Kun Chen-Hu

Abstract: At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing tren… ▽ More At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing trends for massive MIMO have gradually emerged in recent years, which are expected to reshape the future 6G wireless systems and networks. Specifically, the functions and performance of future massive MIMO systems will be enabled and enhanced via combining other innovative technologies, architectures, and strategies such as intelligent omni-surfaces (IOSs)/intelligent reflecting surfaces (IRSs), artificial intelligence (AI), THz communications, cell free architecture. Also, more diverse vertical applications based on massive MIMO will emerge and prosper, such as wireless localization and sensing, vehicular communications, non-terrestrial communications, remote sensing, inter-planetary communications. △ Less

Submitted 5 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 7 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2301.00584 [pdf, other]

doi 10.1093/biomet/asae010

Selective conformal inference with false coverage-statement rate control

Authors: Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou

Abstract: Conformal inference is a popular tool for constructing prediction intervals (PI). We consider here the scenario of post-selection/selective conformal inference, that is PIs are reported only for individuals selected from an unlabeled test data. To account for multiplicity, we develop a general split conformal framework to construct selective PIs with the false coverage-statement rate (FCR) control… ▽ More Conformal inference is a popular tool for constructing prediction intervals (PI). We consider here the scenario of post-selection/selective conformal inference, that is PIs are reported only for individuals selected from an unlabeled test data. To account for multiplicity, we develop a general split conformal framework to construct selective PIs with the false coverage-statement rate (FCR) control. We first investigate the Benjamini and Yekutieli (2005)'s FCR-adjusted method in the present setting, and show that it is able to achieve FCR control but yields uniformly inflated PIs. We then propose a novel solution to the problem, named as Selective COnditional conformal Predictions (SCOP), which entails performing selection procedures on both calibration set and test set and construct marginal conformal PIs on the selected sets by the aid of conditional empirical distribution obtained by the calibration set. Under a unified framework and exchangeable assumptions, we show that the SCOP can exactly control the FCR. More importantly, we provide non-asymptotic miscoverage bounds for a general class of selection procedures beyond exchangeablity and discuss the conditions under which the SCOP is able to control the FCR. As special cases, the SCOP with quantile-based selection or conformal p-values-based multiple testing procedures enjoys valid coverage guarantee under mild conditions. Numerical results confirm the effectiveness and robustness of SCOP in FCR control and show that it achieves more narrowed PIs over existing methods in many settings. △ Less

Submitted 12 March, 2024; v1 submitted 2 January, 2023; originally announced January 2023.

arXiv:2212.13126 [pdf, other]

doi 10.1016/j.scib.2023.03.022

Eliminating temporal correlation in quantum-dot entangled photon source by quantum interference

Authors: Run-Ze Liu, Yu-Kun Qiao, Han-Sen Zhong, Zhen-Xuan Ge, Hui Wang, Tung-Hsun Chung, Chao-Yang Lu, Yong-Heng Huo, Jian-Wei Pan

Abstract: Semiconductor quantum dots, as promising solid-state platform, have exhibited deterministic photon pair generation with high polarization entanglement f\textcompwordmark idelity for quantum information applications. However, due to temporal correlation from inherently cascaded emission, photon indistinguishability is limited, which restricts their potential scalability to multi-photon experiments.… ▽ More Semiconductor quantum dots, as promising solid-state platform, have exhibited deterministic photon pair generation with high polarization entanglement f\textcompwordmark idelity for quantum information applications. However, due to temporal correlation from inherently cascaded emission, photon indistinguishability is limited, which restricts their potential scalability to multi-photon experiments. Here, by utilizing quantum interferences to decouple polarization entanglement from temporal correlation, we improve multi-photon entanglement f\textcompwordmark idelity from $(58.7\pm 2.2)\%$ to $(75.5\pm 2.0)\%$. Our work paves the way to realize scalable and high-quality multi-photon states from quantum dots. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.08006 [pdf, other]

doi 10.1038/s41567-024-02530-z

Experimental quantum computational chemistry with optimised unitary coupled cluster ansatz

Authors: Shaojun Guo, **zhao Sun, Haoran Qian, Ming Gong, Yukun Zhang, Fusheng Chen, Yangsen Ye, Yulin Wu, Sirui Cao, Kun Liu, Chen Zha, Chong Ying, Qingling Zhu, He-Liang Huang, Youwei Zhao, Shaowei Li, Shiyu Wang, Jiale Yu, Dao** Fan, Dachao Wu, Hong Su, Hui Deng, Hao Rong, Yuan Li, Kaili Zhang , et al. (13 additional authors not shown)

Abstract: Quantum computational chemistry has emerged as an important application of quantum computing. Hybrid quantum-classical computing methods, such as variational quantum eigensolvers (VQE), have been designed as promising solutions to quantum chemistry problems, yet challenges due to theoretical complexity and experimental imperfections hinder progress in achieving reliable and accurate results. Exper… ▽ More Quantum computational chemistry has emerged as an important application of quantum computing. Hybrid quantum-classical computing methods, such as variational quantum eigensolvers (VQE), have been designed as promising solutions to quantum chemistry problems, yet challenges due to theoretical complexity and experimental imperfections hinder progress in achieving reliable and accurate results. Experimental works for solving electronic structures are consequently still restricted to nonscalable (hardware efficient) or classically simulable (Hartree-Fock) ansatz, or limited to a few qubits with large errors. The experimental realisation of scalable and high-precision quantum chemistry simulation remains elusive. Here, we address the critical challenges {associated with} solving molecular electronic structures using noisy quantum processors. Our protocol presents significant improvements in the circuit depth and running time, key metrics for chemistry simulation. Through systematic hardware enhancements and the integration of error mitigation techniques, we push forward the limit of experimental quantum computational chemistry and successfully scale up the implementation of VQE with an optimised unitary coupled-cluster ansatz to 12 qubits. We produce high-precision results of the ground-state energy for molecules with error suppression by around two orders of magnitude. We achieve chemical accuracy for H$_2$ at all bond distances and LiH at small bond distances in the experiment, even beyond the two recent concurrent works. Our work demonstrates a feasible path towards a scalable solution to electronic structure calculation, validating the key technological features and identifying future challenges for this goal. △ Less

Submitted 17 June, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: 11 pages, 4 figures in the main text, and 29 pages supplementary materials with 17 figures

arXiv:2212.00059 [pdf, other]

Single Slice Thigh CT Muscle Group Segmentation with Domain Adaptation and Self-Training

Authors: Qi Yang, Xin Yu, Ho Hin Lee, Leon Y. Cai, Kaiwen Xu, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Sokratis Makrogiannis, Luigi Ferrucci, Bennett A. Landman

Abstract: Objective: Thigh muscle group segmentation is important for assessment of muscle anatomy, metabolic disease and aging. Many efforts have been put into quantifying muscle tissues with magnetic resonance (MR) imaging including manual annotation of individual muscles. However, leveraging publicly available annotations in MR images to achieve muscle group segmentation on single slice computed tomograp… ▽ More Objective: Thigh muscle group segmentation is important for assessment of muscle anatomy, metabolic disease and aging. Many efforts have been put into quantifying muscle tissues with magnetic resonance (MR) imaging including manual annotation of individual muscles. However, leveraging publicly available annotations in MR images to achieve muscle group segmentation on single slice computed tomography (CT) thigh images is challenging. Method: We propose an unsupervised domain adaptation pipeline with self-training to transfer labels from 3D MR to single CT slice. First, we transform the image appearance from MR to CT with CycleGAN and feed the synthesized CT images to a segmenter simultaneously. Single CT slices are divided into hard and easy cohorts based on the entropy of pseudo labels inferenced by the segmenter. After refining easy cohort pseudo labels based on anatomical assumption, self-training with easy and hard splits is applied to fine tune the segmenter. Results: On 152 withheld single CT thigh images, the proposed pipeline achieved a mean Dice of 0.888(0.041) across all muscle groups including sartorius, hamstrings, quadriceps femoris and gracilis. muscles Conclusion: To our best knowledge, this is the first pipeline to achieve thigh imaging domain adaptation from MR to CT. The proposed pipeline is effective and robust in extracting muscle groups on 2D single slice CT thigh images.The container is available for public use at https://github.com/MASILab/DA_CT_muscle_seg △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.05436 [pdf, other]

Dynamic nanoindentation and short-range order in equiatomic NiCoCr medium entropy alloy lead to novel density wave ordering

Authors: A. Naghdi, F. J. Dominguez-Gutierrez, W. Y. Huo, K. Karimi, S. Papanikolaou

Abstract: Chemical short-range order (CSRO) is believed to be a key contributor to the exceptional properties of multicomponent alloys. However, direct validation and confirmation of CSRO has been highly elusive in most compounds. Recent studies for equiatomic NiCoCr alloys have shown that thermal treatments (i.e., annealing/aging) may facilitate and manipulate CSRO. In this work, by using molecular simulat… ▽ More Chemical short-range order (CSRO) is believed to be a key contributor to the exceptional properties of multicomponent alloys. However, direct validation and confirmation of CSRO has been highly elusive in most compounds. Recent studies for equiatomic NiCoCr alloys have shown that thermal treatments (i.e., annealing/aging) may facilitate and manipulate CSRO. In this work, by using molecular simulations, we show that nanomechanical probes, such as nanoindentation, may be utilized towards further manipulation of CSRO, providing explicit validation pathways. By using well established interatomic potentials, we perform hybrid Molecular-Dynamics/Monte-Carlo (MD/MC) at room temperature to demonstrate that particular dwell nanoindentation protocols can lead, through thermal MC equilibration, to the reorganization of CSRO under the indenter tip, to a density-wave stripe pattern (DWO). We characterize the novel DWO structures, that are directly correlated to incipient SRO but are highly anisotropic and dependent on local, nanoindentation-induced stress concentrations, and we show how they deeply originate from the peculiarities of the interatomic potentials. Furthermore, we show that the DWO patterns consistently scale up with the incipient plastic zone under the indenter tip, justifying the observation of the DWO patterning at any experimentally feasible nanoindentation depth. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 5 pages, 4 figures

arXiv:2211.03017 [pdf, other]

Learning-based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing

Authors: **gsen Zhu, Fujun Luan, Yuchi Huo, Zihao Lin, Zhihua Zhong, Dianbing Xi, Jiaxiang Zheng, Rui Tang, Hujun Bao, Rui Wang

Abstract: Indoor scenes typically exhibit complex, spatially-varying appearance from global illumination, making inverse rendering a challenging ill-posed problem. This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling. The framework takes a single image as input to jointly recover the underlying geometry, spa… ▽ More Indoor scenes typically exhibit complex, spatially-varying appearance from global illumination, making inverse rendering a challenging ill-posed problem. This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling. The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials. Specifically, we introduce a physically-based differentiable rendering layer with screen-space ray tracing, resulting in more realistic specular reflections that match the input photo. In addition, we create a large-scale, photorealistic indoor scene dataset with significantly richer details like complex furniture and dedicated decorations. Further, we design a novel out-of-view lighting network with uncertainty-aware refinement leveraging hypernetwork-based neural radiance fields to predict lighting outside the view of the input photo. Through extensive evaluations on common benchmark datasets, we demonstrate superior inverse rendering quality of our method compared to state-of-the-art baselines, enabling various applications such as complex object insertion and material editing with high fidelity. Code and data will be made available at \url{https://**gsenzhu.github.io/invrend}. △ Less

Submitted 23 November, 2022; v1 submitted 5 November, 2022; originally announced November 2022.

arXiv:2211.01254 [pdf, other]

CircleSnake: Instance Segmentation with Circle Representation

Authors: Ethan H. Nguyen, Haichun Yang, Zuhayr Asad, Ruining Deng, Agnes B. Fogo, Yuankai Huo

Abstract: Circle representation has recently been introduced as a medical imaging optimized representation for more effective instance object detection on ball-shaped medical objects. With its superior performance on instance detection, it is appealing to extend the circle representation to instance medical object segmentation. In this work, we propose CircleSnake, a simple end-to-end circle contour deforma… ▽ More Circle representation has recently been introduced as a medical imaging optimized representation for more effective instance object detection on ball-shaped medical objects. With its superior performance on instance detection, it is appealing to extend the circle representation to instance medical object segmentation. In this work, we propose CircleSnake, a simple end-to-end circle contour deformation-based segmentation method for ball-shaped medical objects. Compared to the prevalent DeepSnake method, our contribution is three-fold: (1) We replace the complicated bounding box to octagon contour transformation with a computation-free and consistent bounding circle to circle contour adaption for segmenting ball-shaped medical objects; (2) Circle representation has fewer degrees of freedom (DoF=2) as compared with the octagon representation (DoF=8), thus yielding a more robust segmentation performance and better rotation consistency; (3) To the best of our knowledge, the proposed CircleSnake method is the first end-to-end circle representation deep segmentation pipeline method with consistent circle detection, circle contour proposal, and circular convolution. The key innovation is to integrate the circular graph convolution with circle detection into an end-to-end instance segmentation framework, enabled by the proposed simple and consistent circle contour representation. Glomeruli are used to evaluate the performance of the benchmarks. From the results, CircleSnake increases the average precision of glomerular detection from 0.559 to 0.614. The Dice score increased from 0.804 to 0.849. The code has been released: https://github.com/hrlblab/CircleSnake △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: Machine Learning in Medical Imaging Workshop for 2022 MICCAI

arXiv:2210.16705 [pdf, other]

Distributed Swarm Learning for Internet of Things at the Edge: Where Artificial Intelligence Meets Biological Intelligence

Authors: Yue Wang, Zhi Tian, Xin Fan, Yan Huo, Cameron Nowzari, Kai Zeng

Abstract: With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm. Due to device restrictions and resource constraints, edge learning among massive IoT devices faces major technical challenges cause… ▽ More With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm. Due to device restrictions and resource constraints, edge learning among massive IoT devices faces major technical challenges caused by the communication bottleneck, data and device heterogeneity, non-convex optimization, privacy and security concerns, and dynamic environments. To overcome these challenges, this article studies a new framework of distributed swarm learning (DSL) through a holistic integration of artificial intelligence and biological swarm intelligence. Leveraging efficient and robust signal processing and communication techniques, DSL contributes to novel tools for learning and optimization tailored for real-time operations of large-scale IoT in edge wireless environments, which will benefit a wide range of edge IoT applications. △ Less

Submitted 29 October, 2022; originally announced October 2022.

arXiv:2210.09245 [pdf, other]

Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint

Authors: Haoming Li, Xinzhuo Lin, Yang Zhou, Xiang Li, Yuchi Huo, Jiming Chen, Qi Ye

Abstract: 3D grasp synthesis generates gras** poses given an input object. Existing works tackle the problem by learning a direct map** from objects to the distributions of gras** poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear map** between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency… ▽ More 3D grasp synthesis generates gras** poses given an input object. Existing works tackle the problem by learning a direct map** from objects to the distributions of gras** poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear map** between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the map** into two sequential stages by assuming that gras** poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a map** from the contact maps to the gras** poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics. △ Less

Submitted 6 May, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted at IJCAI 2023

arXiv:2210.08652 [pdf, other]

Adaptive Contrastive Learning with Dynamic Correlation for Multi-Phase Organ Segmentation

Authors: Ho Hin Lee, Yucheng Tang, Han Liu, Yubo Fan, Leon Y. Cai, Qi Yang, Xin Yu, Shunxing Bao, Yuankai Huo, Bennett A. Landman

Abstract: Recent studies have demonstrated the superior performance of introducing ``scan-wise" contrast labels into contrastive learning for multi-organ segmentation on multi-phase computed tomography (CT). However, such scan-wise labels are limited: (1) a coarse classification, which could not capture the fine-grained ``organ-wise" contrast variations across all organs; (2) the label (i.e., contrast phase… ▽ More Recent studies have demonstrated the superior performance of introducing ``scan-wise" contrast labels into contrastive learning for multi-organ segmentation on multi-phase computed tomography (CT). However, such scan-wise labels are limited: (1) a coarse classification, which could not capture the fine-grained ``organ-wise" contrast variations across all organs; (2) the label (i.e., contrast phase) is typically manually provided, which is error-prone and may introduce manual biases of defining phases. In this paper, we propose a novel data-driven contrastive loss function that adapts the similar/dissimilar contrast relationship between samples in each minibatch at organ-level. Specifically, as variable levels of contrast exist between organs, we hypothesis that the contrast differences in the organ-level can bring additional context for defining representations in the latent space. An organ-wise contrast correlation matrix is computed with mean organ intensities under one-hot attention maps. The goal of adapting the organ-driven correlation matrix is to model variable levels of feature separability at different phases. We evaluate our proposed approach on multi-organ segmentation with both non-contrast CT (NCCT) datasets and the MICCAI 2015 BTCV Challenge contrast-enhance CT (CECT) datasets. Compared to the state-of-the-art approaches, our proposed contrastive loss yields a substantial and significant improvement of 1.41% (from 0.923 to 0.936, p-value$<$0.01) and 2.02% (from 0.891 to 0.910, p-value$<$0.01) on mean Dice scores across all organs with respect to NCCT and CECT cohorts. We further assess the trained model performance with the MICCAI 2021 FLARE Challenge CECT datasets and achieve a substantial improvement of mean Dice score from 0.927 to 0.934 (p-value$<$0.01). The code is available at: https://github.com/MASILab/DCC_CL △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: 11 pages

arXiv:2210.07006 [pdf, other]

Sustainable Online Reinforcement Learning for Auto-bidding

Authors: Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, Bo Zheng

Abstract: Recently, auto-bidding technique has become an essential tool to increase the revenue of advertisers. Facing the complex and ever-changing bidding environments in the real-world advertising system (RAS), state-of-the-art auto-bidding policies usually leverage reinforcement learning (RL) algorithms to generate real-time bids on behalf of the advertisers. Due to safety concerns, it was believed that… ▽ More Recently, auto-bidding technique has become an essential tool to increase the revenue of advertisers. Facing the complex and ever-changing bidding environments in the real-world advertising system (RAS), state-of-the-art auto-bidding policies usually leverage reinforcement learning (RL) algorithms to generate real-time bids on behalf of the advertisers. Due to safety concerns, it was believed that the RL training process can only be carried out in an offline virtual advertising system (VAS) that is built based on the historical data generated in the RAS. In this paper, we argue that there exists significant gaps between the VAS and RAS, making the RL training process suffer from the problem of inconsistency between online and offline (IBOO). Firstly, we formally define the IBOO and systematically analyze its causes and influences. Then, to avoid the IBOO, we propose a sustainable online RL (SORL) framework that trains the auto-bidding policy by directly interacting with the RAS, instead of learning in the VAS. Specifically, based on our proof of the Lipschitz smooth property of the Q function, we design a safe and efficient online exploration (SER) policy for continuously collecting data from the RAS. Meanwhile, we derive the theoretical lower bound on the safety of the SER policy. We also develop a variance-suppressed conservative Q-learning (V-CQL) method to effectively and stably learn the auto-bidding policy with the collected data. Finally, extensive simulated and real-world experiments validate the superiority of our approach over the state-of-the-art auto-bidding algorithm. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2210.06536 [pdf, other]

doi 10.1109/JIOT.2022.3214471

Distributed Reconfigurable Intelligent Surfaces for Energy Efficient Indoor Terahertz Wireless Communications

Authors: Yiming Huo, Xiaodai Dong, Nuwan Ferdinand

Abstract: With the fifth-generation (5G) networks widely commercialized and fast deployed, the sixth-generation (6G) wireless communication is envisioned to provide competitive quality of service (QoS) in multiple aspects to global users. The critical and underlying research of the 6G is, firstly, highly dependent on the precise modeling and characterization of the wireless propagation when the spectrum is… ▽ More With the fifth-generation (5G) networks widely commercialized and fast deployed, the sixth-generation (6G) wireless communication is envisioned to provide competitive quality of service (QoS) in multiple aspects to global users. The critical and underlying research of the 6G is, firstly, highly dependent on the precise modeling and characterization of the wireless propagation when the spectrum is believed to expand to the terahertz (THz) domain. Moreover, future networks' power consumption and energy efficiency are critical factors to consider. In this research, based on a review of the fundamental mechanisms of reconfigurable intelligent surface (RIS) assisted wireless communications, we utilize the 3D ray-tracing method to analyze a realistic indoor THz propagation environment with the existence of human blockers. Furthermore, we propose a distributed RISs framework (DRF) to assist the indoor THz wireless communication to achieve overall energy efficiency. The numerical analysis of simulation results based on more than 2,900 indoor THz wireless communication sub-scenarios has demonstrated the significant efficacy of applying distributed RISs to overcome the mobile human blockage issue, improve the THz signal coverage, increase signal-to-noise ratios (SNRs), and QoS. With practical hardware design constraints investigated, we eventually envision how to utilize the existing integrated sensing and communication techniques to deploy and operate such a system in reality. Such a distributed RISs framework can also lay the foundation of efficient THz communications for Internet-of-Things (IoT) networks. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 15 Pages, 9 Figures, 2 Tables. To appear in the IEEE Internet of Things Journal

arXiv:2210.06246 [pdf, other]

CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm

Authors: Hongming Zhang, Yintong Huo, Yanai Elazar, Yangqiu Song, Yoav Goldberg, Dan Roth

Abstract: Recently, the community has achieved substantial progress on many commonsense reasoning benchmarks. However, it is still unclear what is learned from the training process: the knowledge, inference capability, or both? We argue that due to the large scale of commonsense knowledge, it is infeasible to annotate a large enough training set for each task to cover all commonsense for learning. Thus we s… ▽ More Recently, the community has achieved substantial progress on many commonsense reasoning benchmarks. However, it is still unclear what is learned from the training process: the knowledge, inference capability, or both? We argue that due to the large scale of commonsense knowledge, it is infeasible to annotate a large enough training set for each task to cover all commonsense for learning. Thus we should separate the commonsense knowledge acquisition and inference over commonsense knowledge as two separate tasks. In this work, we focus on investigating models' commonsense inference capabilities from two perspectives: (1) Whether models can know if the knowledge they have is enough to solve the task; (2) Whether models can develop commonsense inference capabilities that generalize across commonsense tasks. We first align commonsense tasks with relevant knowledge from commonsense knowledge bases and ask humans to annotate whether the knowledge is enough or not. Then, we convert different commonsense tasks into a unified question answering format to evaluate models' generalization capabilities. We name the benchmark as Commonsense Inference with Knowledge-in-the-loop Question Answering (CIKQA). △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.01346 [pdf, other]

doi 10.1109/ICRA48891.2023.10161428

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Authors: Anjun Chen, Xiangyu Wang, Kun Shi, Shaohao Zhu, Bin Fang, Yingfeng Chen, Jiming Chen, Yuchi Huo, Qi Ye

Abstract: 3D human reconstruction from RGB images achieves decent results in good weather conditions but degrades dramatically in rough weather. Complementary, mmWave radars have been employed to reconstruct 3D human joints and meshes in rough weather. However, combining RGB and mmWave signals for robust all-weather 3D human reconstruction is still an open challenge, given the sparse nature of mmWave and th… ▽ More 3D human reconstruction from RGB images achieves decent results in good weather conditions but degrades dramatically in rough weather. Complementary, mmWave radars have been employed to reconstruct 3D human joints and meshes in rough weather. However, combining RGB and mmWave signals for robust all-weather 3D human reconstruction is still an open challenge, given the sparse nature of mmWave and the vulnerability of RGB images. In this paper, we present ImmFusion, the first mmWave-RGB fusion solution to reconstruct 3D human bodies in all weather conditions robustly. Specifically, our ImmFusion consists of image and point backbones for token feature extraction and a Transformer module for token fusion. The image and point backbones refine global and local features from original data, and the Fusion Transformer Module aims for effective information fusion of two modalities by dynamically selecting informative tokens. Extensive experiments on a large-scale dataset, mmBody, captured in various environments demonstrate that ImmFusion can efficiently utilize the information of two modalities to achieve a robust 3D human body reconstruction in all weather conditions. In addition, our method's accuracy is significantly superior to that of state-of-the-art Transformer-based LiDAR-camera fusion methods. △ Less

Submitted 20 September, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: Accepted to ICRA2023, Project Page: https://chen3110.github.io/ImmFusion/index.html

arXiv:2210.00223 [pdf, other]

Contour-Aware Equipotential Learning for Semantic Segmentation

Authors: Xu Yin, Dongbo Min, Yuchi Huo, Sung-Eui Yoon

Abstract: With increasing demands for high-quality semantic segmentation in the industry, hard-distinguishing semantic boundaries have posed a significant threat to existing solutions. Inspired by real-life experience, i.e., combining varied observations contributes to higher visual recognition confidence, we present the equipotential learning (EPL) method. This novel module transfers the predicted/ground-t… ▽ More With increasing demands for high-quality semantic segmentation in the industry, hard-distinguishing semantic boundaries have posed a significant threat to existing solutions. Inspired by real-life experience, i.e., combining varied observations contributes to higher visual recognition confidence, we present the equipotential learning (EPL) method. This novel module transfers the predicted/ground-truth semantic labels to a self-defined potential domain to learn and infer decision boundaries along customized directions. The conversion to the potential domain is implemented via a lightweight differentiable anisotropic convolution without incurring any parameter overhead. Besides, the designed two loss functions, the point loss and the equipotential line loss implement anisotropic field regression and category-level contour learning, respectively, enhancing prediction consistencies in the inter/intra-class boundary areas. More importantly, EPL is agnostic to network architectures, and thus it can be plugged into most existing segmentation models. This paper is the first attempt to address the boundary segmentation problem with field regression and contour learning. Meaningful performance improvements on Pascal Voc 2012 and Cityscapes demonstrate that the proposed EPL module can benefit the off-the-shelf fully convolutional network models when recognizing semantic boundary areas. Besides, intensive comparisons and analysis show the favorable merits of EPL for distinguishing semantically-similar and irregular-shaped categories. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2209.15076 [pdf, other]

3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation

Authors: Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

Abstract: The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approache… ▽ More The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from $7\times7\times7$) to enable the larger global receptive fields, inspired by Swin Transformer. We further substitute the multi-layer perceptron (MLP) in Swin Transformer blocks with pointwise depth convolutions and enhance model performances with fewer normalization and activation layers, thus reducing the number of model parameters. 3D UX-Net competes favorably with current SOTA transformers (e.g. SwinUNETR) using three challenging public datasets on volumetric brain and abdominal imaging: 1) MICCAI Challenge 2021 FLARE, 2) MICCAI Challenge 2021 FeTA, and 3) MICCAI Challenge 2022 AMOS. 3D UX-Net consistently outperforms SwinUNETR with improvement from 0.929 to 0.938 Dice (FLARE2021) and 0.867 to 0.874 Dice (Feta2021). We further evaluate the transfer learning capability of 3D UX-Net with AMOS2022 and demonstrates another improvement of $2.27\%$ Dice (from 0.880 to 0.900). The source code with our proposed model are available at https://github.com/MASILab/3DUX-Net. △ Less

Submitted 1 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted to ICLR 2023

arXiv:2209.14467 [pdf, other]

doi 10.1007/978-3-031-16449-1_20

Reducing Positional Variance in Cross-sectional Abdominal CT Slices with Deep Conditional Generative Models

Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, LeonY. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

Abstract: 2D low-dose single-slice abdominal computed tomography (CT) slice enables direct measurements of body composition, which are critical to quantitatively characterizing health relationships on aging. However, longitudinal analysis of body composition changes using 2D abdominal slices is challenging due to positional variance between longitudinal slices acquired in different years. To reduce the posi… ▽ More 2D low-dose single-slice abdominal computed tomography (CT) slice enables direct measurements of body composition, which are critical to quantitatively characterizing health relationships on aging. However, longitudinal analysis of body composition changes using 2D abdominal slices is challenging due to positional variance between longitudinal slices acquired in different years. To reduce the positional variance, we extend the conditional generative models to our C-SliceGen that takes an arbitrary axial slice in the abdominal region as the condition and generates a defined vertebral level slice by estimating the structural changes in the latent space. Experiments on 1170 subjects from an in-house dataset and 50 subjects from BTCV MICCAI Challenge 2015 show that our model can generate high quality images in terms of realism and similarity. External experiments on 20 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset that contains longitudinal single abdominal slices validate that our method can harmonize the slice positional variance in terms of muscle and visceral fat area. Our approach provides a promising direction of map** slices from different vertebral levels to a target slice to reduce positional variance for single slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 11 pages, 4 figures

Journal ref: Medical Image Computing and Computer Assisted Intervention MICCAI 2022, Cham, 2022, pp202,212

arXiv:2209.14378 [pdf, other]

UNesT: Local Spatial Representation Learning with Hierarchical Transformer for Efficient Medical Segmentation

Authors: Xin Yu, Qi Yang, Yinchi Zhou, Leon Y. Cai, Riqiang Gao, Ho Hin Lee, Thomas Li, Shunxing Bao, Zhoubing Xu, Thomas A. Lasko, Richard G. Abramson, Zizhao Zhang, Yuankai Huo, Bennett A. Landman, Yucheng Tang

Abstract: Transformer-based models, capable of learning better global dependencies, have recently demonstrated exceptional representation learning capabilities in computer vision and medical image analysis. Transformer reformats the image into separate patches and realizes global communication via the self-attention mechanism. However, positional information between patches is hard to preserve in such 1D se… ▽ More Transformer-based models, capable of learning better global dependencies, have recently demonstrated exceptional representation learning capabilities in computer vision and medical image analysis. Transformer reformats the image into separate patches and realizes global communication via the self-attention mechanism. However, positional information between patches is hard to preserve in such 1D sequences, and loss of it can lead to sub-optimal performance when dealing with large amounts of heterogeneous tissues of various sizes in 3D medical image segmentation. Additionally, current methods are not robust and efficient for heavy-duty medical segmentation tasks such as predicting a large number of tissue classes or modeling globally inter-connected tissue structures. To address such challenges and inspired by the nested hierarchical structures in vision transformer, we proposed a novel 3D medical image segmentation method (UNesT), employing a simplified and faster-converging transformer encoder design that achieves local communication among spatially adjacent patch sequences by aggregating them hierarchically. We extensively validate our method on multiple challenging datasets, consisting of multiple modalities, anatomies, and a wide range of tissue classes, including 133 structures in the brain, 14 organs in the abdomen, 4 hierarchical components in the kidneys, inter-connected kidney tumors and brain tumors. We show that UNesT consistently achieves state-of-the-art performance and evaluate its generalizability and data efficiency. Particularly, the model achieves whole brain segmentation task complete ROI with 133 tissue classes in a single network, outperforming prior state-of-the-art method SLANT27 ensembled with 27 networks. △ Less

Submitted 7 September, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: 19 pages, 17 figures. arXiv admin note: text overlap with arXiv:2203.02430

arXiv:2209.11388 [pdf, other]

LGDN: Language-Guided Denoising Network for Video-Language Modeling

Authors: Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu

Abstract: Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video… ▽ More Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e.g., scenery shot, transition or teaser). Although a number of recent works deploy attention mechanism to alleviate this problem, the irrelevant/noisy information still makes it very difficult to address. To overcome such challenge, we thus propose an efficient and effective model, termed Language-Guided Denoising Network (LGDN), for video-language modeling. Different from most existing methods that utilize all extracted video frames, LGDN dynamically filters out the misaligned or redundant frames under the language supervision and obtains only 2--4 salient frames per video for cross-modal token-level alignment. Extensive experiments on five public datasets show that our LGDN outperforms the state-of-the-arts by large margins. We also provide detailed ablation study to reveal the critical importance of solving the noise issue, in hope of inspiring future video-language work. △ Less

Submitted 5 December, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: Accepted by NeurIPS2022

arXiv:2208.14357 [pdf, other]

Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

Authors: Tianyuan Yao, Chang Qu, Jun Long, Quan Liu, Ruining Deng, Yuanhan Tian, Jiachen Xu, Aadarsh Jha, Zuhayr Asad, Shunxing Bao, Mengyang Zhao, Agnes B. Fogo, Bennett A. Landman, Haichun Yang, Catie Chang, Yuankai Huo

Abstract: With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as… ▽ More With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/papers/2022:025.html. arXiv admin note: substantial text overlap with arXiv:2107.08650

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2208.07322 [pdf, other]

Cross-scale Attention Guided Multi-instance Learning for Crohn's Disease Diagnosis with Pathological Images

Authors: Ruining Deng, Can Cui, Lucas W. Remedios, Shunxing Bao, R. Michael Womick, Sophie Chiron, Jia Li, Joseph T. Roland, Ken S. Lau, Qi Liu, Keith T. Wilson, Yaohong Wang, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

Abstract: Multi-instance learning (MIL) is widely used in the computer-aided interpretation of pathological Whole Slide Images (WSIs) to solve the lack of pixel-wise or patch-wise annotations. Often, this approach directly applies "natural image driven" MIL algorithms which overlook the multi-scale (i.e. pyramidal) nature of WSIs. Off-the-shelf MIL algorithms are typically deployed on a single-scale of WSIs… ▽ More Multi-instance learning (MIL) is widely used in the computer-aided interpretation of pathological Whole Slide Images (WSIs) to solve the lack of pixel-wise or patch-wise annotations. Often, this approach directly applies "natural image driven" MIL algorithms which overlook the multi-scale (i.e. pyramidal) nature of WSIs. Off-the-shelf MIL algorithms are typically deployed on a single-scale of WSIs (e.g., 20x magnification), while human pathologists usually aggregate the global and local patterns in a multi-scale manner (e.g., by zooming in and out between different magnifications). In this study, we propose a novel cross-scale attention mechanism to explicitly aggregate inter-scale interactions into a single MIL network for Crohn's Disease (CD), which is a form of inflammatory bowel disease. The contribution of this paper is two-fold: (1) a cross-scale attention mechanism is proposed to aggregate features from different resolutions with multi-scale interaction; and (2) differential multi-scale attention visualizations are generated to localize explainable lesion patterns. By training ~250,000 H&E-stained Ascending Colon (AC) patches from 20 CD patient and 30 healthy control samples at different scales, our approach achieved a superior Area under the Curve (AUC) score of 0.8924 compared with baseline models. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2208.05578 [pdf, other]

CB-DSL: Communication-efficient and Byzantine-robust Distributed Swarm Learning on Non-i.i.d. Data

Authors: Xin Fan, Yue Wang, Yan Huo, Zhi Tian

Abstract: The valuable data collected by IoT devices in edge networks together with the resurgence of ML stimulate the latest trend of edge AI. However, recent FL methods face major challenges including communication bottleneck, data heterogeneity and security concerns in edge IoT scenarios, especially when being adopted for distributed learning among massive IoT devices equipped with limited data and trans… ▽ More The valuable data collected by IoT devices in edge networks together with the resurgence of ML stimulate the latest trend of edge AI. However, recent FL methods face major challenges including communication bottleneck, data heterogeneity and security concerns in edge IoT scenarios, especially when being adopted for distributed learning among massive IoT devices equipped with limited data and transmission resources. Meanwhile, the swarm nature of IoT systems is overlooked by most existing literature, which calls for new designs of distributed learning algorithms. Inspired by the success of biological intelligence (BI) of gregarious organisms, we propose a novel edge learning approach for swarm IoT, called communication-efficient and Byzantine-robust distributed swarm learning (CB-DSL), through a holistic integration of AI-enabled stochastic gradient descent and BI-enabled particle swarm optimization. To deal with non-i.i.d. data issues and Byzantine attacks, global data samples are introduced in CB-DSL and shared among IoT workers, which not only alleviates the local data heterogeneity effectively but also enables to fully utilize the exploration-exploitation mechanism of swarm intelligence. Further, we provide convergence analysis to theoretically demonstrate that the proposed CB-DSL is superior to the standard FL with better convergence behavior. In addition, to measure the effectiveness of the introduction of the globally shared dataset, we also evaluate the model divergence by deriving its upper bound, which is related to the distance between the data distribution at local IoT devices and the population distribution for the whole datasets. Numerical results verify that the proposed CB-DSL outperforms the existing benchmarks in terms of faster convergence speed, higher convergent accuracy, lower communication cost, and better robustness against non-i.i.d. data and Byzantine attacks. △ Less

Submitted 20 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: update theoretical and simulation results

arXiv:2207.06551 [pdf, other]

Body Composition Assessment with Limited Field-of-view Computed Tomography: A Semantic Image Extension Perspective

Authors: Kaiwen Xu, Thomas Li, Mirza S. Khan, Riqiang Gao, Sanja L. Antic, Yuankai Huo, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman

Abstract: Field-of-view (FOV) tissue truncation beyond the lungs is common in routine lung screening computed tomography (CT). This poses limitations for opportunistic CT- based body composition (BC) assessment as key anatomical structures are missing. Traditionally, extending the FOV of CT is considered as a CT reconstruction problem using limited data. However, this approach relies on the projection domai… ▽ More Field-of-view (FOV) tissue truncation beyond the lungs is common in routine lung screening computed tomography (CT). This poses limitations for opportunistic CT- based body composition (BC) assessment as key anatomical structures are missing. Traditionally, extending the FOV of CT is considered as a CT reconstruction problem using limited data. However, this approach relies on the projection domain data which might not be available in application. In this work, we formulate the problem from the semantic image extension perspective which only requires image data as inputs. The proposed two-stage method identifies a new FOV border based on the estimated extent of the complete body and imputes missing tissues in the truncated region. The training samples are simulated using CT slices with complete body in FOV, making the model development self-supervised. We evaluate the validity of the proposed method in automatic BC assessment using lung screening CT with limited FOV. The proposed method effectively restores the missing tissues and reduces BC assessment error introduced by FOV tissue truncation. In the BC assessment for a large-scale lung screening CT dataset, this correction improves both the intra-subject consistency and the correlation with anthropometric approximations. The developed method is available at https://github.com/MASILab/S-EFOV. △ Less

Submitted 15 April, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: Updated with additional evaluation and clarification

arXiv:2207.02651 [pdf]

How should the contact angle of a noncircular wetting boundary be described?

Authors: Jianhui Zhang, Xiaosheng Chen, Zhenzhen Gui, Zhenlin Chen, Mingdong Ma, Yuxuan Huo, Weirong Zhang, Fan Zhang, Xiaosi Zhou, Xi Huang

Abstract: For over 200 years, wettability has made significant contributions to understanding the properties of objects, advancing technological progress. Theoretical model of the contact angle (CA) for evaluating wettability has constantly been modified to address relevant emerging issues. However, these existing models disregard the difference in the CA along the contact line and use a single-point CA to… ▽ More For over 200 years, wettability has made significant contributions to understanding the properties of objects, advancing technological progress. Theoretical model of the contact angle (CA) for evaluating wettability has constantly been modified to address relevant emerging issues. However, these existing models disregard the difference in the CA along the contact line and use a single-point CA to evaluate the entire contact line. From this perspective, there is no reasonable explanation for noncircular wetting. Here, we reveal that noncircular wetting boundaries result from property differences in the surfaces along the boundary, and utilize friction as a comprehensive factor reflecting local wettability. Average CA is proposed to evaluate the contact line instead of the single-point CA, making the Cassie method and Wenzel method obsolete, which will take an average property of the whole surface as a weight coefficient of the single-point CA, ignoring the subordination between physical properties and roughness in systematics. △ Less

Submitted 3 July, 2022; originally announced July 2022.

arXiv:2207.00151 [pdf, other]

doi 10.1109/MC.2022.3160472

Space Broadband Access: The Race Has Just Begun

Authors: Yiming Huo

Abstract: Recent years have witnessed an exponential growth of the commercial space industry, including rocket launch, satellite network deployment, private space travel, and even extraterrestrial colonization. Several trends are predicted in this unprecedented transition to an era of space-enabled broadband access. Recent years have witnessed an exponential growth of the commercial space industry, including rocket launch, satellite network deployment, private space travel, and even extraterrestrial colonization. Several trends are predicted in this unprecedented transition to an era of space-enabled broadband access. △ Less

Submitted 6 May, 2022; originally announced July 2022.

Comments: 8 pages, 3 figures, 2 tables. Accepted by IEEE Magazine (https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=2)

arXiv:2206.13632 [pdf, other]

Omni-Seg: A Scale-aware Dynamic Network for Renal Pathological Image Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jun Long, Zuhayr Asad, R. Michael Womick, Zheyu Zhu, Agnes B. Fogo, Shilin Zhao, Haichun Yang, Yuankai Huo

Abstract: Comprehensive semantic segmentation on renal pathological images is challenging due to the heterogeneous scales of the objects. For example, on a whole slide image (WSI), the cross-sectional areas of glomeruli can be 64 times larger than that of the peritubular capillaries, making it impractical to segment both objects on the same patch, at the same scale. To handle this scaling issue, prior studi… ▽ More Comprehensive semantic segmentation on renal pathological images is challenging due to the heterogeneous scales of the objects. For example, on a whole slide image (WSI), the cross-sectional areas of glomeruli can be 64 times larger than that of the peritubular capillaries, making it impractical to segment both objects on the same patch, at the same scale. To handle this scaling issue, prior studies have typically trained multiple segmentation networks in order to match the optimal pixel resolution of heterogeneous tissue types. This multi-network solution is resource-intensive and fails to model the spatial relationship between tissue types. In this paper, we propose the Omni-Seg+ network, a scale-aware dynamic neural network that achieves multi-object (six tissue types) and multi-scale (5X to 40X scale) pathological image segmentation via a single neural network. The contribution of this paper is three-fold: (1) a novel scale-aware controller is proposed to generalize the dynamic neural network from single-scale to multi-scale; (2) semi-supervised consistency regularization of pseudo-labels is introduced to model the inter-scale correlation of unannotated tissue types into a single end-to-end learning paradigm; and (3) superior scale-aware generalization is evidenced by directly applying a model trained on human kidney images to mouse kidney images, without retraining. By learning from ~150,000 human pathological image patches from six tissue types at three different resolutions, our approach achieved superior segmentation performance according to human visual assessment and evaluation of image-omics (i.e., spatial transcriptomics). The official implementation is available at https://github.com/ddrrnn123/Omni-Seg. △ Less

Submitted 18 January, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.00123 [pdf, other]

Glo-In-One: Holistic Glomerular Detection, Segmentation, and Lesion Characterization with Large-scale Web Image Mining

Authors: Tianyuan Yao, Yuzhe Lu, Jun Long, Aadarsh Jha, Zheyu Zhu, Zuhayr Asad, Haichun Yang, Agnes B. Fogo, Yuankai Huo

Abstract: The quantitative detection, segmentation, and characterization of glomeruli from high-resolution whole slide imaging (WSI) play essential roles in the computer-assisted diagnosis and scientific research in digital renal pathology. Historically, such comprehensive quantification requires extensive programming skills in order to be able to handle heterogeneous and customized computational tools. To… ▽ More The quantitative detection, segmentation, and characterization of glomeruli from high-resolution whole slide imaging (WSI) play essential roles in the computer-assisted diagnosis and scientific research in digital renal pathology. Historically, such comprehensive quantification requires extensive programming skills in order to be able to handle heterogeneous and customized computational tools. To bridge the gap of performing glomerular quantification for non-technical users, we develop the Glo-In-One toolkit to achieve holistic glomerular detection, segmentation, and characterization via a single line of command. Additionally, we release a large-scale collection of 30,000 unlabeled glomerular images to further facilitate the algorithmic development of self-supervised deep learning. The inputs of the Glo-In-One toolkit are WSIs, while the outputs are (1) WSI-level multi-class circle glomerular detection results (which can be directly manipulated with ImageScope), (2) glomerular image patches with segmentation masks, and (3) different lesion types. To leverage the performance of the Glo-In-One toolkit, we introduce self-supervised deep learning to glomerular quantification via large-scale web image mining. The GGS fine-grained classification model achieved a decent performance compared with baseline supervised methods while only using 10% of the annotated data. The glomerular detection achieved an average precision of 0.627 with circle representations, while the glomerular segmentation achieved a 0.955 patch-wise Dice Similarity Coefficient (DSC). △ Less

Submitted 31 May, 2022; originally announced June 2022.

arXiv:2205.08567 [pdf, other]

Internet of Spacecraft for Multi-planetary Defense and Prosperity

Authors: Yiming Huo

Abstract: Recent years have seen unprecedentedly fast-growing prosperity in the commercial space industry. Several privately funded aerospace manufacturers, such as Space Exploration Technologies Corporation (SpaceX) and Blue Origin have innovated what we used to know about this capital-intense industry and gradually reshaped the future of human civilization. As private spaceflight and multi-planetary immig… ▽ More Recent years have seen unprecedentedly fast-growing prosperity in the commercial space industry. Several privately funded aerospace manufacturers, such as Space Exploration Technologies Corporation (SpaceX) and Blue Origin have innovated what we used to know about this capital-intense industry and gradually reshaped the future of human civilization. As private spaceflight and multi-planetary immigration gradually become realities from science fiction (sci-fi) and theory, both opportunities and challenges are presented. In this article, a review of the progress in space exploration and the underlying space technologies is firstly provided. For the next, a revisit and a prediction are paid and made to the K-Pg extinction event, the Chelyabinsk event, extra-terrestrialization, terraforming, planetary defense, including the emerging near-Earth object (NEO) observation and NEO impact avoidance technologies and strategies. Furthermore, a framework of the Solar Communication and Defense Networks (SCADN) with advanced algorithms and high efficacy is proposed to enable an internet of distributed deep-space sensing, communications, and defense to cope with disastrous incidents such as asteroid/comet impacts. Furthermore, the perspectives on the legislation, management, and supervision of founding the proposed SCADN are also discussed in depth. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Comments: 28 pages, 19 figures, submitted to a journal as an invited paper

arXiv:2205.05898 [pdf]

Pseudo-Label Guided Multi-Contrast Generalization for Non-Contrast Organ-Aware Segmentation

Authors: Ho Hin Lee, Yucheng Tang, Riqiang Gao, Qi Yang, Xin Yu, Shunxing Bao, James G. Terry, J. Jeffrey Carr, Yuankai Huo, Bennett A. Landman

Abstract: Non-contrast computed tomography (NCCT) is commonly acquired for lung cancer screening, assessment of general abdominal pain or suspected renal stones, trauma evaluation, and many other indications. However, the absence of contrast limits distinguishing organ in-between boundaries. In this paper, we propose a novel unsupervised approach that leverages pairwise contrast-enhanced CT (CECT) context t… ▽ More Non-contrast computed tomography (NCCT) is commonly acquired for lung cancer screening, assessment of general abdominal pain or suspected renal stones, trauma evaluation, and many other indications. However, the absence of contrast limits distinguishing organ in-between boundaries. In this paper, we propose a novel unsupervised approach that leverages pairwise contrast-enhanced CT (CECT) context to compute non-contrast segmentation without ground-truth label. Unlike generative adversarial approaches, we compute the pairwise morphological context with CECT to provide teacher guidance instead of generating fake anatomical context. Additionally, we further augment the intensity correlations in 'organ-specific' settings and increase the sensitivity to organ-aware boundary. We validate our approach on multi-organ segmentation with paired non-contrast & contrast-enhanced CT scans using five-fold cross-validation. Full external validations are performed on an independent non-contrast cohort for aorta segmentation. Compared with current abdominal organs segmentation state-of-the-art in fully supervised setting, our proposed pipeline achieves a significantly higher Dice by 3.98% (internal multi-organ annotated), and 8.00% (external aorta annotated) for abdominal organs segmentation. The code and pretrained models are publicly available at https://github.com/MASILab/ContrastMix. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2205.03050 [pdf, other]

Mechanisms of strength and hardening in austenitic stainless 310S steel: Nanoindentation experiments and multiscale modeling

Authors: F. J. Domínguez-Gutíerrez, K. Mulewska, A. Ustrzycka, R. Alvarez-Donado, A. Kosínska, W. Y. Huo, L. Kurpaska, I. Jozwik, S. Papanikolaou, M. Alava

Abstract: Austenitic stainless steels with low carbon have exceptional mechanical properties and are capable to reduce embrittlement, due to high chromium and nickel alloying, thus they are very attractive for efficient energy production in extreme environments. It is key to perform nanomechanical investigations of the role of chromium and the form of the particular alloy composition that give rise to the e… ▽ More Austenitic stainless steels with low carbon have exceptional mechanical properties and are capable to reduce embrittlement, due to high chromium and nickel alloying, thus they are very attractive for efficient energy production in extreme environments. It is key to perform nanomechanical investigations of the role of chromium and the form of the particular alloy composition that give rise to the excellent mechanical properties of steel. We perform nanoindentation experiments and molecular dynamics (MD) simulations of FCC austenitic stainless steel 310S, using established interatomic potentials, and we use a comparison to the plastic behavior of NiFe solid solutions under similar conditions for the elucidation of key dislocation mechanisms. We combine EBSD images to connect crystalline orientations to nanoindentation results, and provide input data to MD simulations for modeling mechanisms of defects nucleation and interactions. The maps of impressions after nanoindentation indicate that the Ni-Fe-Cr composition in 310S steel leads to strain localization and hardening. A detailed analysis of the dislocation dynamics at different depths leads to the development of an experimentally consistent Kocks-Mecking-based continuum multiscale model. Furthermore, the analysis of geometrically necessary dislocations (GND) shows to be responsible for exceptional hardness at low depths, predicted by the Ma-Clarke's constitutive model. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2204.07441 [pdf, other]

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

Authors: Haoyu Lu, Nanyi Fei, Yuqi Huo, Yizhao Gao, Zhiwu Lu, Ji-Rong Wen

Abstract: Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-stream methods like CLIP and ALIGN with high inference efficiency have also shown promising performance, however, they only consider instance-level alignment between the two streams (thus there is still room for i… ▽ More Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-stream methods like CLIP and ALIGN with high inference efficiency have also shown promising performance, however, they only consider instance-level alignment between the two streams (thus there is still room for improvement). To overcome these limitations, we propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval by enhancing cross-modal interaction. In addition to instance level alignment via momentum contrastive learning, we leverage two extra levels of cross-modal interactions in our COTS: (1) Token-level interaction - a masked visionlanguage modeling (MVLM) learning objective is devised without using a cross-stream network module, where variational autoencoder is imposed on the visual encoder to generate visual tokens for each image. (2) Task-level interaction - a KL-alignment learning objective is devised between text-to-image and image-to-text retrieval tasks, where the probability distribution per task is computed with the negative queues in momentum contrastive learning. Under a fair comparison setting, our COTS achieves the highest performance among all two-stream methods and comparable performance (but with 10,800X faster in inference) w.r.t. the latest single-stream methods. Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset. △ Less

Submitted 20 May, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: Accepted by CVPR2022

arXiv:2204.05575 [pdf, other]

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

Authors: Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, Zaiqing Nie

Abstract: Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related pr… ▽ More Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: CVPR2022

arXiv:2204.03237 [pdf, other]

Rule-based Procedural Tree Modeling Approach

Authors: Yinhui Yang, Rui Wang, Yuchi Huo

Abstract: In some entertainment and virtual reality applications, it is necessary to model and draw the real world realistically, so as to improve the fidelity of natural scenes and make users have a better sense of immersion. However, due to the morphological structure of trees The complexity and variety present many challenges for photorealistic modeling and rendering of trees. This paper reviews the prog… ▽ More In some entertainment and virtual reality applications, it is necessary to model and draw the real world realistically, so as to improve the fidelity of natural scenes and make users have a better sense of immersion. However, due to the morphological structure of trees The complexity and variety present many challenges for photorealistic modeling and rendering of trees. This paper reviews the progress achieved in photorealistic modeling and rendering of tree branches, leaves, and bark over the past few decades. The main achievement is mainly a rule-based procedural tree modeling method. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2204.01976 [pdf, other]

Streaming Approximation Scheme for Minimizing Total Completion Time on Parallel Machines Subject to Varying Processing Capacity

Authors: Bin Fu, Yumei Huo, Hairong Zhao

Abstract: We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where the input data is massive and cannot fit into memory and thus can only be scanned for a few passes. Our algorithm can compute the approximate value of the optimal total completion ti… ▽ More We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where the input data is massive and cannot fit into memory and thus can only be scanned for a few passes. Our algorithm can compute the approximate value of the optimal total completion time in one pass and output the schedule with the approximate value in two passes. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2204.01970 [pdf, other]

Streaming Algorithms for Multitasking Scheduling with Shared Processing

Authors: Bin Fu, Yumei Huo, Hairong Zhao

Abstract: In this paper, we design the first streaming algorithms for the problem of multitasking scheduling on parallel machines with shared processing. In one pass, our streaming approximation schemes can provide an approximate value of the optimal makespan. If the jobs can be read in two passes, the algorithm can find the schedule with the approximate value. This work not only provides an algorithmic big… ▽ More In this paper, we design the first streaming algorithms for the problem of multitasking scheduling on parallel machines with shared processing. In one pass, our streaming approximation schemes can provide an approximate value of the optimal makespan. If the jobs can be read in two passes, the algorithm can find the schedule with the approximate value. This work not only provides an algorithmic big data solution for the studied problem, but also gives an insight into the design of streaming algorithms for other problems in the area of scheduling. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2204.01859 [pdf, other]

Multitasking Scheduling with Shared Processing

Authors: Bin Fu, Yumei Huo, Hairong Zhao

Abstract: Recently, the problem of multitasking scheduling has attracted a lot of attention in the service industries where workers frequently perform multiple tasks by switching from one task to another. Hall, Leung and Li (Discrete Applied Mathematics 2016) proposed a shared processing multitasking scheduling model which allows a team to continue to work on the primary tasks while processing the routinely… ▽ More Recently, the problem of multitasking scheduling has attracted a lot of attention in the service industries where workers frequently perform multiple tasks by switching from one task to another. Hall, Leung and Li (Discrete Applied Mathematics 2016) proposed a shared processing multitasking scheduling model which allows a team to continue to work on the primary tasks while processing the routinely scheduled activities as they occur. The processing sharing is achieved by allocating a fraction of the processing capacity to routine jobs and the remaining fraction, which we denote as sharing ratio, to the primary jobs. In this paper, we generalize this model to parallel machines and allow the fraction of the processing capacity assigned to routine jobs to vary from one to another. The objectives are minimizing makespan and minimizing the total completion time. We show that for both objectives, there is no polynomial time approximation algorithm unless P=NP if the sharing ratios are arbitrary for all machines. Then we consider the problems where the sharing ratios on some machines have a constant lower bound. For each objective, we analyze the performance of the classical scheduling algorithms and their variations and then develop a polynomial time approximation scheme when the number of machines is a constant. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2203.15588 [pdf]

Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review

Authors: Can Cui, Haichun Yang, Yaohong Wang, Shilin Zhao, Zuhayr Asad, Lori A. Coburn, Keith T. Wilson, Bennett A. Landman, Yuankai Huo

Abstract: The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and… ▽ More The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions. △ Less

Submitted 26 January, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.14441 [pdf, other]

An Interactive Image-based Modeling System

Authors: Zhi He, Rui Wang, Wei Hua, Yuchi Huo

Abstract: This paper propose a interactive 3D modeling method and corresponding system based on single or multiple uncalibrated images. The main feature of this method is that, according to the modeling habits of ordinary people, the 3D model of the target is reconstructed from coarse to fine images. On the basis of determining the approximate shape, the user adds or modify projection constraints and spatia… ▽ More This paper propose a interactive 3D modeling method and corresponding system based on single or multiple uncalibrated images. The main feature of this method is that, according to the modeling habits of ordinary people, the 3D model of the target is reconstructed from coarse to fine images. On the basis of determining the approximate shape, the user adds or modify projection constraints and spatial constraints, and apply topology modification, gradually realize camera calibration, refine rough model, and finally complete the reconstruction of objects with arbitrary geometry and topology. During the interactive process, the geometric parameters and camera projection matrix are solved in real time, and the reconstruction results are displayed in a 3D window. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Showing 101–150 of 321 results for author: Huo, Y