Search | arXiv e-print repository

Real-time Rendering and Editing of Scattering Effects for Translucent Objects

Authors: Rui Wang, Wei Hua, Yuchi Huo, Hujun Bao

Abstract: The photorealistic rendering of the transparent effect of translucent objects is a hot research topic in recent years. A real-time photorealistic rendering and material dynamic editing method for the diffuse scattering effect of translucent objects is proposed based on the bidirectional surface scattering reflectance function's (BSSRDF) Dipole approximation. The diffuse scattering material functio… ▽ More The photorealistic rendering of the transparent effect of translucent objects is a hot research topic in recent years. A real-time photorealistic rendering and material dynamic editing method for the diffuse scattering effect of translucent objects is proposed based on the bidirectional surface scattering reflectance function's (BSSRDF) Dipole approximation. The diffuse scattering material function in the Dipo le approximation is decomposed into the product form of the shape-related function and the translucent material-related function through principal component analysis; using this decomposition representation, under the real-time photorealistic rendering framework of pre-radiative transmission and the scattering transmission to realize real-time editing of translucent object materials under various light sources. In addition, a method for quadratic wavelet compression of precomputed radiative transfer data in the spatial domain is also proposed. Using the correlation of surface points in the spatial distribution position, on the premise of ensuring the rendering quality, the data is greatly compressed and the rendering is efficiently improved. The experimental results show that the method in this paper can generate a highly realistic translucent effect and ensure the real-time rendering speed. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.11484 [pdf, other]

A Virtual Point Light Generation Method in Close-Range Area

Authors: Shihao **, Rui Wang, Wenting Zheng, Wei Hua, Yuchi Huo

Abstract: This paper proposes a new hybrid algorithm for sampling virtual point light (VPL). The indirect lighting calculation of the scene is used to distribute the VPL reasonably. In the process of generating VPL, we divide the scene into two parts according to the camera position and orientation. The close-range part: the part that the camera pays attention to. The distant-range part: the part that the c… ▽ More This paper proposes a new hybrid algorithm for sampling virtual point light (VPL). The indirect lighting calculation of the scene is used to distribute the VPL reasonably. In the process of generating VPL, we divide the scene into two parts according to the camera position and orientation. The close-range part: the part that the camera pays attention to. The distant-range part: the part that the camera does not pay attention to or rarely pays attention to. For the close-range part, we use a patch-based vPL sampling method to distribute the VPL as evenly as possible on the patch in the near-field area; for the distant-range part, we use sparse instant radiosity (IR) for sampling. It turns out that, in contrast to conventional multiple instant radiance Compared with the VPL generation algorithm, the method proposed in this paper can greatly improve the quality of the final result graph when the number of VPLs is the same; Under the same rendering quality, the rendering speed can be greatly improved. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.10521 [pdf, other]

Variational Hierarchical Directed Bounding Box Construction for Solid Mesh Models

Authors: Rui Wang, Wei Hua, Gaofeng Xu, Yuchi Huo, Hujun Bao

Abstract: Object oriented bounding box tree (OBB-Tree for short) has many applications in collision detection, real-time rendering, etc. It has a wide range of applications. The construction of the hierarchical directed bounding box of the solid mesh model is studied, and a new optimization solution method is proposed. But this part of the external space volume that does not belong to the solid mesh model i… ▽ More Object oriented bounding box tree (OBB-Tree for short) has many applications in collision detection, real-time rendering, etc. It has a wide range of applications. The construction of the hierarchical directed bounding box of the solid mesh model is studied, and a new optimization solution method is proposed. But this part of the external space volume that does not belong to the solid mesh model is used as the error, and an error calculation method based on hardware acceleration is given. Secondly, the hierarchical bounding box construction problem is transformed into a variational approximation problem, and the optimal hierarchical directed bounding box is obtained by solving the global error minimum. In the optimization calculation, we propose that combining Lloyd clustering iteration in the same layer and MultiGrid-like reciprocating iteration between layers. Compared with previous results, this method can generate aired original solid mesh models are more tightly packed with hierarchical directed bounding box approximation. In the practical application of collision detection, the results constructed using this method can reduce the computational time of collision detection and improve detection efficiency. △ Less

Submitted 20 March, 2022; originally announced March 2022.

arXiv:2203.04647 [pdf, other]

Normal and Visibility Estimation of Human Face from a Single Image

Authors: Fuzhi Zhong, Rui Wang, Yuchi Huo, Hujun Bao

Abstract: Recent work on the intrinsic image of humans starts to consider the visibility of incident illumination and encodes the light transfer function by spherical harmonics. In this paper, we show that such a light transfer function can be further decomposed into visibility and cosine terms related to surface normal. Such decomposition allows us to recover the surface normal in addition to visibility. W… ▽ More Recent work on the intrinsic image of humans starts to consider the visibility of incident illumination and encodes the light transfer function by spherical harmonics. In this paper, we show that such a light transfer function can be further decomposed into visibility and cosine terms related to surface normal. Such decomposition allows us to recover the surface normal in addition to visibility. We propose a deep learning-based approach with a reconstruction loss for training on real-world images. Results show that compared with previous works, the reconstruction of human face from our method better reveals the surface normal and shading details especially around regions where visibility effect is strong. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2203.04419 [pdf, other]

Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Demographic Data

Authors: Can Cui, Han Liu, Quan Liu, Ruining Deng, Zuhayr Asad, Yaohong WangShilin Zhao, Haichun Yang, Bennett A. Landman, Yuankai Huo

Abstract: Integrating cross-department multi-modal data (e.g., radiological, pathological, genomic, and clinical data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to lever… ▽ More Integrating cross-department multi-modal data (e.g., radiological, pathological, genomic, and clinical data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to leverage such a process to a more objective and quantitative manner. Unfortunately, the prior arts of using four modalities on brain cancer survival prediction are limited by a "complete modalities" setting (i.e., with all modalities available). Thus, there are still open questions on how to effectively predict brain cancer survival from the incomplete radiological, pathological, genomic, and demographic data (e.g., one or more modalities might not be collected for a patient). For instance, should we use both complete and incomplete data, and more importantly, how to use those data? To answer the preceding questions, we generalize the multi-modal learning on cross-department multi-modal data to a missing data setting. Our contribution is three-fold: 1) We introduce optimal multi-modal learning with missing data (MMD) pipeline with optimized hardware consumption and computational efficiency; 2) We extend multi-modal learning on radiological, pathological, genomic, and demographic data into missing data scenarios; 3) a large-scale public dataset (with 962 patients) is collected to systematically evaluate glioma tumor survival prediction using four modalities. The proposed method improved the C-index of survival prediction from 0.7624 to 0.8053. △ Less

Submitted 18 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

arXiv:2203.03870 [pdf, other]

Morphological Anti-Aliasing Method for Boundary Slope Prediction

Authors: Yuchen Zhong, Yuchi Huo, Rui Wang

Abstract: Image pixel aliasing caused by insufficient sampling is a long-standing problem in the field of computer graphics. It has always been the goal of researchers to seek anti-aliasing algorithms with high speed and good effect. Due to the deficiencies in local detection and reconstruction of slo** line boundaries, a morphological anti-aliasing method for boundary slope prediction is proposed. This m… ▽ More Image pixel aliasing caused by insufficient sampling is a long-standing problem in the field of computer graphics. It has always been the goal of researchers to seek anti-aliasing algorithms with high speed and good effect. Due to the deficiencies in local detection and reconstruction of slo** line boundaries, a morphological anti-aliasing method for boundary slope prediction is proposed. This method uses the information of the local line boundary slope to predict and test the end positions of the line boundary in the global scope, thereby reconstructing The boundary information more consistent with the actual boundary is obtained, and a more accurate linear boundary shape is obtained with only a small increase in the amount of calculation. Compared with the previous morphological anti-aliasing algorithm, the proposed method is based on the global morphological boundary. , can reconstruct the straight line boundary more accurately, and apply it to the anti-aliasing calculation, which can further improve the color transition of the straight line boundary, make the inclined straight line boundary have higher continuity, and obtain a better anti-aliasing effect. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2203.02430 [pdf, other]

Characterizing Renal Structures with 3D Block Aggregate Transformers

Authors: Xin Yu, Yucheng Tang, Yinchi Zhou, Riqiang Gao, Qi Yang, Ho Hin Lee, Thomas Li, Shunxing Bao, Yuankai Huo, Zhoubing Xu, Thomas A. Lasko, Richard G. Abramson, Bennett A. Landman

Abstract: Efficiently quantifying renal structures can provide distinct spatial context and facilitate biomarker discovery for kidney morphology. However, the development and evaluation of the transformer model to segment the renal cortex, medulla, and collecting system remains challenging due to data inefficiency. Inspired by the hierarchical structures in vision transformer, we propose a novel method usin… ▽ More Efficiently quantifying renal structures can provide distinct spatial context and facilitate biomarker discovery for kidney morphology. However, the development and evaluation of the transformer model to segment the renal cortex, medulla, and collecting system remains challenging due to data inefficiency. Inspired by the hierarchical structures in vision transformer, we propose a novel method using a 3D block aggregation transformer for segmenting kidney components on contrast-enhanced CT scans. We construct the first cohort of renal substructures segmentation dataset with 116 subjects under institutional review board (IRB) approval. Our method yields the state-of-the-art performance (Dice of 0.8467) against the baseline approach of 0.8308 with the data-efficient design. The Pearson R achieves 0.9891 between the proposed method and manual standards and indicates the strong correlation and reproducibility for volumetric analysis. We extend the proposed method to the public KiTS dataset, the method leads to improved accuracy compared to transformer-based approaches. We show that the 3D block aggregation transformer can achieve local communication between sequence representations without modifying self-attention, and it can serve as an accurate and efficient quantification tool for characterizing renal structures. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2202.12567 [pdf, ps, other]

Sparse Sampling and Completion for Light Transport in VPL-based Rendering

Authors: Yuchi Huo, Rui Wang, Xinguo Liu, Hujun Bao

Abstract: The many-light formulation provides a general framework for rendering various illumination effects using hundreds of thousands of virtual point lights (VPLs). To efficiently gather the contributions of the VPLs, lightcuts and its extensions cluster the VPLs, which implicitly approximates the lighting matrix with some representative blocks similar to vector quantization. In this paper, we propose a… ▽ More The many-light formulation provides a general framework for rendering various illumination effects using hundreds of thousands of virtual point lights (VPLs). To efficiently gather the contributions of the VPLs, lightcuts and its extensions cluster the VPLs, which implicitly approximates the lighting matrix with some representative blocks similar to vector quantization. In this paper, we propose a new approximation method based on the previous lightcut method and a low-rank matrix factorization model. As many researchers pointed out, the lighting matrix is low rank, which implies that it can be completed from a small set of known entries. We first generate a conservative global light cut with bounded error and partition the lighting matrix into slices by the coordinate and normal of the surface points using the method of lightslice. Then we perform two passes of randomly sampling on each matrix slice. In the first pass, uniformly distributed random entries are sampled to coarsen the global light cut, further clustering the similar light for the spatially localized surface points of the slices. In the second pass, more entries are sampled according to the possibility distribution function estimated from the first sampling result. Then each matrix slice is factorized into a product of two smaller low-rank matrices constrained by the sampled entries, which delivers a completion of the lighting matrix. The factorized form provides an additional speedup for adding up the matrix columns which is more GPU friendly. Compared with the previous lightcut based methods, we approximate the lighting matrix with some signal specialized bases via factorization. The experimental results shows that we can achieve significant acceleration than the state of the art many-light methods. △ Less

Submitted 9 March, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.05977 [pdf, other]

doi 10.1111/cgf.14338

Real-time Monte Carlo Denoising with Weight Sharing Kernel Prediction Network

Authors: Hangming Fan, Rui Wang, Yuchi Huo, Hujun Bao

Abstract: Real-time Monte Carlo denoising aims at removing severe noise under low samples per pixel (spp) in a strict time budget. Recently, kernel-prediction methods use a neural network to predict each pixel's filtering kernel and have shown a great potential to remove Monte Carlo noise. However, the heavy computation overhead blocks these methods from real-time applications. This paper expands the kernel… ▽ More Real-time Monte Carlo denoising aims at removing severe noise under low samples per pixel (spp) in a strict time budget. Recently, kernel-prediction methods use a neural network to predict each pixel's filtering kernel and have shown a great potential to remove Monte Carlo noise. However, the heavy computation overhead blocks these methods from real-time applications. This paper expands the kernel-prediction method and proposes a novel approach to denoise very low spp (e.g., 1-spp) Monte Carlo path traced images at real-time frame rates. Instead of using the neural network to directly predict the kernel map, i.e., the complete weights of each per-pixel filtering kernel, we predict an encoding of the kernel map, followed by a high-efficiency decoder with unfolding operations for a high-quality reconstruction of the filtering kernels. The kernel map encoding yields a compact single-channel representation of the kernel map, which can significantly reduce the kernel-prediction network's throughput. In addition, we adopt a scalable kernel fusion module to improve denoising quality. The proposed approach preserves kernel prediction methods' denoising quality while roughly halving its denoising time for 1-spp noisy inputs. In addition, compared with the recent neural bilateral grid-based real-time denoiser, our approach benefits from the high parallelism of kernel-based reconstruction and produces better denoising results at equal time. △ Less

Submitted 25 February, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

Journal ref: Computer Graphics Forum. 2021, 40(4): 15-27

arXiv:2202.00855 [pdf, ps, other]

Extension: Adaptive Sampling with Implicit Radiance Field

Authors: Yuchi Huo

Abstract: This manuscript discusses the extension of adaptive light field sampling with implicit radiance fields. This manuscript discusses the extension of adaptive light field sampling with implicit radiance fields. △ Less

Submitted 8 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

arXiv:2202.00087 [pdf, other]

Holistic Fine-grained GGS Characterization: From Detection to Unbalanced Classification

Authors: Yuzhe Lu, Haichun Yang, Zuhayr Asad, Zheyu Zhu, Tianyuan Yao, Jiachen Xu, Agnes B. Fogo, Yuankai Huo

Abstract: Recent studies have demonstrated the diagnostic and prognostic values of global glomerulosclerosis (GGS) in IgA nephropathy, aging, and end-stage renal disease. However, the fine-grained quantitative analysis of multiple GGS subtypes (e.g., obsolescent, solidified, and disappearing glomerulosclerosis) is typically a resource extensive manual process. Very few automatic methods, if any, have been d… ▽ More Recent studies have demonstrated the diagnostic and prognostic values of global glomerulosclerosis (GGS) in IgA nephropathy, aging, and end-stage renal disease. However, the fine-grained quantitative analysis of multiple GGS subtypes (e.g., obsolescent, solidified, and disappearing glomerulosclerosis) is typically a resource extensive manual process. Very few automatic methods, if any, have been developed to bridge this gap for such analytics. In this paper, we present a holistic pipeline to quantify GGS (with both detection and classification) from a whole slide image in a fully automatic manner. In addition, we conduct the fine-grained classification for the sub-types of GGS. Our study releases the open-source quantitative analytical tool for fine-grained GGS characterization while tackling the technical challenges in unbalanced classification and integrating detection and classification. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2201.11034 [pdf]

Meta-optic Accelerators for Object Classifiers

Authors: Hanyu Zheng, Quan Liu, You Zhou, Ivan I. Kravchenko, Yuankai Huo, Jason Valentine

Abstract: Rapid advances in deep learning have led to paradigm shifts in a number of fields, from medical image analysis to autonomous systems. These advances, however, have resulted in digital neural networks with large computational requirements, resulting in high energy consumption and limitations in real-time decision making when computation resources are limited. Here, we demonstrate a meta-optic based… ▽ More Rapid advances in deep learning have led to paradigm shifts in a number of fields, from medical image analysis to autonomous systems. These advances, however, have resulted in digital neural networks with large computational requirements, resulting in high energy consumption and limitations in real-time decision making when computation resources are limited. Here, we demonstrate a meta-optic based neural network accelerator that can off-load computationally expensive convolution operations into high-speed and low-power optics. In this architecture, metasurfaces enable both spatial multiplexing and additional information channels, such as polarization, in object classification. End-to-end design is used to co-optimize the optical and digital systems resulting in a robust classifier that achieves 95% accurate classification of handwriting digits and 94% accuracy in classifying both the digit and its polarization state. This approach could enable compact, high-speed, and low-power image and information processing systems for a wide range of applications in machine-vision and artificial intelligence. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 16 pages, 5 figures

arXiv:2201.05957 [pdf, other]

doi 10.1016/j.scib.2023.04.003

Quantum Neuronal Sensing of Quantum Many-Body States on a 61-Qubit Programmable Superconducting Processor

Authors: Ming Gong, He-Liang Huang, Shiyu Wang, Chu Guo, Shaowei Li, Yulin Wu, Qingling Zhu, Youwei Zhao, Shaojun Guo, Haoran Qian, Yangsen Ye, Chen Zha, Fusheng Chen, Chong Ying, Jiale Yu, Dao** Fan, Dachao Wu, Hong Su, Hui Deng, Hao Rong, Kaili Zhang, Sirui Cao, ** Lin, Yu Xu, Lihua Sun , et al. (11 additional authors not shown)

Abstract: Classifying many-body quantum states with distinct properties and phases of matter is one of the most fundamental tasks in quantum many-body physics. However, due to the exponential complexity that emerges from the enormous numbers of interacting particles, classifying large-scale quantum states has been extremely challenging for classical approaches. Here, we propose a new approach called quantum… ▽ More Classifying many-body quantum states with distinct properties and phases of matter is one of the most fundamental tasks in quantum many-body physics. However, due to the exponential complexity that emerges from the enormous numbers of interacting particles, classifying large-scale quantum states has been extremely challenging for classical approaches. Here, we propose a new approach called quantum neuronal sensing. Utilizing a 61 qubit superconducting quantum processor, we show that our scheme can efficiently classify two different types of many-body phenomena: namely the ergodic and localized phases of matter. Our quantum neuronal sensing process allows us to extract the necessary information coming from the statistical characteristics of the eigenspectrum to distinguish these phases of matter by measuring only one qubit. Our work demonstrates the feasibility and scalability of quantum neuronal sensing for near-term quantum processors and opens new avenues for exploring quantum many-body phenomena in larger-scale systems. △ Less

Submitted 20 November, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

Comments: 7 pages, 3 figures in the main text, and 13 pages, 13 figures, and 1 table in supplementary materials

Journal ref: Science Bulletin, 68(9):906-912 (2023)

arXiv:2201.04769 [pdf, other]

MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

Authors: Kaifeng Pang, Zuhayr Asad, Shilin Zhao, Yuankai Huo

Abstract: The prediction of microsatellite instability (MSI) and microsatellite stability (MSS) is essential in predicting both the treatment response and prognosis of gastrointestinal cancer. In clinical practice, a universal MSI testing is recommended, but the accessibility of such a test is limited. Thus, a more cost-efficient and broadly accessible tool is desired to cover the traditionally untested pat… ▽ More The prediction of microsatellite instability (MSI) and microsatellite stability (MSS) is essential in predicting both the treatment response and prognosis of gastrointestinal cancer. In clinical practice, a universal MSI testing is recommended, but the accessibility of such a test is limited. Thus, a more cost-efficient and broadly accessible tool is desired to cover the traditionally untested patients. In the past few years, deep-learning-based algorithms have been proposed to predict MSI directly from haematoxylin and eosin (H&E)-stained whole-slide images (WSIs). Such algorithms can be summarized as (1) patch-level MSI/MSS prediction, and (2) patient-level aggregation. Compared with the advanced deep learning approaches that have been employed for the first stage, only the naïve first-order statistics (e.g., averaging and counting) were employed in the second stage. In this paper, we propose a simple yet broadly generalizable patient-level MSI aggregation (MAg) method to effectively integrate the precious patch-level information. Briefly, the entire probabilistic distribution in the first stage is modeled as histogram-based features to be fused as the final outcome with machine learning (e.g., SVM). The proposed MAg method can be easily used in a plug-and-play manner, which has been evaluated upon five broadly used deep neural networks: ResNet, MobileNetV2, EfficientNet, Dpn and ResNext. From the results, the proposed MAg method consistently improves the accuracy of patient-level aggregation for two publicly available datasets. It is our hope that the proposed method could potentially leverage the low-cost H&E based MSI detection method. The code of our work has been made publicly available at https://github.com/Calvin-Pang/MAg. △ Less

Submitted 12 January, 2022; originally announced January 2022.

arXiv:2201.01459 [pdf, other]

doi 10.1145/3510003.3510158

ARCLIN: Automated API Mention Resolution for Unformatted Texts

Authors: Yintong Huo, Yuxin Su, Hongming Zhang, Michael R. Lyu

Abstract: Online technical forums (e.g., StackOverflow) are popular platforms for developers to discuss technical problems such as how to use specific Application Programming Interface (API), how to solve the programming tasks, or how to fix bugs in their codes. These discussions can often provide auxiliary knowledge of how to use the software that is not covered by the official documents. The automatic ext… ▽ More Online technical forums (e.g., StackOverflow) are popular platforms for developers to discuss technical problems such as how to use specific Application Programming Interface (API), how to solve the programming tasks, or how to fix bugs in their codes. These discussions can often provide auxiliary knowledge of how to use the software that is not covered by the official documents. The automatic extraction of such knowledge will support a set of downstream tasks like API searching or indexing. However, unlike official documentation written by experts, discussions in open forums are made by regular developers who write in short and informal texts, including spelling errors or abbreviations. There are three major challenges for the accurate APIs recognition and linking mentioned APIs from unstructured natural language documents to an entry in the API repository: (1) distinguishing API mentions from common words; (2) identifying API mentions without a fully qualified name; and (3) disambiguating API mentions with similar method names but in a different library. In this paper, to tackle these challenges, we propose an ARCLIN tool, which can effectively distinguish and link APIs without using human annotations. Specifically, we first design an API recognizer to automatically extract API mentions from natural language sentences by a Conditional Random Field (CRF) on the top of a Bi-directional Long Short-Term Memory (Bi-LSTM) module, then we apply a context-aware scoring mechanism to compute the mention-entry similarity for each entry in an API repository. Compared to previous approaches with heuristic rules, our proposed tool without manual inspection outperforms by 8% in a high-quality dataset Py-mention, which contains 558 mentions and 2,830 sentences from five popular Python libraries. △ Less

Submitted 20 April, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: Accepted by the 44th International Conference on Software Engineering (ICSE '22)

arXiv:2112.13505 [pdf, other]

doi 10.1103/PhysRevLett.129.030501

Realization of an Error-Correcting Surface Code with Superconducting Qubits

Authors: Youwei Zhao, Yangsen Ye, He-Liang Huang, Yiming Zhang, Dachao Wu, Huijie Guan, Qingling Zhu, Zuolin Wei, Tan He, Sirui Cao, Fusheng Chen, Tung-Hsun Chung, Hui Deng, Dao** Fan, Ming Gong, Cheng Guo, Shaojun Guo, Lianchen Han, Na Li, Shaowei Li, Yuan Li, Futian Liang, ** Lin, Haoran Qian, Hao Rong , et al. (14 additional authors not shown)

Abstract: Quantum error correction is a critical technique for transitioning from noisy intermediate-scale quantum (NISQ) devices to fully fledged quantum computers. The surface code, which has a high threshold error rate, is the leading quantum error correction code for two-dimensional grid architecture. So far, the repeated error correction capability of the surface code has not been realized experimental… ▽ More Quantum error correction is a critical technique for transitioning from noisy intermediate-scale quantum (NISQ) devices to fully fledged quantum computers. The surface code, which has a high threshold error rate, is the leading quantum error correction code for two-dimensional grid architecture. So far, the repeated error correction capability of the surface code has not been realized experimentally. Here, we experimentally implement an error-correcting surface code, the distance-3 surface code which consists of 17 qubits, on the \textit{Zuchongzhi} 2.1 superconducting quantum processor. By executing several consecutive error correction cycles, the logical error can be significantly reduced after applying corrections, achieving the repeated error correction of surface code for the first time. This experiment represents a fully functional instance of an error-correcting surface code, providing a key step on the path towards scalable fault-tolerant quantum computing. △ Less

Submitted 29 January, 2022; v1 submitted 26 December, 2021; originally announced December 2021.

Journal ref: Phys. Rev. Lett. 129, 030501 (2022)

arXiv:2112.12665 [pdf, other]

Omni-Seg: A Single Dynamic Network for Multi-label Renal Pathology Image Segmentation using Partially Labeled Data

Authors: Ruining Deng, Quan Liu, Can Cui, Zuhayr Asad, Haichun Yang, Yuankai Huo

Abstract: Computer-assisted quantitative analysis on Giga-pixel pathology images has provided a new avenue in histology examination. The innovations have been largely focused on cancer pathology (i.e., tumor segmentation and characterization). In non-cancer pathology, the learning algorithms can be asked to examine more comprehensive tissue types simultaneously, as a multi-label setting. The prior arts typi… ▽ More Computer-assisted quantitative analysis on Giga-pixel pathology images has provided a new avenue in histology examination. The innovations have been largely focused on cancer pathology (i.e., tumor segmentation and characterization). In non-cancer pathology, the learning algorithms can be asked to examine more comprehensive tissue types simultaneously, as a multi-label setting. The prior arts typically needed to train multiple segmentation networks in order to match the domain-specific knowledge for heterogeneous tissue types (e.g., glomerular tuft, glomerular unit, proximal tubular, distal tubular, peritubular capillaries, and arteries). In this paper, we propose a dynamic single segmentation network (Omni-Seg) that learns to segment multiple tissue types using partially labeled images (i.e., only one tissue type is labeled for each training image) for renal pathology. By learning from ~150,000 patch-wise pathological images from six tissue types, the proposed Omni-Seg network achieved superior segmentation accuracy and less resource consumption when compared to the previous the multiple-network and multi-head design. In the testing stage, the proposed method obtains "completely labeled" tissue segmentation results using only "partially labeled" training images. The source code is available at https://github.com/ddrrnn123/Omni-Seg △ Less

Submitted 23 March, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.12636 [pdf, other]

SemParser: A Semantic Parser for Log Analysis

Authors: Yintong Huo, Yuxin Su, Cheryl Lee, Michael R. Lyu

Abstract: Logs, being run-time information automatically generated by software, record system events and activities with their timestamps. Before obtaining more insights into the run-time status of the software, a fundamental step of log analysis, called log parsing, is employed to extract structured templates and parameters from the semi-structured raw log messages. However, current log parsers are all syn… ▽ More Logs, being run-time information automatically generated by software, record system events and activities with their timestamps. Before obtaining more insights into the run-time status of the software, a fundamental step of log analysis, called log parsing, is employed to extract structured templates and parameters from the semi-structured raw log messages. However, current log parsers are all syntax-based and regard each message as a character string, ignoring the semantic information included in parameters and templates. Thus, we propose the semantic-based parser SemParser to unlock the critical bottleneck of mining semantics from log messages. It contains two steps, an end-to-end semantic miner and a joint parser. Specifically, the first step aims to identify explicit semantics inside a single log, and the second step is responsible for jointly inferring implicit semantics and computing structural outputs based on the contextual knowledge base. To analyze the effectiveness of our semantic parser, we first demonstrate that it can derive rich semantics from log messages collected from six widely-applied systems with an average F1 score of 0.985. Then, we conduct two representative downstream tasks, showing that current downstream models improve their performance with appropriately extracted semantics by 1.2%-11.7% and 8.65% on two anomaly detection datasets and a failure identification dataset, respectively. We believe these findings provide insights into semantically understanding log messages for the log analysis community. △ Less

Submitted 5 February, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: The paper was accepted by ICSE 2023

arXiv:2112.06437 [pdf, other]

Semi-Supervised Contrastive Learning for Remote Sensing: Identifying Ancient Urbanization in the South Central Andes

Authors: Jiachen Xu, Junlin Guo, James Zimmer-Dauphinee, Quan Liu, Yuxuan Shi, Zuhayr Asad, D. Mitchell Wilkes, Parker VanValkenburgh, Steven A. Wernke, Yuankai Huo

Abstract: Archaeology has long faced fundamental issues of sampling and scalar representation. Traditionally, the local-to-regional-scale views of settlement patterns are produced through systematic pedestrian surveys. Recently, systematic manual survey of satellite and aerial imagery has enabled continuous distributional views of archaeological phenomena at interregional scales. However, such 'brute force'… ▽ More Archaeology has long faced fundamental issues of sampling and scalar representation. Traditionally, the local-to-regional-scale views of settlement patterns are produced through systematic pedestrian surveys. Recently, systematic manual survey of satellite and aerial imagery has enabled continuous distributional views of archaeological phenomena at interregional scales. However, such 'brute force' manual imagery survey methods are both time- and labor-intensive, as well as prone to inter-observer differences in sensitivity and specificity. The development of self-supervised learning methods offers a scalable learning scheme for locating archaeological features using unlabeled satellite and historical aerial images. However, archaeological features are generally only visible in a very small proportion relative to the landscape, while the modern contrastive-supervised learning approach typically yields an inferior performance on highly imbalanced datasets. In this work, we propose a framework to address this long-tail problem. As opposed to the existing contrastive learning approaches that treat the labelled and unlabeled data separately, our proposed method reforms the learning paradigm under a semi-supervised setting in order to utilize the precious annotated data (<7% in our setting). Specifically, the highly unbalanced nature of the data is employed as the prior knowledge in order to form pseudo negative pairs by ranking the similarities between unannotated image patches and annotated anchor images. In this study, we used 95,358 unlabeled images and 5,830 labelled images in order to solve the issues associated with detecting ancient buildings from a long-tailed satellite image dataset. From the results, our semi-supervised contrastive learning model achieved a promising testing balanced accuracy of 79.0%, which is a 3.8% improvement as compared to other state-of-the-art approaches. △ Less

Submitted 15 April, 2023; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2112.04489 [pdf, other]

Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning

Authors: Alessa Hering, Lasse Hansen, Tony C. W. Mok, Albert C. S. Chung, Hanna Siebert, Stephanie Häger, Annkristin Lange, Sven Kuckertz, Stefan Heldmann, Wei Shao, Sulaiman Vesal, Mirabela Rusu, Geoffrey Sonn, Théo Estienne, Maria Vakalopoulou, Luyi Han, Yunzhi Huang, Pew-Thian Yap, Mikael Brudfors, Yaël Balbastre, Samuel Joutard, Marc Modat, Gal Lifshitz, Dan Raviv, **xin Lv , et al. (28 additional authors not shown)

Abstract: Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing… ▽ More Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing approaches. The Learn2Reg challenge addresses these limitations by providing a multi-task medical image registration data set for comprehensive characterisation of deformable registration algorithms. A continuous evaluation will be possible at https://learn2reg.grand-challenge.org. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation. We established an easily accessible framework for training and validation of 3D registration methods, which enabled the compilation of results of over 65 individual method submissions from more than 20 unique teams. We used a complementary set of metrics, including robustness, accuracy, plausibility, and runtime, enabling unique insight into the current state-of-the-art of medical image registration. This paper describes datasets, tasks, evaluation methods and results of the challenge, as well as results of further analysis of transferability to new datasets, the importance of label supervision, and resulting bias. While no single approach worked best across all tasks, many methodological aspects could be identified that push the performance of medical image registration to new state-of-the-art performance. Furthermore, we demystified the common belief that conventional registration methods have to be much slower than deep-learning-based methods. △ Less

Submitted 7 October, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

arXiv:2110.14378 [pdf, other]

doi 10.1038/s41467-022-30761-2

Towards artificial general intelligence via a multimodal foundation model

Authors: Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, **gyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, Hao Sun, Ji-Rong Wen

Abstract: The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly… ▽ More The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that promising results can be obtained on a wide range of downstream tasks. Particularly, with the developed model-interpretability tools, we demonstrate that strong imagination ability is now possessed by our foundation model. We believe that our work makes a transformative stride towards AGI, from our common practice of "weak or narrow AI" to that of "strong or generalized AI". △ Less

Submitted 8 June, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Published by Nature Communications, see https://www.nature.com/articles/s41467-022-30761-2

arXiv:2110.12093 [pdf, other]

doi 10.1109/TMI.2021.3122835

Circle Representation for Medical Object Detection

Authors: Ethan H. Nguyen, Haichun Yang, Ruining Deng, Yuzhe Lu, Zheyu Zhu, Joseph T. Roland, Le Lu, Bennett A. Landman, Agnes B. Fogo, Yuankai Huo

Abstract: Box representation has been extensively used for object detection in computer vision. Such representation is efficacious but not necessarily optimized for biomedical objects (e.g., glomeruli), which play an essential role in renal pathology. In this paper, we propose a simple circle representation for medical object detection and introduce CircleNet, an anchor-free detection framework. Compared wi… ▽ More Box representation has been extensively used for object detection in computer vision. Such representation is efficacious but not necessarily optimized for biomedical objects (e.g., glomeruli), which play an essential role in renal pathology. In this paper, we propose a simple circle representation for medical object detection and introduce CircleNet, an anchor-free detection framework. Compared with the conventional bounding box representation, the proposed bounding circle representation innovates in three-fold: (1) it is optimized for ball-shaped biomedical objects; (2) The circle representation reduced the degree of freedom compared with box representation; (3) It is naturally more rotation invariant. When detecting glomeruli and nuclei on pathological images, the proposed circle representation achieved superior detection performance and be more rotation-invariant, compared with the bounding box. The code has been made publicly available: https://github.com/hrlblab/CircleNet △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 10 pages, 8 figures, to be published in IEEE Transactions on Medical Imaging

Journal ref: in IEEE Transactions on Medical Imaging, vol. 41, no. 3, pp. 746-754, March 2022

arXiv:2110.10219 [pdf, other]

Power Line Communication and Sensing Using Time Series Forecasting

Authors: Yinjia Huo, Gautham Prasad, Lutz Lampe, Victor C. M. Leung

Abstract: Smart electrical grids rely on data communication to support their operation and on sensing for diagnostics and maintenance. Usually, the roles of communication and sensing equipment are different, i.e., communication equipment does not participate in sensing tasks and vice versa. Power line communication (PLC) offers a cost-effective solution for joint communication and sensing for smart grids. T… ▽ More Smart electrical grids rely on data communication to support their operation and on sensing for diagnostics and maintenance. Usually, the roles of communication and sensing equipment are different, i.e., communication equipment does not participate in sensing tasks and vice versa. Power line communication (PLC) offers a cost-effective solution for joint communication and sensing for smart grids. This is because the high-frequency PLC signals used for data communication also reveal detailed information regarding the health of the power lines that they travel through. Traditional PLC-based power line or cable diagnostic solutions are dependent on prior knowledge of the cable type, network topology, and/or characteristics of the anomalies. In this paper, we develop a power line sensing technique that can detect various types of cable anomalies without any prior domain knowledge. To this end, we design a solution that first uses time-series forecasting to predict the PLC channel state information at any given point in time based on its historical data. Under the approximation that the prediction error follows a Gaussian distribution, we then perform chi-squared statistical test to build an anomaly detector which identifies the occurrence of a cable fault. We demonstrate the effectiveness and universality of our sensing solution via evaluations conducted using both synthetic and real-world data extracted from low- and medium-voltage distribution networks. △ Less

Submitted 15 July, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

arXiv:2110.09660 [pdf, other]

doi 10.1109/JIOT.2022.3164339

BEV-SGD: Best Effort Voting SGD for Analog Aggregation Based Federated Learning against Byzantine Attackers

Authors: Xin Fan, Yue Wang, Yan Huo, Zhi Tian

Abstract: As a promising distributed learning technology, analog aggregation based federated learning over the air (FLOA) provides high communication efficiency and privacy provisioning under the edge computing paradigm. When all edge devices (workers) simultaneously upload their local updates to the parameter server (PS) through commonly shared time-frequency resources, the PS obtains the averaged update o… ▽ More As a promising distributed learning technology, analog aggregation based federated learning over the air (FLOA) provides high communication efficiency and privacy provisioning under the edge computing paradigm. When all edge devices (workers) simultaneously upload their local updates to the parameter server (PS) through commonly shared time-frequency resources, the PS obtains the averaged update only rather than the individual local ones. While such a concurrent transmission and aggregation scheme reduces the latency and communication costs, it unfortunately renders FLOA vulnerable to Byzantine attacks. Aiming at Byzantine-resilient FLOA, this paper starts from analyzing the channel inversion (CI) mechanism that is widely used for power control in FLOA. Our theoretical analysis indicates that although CI can achieve good learning performance in the benign scenarios, it fails to work well with limited defensive capability against Byzantine attacks. Then, we propose a novel scheme called the best effort voting (BEV) power control policy that is integrated with stochastic gradient descent (SGD). Our BEV-SGD enhances the robustness of FLOA to Byzantine attacks, by allowing all the workers to send their local updates at their maximum transmit power. Under worst-case attacks, we derive the expected convergence rates of FLOA with CI and BEV power control policies, respectively. The rate comparison reveals that our BEV-SGD outperforms its counterpart with CI in terms of better convergence behavior, which is verified by experimental simulations. △ Less

Submitted 9 February, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: Version 2:Revised some proofs, some typos, and some expressions of sentences

arXiv:2110.04529 [pdf]

The design of quaternary eutectic solder by machine learning

Authors: Zhenhua Guo, Xintong Ren, Jiahua Jiang, Haoyang Liu, Yongjun Huo, Xiuchen Zhao, K. N. Tu, Yingxia Liu

Abstract: In this paper, we obtain a Sn-Bi-In-Pb quaternary near eutectic alloy composition from machine learning model. The eutectic points and the alloy composition were evaluated and continuously improved by experimental input. The actual composition is near the result given by machine learning. We conclude that the application of machine learning in solder design has shown the potential to overcome the… ▽ More In this paper, we obtain a Sn-Bi-In-Pb quaternary near eutectic alloy composition from machine learning model. The eutectic points and the alloy composition were evaluated and continuously improved by experimental input. The actual composition is near the result given by machine learning. We conclude that the application of machine learning in solder design has shown the potential to overcome the challenge in searching for the next generation eutectic solders, which will have a broad impact on the industry. △ Less

Submitted 9 October, 2021; originally announced October 2021.

arXiv:2109.09004 [pdf, other]

Random Multi-Channel Image Synthesis for Multiplexed Immunofluorescence Imaging

Authors: Shunxing Bao, Yucheng Tang, Ho Hin Lee, Riqiang Gao, Sophie Chiron, Ilwoo Lyu, Lori A. Coburn, Keith T. Wilson, Joseph T. Roland, Bennett A. Landman, Yuankai Huo

Abstract: Multiplex immunofluorescence (MxIF) is an emerging imaging technique that produces the high sensitivity and specificity of single-cell map**. With a tenet of 'seeing is believing', MxIF enables iterative staining and imaging extensive antibodies, which provides comprehensive biomarkers to segment and group different cells on a single tissue section. However, considerable depletion of the scarce… ▽ More Multiplex immunofluorescence (MxIF) is an emerging imaging technique that produces the high sensitivity and specificity of single-cell map**. With a tenet of 'seeing is believing', MxIF enables iterative staining and imaging extensive antibodies, which provides comprehensive biomarkers to segment and group different cells on a single tissue section. However, considerable depletion of the scarce tissue is inevitable from extensive rounds of staining and bleaching ('missing tissue'). Moreover, the immunofluorescence (IF) imaging can globally fail for particular rounds ('missing stain''). In this work, we focus on the 'missing stain' issue. It would be appealing to develop digital image synthesis approaches to restore missing stain images without losing more tissue physically. Herein, we aim to develop image synthesis approaches for eleven MxIF structural molecular markers (i.e., epithelial and stromal) on real samples. We propose a novel multi-channel high-resolution image synthesis approach, called pixN2N-HD, to tackle possible missing stain scenarios via a high-resolution generative adversarial network (GAN). Our contribution is three-fold: (1) a single deep network framework is proposed to tackle missing stain in MxIF; (2) the proposed 'N-to-N' strategy reduces theoretical four years of computational time to 20 hours when covering all possible missing stains scenarios, with up to five missing stains (e.g., '(N-1)-to-1', '(N-2)-to-2'); and (3) this work is the first comprehensive experimental study of investigating cross-stain synthesis in MxIF. Our results elucidate a promising direction of advancing MxIF imaging with deep image synthesis. △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: Accepted at the third MICCAI workshop on Computational Pathology (COMPAY 2021)

arXiv:2109.03494 [pdf, other]

Quantum Computational Advantage via 60-Qubit 24-Cycle Random Circuit Sampling

Authors: Qingling Zhu, Sirui Cao, Fusheng Chen, Ming-Cheng Chen, Xiawei Chen, Tung-Hsun Chung, Hui Deng, Yajie Du, Dao** Fan, Ming Gong, Cheng Guo, Chu Guo, Shaojun Guo, Lianchen Han, Linyin Hong, He-Liang Huang, Yong-Heng Huo, Li** Li, Na Li, Shaowei Li, Yuan Li, Futian Liang, Chun Lin, ** Lin, Haoran Qian , et al. (28 additional authors not shown)

Abstract: To ensure a long-term quantum computational advantage, the quantum hardware should be upgraded to withstand the competition of continuously improved classical algorithms and hardwares. Here, we demonstrate a superconducting quantum computing systems \textit{Zuchongzhi} 2.1, which has 66 qubits in a two-dimensional array in a tunable coupler architecture. The readout fidelity of \textit{Zuchongzhi}… ▽ More To ensure a long-term quantum computational advantage, the quantum hardware should be upgraded to withstand the competition of continuously improved classical algorithms and hardwares. Here, we demonstrate a superconducting quantum computing systems \textit{Zuchongzhi} 2.1, which has 66 qubits in a two-dimensional array in a tunable coupler architecture. The readout fidelity of \textit{Zuchongzhi} 2.1 is considerably improved to an average of 97.74\%. The more powerful quantum processor enables us to achieve larger-scale random quantum circuit sampling, with a system scale of up to 60 qubits and 24 cycles. The achieved sampling task is about 6 orders of magnitude more difficult than that of Sycamore [Nature \textbf{574}, 505 (2019)] in the classic simulation, and 3 orders of magnitude more difficult than the sampling task on \textit{Zuchongzhi} 2.0 [arXiv:2106.14734 (2021)]. The time consumption of classically simulating random circuit sampling experiment using state-of-the-art classical algorithm and supercomputer is extended to tens of thousands of years (about $4.8\times 10^4$ years), while \textit{Zuchongzhi} 2.1 only takes about 4.2 hours, thereby significantly enhancing the quantum computational advantage. △ Less

Submitted 9 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

arXiv:2109.01781 [pdf, other]

Measurement-based Condition Monitoring of Railway Signaling Cables

Authors: Rathinamala Vijay, Gautham Prasad, Yinjia Huo, SM Sachin, TV Prabhakar

Abstract: We propose a composite diagnostics solution for railway infrastructure monitoring. In particular, we address the issue of soft-fault detection in underground railway cables. We first demonstrate the feasibility of an orthogonal multitone time domain reflectometry based fault detection and location method for railway cabling infrastructure by implementing it using software defined radios. Our pract… ▽ More We propose a composite diagnostics solution for railway infrastructure monitoring. In particular, we address the issue of soft-fault detection in underground railway cables. We first demonstrate the feasibility of an orthogonal multitone time domain reflectometry based fault detection and location method for railway cabling infrastructure by implementing it using software defined radios. Our practical implementation, comprehensive measurement campaign, and our measurement results guide the design of our overall composite solution. With several diagnostics solutions available in the literature, our conglomerated method presents a technique to consolidate results from multiple diagnostics methods to provide an accurate assessment of underground cable health. We present a Bayesian framework based cable health index computation technique that indicates the extent of degradation that a cable is subject to at any stage during its lifespan. We present the performance results of our proposed solution using real-world measurements to demonstrate its effectiveness. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 6 pages, 8 figures, 2 tables, To appear in SmartGridComm 2021

arXiv:2108.11993 [pdf, other]

Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation

Authors: Cam Nguyen, Zuhayr Asad, Yuankai Huo

Abstract: Histopathology has played an essential role in cancer diagnosis. With the rapid advances in convolutional neural networks (CNN). Various CNN-based automated pathological image segmentation approaches have been developed in computer-assisted pathological image analysis. In the past few years, Transformer neural networks (Transformer) have shown the unique merit of capturing the global long-distance… ▽ More Histopathology has played an essential role in cancer diagnosis. With the rapid advances in convolutional neural networks (CNN). Various CNN-based automated pathological image segmentation approaches have been developed in computer-assisted pathological image analysis. In the past few years, Transformer neural networks (Transformer) have shown the unique merit of capturing the global long-distance dependencies across the entire image as a new deep learning paradigm. Such merit is appealing for exploring spatially heterogeneous pathological images. However, there have been very few, if any, studies that have systematically evaluated the current Transformer-based approaches in pathological image segmentation. To assess the performance of Transformer segmentation models on whole slide images (WSI), we quantitatively evaluated six prevalent transformer-based models on tumor segmentation, using the widely used PAIP liver histopathological dataset. For a more comprehensive analysis, we also compare the transformer-based models with six major traditional CNN-based models. The results show that the Transformer-based models exhibit a general superior performance over the CNN-based models. In particular, Segmenter, Swin-Transformer and TransUNet-all transformer-based-came out as the best performers among the twelve evaluated models. △ Less

Submitted 22 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

arXiv:2107.12842 [pdf]

Technical Report: Quality Assessment Tool for Machine Learning with Clinical CT

Authors: Riqiang Gao, Mirza S. Khan, Yucheng Tang, Kaiwen Xu, Steve Deppen, Yuankai Huo, Kim L. Sandler, Pierre P. Massion, Bennett A. Landman

Abstract: Image Quality Assessment (IQA) is important for scientific inquiry, especially in medical imaging and machine learning. Potential data quality issues can be exacerbated when human-based workflows use limited views of the data that may obscure digital artifacts. In practice, multiple factors such as network issues, accelerated acquisitions, motion artifacts, and imaging protocol design can impede t… ▽ More Image Quality Assessment (IQA) is important for scientific inquiry, especially in medical imaging and machine learning. Potential data quality issues can be exacerbated when human-based workflows use limited views of the data that may obscure digital artifacts. In practice, multiple factors such as network issues, accelerated acquisitions, motion artifacts, and imaging protocol design can impede the interpretation of image collections. The medical image processing community has developed a wide variety of tools for the inspection and validation of imaging data. Yet, IQA of computed tomography (CT) remains an under-recognized challenge, and no user-friendly tool is commonly available to address these potential issues. Here, we create and illustrate a pipeline specifically designed to identify and resolve issues encountered with large-scale data mining of clinically acquired CT data. Using the widely studied National Lung Screening Trial (NLST), we have identified approximately 4% of image volumes with quality concerns out of 17,392 scans. To assess robustness, we applied the proposed pipeline to our internal datasets where we find our tool is generalizable to clinically acquired medical images. In conclusion, the tool has been useful and time-saving for research study of clinical data, and the code and tutorials are publicly available at https://github.com/MASILab/QA_tool. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: 18 pages, 13 figures, technical report

arXiv:2107.11882 [pdf, other]

Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing Imputation Perspective

Authors: Riqiang Gao, Yucheng Tang, Kaiwen Xu, Ho Hin Lee, Steve Deppen, Kim Sandler, Pierre Massion, Thomas A. Lasko, Yuankai Huo, Bennett A. Landman

Abstract: Data from multi-modality provide complementary information in clinical prediction, but missing data in clinical cohorts limits the number of subjects in multi-modal learning context. Multi-modal missing imputation is challenging with existing methods when 1) the missing data span across heterogeneous modalities (e.g., image vs. non-image); or 2) one modality is largely missing. In this paper, we a… ▽ More Data from multi-modality provide complementary information in clinical prediction, but missing data in clinical cohorts limits the number of subjects in multi-modal learning context. Multi-modal missing imputation is challenging with existing methods when 1) the missing data span across heterogeneous modalities (e.g., image vs. non-image); or 2) one modality is largely missing. In this paper, we address imputation of missing data by modeling the joint distribution of multi-modal data. Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method that imputes one modality combining the conditional knowledge from another modality. Specifically, C-PBiGAN introduces a conditional latent space in a missing imputation framework that jointly encodes the available multi-modal data, along with a class regularization loss on imputed data to recover discriminative information. To our knowledge, it is the first generative adversarial model that addresses multi-modal missing imputation by modeling the joint distribution of image and non-image data. We validate our model with both the national lung screening trial (NLST) dataset and an external clinical validation cohort. The proposed C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods (e.g., AUC values increase in both NLST (+2.9\%) and in-house dataset (+4.3\%) compared with PBiGAN, p$<$0.05). △ Less

Submitted 25 July, 2021; originally announced July 2021.

Comments: Early Accepted by MICCAI 2021. Traveling Award

arXiv:2107.08650 [pdf, other]

Compound Figure Separation of Biomedical Images with Side Loss

Authors: Tianyuan Yao, Chang Qu, Quan Liu, Ruining Deng, Yuanhan Tian, Jiachen Xu, Aadarsh Jha, Shunxing Bao, Mengyang Zhao, Agnes B. Fogo, Bennett A. Landman, Catie Chang, Haichun Yang, Yuankai Huo

Abstract: Unsupervised learning algorithms (e.g., self-supervised learning, auto-encoder, contrastive learning) allow deep learning models to learn effective image representations from large-scale unlabeled data. In medical image analysis, even unannotated data can be difficult to obtain for individual labs. Fortunately, national-level efforts have been made to provide efficient access to obtain biomedical… ▽ More Unsupervised learning algorithms (e.g., self-supervised learning, auto-encoder, contrastive learning) allow deep learning models to learn effective image representations from large-scale unlabeled data. In medical image analysis, even unannotated data can be difficult to obtain for individual labs. Fortunately, national-level efforts have been made to provide efficient access to obtain biomedical image data from previous scientific publications. For instance, NIH has launched the Open-i search engine that provides a large-scale image database with free access. However, the images in scientific publications consist of a considerable amount of compound figures with subplots. To extract and curate individual subplots, many different compound figure separation approaches have been developed, especially with the recent advances in deep learning. However, previous approaches typically required resource extensive bounding box annotation to train detection models. In this paper, we propose a simple compound figure separation (SimCFS) framework that uses weak classification annotations from individual images. Our technical contribution is three-fold: (1) we introduce a new side loss that is designed for compound figure separation; (2) we introduce an intra-class image augmentation method to simulate hard cases; (3) the proposed framework enables an efficient deployment to new classes of images, without requiring resource extensive bounding box annotations. From the results, the SimCFS achieved a new state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2107.06149 [pdf, other]

MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis

Authors: Haocheng Ren, Hao Zhang, Jia Zheng, Jiaxiang Zheng, Rui Tang, Yuchi Huo, Hujun Bao, Rui Wang

Abstract: With the rapid development of data-driven techniques, data has played an essential role in various computer vision tasks. Many realistic and synthetic datasets have been proposed to address different problems. However, there are lots of unresolved challenges: (1) the creation of dataset is usually a tedious process with manual annotations, (2) most datasets are only designed for a single specific… ▽ More With the rapid development of data-driven techniques, data has played an essential role in various computer vision tasks. Many realistic and synthetic datasets have been proposed to address different problems. However, there are lots of unresolved challenges: (1) the creation of dataset is usually a tedious process with manual annotations, (2) most datasets are only designed for a single specific task, (3) the modification or randomization of the 3D scene is difficult, and (4) the release of commercial 3D data may encounter copyright issue. This paper presents MINERVAS, a Massive INterior EnviRonments VirtuAl Synthesis system, to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks. In particular, we design a programmable pipeline with Domain-Specific Language, allowing users to (1) select scenes from the commercial indoor scene database, (2) synthesize scenes for different tasks with customized rules, and (3) render various imagery data, such as visual color, geometric structures, semantic label. Our system eases the difficulty of customizing massive numbers of scenes for different tasks and relieves users from manipulating fine-grained scene configurations by providing user-controllable randomness using multi-level samplers. Most importantly, it empowers users to access commercial scene databases with millions of indoor scenes and protects the copyright of core data assets, e.g., 3D CAD models. We demonstrate the validity and flexibility of our system by using our synthesized data to improve the performance on different kinds of computer vision tasks. △ Less

Submitted 30 August, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: Accepted by Computer Graphics Forum, Pacific Graphics 2022. The two first authors contribute equally. Project pape: https://coohom.github.io/MINERVAS

arXiv:2107.03484 [pdf, other]

An Overview of Low latency for Wireless Communications: an Evolutionary Perspective

Authors: Xin Fan, Yan Huo

Abstract: Ultra-low latency supported by the fifth generation (5G) give impetus to the prosperity of many wireless network applications, such as autonomous driving, robotics, telepresence, virtual reality and so on. Ultra-low latency is not achieved in a moment, but requires long-term evolution of network structure and key enabling communication technologies. In this paper, we provide an evolutionary overvi… ▽ More Ultra-low latency supported by the fifth generation (5G) give impetus to the prosperity of many wireless network applications, such as autonomous driving, robotics, telepresence, virtual reality and so on. Ultra-low latency is not achieved in a moment, but requires long-term evolution of network structure and key enabling communication technologies. In this paper, we provide an evolutionary overview of low latency in mobile communication systems, including two different evolutionary perspectives: 1) network architecture; 2) physical layer air interface technologies. We firstly describe in detail the evolution of communication network architecture from the second generation (2G) to 5G, highlighting the key points reducing latency. Moreover, we review the evolution of key enabling technologies in the physical layer from 2G to 5G, which is also aimed at reducing latency. We also discussed the challenges and future research directions for low latency in network architecture and physical layer. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2106.15545 [pdf]

Quantum interference between independent solid-state single-photon sources separated by 300 km fiber

Authors: Xiang You, Ming-Yang Zheng, Si Chen, Run-Ze Liu, Jian Qin, M. -C. Xu, Z. -X. Ge, T. -H. Chung, Y. -K. Qiao, Y. -F. Jiang, H. -S. Zhong, M. -C. Chen, H. Wang, Y. -M. He, X. -P. Xie, H. Li, L. -X. You, C. Schneider, J. Yin, T. -Y. Chen, M. Benyoucef, Yong-Heng Huo, S. Hoefling, Qiang Zhang, Chao-Yang Lu , et al. (1 additional authors not shown)

Abstract: In the quest to realize a scalable quantum network, semiconductor quantum dots (QDs) offer distinct advantages including high single-photon efficiency and indistinguishability, high repetition rate (tens of GHz with Purcell enhancement), interconnectivity with spin qubits, and a scalable on-chip platform. However, in the past two decades, the visibility of quantum interference between independent… ▽ More In the quest to realize a scalable quantum network, semiconductor quantum dots (QDs) offer distinct advantages including high single-photon efficiency and indistinguishability, high repetition rate (tens of GHz with Purcell enhancement), interconnectivity with spin qubits, and a scalable on-chip platform. However, in the past two decades, the visibility of quantum interference between independent QDs rarely went beyond the classical limit of 50$\%$ and the distances were limited from a few meters to kilometers. Here, we report quantum interference between two single photons from independent QDs separated by 302 km optical fiber. The single photons are generated from resonantly driven single QDs deterministically coupled to microcavities. Quantum frequency conversions are used to eliminate the QD inhomogeneity and shift the emission wavelength to the telecommunication band. The observed interference visibility is 0.67$\pm$0.02 (0.93$\pm$0.04) without (with) temporal filtering. Feasible improvements can further extend the distance to 600 km. Our work represents a key step to long-distance solid-state quantum networks. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: 14 pages, 5 figures

arXiv:2106.14734 [pdf, other]

doi 10.1103/PhysRevLett.127.180501

Strong quantum computational advantage using a superconducting quantum processor

Authors: Yulin Wu, Wan-Su Bao, Sirui Cao, Fusheng Chen, Ming-Cheng Chen, Xiawei Chen, Tung-Hsun Chung, Hui Deng, Yajie Du, Dao** Fan, Ming Gong, Cheng Guo, Chu Guo, Shaojun Guo, Lianchen Han, Linyin Hong, He-Liang Huang, Yong-Heng Huo, Li** Li, Na Li, Shaowei Li, Yuan Li, Futian Liang, Chun Lin, ** Lin , et al. (29 additional authors not shown)

Abstract: Scaling up to a large number of qubits with high-precision control is essential in the demonstrations of quantum computational advantage to exponentially outpace the classical hardware and algorithmic improvements. Here, we develop a two-dimensional programmable superconducting quantum processor, \textit{Zuchongzhi}, which is composed of 66 functional qubits in a tunable coupling architecture. To… ▽ More Scaling up to a large number of qubits with high-precision control is essential in the demonstrations of quantum computational advantage to exponentially outpace the classical hardware and algorithmic improvements. Here, we develop a two-dimensional programmable superconducting quantum processor, \textit{Zuchongzhi}, which is composed of 66 functional qubits in a tunable coupling architecture. To characterize the performance of the whole system, we perform random quantum circuits sampling for benchmarking, up to a system size of 56 qubits and 20 cycles. The computational cost of the classical simulation of this task is estimated to be 2-3 orders of magnitude higher than the previous work on 53-qubit Sycamore processor [Nature \textbf{574}, 505 (2019)]. We estimate that the sampling task finished by \textit{Zuchongzhi} in about 1.2 hours will take the most powerful supercomputer at least 8 years. Our work establishes an unambiguous quantum computational advantage that is infeasible for classical computation in a reasonable amount of time. The high-precision and programmable quantum computing platform opens a new door to explore novel many-body phenomena and implement complex quantum algorithms. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2106.11480 [pdf, other]

VoxelEmbed: 3D Instance Segmentation and Tracking with Voxel Embedding based Deep Learning

Authors: Mengyang Zhao, Quan Liu, Aadarsh Jha, Ruining Deng, Tianyuan Yao, Anita Mahadevan-Jansen, Matthew J. Tyska, Bryan A. Millis, Yuankai Huo

Abstract: Recent advances in bioimaging have provided scientists a superior high spatial-temporal resolution to observe dynamics of living cells as 3D volumetric videos. Unfortunately, the 3D biomedical video analysis is lagging, impeded by resource insensitive human curation using off-the-shelf 3D analytic tools. Herein, biologists often need to discard a considerable amount of rich 3D spatial information… ▽ More Recent advances in bioimaging have provided scientists a superior high spatial-temporal resolution to observe dynamics of living cells as 3D volumetric videos. Unfortunately, the 3D biomedical video analysis is lagging, impeded by resource insensitive human curation using off-the-shelf 3D analytic tools. Herein, biologists often need to discard a considerable amount of rich 3D spatial information by compromising on 2D analysis via maximum intensity projection. Recently, pixel embedding-based cell instance segmentation and tracking provided a neat and generalizable computing paradigm for understanding cellular dynamics. In this work, we propose a novel spatial-temporal voxel-embedding (VoxelEmbed) based learning method to perform simultaneous cell instance segmenting and tracking on 3D volumetric video sequences. Our contribution is in four-fold: (1) The proposed voxel embedding generalizes the pixel embedding with 3D context information; (2) Present a simple multi-stream learning approach that allows effective spatial-temporal embedding; (3) Accomplished an end-to-end framework for one-stage 3D cell instance segmentation and tracking without heavy parameter tuning; (4) The proposed 3D quantification is memory efficient via a single GPU with 12 GB memory. We evaluate our VoxelEmbed method on four 3D datasets (with different cell types) from the ISBI Cell Tracking Challenge. The proposed VoxelEmbed method achieved consistent superior overall performance (OP) on two densely annotated datasets. The performance is also competitive on two sparsely annotated cohorts with 20.6% and 2% of data-set having segmentation annotations. The results demonstrate that the VoxelEmbed method is a generalizable and memory-efficient solution. △ Less

Submitted 21 June, 2021; originally announced June 2021.

arXiv:2106.07139 [pdf, other]

Pre-Trained Models: Past, Present and Future

Authors: Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Yuan Yao, Ao Zhang, Liang Zhang, Wentao Han, Minlie Huang, Qin **, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong Wen, **hui Yuan, Wayne Xin Zhao, Jun Zhu

Abstract: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific… ▽ More Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs. △ Less

Submitted 11 August, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

arXiv:2106.01596 [pdf, other]

Semantic-Aware Contrastive Learning for Multi-object Medical Image Segmentation

Authors: Ho Hin Lee, Yucheng Tang, Qi Yang, Xin Yu, Shunxing Bao, Leon Y. Cai, Lucas W. Remedios, Bennett A. Landman, Yuankai Huo

Abstract: Medical image segmentation, or computing voxelwise semantic masks, is a fundamental yet challenging task to compute a voxel-level semantic mask. To increase the ability of encoder-decoder neural networks to perform this task across large clinical cohorts, contrastive learning provides an opportunity to stabilize model initialization and enhance encoders without labels. However, multiple target obj… ▽ More Medical image segmentation, or computing voxelwise semantic masks, is a fundamental yet challenging task to compute a voxel-level semantic mask. To increase the ability of encoder-decoder neural networks to perform this task across large clinical cohorts, contrastive learning provides an opportunity to stabilize model initialization and enhance encoders without labels. However, multiple target objects (with different semantic meanings) may exist in a single image, which poses a problem for adapting traditional contrastive learning methods from prevalent 'image-level classification' to 'pixel-level segmentation'. In this paper, we propose a simple semantic-aware contrastive learning approach leveraging attention masks to advance multi-object semantic segmentation. Briefly, we embed different semantic objects to different clusters rather than the traditional image-level embeddings. We evaluate our proposed method on a multi-organ medical image segmentation task with both in-house data and MICCAI Challenge 2015 BTCV datasets. Compared with current state-of-the-art training strategies, our proposed pipeline yields a substantial improvement of 5.53% and 6.09% on Dice score for both medical image segmentation cohorts respectively (p-value<0.01). The performance of the proposed method is further assessed on natural images via the PASCAL VOC 2012 dataset, and achieves a substantial improvement of 2.75% on mIoU (p-value<0.01). △ Less

Submitted 8 November, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

arXiv:2105.11244 [pdf, other]

doi 10.1103/PhysRevB.104.165401

Electric field induced tuning of electronic correlation in weakly confining quantum dots

Authors: Huiying Huang, Diana Csontosová, Santanu Manna, Yongheng Huo, Rinaldo Trotta, Armando Rastelli, Petr Klenovský

Abstract: We conduct a combined experimental and theoretical study of the quantum-confined Stark effect in GaAs/AlGaAs quantum dots obtained with the local droplet etching method. In the experiment, we probe the permanent electric dipole and polarizability of neutral and positively charged excitons weakly confined in GaAs quantum dots by measuring their light emission under the influence of a variable elect… ▽ More We conduct a combined experimental and theoretical study of the quantum-confined Stark effect in GaAs/AlGaAs quantum dots obtained with the local droplet etching method. In the experiment, we probe the permanent electric dipole and polarizability of neutral and positively charged excitons weakly confined in GaAs quantum dots by measuring their light emission under the influence of a variable electric field applied along the growth direction. Calculations based on the configuration-interaction method show excellent quantitative agreement with the experiment and allow us to elucidate the role of Coulomb interactions among the confined particles and -- even more importantly -- of electronic correlation effects on the Stark shifts. Moreover, we show how the electric field alters properties such as built-in dipole, binding energy, and heavy-light hole mixing of multiparticle complexes in weakly confining systems, underlining the deficiencies of commonly used models for the quantum-confined Stark effect. △ Less

Submitted 14 February, 2022; v1 submitted 24 May, 2021; originally announced May 2021.

Journal ref: Phys. Rev. B 104, 165401 (2021)

arXiv:2104.10875 [pdf, other]

doi 10.1109/TWC.2021.3072553

Joint Time and Power Allocation for 5G NR Unlicensed Systems

Authors: Haizhou Bao, Yiming Huo, Xiaodai Dong, Chuanhe Huang

Abstract: The fifth-generation (5G) and beyond networks are designed to efficiently utilize the spectrum resources to meet various quality of service (QoS) requirements. The unlicensed frequency bands used by WiFi are mainly deployed for indoor applications and are not always fully occupied. The cellular industry has been working to enable cellular and WiFi coexistence. In particular, 5G New Radio in unlice… ▽ More The fifth-generation (5G) and beyond networks are designed to efficiently utilize the spectrum resources to meet various quality of service (QoS) requirements. The unlicensed frequency bands used by WiFi are mainly deployed for indoor applications and are not always fully occupied. The cellular industry has been working to enable cellular and WiFi coexistence. In particular, 5G New Radio in unlicensed channel spectrum (NR-U) supports the uplink and downlink transmission on the maximum channel occupation time (MCOT) duration. In this paper, we consider maximizing the total throughput of both downlink and uplink in NR-U by jointly optimizing the time and power allocation during MCOT while ensuring fair coexistence with WiFi. Fairness is guaranteed in two steps: 1) tuning the access related parameters of NR-U to achieve proportional fairness, and 2) including 3GPP fairness from the throughput perspective as a constraint in NR-U throughput maximization. Numerical analysis and simulation have demonstrated the superior performance of the proposed resource allocation algorithm compared to conventional deployment strategies. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 15 pages, 6 figures, to appear in IEEE Transactions on Wireless Communications

arXiv:2104.07116 [pdf, other]

doi 10.1109/VTC2020-Fall49728.2020.9348592

Meteorologically Introduced Impacts on Aerial Channels and UAV Communications

Authors: Mengan Song, Yiming Huo, Tao Lu, Xiaodai Dong, Zhonghua Liang

Abstract: As 5G wireless systems and networks are now being globally commercialized and deployed, more diversified application scenarios are emerging, quickly resha** our societies and paving the road to the beyond 5G (6G) era when terahertz (THz) and unmanned aerial vehicle (UAV) communications may play critical roles. In this paper, aerial channel models under multiple meteorological conditions such as… ▽ More As 5G wireless systems and networks are now being globally commercialized and deployed, more diversified application scenarios are emerging, quickly resha** our societies and paving the road to the beyond 5G (6G) era when terahertz (THz) and unmanned aerial vehicle (UAV) communications may play critical roles. In this paper, aerial channel models under multiple meteorological conditions such as rain, fog and snow, have been investigated at frequencies of interest (from 2 GHz to 900 GHz) for UAV communications. Furthermore, the link budget and the received signal-to-noise ratio (SNR) performance under the existing air-to-ground (A2G) channel models are studied with antenna(s) system considered. The relationship between the 3D coverage radius and UAV altitude under the influence of multiple weather (MW) conditions is simulated. Numerical results show that medium rain has the most effects on the UAV's coverage for UAV communications at millimeter wave (mmWave) bands, while snow has the largest impacts at near THz bands. In addition, when the frequency increases, the corresponding increase in the number of antennas can effectively compensate for the propagation loss introduced by weather factors, while its form factor and weight can be kept to maintain the UAV's payload. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: 5 pages, 7 figures, accepted by IEEE VTC2020-FALL

arXiv:2104.03490 [pdf, other]

doi 10.1109/TWC.2021.3130111

Joint Optimization of Communications and Federated Learning Over the Air

Authors: Xin Fan, Yue Wang, Yan Huo, Zhi Tian

Abstract: Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy. Nonetheless, nonideal communication links and limited transmission resources may hinder the implementation of fast and accurate FL. In this paper, we study joint optimization of communications and FL based on analog aggregation transmission in realistic wireless networks. We fir… ▽ More Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy. Nonetheless, nonideal communication links and limited transmission resources may hinder the implementation of fast and accurate FL. In this paper, we study joint optimization of communications and FL based on analog aggregation transmission in realistic wireless networks. We first derive closed-form expressions for the expected convergence rate of FL over the air, which theoretically quantify the impact of analog aggregation on FL. Based on the analytical results, we develop a joint optimization model for accurate FL implementation, which allows a parameter server to select a subset of workers and determine an appropriate power scaling factor. Since the practical setting of FL over the air encounters unobservable parameters, we reformulate the joint optimization of worker selection and power allocation using controlled approximation. Finally, we efficiently solve the resulting mixed-integer programming problem via a simple yet optimal finite-set search method by reducing the search space. Simulation results show that the proposed solutions developed for realistic wireless analog channels outperform a benchmark method, and achieve comparable performance of the ideal case where FL is implemented over noise-free wireless channels. △ Less

Submitted 4 October, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: V2: Corrected typos and some technical content. Added some references. Added some theoretical proofs. Added some content to make our research applicable to more scenarios

arXiv:2103.16055 [pdf, other]

1-Bit Compressive Sensing for Efficient Federated Learning Over the Air

Authors: Xin Fan, Yue Wang, Yan Huo, Zhi Tian

Abstract: For distributed learning among collaborative users, this paper develops and analyzes a communication-efficient scheme for federated learning (FL) over the air, which incorporates 1-bit compressive sensing (CS) into analog aggregation transmissions. To facilitate design parameter optimization, we theoretically analyze the efficacy of the proposed scheme by deriving a closed-form expression for the… ▽ More For distributed learning among collaborative users, this paper develops and analyzes a communication-efficient scheme for federated learning (FL) over the air, which incorporates 1-bit compressive sensing (CS) into analog aggregation transmissions. To facilitate design parameter optimization, we theoretically analyze the efficacy of the proposed scheme by deriving a closed-form expression for the expected convergence rate of the FL over the air. Our theoretical results reveal the tradeoff between convergence performance and communication efficiency as a result of the aggregation errors caused by sparsification, dimension reduction, quantization, signal reconstruction and noise. Then, we formulate 1-bit CS based FL over the air as a joint optimization problem to mitigate the impact of these aggregation errors through joint optimal design of worker scheduling and power scaling policy. An enumeration-based method is proposed to solve this non-convex problem, which is optimal but becomes computationally infeasible as the number of devices increases. For scalable computing, we resort to the alternating direction method of multipliers (ADMM) technique to develop an efficient implementation that is suitable for large-scale networks. Simulation results show that our proposed 1-bit CS based FL over the air achieves comparable performance to the ideal case where conventional FL without compression and quantification is applied over error-free aggregation, at much reduced communication overhead and transmission latency. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: Part of this article is to be published in the IEEE ICC 2021 Workshop Proceedings

arXiv:2103.13253 [pdf, other]

Learning Versatile Neural Architectures by Propagating Network Codes

Authors: Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, **gdong Wang, ** Luo

Abstract: This work explores how to design a single neural network capable of adapting to multiple heterogeneous vision tasks, such as image segmentation, 3D detection, and video recognition. This goal is challenging because both network architecture search (NAS) spaces and methods in different tasks are inconsistent. We solve this challenge from both sides. We first introduce a unified design space for mul… ▽ More This work explores how to design a single neural network capable of adapting to multiple heterogeneous vision tasks, such as image segmentation, 3D detection, and video recognition. This goal is challenging because both network architecture search (NAS) spaces and methods in different tasks are inconsistent. We solve this challenge from both sides. We first introduce a unified design space for multiple tasks and build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets, including ImageNet, Cityscapes, KITTI, and HMDB51. We further propose Network Coding Propagation (NCP), which back-propagates gradients of neural predictors to directly update architecture codes along the desired gradient directions to solve various tasks. In this way, optimal architecture configurations can be found by NCP in our large search space in seconds. Unlike prior arts of NAS that typically focus on a single task, NCP has several unique benefits. (1) NCP transforms architecture optimization from data-driven to architecture-driven, enabling joint search an architecture among multitasks with different data distributions. (2) NCP learns from network codes but not original data, enabling it to update the architecture efficiently across datasets. (3) In addition to our NAS-Bench-MR, NCP performs well on other NAS benchmarks, such as NAS-Bench-201. (4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i.e., multitask neural architectures and architecture transferring between different tasks. Code is available at https://github.com/dingmyu/NCP. △ Less

Submitted 17 February, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: ICLR 2022. Project page: https://network-propagation.github.io

arXiv:2103.13217 [pdf, other]

Cross-layer based intermittent jamming schemes for securing energy-constraint networks

Authors: Qinghe Gao, Yan Huo, Tao **g, Liran Ma, ** Qian

Abstract: The Internet-of-Things (IoT) emerges as a paradigm to achieve ubiquitous connectivity via wireless communications between kinds of physical objects. Due to the wireless broadcasting nature and the energy constraint of physical objects, concerns on IoT security have triggered research on cooperative jamming based physical layer security. With the help of a cooperative jammer, existing solutions can… ▽ More The Internet-of-Things (IoT) emerges as a paradigm to achieve ubiquitous connectivity via wireless communications between kinds of physical objects. Due to the wireless broadcasting nature and the energy constraint of physical objects, concerns on IoT security have triggered research on cooperative jamming based physical layer security. With the help of a cooperative jammer, existing solutions can effectively fight against eavesdroppers. However, these schemes are of high energy cost due to continuously transmitting jamming signals. To reduce the energy consumption, we propose a new idea of intermittent jamming and design five specific intermittent jamming schemes (IJSs). By taking the transmit frame formate into account, we optimize these IJSs from three aspects, including the jamming power, the jamming method, and the jamming positions. Then we analyze the applicability of the proposed IJSs according to different requirements on the synchronization, the available jamming energy and the jamming power constraints. Extensive MATLAB experiments are conducted on the basis of the WLAN Toolbox, which demonstrate the proposed IJSs can effectively degrade the reception of the eavesdropper and outperform the widespread continuous jamming scheme (CJS) when the available jamming energy is limited. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: 11 pages,33 subfigures and figures

arXiv:2103.06561 [pdf, other]

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Authors: Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, **gyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, **ming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong , et al. (10 additional authors not shown)

Abstract: Multi-modal pre-training models have been intensively explored to bridge vision and language in recent years. However, most of them explicitly model the cross-modal interaction between image-text pairs, by assuming that there exists strong semantic correlation between the text and image modalities. Since this strong assumption is often invalid in real-world scenarios, we choose to implicitly model… ▽ More Multi-modal pre-training models have been intensively explored to bridge vision and language in recent years. However, most of them explicitly model the cross-modal interaction between image-text pairs, by assuming that there exists strong semantic correlation between the text and image modalities. Since this strong assumption is often invalid in real-world scenarios, we choose to implicitly model the cross-modal correlation for large-scale multi-modal pre-training, which is the focus of the Chinese project `WenLan' led by our team. Specifically, with the weak correlation assumption over image-text pairs, we propose a two-tower pre-training model called BriVL within the cross-modal contrastive learning framework. Unlike OpenAI CLIP that adopts a simple contrastive learning method, we devise a more advanced algorithm by adapting the latest method MoCo into the cross-modal scenario. By building a large queue-based dictionary, our BriVL can incorporate more negative samples in limited GPU resources. We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model. Extensive experiments demonstrate that the pre-trained BriVL model outperforms both UNITER and OpenAI CLIP on various downstream tasks. △ Less

Submitted 8 July, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

Comments: This paper is the outcome of the Chinese multi-modal pre-training project called 'WenLan'

arXiv:2103.05585 [pdf, other]

SimTriplet: Simple Triplet Representation Learning with a Single GPU

Authors: Quan Liu, Peter C. Louis, Yuzhe Lu, Aadarsh Jha, Mengyang Zhao, Ruining Deng, Tianyuan Yao, Joseph T. Roland, Haichun Yang, Shilin Zhao, Lee E. Wheless, Yuankai Huo

Abstract: Contrastive learning is a key technique of modern self-supervised learning. The broader accessibility of earlier approaches is hindered by the need of heavy computational resources (e.g., at least 8 GPUs or 32 TPU cores), which accommodate for large-scale negative samples or momentum. The more recent SimSiam approach addresses such key limitations via stop-gradient without momentum encoders. In me… ▽ More Contrastive learning is a key technique of modern self-supervised learning. The broader accessibility of earlier approaches is hindered by the need of heavy computational resources (e.g., at least 8 GPUs or 32 TPU cores), which accommodate for large-scale negative samples or momentum. The more recent SimSiam approach addresses such key limitations via stop-gradient without momentum encoders. In medical image analysis, multiple instances can be achieved from the same patient or tissue. Inspired by these advances, we propose a simple triplet representation learning (SimTriplet) approach on pathological images. The contribution of the paper is three-fold: (1) The proposed SimTriplet method takes advantage of the multi-view nature of medical images beyond self-augmentation; (2) The method maximizes both intra-sample and inter-sample similarities via triplets from positive pairs, without using negative samples; and (3) The recent mix precision training is employed to advance the training by only using a single GPU with 16GB memory. By learning from 79,000 unlabeled pathological patch images, SimTriplet achieved 10.58% better performance compared with supervised learning. It also achieved 2.13% better performance compared with SimSiam. Our proposed SimTriplet can achieve decent performance using only 1% labeled data. The code and data are available at https://github.com/hrlblab/SimTriple. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2103.03166 [pdf, other]

Contrastive Learning Meets Transfer Learning: A Case Study In Medical Image Analysis

Authors: Yuzhe Lu, Aadarsh Jha, Yuankai Huo

Abstract: Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective solutions to tackle such issues from different perspectives. The state-of-the-art transfer learning (e.g., Big Transfer (BiT)) and contrastive learning (e.g., Simple Siamese Contrast… ▽ More Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective solutions to tackle such issues from different perspectives. The state-of-the-art transfer learning (e.g., Big Transfer (BiT)) and contrastive learning (e.g., Simple Siamese Contrastive Learning (SimSiam)) approaches have been investigated independently, without considering the complementary nature of such techniques. It would be appealing to accelerate contrastive learning with transfer learning, given that slow convergence speed is a critical limitation of modern contrastive learning approaches. In this paper, we investigate the feasibility of aligning BiT with SimSiam. From empirical analyses, different normalization techniques (Group Norm in BiT vs. Batch Norm in SimSiam) are the key hurdle of adapting BiT to SimSiam. When combining BiT with SimSiam, we evaluated the performance of using BiT, SimSiam, and BiT+SimSiam on CIFAR-10 and HAM10000 datasets. The results suggest that the BiT models accelerate the convergence speed of SimSiam. When used together, the model gives superior performance over both of its counterparts. We hope this study will motivate researchers to revisit the task of aggregating big pre-trained models with contrastive learning models for image analysis. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2103.02965 [pdf, other]

doi 10.1016/j.nuclphysb.2021.115443

Self-Renormalization of Quasi-Light-Front Correlators on the Lattice

Authors: Yi-Kai Huo, Yushan Su, Long-Cheng Gui, Xiangdong Ji, Yuan-Yuan Li, Yizhuang Liu, Andreas Schäfer, Maximilian Schlemmer, Peng Sun, Wei Wang, Yi-Bo Yang, Jian-Hui Zhang, Kuan Zhang

Abstract: In applying large-momentum effective theory, renormalization of the Euclidean correlators in lattice regularization is a challenge due to linear divergences in the self-energy of Wilson lines. Based on lattice QCD matrix elements of the quasi-PDF operator at lattice spacing $a$= 0.03 fm $\sim$ 0.12 fm with clover and overlap valence quarks on staggered and domain-wall sea, we design a strategy to… ▽ More In applying large-momentum effective theory, renormalization of the Euclidean correlators in lattice regularization is a challenge due to linear divergences in the self-energy of Wilson lines. Based on lattice QCD matrix elements of the quasi-PDF operator at lattice spacing $a$= 0.03 fm $\sim$ 0.12 fm with clover and overlap valence quarks on staggered and domain-wall sea, we design a strategy to disentangle the divergent renormalization factors from finite physics matrix elements, which can be matched to a continuum scheme at short distance such as dimensional regularization and minimal subtraction. Our results indicate that the renormalization factors are universal in the hadron state matrix elements. Moreover, the physical matrix elements appear independent of the valence fermion formulations. These conclusions remain valid even with HYP smearing which reduces the statistical errors albeit reducing control of the renormalization procedure. Moreover, we find a large non-perturbative effect in the popular RI/MOM and ratio renormalization scheme, suggesting favor of the hybrid renormalization procedure proposed recently. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: 29 pages, 30 figures

Showing 151–200 of 321 results for author: Huo, Y