Search | arXiv e-print repository

arXiv:2203.07659 [pdf]

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Authors: Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian

Abstract: Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from con… ▽ More Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subty**. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2203.06616 [pdf, other]

LAS-AT: Adversarial Training with Learnable Attack Strategy

Authors: Xiaojun Jia, Yong Zhang, Baoyuan Wu, Ke Ma, Jue Wang, Xiaochun Cao

Abstract: Adversarial training (AT) is always formulated as a minimax problem, of which the performance depends on the inner optimization that involves the generation of adversarial examples (AEs). Most previous methods adopt Projected Gradient Decent (PGD) with manually specifying attack parameters for AE generation. A combination of the attack parameters can be referred to as an attack strategy. Several w… ▽ More Adversarial training (AT) is always formulated as a minimax problem, of which the performance depends on the inner optimization that involves the generation of adversarial examples (AEs). Most previous methods adopt Projected Gradient Decent (PGD) with manually specifying attack parameters for AE generation. A combination of the attack parameters can be referred to as an attack strategy. Several works have revealed that using a fixed attack strategy to generate AEs during the whole training phase limits the model robustness and propose to exploit different attack strategies at different training stages to improve robustness. But those multi-stage hand-crafted attack strategies need much domain expertise, and the robustness improvement is limited. In this paper, we propose a novel framework for adversarial training by introducing the concept of "learnable attack strategy", dubbed LAS-AT, which learns to automatically produce attack strategies to improve the model robustness. Our framework is composed of a target network that uses AEs for training to improve robustness and a strategy network that produces attack strategies to control the AE generation. Experimental evaluations on three benchmark databases demonstrate the superiority of the proposed method. The code is released at https://github.com/jiaxiaojunQAQ/LAS-AT. △ Less

Submitted 13 March, 2022; originally announced March 2022.

Journal ref: CVPR 2022

arXiv:2203.06429 [pdf, other]

DFTR: Depth-supervised Fusion Transformer for Salient Object Detection

Authors: Heqin Zhu, Xu Sun, Yuexiang Li, Kai Ma, S. Kevin Zhou, Yefeng Zheng

Abstract: Automated salient object detection (SOD) plays an increasingly crucial role in many computer vision applications. By reformulating the depth information as supervision rather than as input, depth-supervised convolutional neural networks (CNN) have achieved promising results on both RGB and RGB-D SOD scenarios with the merits of no requirements for extra depth networks and depth inputs in the infer… ▽ More Automated salient object detection (SOD) plays an increasingly crucial role in many computer vision applications. By reformulating the depth information as supervision rather than as input, depth-supervised convolutional neural networks (CNN) have achieved promising results on both RGB and RGB-D SOD scenarios with the merits of no requirements for extra depth networks and depth inputs in the inference stage. This paper, for the first time, seeks to expand the applicability of depth supervision to the Transformer architecture. Specifically, we develop a Depth-supervised Fusion TRansformer (DFTR), to further improve the accuracy of both RGB and RGB-D SOD. The proposed DFTR involves three primary features: 1) DFTR, to the best of our knowledge, is the first pure Transformer-based model for depth-supervised SOD; 2) A multi-scale feature aggregation (MFA) module is proposed to fully exploit the multi-scale features encoded by the Swin Transformer in a coarse-to-fine manner; 3) To enable bidirectional information flow across different streams of features, a novel multi-stage feature fusion (MFF) module is further integrated into our DFTR with the emphasis on salient regions at different network learning stages. We extensively evaluate the proposed DFTR on ten benchmarking datasets. Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks. The code and model will be made publicly available. △ Less

Submitted 11 April, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

Comments: 15 pages, 5 figures, 4 tables

arXiv:2203.06321 [pdf, other]

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Authors: Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma

Abstract: Remarkable achievements have been attained with Generative Adversarial Networks (GANs) in image-to-image translation. However, due to a tremendous amount of parameters, state-of-the-art GANs usually suffer from low efficiency and bulky memory usage. To tackle this challenge, firstly, this paper investigates GANs performance from a frequency perspective. The results show that GANs, especially small… ▽ More Remarkable achievements have been attained with Generative Adversarial Networks (GANs) in image-to-image translation. However, due to a tremendous amount of parameters, state-of-the-art GANs usually suffer from low efficiency and bulky memory usage. To tackle this challenge, firstly, this paper investigates GANs performance from a frequency perspective. The results show that GANs, especially small GANs lack the ability to generate high-quality high frequency information. To address this problem, we propose a novel knowledge distillation method referred to as wavelet knowledge distillation. Instead of directly distilling the generated images of teachers, wavelet knowledge distillation first decomposes the images into different frequency bands with discrete wavelet transformation and then only distills the high frequency bands. As a result, the student GAN can pay more attention to its learning on high frequency bands. Experiments demonstrate that our method leads to 7.08 times compression and 6.80 times acceleration on CycleGAN with almost no performance drop. Additionally, we have studied the relation between discriminators and generators which shows that the compression of discriminators can promote the performance of compressed generators. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR2022

arXiv:2203.03640 [pdf, other]

doi 10.1109/TMI.2020.3014433

Conquering Data Variations in Resolution: A Slice-Aware Multi-Branch Decoder Network

Authors: Shuxin Wang, Shilei Cao, Zhizhong Chai, Dong Wei, Kai Ma, Liansheng Wang, Yefeng Zheng

Abstract: Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the perf… ▽ More Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the performance. To tackle the mismatch between the intra- and inter-slice information, we propose a slice-aware 2.5D network that emphasizes extracting discriminative features utilizing not only in-plane semantics but also out-of-plane coherence for each separate slice. Specifically, we present a slice-wise multi-input multi-output architecture to instantiate such a design paradigm, which contains a Multi-Branch Decoder (MD) with a Slice-centric Attention Block (SAB) for learning slice-specific features and a Densely Connected Dice (DCD) loss to regularize the inter-slice predictions to be coherent and continuous. Based on the aforementioned innovations, we achieve state-of-the-art results on the MICCAI 2017 Liver Tumor Segmentation (LiTS) dataset. Besides, we also test our model on the ISBI 2019 Segmentation of THoracic Organs at Risk (SegTHOR) dataset, and the result proves the robustness and generalizability of the proposed method in other segmentation tasks. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: Published by IEEE TMI

arXiv:2203.02390 [pdf, other]

doi 10.1007/978-3-030-87237-3_11

Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D Networks for 3D Coherent Layer Segmentation of Retina OCT Images

Authors: Hong Liu, Dong Wei, Donghuan Lu, Yuexiang Li, Kai Ma, Liansheng Wang, Yefeng Zheng

Abstract: Automated surface segmentation of retinal layer is important and challenging in analyzing optical coherence tomography (OCT). Recently, many deep learning based methods have been developed for this task and yield remarkable performance. However, due to large spatial gap and potential mismatch between the B-scans of OCT data, all of them are based on 2D segmentation of individual B-scans, which may… ▽ More Automated surface segmentation of retinal layer is important and challenging in analyzing optical coherence tomography (OCT). Recently, many deep learning based methods have been developed for this task and yield remarkable performance. However, due to large spatial gap and potential mismatch between the B-scans of OCT data, all of them are based on 2D segmentation of individual B-scans, which may loss the continuity information across the B-scans. In addition, 3D surface of the retina layers can provide more diagnostic information, which is crucial in quantitative image analysis. In this study, a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) is proposed to obtain continuous 3D retinal layer surfaces from OCT. The 2D features of individual B-scans are extracted by an encoder consisting of 2D convolutions. These 2D features are then used to produce the alignment displacement field and layer segmentation by two 3D decoders, which are coupled via a spatial transformer module. The entire framework is trained end-to-end. To the best of our knowledge, this is the first study that attempts 3D retinal layer segmentation in volumetric OCT images based on CNNs. Experiments on a publicly available dataset show that our framework achieves superior results to state-of-the-art 2D methods in terms of both layer segmentation accuracy and cross-B-scan 3D continuity, thus offering more clinical values than previous works. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: Presented at MICCAI 2021

arXiv:2203.00270 [pdf, other]

Bidirectional Pricing and Demand Response for Nanogrids with HVAC Systems

Authors: Jiaxin Cao, Bo Yang, Shanying Zhu, Kai Ma, ** Guan

Abstract: Owing to the fluctuant renewable generation and power demand, the energy surplus or deficit in each nanogrid is embodied differently across time. To stimulate local renewable energy consumption and minimize the long-term energy cost, some issues still remain to be explored: when and how the energy demand and bidirectional trading prices are scheduled considering personal comfort preferences and en… ▽ More Owing to the fluctuant renewable generation and power demand, the energy surplus or deficit in each nanogrid is embodied differently across time. To stimulate local renewable energy consumption and minimize the long-term energy cost, some issues still remain to be explored: when and how the energy demand and bidirectional trading prices are scheduled considering personal comfort preferences and environmental factors. For this purpose, the demand response and two-way pricing problems concurrently for nanogrids and a public monitoring entity (PME) are studied with exploiting the large potential thermal elastic ability of heating, ventilation and air-conditioning (HVAC) units. Different from nanogrids, in terms of minimizing time-average costs, PME aims to set reasonable prices and optimize profits by trading with nanogrids and the main grid bi-directionally. In particular, such bilevel energy management problem is formulated as a stochastic form in a long-term horizon. Since there are uncertain system parameters, time-coupled queue constraints and the interplay of bilevel decision-making, it is challenging to solve the formulated problems. To this end, we derive a form of relaxation based on Lyapunov optimization technique to make the energy management problem tractable without forecasting the related system parameters. The transaction between nanogrids and PME is captured by a one-leader and multi-follower Stackelberg game framework. Then, theoretical analysis of the existence and uniqueness of Stackelberg equilibrium (SE) is developed based on the proposed game property. Following that, we devise an optimization algorithm to reach the SE with less information exchange. Numerical experiments validate the effectiveness of the proposed approach. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.13687 [pdf, other]

AGMR-Net: Attention Guided Multiscale Recovery framework for stroke segmentation

Authors: Xiuquan Du, Kunpeng Ma, Yuhui Song

Abstract: Automatic and accurate lesion segmentation is critical for clinically estimating the lesion statuses of stroke diseases and develo** appropriate diagnostic systems. Although existing methods have achieved remarkable results, further adoption of the models is hindered by: (1) inter-class indistinction, the normal brain tissue resembles the lesion in appearance. (2) intra-class inconsistency, larg… ▽ More Automatic and accurate lesion segmentation is critical for clinically estimating the lesion statuses of stroke diseases and develo** appropriate diagnostic systems. Although existing methods have achieved remarkable results, further adoption of the models is hindered by: (1) inter-class indistinction, the normal brain tissue resembles the lesion in appearance. (2) intra-class inconsistency, large variability exists between different areas of the lesion. To solve these challenges in stroke segmentation, we propose a novel method, namely Attention Guided Multiscale Recovery framework (AGMR-Net) in this paper. Firstly, a coarse-grained patch attention module in the encoding is adopted to get a patch-based coarse-grained attention map in a multi-stage explicitly supervised way, enabling target spatial context saliency representation with a patch-based weighting technique that eliminates the effect of intra-class inconsistency. Secondly, to obtain a more detailed boundary partitioning to solve the challenge of the inter-class indistinction, a newly designed cross-dimensional feature fusion module is used to capture global contextual information to further guide the selective aggregation of 2D and 3D features, which can compensate for the lack of boundary learning capability of 2D convolution. Lastly, in the decoding stage, an innovative designed multi-scale deconvolution upsampling instead of linear interpolation enhances the recovery of target space and boundary information. The AGMR-Net is evaluated on the open dataset Anatomical Tracings of Lesions-After-Stroke (ATLAS), achieving the highest dice similarity coefficient (DSC) score of 0.594, Hausdorff distance of 27.005 mm, and average symmetry surface distance of 7.137 mm, which demonstrate that our proposed method outperforms other state-of-the-art methods and has great potential in the diagnosis of stroke. △ Less

Submitted 16 April, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.13606 [pdf, other]

doi 10.1109/TNSE.2022.3150182

Asynchronous Decentralized Federated Learning for Collaborative Fault Diagnosis of PV Stations

Authors: Qi Liu, Bo Yang, Zhaojian Wang, Dafeng Zhu, Xinyi Wang, Kai Ma, ** Guan

Abstract: Due to the different losses caused by various photovoltaic (PV) array faults, accurate diagnosis of fault types is becoming increasingly important. Compared with a single one, multiple PV stations collect sufficient fault samples, but their data is not allowed to be shared directly due to potential conflicts of interest. Therefore, federated learning can be exploited to train a collaborative fault… ▽ More Due to the different losses caused by various photovoltaic (PV) array faults, accurate diagnosis of fault types is becoming increasingly important. Compared with a single one, multiple PV stations collect sufficient fault samples, but their data is not allowed to be shared directly due to potential conflicts of interest. Therefore, federated learning can be exploited to train a collaborative fault diagnosis model. However, the modeling efficiency is seriously affected by the model update mechanism since each PV station has a different computing capability and amount of data. Moreover, for the safe and stable operation of the PV system, the robustness of collaborative modeling must be guaranteed rather than simply being processed on a central server. To address these challenges, a novel asynchronous decentralized federated learning (ADFL) framework is proposed. Each PV station not only trains its local model but also participates in collaborative fault diagnosis by exchanging model parameters to improve the generalization without losing accuracy. The global model is aggregated distributedly to avoid central node failure. By designing the asynchronous update scheme, the communication overhead and training time are greatly reduced. Both the experiments and numerical simulations are carried out to verify the effectiveness of the proposed method. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.12456 [pdf, other]

doi 10.1109/TAFFC.2022.3154332

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Authors: Kaining Mao, Wei Zhang, Deborah Baofeng Wang, Ang Li, Rongqi Jiao, Yanhui Zhu, Bin Wu, Tiansheng Zheng, Lei Qian, Wei Lyu, Minjie Ye, Jie Chen

Abstract: Depression is increasingly impacting individuals both physically and psychologically worldwide. It has become a global major public health problem and attracts attention from various research fields. Traditionally, the diagnosis of depression is formulated through semi-structured interviews and supplementary questionnaires, which makes the diagnosis heavily relying on physicians experience and is… ▽ More Depression is increasingly impacting individuals both physically and psychologically worldwide. It has become a global major public health problem and attracts attention from various research fields. Traditionally, the diagnosis of depression is formulated through semi-structured interviews and supplementary questionnaires, which makes the diagnosis heavily relying on physicians experience and is subject to bias. Mental health monitoring and cloud-based remote diagnosis can be implemented through an automated depression diagnosis system. In this article, we propose an attention-based multimodality speech and text representation for depression prediction. Our model is trained to estimate the depression severity of participants using the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset. For the audio modality, we use the collaborative voice analysis repository (COVAREP) features provided by the dataset and employ a Bidirectional Long Short-Term Memory Network (Bi-LSTM) followed by a Time-distributed Convolutional Neural Network (T-CNN). For the text modality, we use global vectors for word representation (GloVe) to perform word embeddings and the embeddings are fed into the Bi-LSTM network. Results show that both audio and text models perform well on the depression severity estimation task, with best sequence level F1 score of 0.9870 and patient-level F1 score of 0.9074 for the audio model over five classes (healthy, mild, moderate, moderately severe, and severe), as well as sequence level F1 score of 0.9709 and patient-level F1 score of 0.9245 for the text model over five classes. Results are similar for the multimodality fused model, with the highest F1 score of 0.9580 on the patient-level depression detection task over five classes. Experiments show statistically significant improvements over previous works. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 15 pages, 7 figures, already accepted by IEEE Transactions on Affective Computing, listed in early access now

arXiv:2202.11642 [pdf]

Nature and Energy Source of the Strong Waveforms Recorded during the 2008 Wenchuan Earthquake

Authors: ** Mao, Xueqiang Zhang, Yuci Su, Ke Mao, Pengyu Lu, Fei Zhang

Abstract: Earthquakes are indeed triggered by fault dislocations, but whether this process alone can produce the actual earthquake energy released by the mainshock has long been questioned. Therefore, exploring the true source of energy that causes earthquakes after the first motion is necessary. Based on analyses of the waveforms and ray paths at seismic stations close to the epicenter, it is considered th… ▽ More Earthquakes are indeed triggered by fault dislocations, but whether this process alone can produce the actual earthquake energy released by the mainshock has long been questioned. Therefore, exploring the true source of energy that causes earthquakes after the first motion is necessary. Based on analyses of the waveforms and ray paths at seismic stations close to the epicenter, it is considered that strong earthquake vibrations may not be caused by S-waves. It is also proposed that the reservoirs in sedimentary strata contain large amounts of high-pressure fluids, whose pressures can be released under certain conditions; this release of pressure may be an important component of the main earthquake energy. When a natural fault ruptures and penetrates a reservoir with a large area, the elastic energy produced by the release of pressure can reach the energy released by an earthquake of magnitude 8.0. Artificial engineering activities can lead to small-scale fluid pressure release phenomena, such as blowouts during drilling and earthquakes induced by hydraulic fracturing. Much direct and indirect evidence, such as the characteristics of seismic waves in the time and frequency domains recorded during the Wenchuan earthquake, explosion phenomena observed on the ground and cores obtained by scientific drilling, indicates the possibility of such energy release. We propose that seismicity can be divided into three stages: the microfracturing stage, in which there is fluid activity and can produce an electrokinetic effect; the significant fracturing stage after the initial movement; and the strong earthquake stage caused by fluid pressure release. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: 31 pages, 10 figures

arXiv:2202.11317 [pdf, other]

The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Authors: Yi Sheng, Junhuan Yang, Yawen Wu, Kevin Mao, Yiyu Shi, **gtong Hu, Weiwen Jiang, Lei Yang

Abstract: Along with the progress of AI democratization, neural networks are being deployed more frequently in edge devices for a wide range of applications. Fairness concerns gradually emerge in many applications, such as face recognition and mobile medical. One fundamental question arises: what will be the fairest neural architecture for edge devices? By examining the existing neural networks, we observe… ▽ More Along with the progress of AI democratization, neural networks are being deployed more frequently in edge devices for a wide range of applications. Fairness concerns gradually emerge in many applications, such as face recognition and mobile medical. One fundamental question arises: what will be the fairest neural architecture for edge devices? By examining the existing neural networks, we observe that larger networks typically are fairer. But, edge devices call for smaller neural architectures to meet hardware specifications. To address this challenge, this work proposes a novel Fairness- and Hardware-aware Neural architecture search framework, namely FaHaNa. Coupled with a model freezing approach, FaHaNa can efficiently search for neural networks with balanced fairness and accuracy, while guaranteed to meet hardware specifications. Results show that FaHaNa can identify a series of neural networks with higher fairness and accuracy on a dermatology dataset. Target edge devices, FaHaNa finds a neural architecture with slightly higher accuracy, 5.28x smaller size, 15.14% higher fairness score, compared with MobileNetV2; meanwhile, on Raspberry PI and Odroid XU-4, it achieves 5.75x and 5.79x speedup. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted by DAC'22

arXiv:2202.08437 [pdf, other]

doi 10.1109/ISBI52829.2022.9761489

Visual attention analysis of pathologists examining whole slide images of Prostate cancer

Authors: Souradeep Chakraborty, Ke Ma, Rajarsi Gupta, Beatrice Knudsen, Gregory J. Zelinsky, Joel H. Saltz, Dimitris Samaras

Abstract: We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time… ▽ More We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time) from 13 pathologists in 2 groups (5 genitourinary (GU) specialists and 8 general pathologists) and generated visual attention heatmaps and scanpaths. Each pathologist examined five WSIs from the TCGA PRAD dataset, which were selected by a GU pathology specialist. We examined and analyzed the distributions of visual attention for each group of pathologists after each WSI was examined. To quantify the relationship between a pathologist's attention and evidence for cancer in the WSI, we obtained tumor annotations from a genitourinary specialist. We used these annotations to compute the overlap between the distribution of visual attention and annotated tumor region to identify strong correlations. Motivated by this analysis, we trained a deep learning model to predict visual attention on unseen WSIs. We find that the attention heatmaps predicted by our model correlate quite well with the ground truth attention heatmap and tumor annotations on a test set of 17 WSIs by using various spatial and temporal evaluation metrics. △ Less

Submitted 2 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: ISBI 2022 (Oral presentation)

arXiv:2202.08195 [pdf, other]

doi 10.1016/j.media.2023.102933

Nuclei Segmentation with Point Annotations from Pathology Images via Self-Supervised Learning and Co-Training

Authors: Yi Lin, Zhiyong Qu, Hao Chen, Zhongke Gao, Yuexiang Li, Lili Xia, Kai Ma, Yefeng Zheng, Kwang-Ting Cheng

Abstract: Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as po… ▽ More Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as point annotations. In this paper, we propose a weakly-supervised learning method for nuclei segmentation that only requires point annotations for training. First, coarse pixel-level labels are derived from the point annotations based on the Voronoi diagram and the k-means clustering method to avoid overfitting. Second, a co-training strategy with an exponential moving average method is designed to refine the incomplete supervision of the coarse labels. Third, a self-supervised visual representation learning method is tailored for nuclei segmentation of pathology images that transforms the hematoxylin component images into the H&E stained images to gain better understanding of the relationship between the nuclei and cytoplasm. We comprehensively evaluate the proposed method using two public datasets. Both visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and its competitive performance compared to the fully-supervised methods. Code: https://github.com/hust-linyi/SC-Net △ Less

Submitted 17 August, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by MedIA

arXiv:2202.08057 [pdf, other]

Understanding and Improving Graph Injection Attack by Promoting Unnoticeability

Authors: Yongqiang Chen, Han Yang, Yonggang Zhang, Kaili Ma, Tongliang Liu, Bo Han, James Cheng

Abstract: Recently Graph Injection Attack (GIA) emerges as a practical attack scenario on Graph Neural Networks (GNNs), where the adversary can merely inject few malicious nodes instead of modifying existing nodes or edges, i.e., Graph Modification Attack (GMA). Although GIA has achieved promising results, little is known about why it is successful and whether there is any pitfall behind the success. To und… ▽ More Recently Graph Injection Attack (GIA) emerges as a practical attack scenario on Graph Neural Networks (GNNs), where the adversary can merely inject few malicious nodes instead of modifying existing nodes or edges, i.e., Graph Modification Attack (GMA). Although GIA has achieved promising results, little is known about why it is successful and whether there is any pitfall behind the success. To understand the power of GIA, we compare it with GMA and find that GIA can be provably more harmful than GMA due to its relatively high flexibility. However, the high flexibility will also lead to great damage to the homophily distribution of the original graph, i.e., similarity among neighbors. Consequently, the threats of GIA can be easily alleviated or even prevented by homophily-based defenses designed to recover the original homophily. To mitigate the issue, we introduce a novel constraint -- homophily unnoticeability that enforces GIA to preserve the homophily, and propose Harmonious Adversarial Objective (HAO) to instantiate it. Extensive experiments verify that GIA with HAO can break homophily-based defenses and outperform previous GIA attacks by a significant margin. We believe our methods can serve for a more reliable evaluation of the robustness of GNNs. △ Less

Submitted 5 April, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: ICLR2022, 42 pages, 22 figures

arXiv:2202.05601 [pdf, ps, other]

doi 10.1103/PhysRevC.105.044302

Observation of the $π^2σ^2$-bond linear-chain molecular structure in $^{16}$C

Authors: J. X. Han, Y. Liu, Y. L. Ye, J. L. Lou, X. F. Yang, T. Baba, M. Kimura, B. Yang, Z. H. Li, Q. T. Li, J. Y. Xu, Y. C. Ge, H. Hua, Z. H. Yang, J. S. Wang, Y. Y. Yang, P. Ma, Z. Bai, Q. Hu, W. Liu, K. Ma, L. C. Tao, Y. Jiang, L. Y. Hu, H. L. Zang , et al. (15 additional authors not shown)

Abstract: Measurements of the $^2$H($^{16}$C,$^{16}$C$^{*}$$\rightarrow^4$He+$^{12}$Be or $^6$He+$^{10}$Be)$^2$H inelastic excitation and cluster-decay reactions have been carried out at a beam energy of about 23.5 MeV/u. A specially designed detection system, including one multi-layer silicon-strip telescope at around zero degrees, has allowed the high-efficiency three-fold coincident detection and therefo… ▽ More Measurements of the $^2$H($^{16}$C,$^{16}$C$^{*}$$\rightarrow^4$He+$^{12}$Be or $^6$He+$^{10}$Be)$^2$H inelastic excitation and cluster-decay reactions have been carried out at a beam energy of about 23.5 MeV/u. A specially designed detection system, including one multi-layer silicon-strip telescope at around zero degrees, has allowed the high-efficiency three-fold coincident detection and therefore the event-by-event determination of the energy of the unstable nucleus beam. The decay paths from the $^{16}$C resonances to various states of the final $^{10}$Be or $^{12}$Be nucleus are recognized thanks to the well-resolved $Q$-value spectra. The reconstructed resonances at 16.5(1), 17.3(2), 19.4(1) and 21.6(2) MeV are assigned as the $0^+$, $2^+$, $4^+$ and $6^+$ members, respectively, of the positive-parity $(3/2_π^-)^2(1/2_σ^-)^2$-bond linear-chain molecular band in $^{16}$C, based on the angular correlation analysis for the 16.5 MeV state and the excellent agreement of decay patterns between the measurements and theoretical predictions. Moreover, another intriguing high-lying state was observed at 27.2(1) MeV which decays almost exclusively to the $\sim$6 MeV states of $^{10}$Be, in line with the newly predicted pure $σ$-bond linear-chain configuration. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: 13 pages, 10 figures

arXiv:2202.05441 [pdf, other]

Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs

Authors: Yongqiang Chen, Yonggang Zhang, Yatao Bian, Han Yang, Kaili Ma, Binghui Xie, Tongliang Liu, Bo Han, James Cheng

Abstract: Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e.g., images), studies on graph data are still limited. Different from images, the complex nature of graphs poses unique challenges to adopting the invariance principle. In particular, distribution shifts on graphs can appear in a variety of forms such as attributes and structur… ▽ More Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e.g., images), studies on graph data are still limited. Different from images, the complex nature of graphs poses unique challenges to adopting the invariance principle. In particular, distribution shifts on graphs can appear in a variety of forms such as attributes and structures, making it difficult to identify the invariance. Moreover, domain or environment partitions, which are often required by OOD methods on Euclidean data, could be highly expensive to obtain for graphs. To bridge this gap, we propose a new framework, called Causality Inspired Invariant Graph LeArning (CIGA), to capture the invariance of graphs for guaranteed OOD generalization under various distribution shifts. Specifically, we characterize potential distribution shifts on graphs with causal models, concluding that OOD generalization on graphs is achievable when models focus only on subgraphs containing the most information about the causes of labels. Accordingly, we propose an information-theoretic objective to extract the desired subgraphs that maximally preserve the invariant intra-class information. Learning with these subgraphs is immune to distribution shifts. Extensive experiments on 16 synthetic or real-world datasets, including a challenging setting -- DrugOOD, from AI-aided drug discovery, validate the superior OOD performance of CIGA. △ Less

Submitted 11 October, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: NeurIPS2022, 46 pages, 72 figures

arXiv:2202.05413 [pdf, other]

A Machine-Learning-Aided Visual Analysis Workflow for Investigating Air Pollution Data

Authors: Yun-Hsin Kuo, Takanori Fujiwara, Charles C. -K. Chou, Chun-houh Chen, Kwan-Liu Ma

Abstract: Analyzing air pollution data is challenging as there are various analysis focuses from different aspects: feature (what), space (where), and time (when). As in most geospatial analysis problems, besides high-dimensional features, the temporal and spatial dependencies of air pollution induce the complexity of performing analysis. Machine learning methods, such as dimensionality reduction, can extra… ▽ More Analyzing air pollution data is challenging as there are various analysis focuses from different aspects: feature (what), space (where), and time (when). As in most geospatial analysis problems, besides high-dimensional features, the temporal and spatial dependencies of air pollution induce the complexity of performing analysis. Machine learning methods, such as dimensionality reduction, can extract and summarize important information of the data to lift the burden of understanding such a complicated environment. In this paper, we present a methodology that utilizes multiple machine learning methods to uniformly explore these aspects. With this methodology, we develop a visual analytic system that supports a flexible analysis workflow, allowing domain experts to freely explore different aspects based on their analysis needs. We demonstrate the capability of our system and analysis workflow supporting a variety of analysis tasks with multiple use cases. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: To appear in the Proceedings of IEEE PacificVis 2022

arXiv:2202.03771 [pdf, ps, other]

doi 10.1016/j.apenergy.2022.118636

Energy Management Based on Multi-Agent Deep Reinforcement Learning for A Multi-Energy Industrial Park

Authors: Dafeng Zhu, Bo Yang, Yuxiang Liu, Zhaojian Wang, Kai Ma, ** Guan

Abstract: Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achiev… ▽ More Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achieved by decentralized execution and centralized training for an industrial park. The energy management problem is formulated as a partially-observable Markov decision process, which is intractable by dynamic programming due to the lack of the prior knowledge of the underlying stochastic process. The objective is to minimize long-term energy costs while ensuring the demand of users. To solve this issue and improve the calculation speed, a novel multi-agent deep reinforcement learning algorithm is proposed, which contains the following key points: counterfactual baseline for facilitating contributing agents to learn better policies, soft actor-critic for improving robustness and exploring optimal solutions. A novel reward is designed by Lagrange multiplier method to ensure the capacity constraints of energy storage. In addition, considering that the increase in the number of agents leads to performance degradation due to large observation spaces, an attention mechanism is introduced to enhance the stability of policy and enable agents to focus on important energy-related information, which improves the exploration efficiency of soft actor-critic. Numerical results based on actual data verify the performance of the proposed algorithm with high scalability, indicating that the industrial park can minimize energy costs under different demands. △ Less

Submitted 11 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: Accepted by Applied Energy

Journal ref: Applied Energy 311 (2022) 118636

arXiv:2202.02983 [pdf, other]

SpinQ Triangulum: a commercial three-qubit desktop quantum computer

Authors: Guanru Feng, Shi-Yao Hou, Hongyang Zou, Wei Shi, Sheng Yu, Zikai Sheng, Xin Rao, Kaihong Ma, Chenxing Chen, Bing Ren, Guoxing Miao, **gen Xiang, Bei Zeng

Abstract: SpinQ Triangulum is the second generation of the desktop quantum computers designed and manufactured by SpinQ Technology. SpinQ's desktop quantum computer series, based on room temperature NMR spectrometer, provide light-weighted, cost-effective and maintenance-free quantum computing platforms that aim to provide real-device experience for quantum computing education for K-12 and college level. Th… ▽ More SpinQ Triangulum is the second generation of the desktop quantum computers designed and manufactured by SpinQ Technology. SpinQ's desktop quantum computer series, based on room temperature NMR spectrometer, provide light-weighted, cost-effective and maintenance-free quantum computing platforms that aim to provide real-device experience for quantum computing education for K-12 and college level. These platforms also feature quantum control design capabilities for studying quantum control and quantum noise. Compared with the first generation product, the two-qubit SpinQ Gemini, Triangulum features a three-qubit QPU, smaller dimensions (61 * 33 * 56 cm^3) and lighter (40 kg). Furthermore, the magnetic field is more stable and the performance of quantum control is more accurate. This paper introduces the system design of Triangulum and its new features. As an example of performing quantum computing tasks, we present the implementation of the Harrow-Hassidim-Lloyd (HHL) algorithm on Triangulum, demonstrating Triangulum's capability of undertaking complex quantum computing tasks. SpinQ will continue to develop desktop quantum computing platform with more qubits. Meanwhile, a simplified version of SpinQ Gemini, namely Gemini Mini (https://www.spinq.cn/products#geminiMini-anchor) , has been recently realised. Gemini Mini is much more portable (20* 35 * 26 cm^3, 14 kg) and affordable for most K-12 schools around the world. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: 9 pages, 9 figures

arXiv:2202.02852 [pdf, ps, other]

doi 10.1103/PhysRevB.106.035143

Kolmogorov complexity as intrinsic entropy of a pure state: Perspective from entanglement in free fermion systems

Authors: Ken K. W. Ma, Kun Yang

Abstract: We consider free fermion systems in arbitrary dimensions and represent the occupation pattern of each eigenstate as a classical binary string. We find that the Kolmogorov complexity of the string correctly captures the scaling behavior of its entanglement entropy (EE). In particular, the logarithmically-enhanced area law for EE in the ground state and the volume law for EE in typical highly excite… ▽ More We consider free fermion systems in arbitrary dimensions and represent the occupation pattern of each eigenstate as a classical binary string. We find that the Kolmogorov complexity of the string correctly captures the scaling behavior of its entanglement entropy (EE). In particular, the logarithmically-enhanced area law for EE in the ground state and the volume law for EE in typical highly excited states are reproduced. Since our approach does not require bipartitioning the system, it allows us to distinguish typical and atypical eigenstates directly by their intrinsic complexity. We reveal that the fraction of atypical eigenstates which do not thermalize in the free fermion system vanishes exponentially in the thermodynamic limit. Our results illustrate explicitly the connection between complexity and EE of individual pure states in quantum systems. △ Less

Submitted 25 July, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

Comments: Accepted version by PRB

Journal ref: Phys. Rev. B 106, 035143 (2022)

arXiv:2202.00963 [pdf, ps, other]

doi 10.1103/PhysRevE.106.014110

Projective-truncation-approximation study of the one-dimensional $φ^4$ lattice model

Authors: Kou-Han Ma, Yan-Jiang Guo, Lei Wang, Ning-Hua Tong

Abstract: In this paper, we first develop the projective truncation approximation (PTA) in the Green's function equation of motion (EOM) formalism for classical statistical models. To implement PTA for a given Hamiltonian, we choose a set of basis variables and projectively truncate the hierarchical EOM. We apply PTA to the one-dimensional $φ^4$ lattice model. Phonon dispersion and static correlation functi… ▽ More In this paper, we first develop the projective truncation approximation (PTA) in the Green's function equation of motion (EOM) formalism for classical statistical models. To implement PTA for a given Hamiltonian, we choose a set of basis variables and projectively truncate the hierarchical EOM. We apply PTA to the one-dimensional $φ^4$ lattice model. Phonon dispersion and static correlation functions are studied in detail. Using one- and two-dimensional bases, we obtain results identical to and beyond the quadratic variational approximation, respectively. In particular, we analyze the power-law temperature dependence of the static averages in the low- and high-temperature limits, and we give exact exponents. △ Less

Submitted 10 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: 14 pages, 6 figures, published version

Journal ref: Phys. Rev. E 106, 014110(2022)

arXiv:2201.08388 [pdf, other]

Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Authors: Xiangyang Zhu, Kede Ma, Wufeng Xue

Abstract: Predicting cardiac indices has long been a focal point in the medical imaging community. While various deep learning models have demonstrated success in quantifying cardiac indices, they remain susceptible to mild input perturbations, e.g., spatial transformations, image distortions, and adversarial attacks. This vulnerability undermines confidence in using learning-based automated systems for dia… ▽ More Predicting cardiac indices has long been a focal point in the medical imaging community. While various deep learning models have demonstrated success in quantifying cardiac indices, they remain susceptible to mild input perturbations, e.g., spatial transformations, image distortions, and adversarial attacks. This vulnerability undermines confidence in using learning-based automated systems for diagnosing cardiovascular diseases. In this work, we describe a simple yet effective method to learn robust models for left ventricle (LV) quantification, encompassing cavity and myocardium areas, directional dimensions, and regional wall thicknesses. Our success hinges on employing the biologically inspired steerable pyramid transform (SPT) for fixed front-end processing, which offers three main benefits. First, the basis functions of SPT align with the anatomical structure of LV and the geometric features of the measured indices. Second, SPT facilitates weight sharing across different orientations as a form of parameter regularization and naturally captures the scale variations of LV. Third, the residual highpass subband can be conveniently discarded, promoting robust feature learning. Extensive experiments on the Cardiac-Dig benchmark show that our SPT-augmented model not only achieves reasonable prediction accuracy compared to state-of-the-art methods, but also exhibits significantly improved robustness against input perturbations. △ Less

Submitted 2 July, 2024; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Code is available at https://github.com/yangyangyang127/RobustLV

arXiv:2201.06230 [pdf, other]

doi 10.3233/FAIA210360

Generalizable Neuro-symbolic Systems for Commonsense Question Answering

Authors: Alessandro Oltramari, Jonathan Francis, Filip Ilievski, Kaixin Ma, Roshanak Mirzaee

Abstract: This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed. The situations in which this combination is most appropriate are characterized, including quantitative evaluation and qualitative error analysis on a… ▽ More This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed. The situations in which this combination is most appropriate are characterized, including quantitative evaluation and qualitative error analysis on a variety of commonsense question answering benchmark datasets. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: In Pascal Hitzler, Md Kamruzzaman Sarker (eds.), Neuro-Symbolic Artificial Intelligence: The State of the Art. Frontiers in Artificial Intelligence and Applications Vol. 342, IOS Press, Amsterdam, 2022. arXiv admin note: text overlap with arXiv:2003.04707

arXiv:2201.06216 [pdf, other]

Learning to Reformulate for Linear Programming

Authors: Xijun Li, Qingyu Qu, Fangzhou Zhu, Jia Zeng, Mingxuan Yuan, Kun Mao, Jie Wang

Abstract: It has been verified that the linear programming (LP) is able to formulate many real-life optimization problems, which can obtain the optimum by resorting to corresponding solvers such as OptVerse, Gurobi and CPLEX. In the past decades, a serial of traditional operation research algorithms have been proposed to obtain the optimum of a given LP in a fewer solving time. Recently, there is a trend of… ▽ More It has been verified that the linear programming (LP) is able to formulate many real-life optimization problems, which can obtain the optimum by resorting to corresponding solvers such as OptVerse, Gurobi and CPLEX. In the past decades, a serial of traditional operation research algorithms have been proposed to obtain the optimum of a given LP in a fewer solving time. Recently, there is a trend of using machine learning (ML) techniques to improve the performance of above solvers. However, almost no previous work takes advantage of ML techniques to improve the performance of solver from the front end, i.e., the modeling (or formulation). In this paper, we are the first to propose a reinforcement learning-based reformulation method for LP to improve the performance of solving process. Using an open-source solver COIN-OR LP (CLP) as an environment, we implement the proposed method over two public research LP datasets and one large-scale LP dataset collected from practical production planning scenario. The evaluation results suggest that the proposed method can effectively reduce both the solving iteration number ($25\%\downarrow$) and the solving time ($15\%\downarrow$) over above datasets in average, compared to directly solving the original LP instances. △ Less

Submitted 16 January, 2022; originally announced January 2022.

arXiv:2201.06213 [pdf, other]

An Improved Reinforcement Learning Algorithm for Learning to Branch

Authors: Qingyu Qu, Xijun Li, Yunfan Zhou, Jia Zeng, Mingxuan Yuan, Jie Wang, **hu Lv, Kexin Liu, Kun Mao

Abstract: Most combinatorial optimization problems can be formulated as mixed integer linear programming (MILP), in which branch-and-bound (B\&B) is a general and widely used method. Recently, learning to branch has become a hot research topic in the intersection of machine learning and combinatorial optimization. In this paper, we propose a novel reinforcement learning-based B\&B algorithm. Similar to offl… ▽ More Most combinatorial optimization problems can be formulated as mixed integer linear programming (MILP), in which branch-and-bound (B\&B) is a general and widely used method. Recently, learning to branch has become a hot research topic in the intersection of machine learning and combinatorial optimization. In this paper, we propose a novel reinforcement learning-based B\&B algorithm. Similar to offline reinforcement learning, we initially train on the demonstration data to accelerate learning massively. With the improvement of the training effect, the agent starts to interact with the environment with its learned policy gradually. It is critical to improve the performance of the algorithm by determining the mixing ratio between demonstration and self-generated data. Thus, we propose a prioritized storage mechanism to control this ratio automatically. In order to improve the robustness of the training process, a superior network is additionally introduced based on Double DQN, which always serves as a Q-network with competitive performance. We evaluate the performance of the proposed algorithm over three public research benchmarks and compare it against strong baselines, including three classical heuristics and one state-of-the-art imitation learning-based branching algorithm. The results show that the proposed algorithm achieves the best performance among compared algorithms and possesses the potential to improve B\&B algorithm performance continuously. △ Less

Submitted 16 January, 2022; originally announced January 2022.

arXiv:2112.15402 [pdf, other]

Relational Experience Replay: Continual Learning by Adaptively Tuning Task-wise Relationship

Authors: Quanziang Wang, Renzhen Wang, Yuexiang Li, Dong Wei, Kai Ma, Yefeng Zheng, Deyu Meng

Abstract: Continual learning is a promising machine learning paradigm to learn new tasks while retaining previously learned knowledge over streaming training data. Till now, rehearsal-based methods, kee** a small part of data from old tasks as a memory buffer, have shown good performance in mitigating catastrophic forgetting for previously learned knowledge. However, most of these methods typically treat… ▽ More Continual learning is a promising machine learning paradigm to learn new tasks while retaining previously learned knowledge over streaming training data. Till now, rehearsal-based methods, kee** a small part of data from old tasks as a memory buffer, have shown good performance in mitigating catastrophic forgetting for previously learned knowledge. However, most of these methods typically treat each new task equally, which may not adequately consider the relationship or similarity between old and new tasks. Furthermore, these methods commonly neglect sample importance in the continual training process and result in sub-optimal performance on certain tasks. To address this challenging problem, we propose Relational Experience Replay (RER), a bi-level learning framework, to adaptively tune task-wise relationships and sample importance within each task to achieve a better `stability' and `plasticity' trade-off. As such, the proposed method is capable of accumulating new knowledge while consolidating previously learned old knowledge during continual learning. Extensive experiments conducted on three publicly available datasets (i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet) show that the proposed method can consistently improve the performance of all baselines and surpass current state-of-the-art methods. △ Less

Submitted 3 August, 2023; v1 submitted 31 December, 2021; originally announced December 2021.

arXiv:2112.15139 [pdf, other]

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Authors: Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma

Abstract: Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original network, which generally degrades the performance. To tackle this issue, massive efforts have been made, but most existing approaches lack statistical considerations a… ▽ More Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original network, which generally degrades the performance. To tackle this issue, massive efforts have been made, but most existing approaches lack statistical considerations and depend on several manual configurations. In this paper, we present an adaptive-map** quantization method to learn an optimal latent sub-distribution that is inherent within models and smoothly approximated with a concrete Gaussian Mixture (GM). In particular, the network weights are projected in compliance with the GM-approximated sub-distribution. This sub-distribution evolves along with the weight update in a co-tuning schema guided by the direct task-objective optimization. Sufficient experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method. Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7.46$\times$ inference acceleration on an octa-core ARM CPU. Our codes have been publicly released at \url{https://github.com/RunpeiDong/DGMS}. △ Less

Submitted 27 May, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

Comments: Accepted at ICML 2022

arXiv:2112.13227 [pdf, other]

Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Authors: Mu Li, Kede Ma, **xing Li, David Zhang

Abstract: Although equirectangular projection (ERP) is a convenient form to store omnidirectional images (also known as 360-degree images), it is neither equal-area nor conformal, thus not friendly to subsequent visual communication. In the context of image compression, ERP will over-sample and deform things and stuff near the poles, making it difficult for perceptually optimal bit allocation. In convention… ▽ More Although equirectangular projection (ERP) is a convenient form to store omnidirectional images (also known as 360-degree images), it is neither equal-area nor conformal, thus not friendly to subsequent visual communication. In the context of image compression, ERP will over-sample and deform things and stuff near the poles, making it difficult for perceptually optimal bit allocation. In conventional 360-degree image compression, techniques such as region-wise packing and tiled representation are introduced to alleviate the over-sampling problem, achieving limited success. In this paper, we make one of the first attempts to learn deep neural networks for omnidirectional image compression. We first describe parametric pseudocylindrical representation as a generalization of common pseudocylindrical map projections. A computationally tractable greedy method is presented to determine the (sub)-optimal configuration of the pseudocylindrical representation in terms of a novel proxy objective for rate-distortion performance. We then propose pseudocylindrical convolutions for 360-degree image compression. Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution with the so-called pseudocylindrical padding. To demonstrate the feasibility of our idea, we implement an end-to-end 360-degree image compression system, consisting of the learned pseudocylindrical representation, an analysis transform, a non-uniform quantizer, a synthesis transform, and an entropy model. Experimental results on $19,790$ omnidirectional images show that our method achieves consistently better rate-distortion performance than the competing methods. Moreover, the visual quality by our method is significantly improved for all images at all bitrates. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2112.11572 [pdf, other]

doi 10.1109/ICMLA52953.2021.00263

Practical Active Learning with Model Selection for Small Data

Authors: Maryam Pardakhti, Nila Mandal, Anson W. K. Ma, Qian Yang

Abstract: Active learning is of great interest for many practical applications, especially in industry and the physical sciences, where there is a strong need to minimize the number of costly experiments necessary to train predictive models. However, there remain significant challenges for the adoption of active learning methods in many practical applications. One important challenge is that many methods as… ▽ More Active learning is of great interest for many practical applications, especially in industry and the physical sciences, where there is a strong need to minimize the number of costly experiments necessary to train predictive models. However, there remain significant challenges for the adoption of active learning methods in many practical applications. One important challenge is that many methods assume a fixed model, where model hyperparameters are chosen a priori. In practice, it is rarely true that a good model will be known in advance. Existing methods for active learning with model selection typically depend on a medium-sized labeling budget. In this work, we focus on the case of having a very small labeling budget, on the order of a few dozen data points, and develop a simple and fast method for practical active learning with model selection. Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel. First we show empirically that our method is able to find hyperparameters that lead to the best performance compared to an oracle model on less separable, difficult to classify datasets, and reasonable performance on datasets that are more separable and easier to classify. Then, we demonstrate that it is possible to refine our model selection method using a weighted approach to trade-off between achieving optimal performance on datasets that are easy to classify, versus datasets that are difficult to classify, which can be tuned based on prior domain knowledge about the dataset. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: Accepted for publication in the Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

arXiv:2112.10272 [pdf, other]

A Multi-Layout Design for Immersive Visualization of Network Data

Authors: David Bauer, Chengbo Zheng, Oh-Hyun Kwon, Kwan-Liu Ma

Abstract: Visualization plays a vital role in making sense of complex network data. Recent studies have shown the potential of using extended reality (XR) for the immersive exploration of networks. The additional depth cues offered by XR help users perform better in certain tasks when compared to using traditional desktop setups. However, prior works on immersive network visualization rely on mostly static… ▽ More Visualization plays a vital role in making sense of complex network data. Recent studies have shown the potential of using extended reality (XR) for the immersive exploration of networks. The additional depth cues offered by XR help users perform better in certain tasks when compared to using traditional desktop setups. However, prior works on immersive network visualization rely on mostly static graph layouts to present the data to the user. This poses a problem since there is no optimal layout for all possible tasks. The choice of layout heavily depends on the type of network and the task at hand. We introduce a multi-layout approach that allows users to effectively explore hierarchical network data in immersive space. The resulting system leverages different layout techniques and interactions to efficiently use the available space in VR and provide an optimal view of the data depending on the task and the level of detail required to solve it. To evaluate our approach, we have conducted a user study comparing it against the state of the art for immersive network visualization. Participants performed tasks at varying spatial scopes. The results show that our approach outperforms the baseline in spatially focused scenarios as well as when the whole network needs to be considered. △ Less

Submitted 26 January, 2023; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: 13 pages, 6 figures, this manuscript is currently under revision

arXiv:2112.06729 [pdf, other]

doi 10.1007/JHEP04(2022)123

Hadron Collider Probes of the Quartic Couplings of Gluons to the Photon and $Z$ Boson

Authors: John Ellis, Shao-Feng Ge, Kai Ma

Abstract: We explore the experimental sensitivities of measuring the $gg \rightarrow Z γ$ process at the LHC to the dimension-8 quartic couplings of gluon pairs to the $Z$ boson and photon, in addition to comparing them with the analogous sensitivities in the $gg \to γγ$ process. These processes can both receive contributions from 4 different CP-conserving dimension-8 operators with distinct Lorentz structu… ▽ More We explore the experimental sensitivities of measuring the $gg \rightarrow Z γ$ process at the LHC to the dimension-8 quartic couplings of gluon pairs to the $Z$ boson and photon, in addition to comparing them with the analogous sensitivities in the $gg \to γγ$ process. These processes can both receive contributions from 4 different CP-conserving dimension-8 operators with distinct Lorentz structures that contain a pair of gluon field strengths, $\hat G^a_{μν}$, and a pair of electroweak SU(2) gauge field strengths, $W^i_{μν}$, as well as 4 similar operators containing a pair of $\hat G^a_{μν}$ and a pair of U(1) gauge field strengths, $B_{μν}$. We calculate the scattering angular distributions for $gg \rightarrow Z γ$ and the $Z \to \bar f f$ decay angular distributions for these 4 Lorentz structures, as well as the Standard Model background. We analyze the sensitivity of ATLAS measurements of the $Z(\to \ell^+\ell^-, \bar νν, \bar q q)γ$ final states with integrated luminosities up to 139 fb$^{-1}$ at $\sqrt{s} = 13$ TeV, showing that they exclude values $\lesssim 2$ TeV for the dimension-8 operator scales, and compare the $Z γ$ sensitivity with that of an ATLAS measurement of the $γγ$ final state. We present combined $Z γ$ and $γγ$ constraints on the scales of dimension-8 SMEFT operators and $γγ$ constraints on the nonlinearity scale of the Born-Infeld extension of the Standard Model. We also estimate the sensitivities to dimension-8 operators of experiments at possible future proton-proton colliders with centre-of-mass energies of 25, 50 and 100 TeV, and discuss possible measurements of the $Z$ spin and angular correlations. △ Less

Submitted 8 May, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: 25 pages, 25 figures; part of the results were talked at CLHCP2021: https://indico.ihep.ac.cn/event/14560/session/6/contribution/113; v2, Fig.9 is updated by including combined results. 1 table is added; v3, several typos are corrected; v4, match to published version

Report number: KCL-PH-TH/2021-95, CERN-TH-2021-215

Journal ref: JHEP 2022, 123 (2022)

arXiv:2112.04044 [pdf, other]

doi 10.1093/mnras/stab3375

A Radio Polarisation Study of Magnetic Fields in the Small Magellanic Cloud

Authors: J. D. Livingston, N. M. McClure-Griffiths, S. A. Mao, Y. K. Ma, B. M. Gaensler, G. Heald, A. Seta

Abstract: Observing the magnetic fields of low-mass interacting galaxies tells us how they have evolved over cosmic time and their importance in galaxy evolution. We have measured the Faraday rotation of 80 extra-galactic radio sources behind the Small Magellanic Cloud (SMC) using the CSIRO Australia Telescope Compact Array (ATCA) with a frequency range of 1.4 -- 3.0 GHz. Both the sensitivity of our observa… ▽ More Observing the magnetic fields of low-mass interacting galaxies tells us how they have evolved over cosmic time and their importance in galaxy evolution. We have measured the Faraday rotation of 80 extra-galactic radio sources behind the Small Magellanic Cloud (SMC) using the CSIRO Australia Telescope Compact Array (ATCA) with a frequency range of 1.4 -- 3.0 GHz. Both the sensitivity of our observations and the source density are an order of magnitude improvement on previous Faraday rotation measurements of this galaxy. The SMC generally produces negative rotation measures (RMs) after accounting for the Milky Way foreground contribution, indicating that it has a mean coherent line-of-sight magnetic field strength of $-0.3\pm0.1μ$G, consistent with previous findings. We detect signatures of magnetic fields extending from the north and south of the Bar of the SMC. The random component of the SMC magnetic field has a strength of $\sim 5μ$G with a characteristic size-scale of magneto-ionic turbulence $< 250$ pc, making the SMC like other low-mass interacting galaxies. The magnetic fields of the SMC and Magellanic Bridge appear similar in direction and strength, hinting at a connection between the two fields as part of the hypothesised `pan-Magellanic' magnetic field. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: 17 pages, 9 figures, 5 tables

arXiv:2111.12452 [pdf, other]

doi 10.1021/acs.chemmater.1c02683

Group-9 Transition Metal Suboxides Adopting the Filled-Ti$_2$Ni Structure: A Class of Superconductors Exhibiting Exceptionally High Upper Critical Fields

Authors: KeYuan Ma, Robin Lefèvre, Karolina Gornicka, Harald O. Jeschke, Xiaofu Zhang, Zurab Guguchia, Tomasz Klimczuk, Fabian O. von Rohr

Abstract: The Ti$_2$Ni and the related $η$-carbide structure are known to exhibit various intriguing physical properties. The Ti$_2$Ni structure with the cubic space group $Fd\bar{3}m$ is surprisingly complex, consisting of a unit cell with 96 metal atoms. The related $η$-carbide compounds correspond to a filled version of the Ti$_2$Ni structure. Here, we report on the structure and superconductivity in the… ▽ More The Ti$_2$Ni and the related $η$-carbide structure are known to exhibit various intriguing physical properties. The Ti$_2$Ni structure with the cubic space group $Fd\bar{3}m$ is surprisingly complex, consisting of a unit cell with 96 metal atoms. The related $η$-carbide compounds correspond to a filled version of the Ti$_2$Ni structure. Here, we report on the structure and superconductivity in the $η$-carbide type suboxides Ti$_4$M$_2$O with M = Co, Rh, Ir. We have successfully synthesized all three compounds in single phase form. We find all three compounds to be type-II bulk superconductors with transition temperatures of $T_{\rm c}$ = 2.7, 2.8, and 5.4 K, and with normalized specific heat jumps of $ΔC/γT_{\rm c}$ = 1.65, 1.28, and 1.80 for Ti$_4$Co$_2$O, Ti$_4$Rh$_2$O, and Ti$_4$Ir$_2$O, respectively. We find that all three superconductors, exhibit high upper-critical fields. Particularly noteworthy is Ti$_4$Ir$_2$O with an upper critical field of $μ_0 H_{\rm c2}{\rm (0)}$ =~16.06~T, which exceeds by far the weak-coupling Pauli limit of 9.86~T. The role of the void filling light atom X has so far been uncertain for the overall physical properties of these materials. Herein, we have successfully grown single crystals of Ti$_2$Co. In contrast to the metallic $η$-carbide type suboxides Ti$_4$M$_2$O, we find that Ti$_2$Co displays a semimetallic behavior. Hence, the octahedral void-filling oxygen plays a crucial role for the overall physical properties, even though its effect on the crystal structure is small. Our results indicate that the design of new superconductors by incorporation of electron-acceptor atoms may in the Ti$_2$Ni-type structures and other materials with crystallographic void position be a promising future approach. The remarkably high upper critical fields, in this family of compounds, may furthermore spark significant future interest. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: Submitted: 3. August 2021, published: 1. November 2021, Supporting Information at publisher's page: https://pubs.acs.org/doi/suppl/10.1021/acs.chemmater.1c02683/suppl_file/cm1c02683_si_001.pdf

Journal ref: Chem. Mater. 2021, 33, 22, 8722-8732

arXiv:2111.12440 [pdf, other]

doi 10.1039/D1SC03026D

Synthetic control over polymorph formation in the d-band semiconductor system FeS$_2$

Authors: KeYuan Ma, Robin Lefèvre, Qingtian Li, Jorge Lago, Olivier Blacque, Wanli Yang, Fabian O. von Rohr

Abstract: Pyrite, also known as fool's gold is the thermodynamic stable polymorph of FeS$_2$. It is widely considered as a promising d-band semiconductor for various applications due to its intriguing physical properties. Marcasite is the other naturally occurring polymorph of FeS$_2$. Measurements on natural crystals have shown that it has similarly promising electronic, mechanical, and optical properties… ▽ More Pyrite, also known as fool's gold is the thermodynamic stable polymorph of FeS$_2$. It is widely considered as a promising d-band semiconductor for various applications due to its intriguing physical properties. Marcasite is the other naturally occurring polymorph of FeS$_2$. Measurements on natural crystals have shown that it has similarly promising electronic, mechanical, and optical properties as pyrite. However, it has been only scarcely investigated so far, because the laboratory-based synthesis of phase-pure samples or high-quality marcasite single crystal has been a challenge until now. Here, we report the targeted phase formation via hydrothermal synthesis of marcasite and pyrite. The formation condition and phase purity of the FeS$_2$ polymorphs are systematically studied in the form of a comprehensive synthesis map. We, furthermore, report on a detailed analysis of marcasite single crystal growth by a space-separated hydrothermal synthesis. We observe that single phase product of marcasite forms only on the surface under the involvement of H$_2$S and sulphur vapor. The availability of high-quality crystals of marcasite allows us to measure the fundamental physical properties, including an allowed direct optical bandgap of 0.76 eV, temperature independent diamagnetism, an electronic transport gap of 0.11 eV, and a room-temperature carrier concentration of 4.14 $\times$ 10$^{18}$ cm$^{-3}$. X-ray absorption/emission spectroscopy are employed to measure the band gap of the two FeS$_2$ phases. We find marcasite has a band gap of 0.73 eV, while pyrite has a band gap of 0.87 eV. Our results indicate that marcasite -- that is now synthetically available in a straightforward fashion -- is as equally promising as pyrite as candidate for various semiconductor applications based on earth abundant elements. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: Supporting Information available at publisher's page: https://www.rsc.org/suppdata/d1/sc/d1sc03026d/d1sc03026d1.pdf

Journal ref: Chemical Science, 2021, 12, 13870-13877

arXiv:2111.11177 [pdf, ps, other]

Deep Learning for Beam-Management: State-of-the-Art, Opportunities and Challenges

Authors: Ke Ma, Zhaocheng Wang, Wenqiang Tian, Sheng Chen, Lajos Hanzo

Abstract: Benefiting from huge bandwidth resources, millimeter-wave (mmWave) communications provide one of the most promising technologies for next-generation wireless networks. To compensate for the high pathloss of mmWave signals, large-scale antenna arrays are required both at the base stations and user equipment to establish directional beamforming, where beam-management is adopted to acquire and track… ▽ More Benefiting from huge bandwidth resources, millimeter-wave (mmWave) communications provide one of the most promising technologies for next-generation wireless networks. To compensate for the high pathloss of mmWave signals, large-scale antenna arrays are required both at the base stations and user equipment to establish directional beamforming, where beam-management is adopted to acquire and track the optimal beam pair having the maximum received power. Naturally, narrow beams are required for achieving high beamforming gain, but they impose enormous training overhead and high sensitivity to blockages. As a remedy, deep learning (DL) may be harnessed for beam-management. First, the current state-of-the-art is reviewed, followed by the associated challenges and future research opportunities. We conclude by highlighting the associated DL design insights and novel beam-management mechanisms. △ Less

Submitted 12 December, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: Submitted to IEEE

arXiv:2111.09536 [pdf, other]

doi 10.1088/1361-648X/ac3a45

Break of symmetry at the surface of IrTe$_2$ upon phase transition measured by X-ray photoelectron diffraction

Authors: Maxime Rumo, Aki Pulkkinen, KeYuan Ma, Fabian O. von Rohr, Matthias Muntwiler, Claude Monney

Abstract: IrTe$_2$ undergoes a series of charge-ordered phase transitions below room temperature that are characterized by the formation of stripes of Ir dimers of different periodicities. Full hemispherical X-ray photoelectron diffraction (XPD) experiments have been performed to investigate the atomic position changes undergone near the surface of $1T-$IrTe$_2$ in the first-order phase transition, from the… ▽ More IrTe$_2$ undergoes a series of charge-ordered phase transitions below room temperature that are characterized by the formation of stripes of Ir dimers of different periodicities. Full hemispherical X-ray photoelectron diffraction (XPD) experiments have been performed to investigate the atomic position changes undergone near the surface of $1T-$IrTe$_2$ in the first-order phase transition, from the $(1\times1)$ phase to the $(5\times1)$ phase. Comparison between experiment and simulation allows us to identify the consequence of the dimerization on the Ir atoms local environment. We report that XPD permits to unveil the break of symmetry of IrTe$_2$ trigonal to a monoclonic unit cell and confirm the occurence of the $(5\times1)$ reconstruction within the first few layers below the surface with a staircase-like stacking of dimers. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: This is the version of the article before peer review or editing, as submitted by M. Rumo to J. Phys.: Condens. Matter. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at DOI

arXiv:2111.09461 [pdf]

doi 10.1038/s42256-021-00421-z

Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

Authors: Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin , et al. (21 additional authors not shown)

Abstract: Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI),… ▽ More Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution under a federated learning framework (FL) without data sharing. Here we show that our FL model outperformed all the local models by a large yield (test sensitivity /specificity in China: 0.973/0.951, in the UK: 0.730/0.942), achieving comparable performance with a panel of professional radiologists. We further evaluated the model on the hold-out (collected from another two hospitals leaving out the FL) and heterogeneous (acquired with contrast materials) data, provided visual explanations for decisions made by the model, and analysed the trade-offs between the model performance and the communication costs in the federated training process. Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK. Collectively, our work advanced the prospects of utilising federated learning for privacy-preserving AI in digital health. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: Nature Machine Intelligence

arXiv:2111.05339 [pdf, other]

doi 10.1017/pasa.2021.59

GASKAP-HI Pilot Survey Science I: ASKAP Zoom Observations of HI Emission in the Small Magellanic Cloud

Authors: N. M. **el, J. Dempsey, N. M. McClure-Griffiths, J. M. Dickey, K. E. Jameson, H. Arce, G. Anglada, J. Bland-Hawthorn, S. L. Breen, F. Buckland-Willis, S. E. Clark, J. R. Dawson, H. Dénes, E. M. Di Teodoro, B. -Q. For, Tyler J. Foster, J. F. Gómez, H. Imai, G. Joncas, C. -G. Kim, M. -Y. Lee, C. Lynn, D. Leahy, Y. K. Ma, A. Marchal , et al. (31 additional authors not shown)

Abstract: We present the most sensitive and detailed view of the neutral hydrogen (HI) emission associated with the Small Magellanic Cloud (SMC), through the combination of data from the Australian Square Kilometre Array Pathfinder (ASKAP) and Parkes (Murriyang), as part of the Galactic Australian Square Kilometre Array Pathfinder (GASKAP) pilot survey. These GASKAP-HI pilot observations, for the first time… ▽ More We present the most sensitive and detailed view of the neutral hydrogen (HI) emission associated with the Small Magellanic Cloud (SMC), through the combination of data from the Australian Square Kilometre Array Pathfinder (ASKAP) and Parkes (Murriyang), as part of the Galactic Australian Square Kilometre Array Pathfinder (GASKAP) pilot survey. These GASKAP-HI pilot observations, for the first time, reveal HI in the SMC on similar physical scales as other important tracers of the interstellar medium, such as molecular gas and dust. The resultant image cube possesses an rms noise level of 1.1 K (1.6 mJy/beam) per 0.98 km s$^{-1}$ spectral channel with an angular resolution of 30$''$ ($\sim$10 pc). We discuss the calibration scheme and the custom imaging pipeline that utilizes a joint deconvolution approach, efficiently distributed across a computing cluster, to accurately recover the emission extending across the entire $\sim$25 deg$^2$ field-of-view. We provide an overview of the data products and characterize several aspects including the noise properties as a function of angular resolution and the represented spatial scales by deriving the global transfer function over the full spectral range. A preliminary spatial power spectrum analysis on individual spectral channels reveals that the power-law nature of the density distribution extends down to scales of 10 pc. We highlight the scientific potential of these data by comparing the properties of an outflowing high velocity cloud with previous ASKAP+Parkes HI test observations. △ Less

Submitted 10 December, 2021; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: Accepted for publication in PASA, 34 pages, 18 figures, 5 tables

arXiv:2111.03568 [pdf, ps, other]

doi 10.1103/PhysRevB.105.035132

Quantitative theory of composite fermions in Bose-Fermi mixtures at $ν=1$

Authors: Ken K. W. Ma, Kun Yang

Abstract: Composite fermions provide a simple and unified picture to understand a vast amount of phenomenology in the quantum Hall regime. However it has remained challenging to formulate this concept properly within a single Landau level. Recently a low-energy noncommutative field theory for bosons at Landau-level filling factor $ν=1$ has been formulated by Dong and Senthil. In the limit of long-wavelength… ▽ More Composite fermions provide a simple and unified picture to understand a vast amount of phenomenology in the quantum Hall regime. However it has remained challenging to formulate this concept properly within a single Landau level. Recently a low-energy noncommutative field theory for bosons at Landau-level filling factor $ν=1$ has been formulated by Dong and Senthil. In the limit of long-wavelength and small-amplitude gauge fluctuation, they found it reduces to the celebrated Halperin-Lee-Read theory of composite fermion liquid. In this work we consider a Bose-Fermi mixture at total filling factor $ν=1$. Different from previous work, the number density of composite fermions in the mixture and corresponding Fermi momentum can be tuned by changing the filling factor of bosons, $ν_b = 1 -ν_f$. This tunability enables us to study the dilute limit $ν_b\ll 1$, which allows for a controlled and asymptotically exact calculation of the energy dispersion and effective mass of composite fermions. Furthermore, the approximation of the low-energy description by a commutative field theory is manifestly justified. Most importantly, we demonstrate gauge fluctuations acquire a Higgs mass due to the presence of a composite boson condensate, as a result of which the system behaves like a genuine Landau Fermi liquid. Combined with the irrelevance of four-fermion interaction in the dilute limit, we are able to obtain asymptotically exact properties of this composite fermion Fermi liquid. In the opposite limit of $ν_f\ll 1$, the Higgs mass goes to zero and we find crossover between Fermi liquid and non-Fermi liquid as temperature increases. Observing these properties either experimentally or numerically provides unambiguous evidence of not only the composite fermions and the Fermi surface they form, but also the presence of emergent gauge fields and their fluctuations due to strong correlation. △ Less

Submitted 23 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: Accepted version by PRB

Journal ref: Phys. Rev. B 105, 035132 (2022)

arXiv:2111.02018 [pdf, other]

Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

Authors: Sia Huat Tan, Runpei Dong, Kaisheng Ma

Abstract: Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high comp… ▽ More Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism. Specifically, MGNet sequentially selects task-relevant regions of an image to focus on and then adaptively combines all collected information for the final prediction. MGNet expresses strong resistance against adversarial attacks and common corruptions with less computation. Also, MGNet is inherently more interpretable as it explicitly informs us where it focuses during each iteration. Our experiments on ImageNet100 demonstrate the potential of recurrent downsampled attention mechanisms to improve a single feedforward manner. For example, MGNet improves 4.76% accuracy on average in common corruptions with only 36.9% computational cost. Moreover, while the baseline incurs an accuracy drop to 7.6%, MGNet manages to maintain 44.2% accuracy in the same PGD attack strength with ResNet-50 backbone. Our code is available at https://github.com/siahuat0727/MGNet. △ Less

Submitted 12 April, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: Accepted at BMVC 2021

Journal ref: The British Machine Vision Conference (BMVC) 2021

arXiv:2110.15114 [pdf, other]

UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation

Authors: Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, Xiuqiang He

Abstract: With the recent success of graph convolutional networks (GCNs), they have been widely applied for recommendation, and achieved impressive performance gains. The core of GCNs lies in its message passing mechanism to aggregate neighborhood information. However, we observed that message passing largely slows down the convergence of GCNs during training, especially for large-scale recommender systems,… ▽ More With the recent success of graph convolutional networks (GCNs), they have been widely applied for recommendation, and achieved impressive performance gains. The core of GCNs lies in its message passing mechanism to aggregate neighborhood information. However, we observed that message passing largely slows down the convergence of GCNs during training, especially for large-scale recommender systems, which hinders their wide adoption. LightGCN makes an early attempt to simplify GCNs for collaborative filtering by omitting feature transformations and nonlinear activations. In this paper, we take one step further to propose an ultra-simplified formulation of GCNs (dubbed UltraGCN), which skips infinite layers of message passing for efficient recommendation. Instead of explicit message passing, UltraGCN resorts to directly approximate the limit of infinite-layer graph convolutions via a constraint loss. Meanwhile, UltraGCN allows for more appropriate edge weight assignments and flexible adjustment of the relative importances among different types of relationships. This finally yields a simple yet effective UltraGCN model, which is easy to implement and efficient to train. Experimental results on four benchmark datasets show that UltraGCN not only outperforms the state-of-the-art GCN models but also achieves more than 10x speedup over LightGCN. Our source code will be available at https://reczoo.github.io/UltraGCN. △ Less

Submitted 29 November, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

Comments: Accepted by CIKM 2021. Code available at: https://reczoo.github.io/UltraGCN

arXiv:2110.14209 [pdf, ps, other]

Fast Distributed Stochastic Scheduling for A Multi-Energy Industrial Park

Authors: Dafeng Zhu, Bo Yang, Zhaojian Wang, Chengbin Ma, Kai Ma, Shanying Zhu

Abstract: The multi-energy management framework of industrial parks advocates energy conversion and scheduling, which takes full advantage of the compensation and temporal availability of multiple energy. However, how to exploit elastic loads and compensate inelastic loads to match multiple generators and storage is still a key problem under the uncertainty of demand and supply. To solve the issue, the ener… ▽ More The multi-energy management framework of industrial parks advocates energy conversion and scheduling, which takes full advantage of the compensation and temporal availability of multiple energy. However, how to exploit elastic loads and compensate inelastic loads to match multiple generators and storage is still a key problem under the uncertainty of demand and supply. To solve the issue, the energy management problem is constructed as a stochastic optimization problem. The optimization aims are to minimize the time-averaged energy cost and improve the energy efficiency while respecting the energy constraints. To achieve the distributed implementation in real time without knowing any priori knowledge of underlying stochastic process, a distributed stochastic gradient algorithm based on dual decomposition and a fast scheme are proposed. The numerical results based on real data show that the industrial park, by adopting the proposed algorithm, can achieve social welfare maximization asymptotically. △ Less

Submitted 24 May, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

arXiv:2110.13408 [pdf, ps, other]

Learning Rich Features for Gait Recognition by Integrating Skeletons and Silhouettes

Authors: Yunjie Peng, Kang Ma, Yang Zhang, Zhiqiang He

Abstract: Gait recognition captures gait patterns from the walking sequence of an individual for identification. Most existing gait recognition methods learn features from silhouettes or skeletons for the robustness to clothing, carrying, and other exterior factors. The combination of the two data modalities, however, is not fully exploited. Previous multimodal gait recognition methods mainly employ the ske… ▽ More Gait recognition captures gait patterns from the walking sequence of an individual for identification. Most existing gait recognition methods learn features from silhouettes or skeletons for the robustness to clothing, carrying, and other exterior factors. The combination of the two data modalities, however, is not fully exploited. Previous multimodal gait recognition methods mainly employ the skeleton to assist the local feature extraction where the intrinsic discrimination of the skeleton data is ignored. This paper proposes a simple yet effective Bimodal Fusion (BiFusion) network which mines discriminative gait patterns in skeletons and integrates with silhouette representations to learn rich features for identification. Particularly, the inherent hierarchical semantics of body joints in a skeleton is leveraged to design a novel Multi-Scale Gait Graph (MSGG) network for the feature extraction of skeletons. Extensive experiments on CASIA-B and OUMVLP demonstrate both the superiority of the proposed MSGG network in modeling skeletons and the effectiveness of the bimodal fusion for gait recognition. Under the most challenging condition of walking in different clothes on CASIA-B, our method achieves the rank-1 accuracy of 92.1%. △ Less

Submitted 5 May, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: The paper is under consideration at Multimedia Tools and Applications

ACM Class: I.2; I.4; I.5.4

arXiv:2110.13252 [pdf, other]

VAC-CNN: A Visual Analytics System for Comparative Studies of Deep Convolutional Neural Networks

Authors: Xiwei Xuan, Xiaoyu Zhang, Oh-Hyun Kwon, Kwan-Liu Ma

Abstract: The rapid development of Convolutional Neural Networks (CNNs) in recent years has triggered significant breakthroughs in many machine learning (ML) applications. The ability to understand and compare various CNN models available is thus essential. The conventional approach with visualizing each model's quantitative features, such as classification accuracy and computational complexity, is not suff… ▽ More The rapid development of Convolutional Neural Networks (CNNs) in recent years has triggered significant breakthroughs in many machine learning (ML) applications. The ability to understand and compare various CNN models available is thus essential. The conventional approach with visualizing each model's quantitative features, such as classification accuracy and computational complexity, is not sufficient for a deeper understanding and comparison of the behaviors of different models. Moreover, most of the existing tools for assessing CNN behaviors only support comparison between two models and lack the flexibility of customizing the analysis tasks according to user needs. This paper presents a visual analytics system, VAC-CNN (Visual Analytics for Comparing CNNs), that supports the in-depth inspection of a single CNN model as well as comparative studies of two or more models. The ability to compare a larger number of (e.g., tens of) models especially distinguishes our system from previous ones. With a carefully designed model visualization and explaining support, VAC-CNN facilitates a highly interactive workflow that promptly presents both quantitative and qualitative information at each analysis stage. We demonstrate VAC-CNN's effectiveness for assisting novice ML practitioners in evaluating and comparing multiple CNN models through two use cases and one preliminary evaluation study using the image classification tasks on the ImageNet dataset. △ Less

Submitted 14 January, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 12 pages, 6 figures. This manuscript is currently under review

arXiv:2110.12187 [pdf, other]

AFEC: Active Forgetting of Negative Transfer in Continual Learning

Authors: Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, Yi Zhong

Abstract: Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i.e., the forward knowledge transfer is negative, then precisely remembering… ▽ More Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i.e., the forward knowledge transfer is negative, then precisely remembering the old tasks will further aggravate the interference, thus decreasing the performance of continual learning. By contrast, biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience, through regulating the learning-triggered synaptic expansion and synaptic convergence. Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning. Under the framework of Bayesian continual learning, we develop a novel approach named Active Forgetting with synaptic Expansion-Convergence (AFEC). Our method dynamically expands parameters to learn each new task and then selectively combines them, which is formally consistent with the underlying mechanism of biological active forgetting. We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way. △ Less

Submitted 4 November, 2021; v1 submitted 23 October, 2021; originally announced October 2021.

Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2110.09699 [pdf, ps, other]

Image Quality Assessment in the Modern Age

Authors: Kede Ma, Yuming Fang

Abstract: This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA). From an actionable perspective, we will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli. We will then present in detail the design principles of objective quality assessment models, supplement… ▽ More This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA). From an actionable perspective, we will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli. We will then present in detail the design principles of objective quality assessment models, supplemented by an in-depth analysis of their advantages and disadvantages. Both hand-engineered and (deep) learning-based methods will be covered. Moreover, the limitations with the conventional model comparison methodology for objective quality models will be pointed out, and novel comparison methodologies such as those based on the theory of "analysis by synthesis" will be introduced. We will last discuss the real-world multimedia applications of IQA, and give a list of open challenging problems, in the hope of encouraging more and more talented researchers and engineers devoting to this exciting and rewarding research field. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: ACM Multimedia 2021 Tutorial

arXiv:2110.09260 [pdf, other]

doi 10.1109/TMI.2020.3045775

A Unified Framework for Generalized Low-Shot Medical Image Segmentation with Scarce Data

Authors: Hengji Cui, Dong Wei, Kai Ma, Shi Gu, Yefeng Zheng

Abstract: Medical image segmentation has achieved remarkable advancements using deep neural networks (DNNs). However, DNNs often need big amounts of data and annotations for training, both of which can be difficult and costly to obtain. In this work, we propose a unified framework for generalized low-shot (one- and few-shot) medical image segmentation based on distance metric learning (DML). Unlike most exi… ▽ More Medical image segmentation has achieved remarkable advancements using deep neural networks (DNNs). However, DNNs often need big amounts of data and annotations for training, both of which can be difficult and costly to obtain. In this work, we propose a unified framework for generalized low-shot (one- and few-shot) medical image segmentation based on distance metric learning (DML). Unlike most existing methods which only deal with the lack of annotations while assuming abundance of data, our framework works with extreme scarcity of both, which is ideal for rare diseases. Via DML, the framework learns a multimodal mixture representation for each category, and performs dense predictions based on cosine distances between the pixels' deep embeddings and the category representations. The multimodal representations effectively utilize the inter-subject similarities and intraclass variations to overcome overfitting due to extremely limited data. In addition, we propose adaptive mixing coefficients for the multimodal mixture distributions to adaptively emphasize the modes better suited to the current input. The representations are implicitly embedded as weights of the fc layer, such that the cosine distances can be computed efficiently via forward propagation. In our experiments on brain MRI and abdominal CT datasets, the proposed framework achieves superior performances for low-shot segmentation towards standard DNN-based (3D U-Net) and classical registration-based (ANTs) methods, e.g., achieving mean Dice coefficients of 81%/69% for brain tissue/abdominal multiorgan segmentation using a single training sample, as compared to 52%/31% and 72%/35% by the U-Net and ANTs, respectively. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: Published in IEEE TRANSACTIONS ON MEDICAL IMAGING

arXiv:2110.09035 [pdf, other]

Edge Rewiring Goes Neural: Boosting Network Resilience without Rich Features

Authors: Shanchao Yang, Kaili Ma, Baoxiang Wang, Tianshu Yu, Hongyuan Zha

Abstract: Improving the resilience of a network is a fundamental problem in network science, which protects the underlying system from natural disasters and malicious attacks. This is traditionally achieved via successive degree-preserving edge rewiring operations, with the major limitation of being transductive. Inductively solving graph-related tasks with sequential actions is accomplished by adopting gra… ▽ More Improving the resilience of a network is a fundamental problem in network science, which protects the underlying system from natural disasters and malicious attacks. This is traditionally achieved via successive degree-preserving edge rewiring operations, with the major limitation of being transductive. Inductively solving graph-related tasks with sequential actions is accomplished by adopting graph neural networks (GNNs) coupled with reinforcement learning under the scenario with rich graph features. However, such frameworks cannot be directly applied to resilience tasks where only pure topological structure is available. In this case, GNNs can barely learn useful information, resulting in prohibitive difficulty in making actions for successively rewiring edges under a reinforcement learning context. In this paper, we study in depth the reasons why typical GNNs cause such failure. Based on this investigation, we propose ResiNet, the first end-to-end trainable inductive framework to discover resilient network topologies while balancing network utility. To this end, we reformulate resilience optimization as an MDP equipped with edge rewiring action space, and propose a pure topology-oriented variant of GNN called filtration enhanced graph neural network (FireGNN), which can learn from graphs without rich features. Extensive experiments demonstrate that ResiNet achieves a near-optimal resilience gain on various graphs while balancing the utility, and outperforms existing approaches by a large margin. △ Less

Submitted 22 May, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: Code: https://github.com/yangysc/ResiNet

arXiv:2110.08866 [pdf, other]

Alleviating Noisy-label Effects in Image Classification via Probability Transition Matrix

Authors: Ziqi Zhang, Yuexiang Li, Hongxin Wei, Kai Ma, Tao Xu, Yefeng Zheng

Abstract: Deep-learning-based image classification frameworks often suffer from the noisy label problem caused by the inter-observer variation. Recent studies employed learning-to-learn paradigms (e.g., Co-teaching and JoCoR) to filter the samples with noisy labels from the training set. However, most of them use a simple cross-entropy loss as the criterion for noisy label identification. The hard samples,… ▽ More Deep-learning-based image classification frameworks often suffer from the noisy label problem caused by the inter-observer variation. Recent studies employed learning-to-learn paradigms (e.g., Co-teaching and JoCoR) to filter the samples with noisy labels from the training set. However, most of them use a simple cross-entropy loss as the criterion for noisy label identification. The hard samples, which are beneficial for classifier learning, are often mistakenly treated as noises in such a setting since both the hard samples and ones with noisy labels lead to a relatively larger loss value than the easy cases. In this paper, we propose a plugin module, namely noise ignoring block (NIB), consisting of a probability transition matrix and an inter-class correlation (IC) loss, to separate the hard samples from the mislabeled ones, and further boost the accuracy of image classification network trained with noisy labels. Concretely, our IC loss is calculated as Kullback-Leibler divergence between the network prediction and the accumulative soft label generated by the probability transition matrix. Such that, with the lower value of IC loss, the hard cases can be easily distinguished from mislabeled cases. Extensive experiments are conducted on natural and medical image datasets (CIFAR-10 and ISIC 2019). The experimental results show that our NIB module consistently improves the performances of the state-of-the-art robust training methods. △ Less

Submitted 19 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

Journal ref: The British Machine Vision Conference (BMVC), 2021

Showing 251–300 of 611 results for author: Ma, K