Search | arXiv e-print repository

Drug Package Recommendation via Interaction-aware Graph Induction

Authors: Zhi Zheng, Chao Wang, Tong Xu, Dazhong Shen, Penggang Qin, Baoxing Huai, Tongzhu Liu, Enhong Chen

Abstract: Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., syn… ▽ More Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., synergistic or antagonistic effect, have been largely ignored. To that end, in this paper, we target at develo** a new paradigm for drug package recommendation with considering the interaction effect within drugs, in which the interaction effects could be affected by patient conditions. Specifically, we first design a pre-training method based on neural collaborative filtering to get the initial embedding of patients and drugs. Then, the drug interaction graph will be initialized based on medical records and domain knowledge. Along this line, we propose a new Drug Package Recommendation (DPR) framework with two variants, respectively DPR on Weighted Graph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solve the problem, in which each the interactions will be described as signed weights or attribute vectors. In detail, a mask layer is utilized to capture the impact of patient condition, and graph neural networks (GNNs) are leveraged for the final graph induction task to embed the package. Extensive experiments on a real-world data set from a first-rate hospital demonstrate the effectiveness of our DPR framework compared with several competitive baseline methods, and further support the heuristic study for the drug package generation task with adequate performance. △ Less

Submitted 6 February, 2021; originally announced February 2021.

arXiv:2101.12521 [pdf, other]

doi 10.1109/TIP.2021.3056212

Complementary Pseudo Labels For Unsupervised Domain Adaptation On Person Re-identification

Authors: Hao Feng, Minghao Chen, **ming Hu, Dong Shen, Haifeng Liu, Deng Cai

Abstract: In recent years, supervised person re-identification (re-ID) models have received increasing studies. However, these models trained on the source domain always suffer dramatic performance drop when tested on an unseen domain. Existing methods are primary to use pseudo labels to alleviate this problem. One of the most successful approaches predicts neighbors of each unlabeled image and then uses th… ▽ More In recent years, supervised person re-identification (re-ID) models have received increasing studies. However, these models trained on the source domain always suffer dramatic performance drop when tested on an unseen domain. Existing methods are primary to use pseudo labels to alleviate this problem. One of the most successful approaches predicts neighbors of each unlabeled image and then uses them to train the model. Although the predicted neighbors are credible, they always miss some hard positive samples, which may hinder the model from discovering important discriminative information of the unlabeled domain. In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels. The group pseudo labels are generated by transitively merging neighbors of different samples into a group to achieve higher recall. However, the merging operation may cause subgroups in the group due to imperfect neighbor predictions. To utilize these group pseudo labels properly, we propose using a similarity-aggregating loss to mitigate the influence of these subgroups by pulling the input sample towards the most similar embeddings. Extensive experiments on three large-scale datasets demonstrate that our method can achieve state-of-the-art performance under the unsupervised domain adaptation re-ID setting. △ Less

Submitted 6 February, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

Comments: 10 pages, 3 figures. Accepted for publication in IEEE Transactions on Image Processing 2021

arXiv:2101.12413 [pdf, other]

doi 10.1103/PhysRevC.104.024902

Cumulants and Correlation Functions of Net-proton, Proton and Antiproton Multiplicity Distributions in Au+Au Collisions at energies available at the BNL Relativistic Heavy Ion Collider

Authors: STAR Collaboration, M. S. Abdallah, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, G. Agakishiev, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, I. Alekseev, D. M. Anderson, A. Aparin, E. C. Aschenauer, M. U. Ashraf, F. G. Atetalla, A. Attri, G. S. Averichev, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied, P. Bhagat , et al. (367 additional authors not shown)

Abstract: We report a systematic measurement of cumulants, $C_{n}$, for net-proton, proton and antiproton multiplicity distributions, and correlation functions, $κ_n$, for proton and antiproton multiplicity distributions up to the fourth order in Au+Au collisions at $\sqrt{s_{\mathrm {NN}}}$ = 7.7, 11.5, 14.5, 19.6, 27, 39, 54.4, 62.4 and 200 GeV. The $C_{n}$ and $κ_n$ are presented as a function of collisi… ▽ More We report a systematic measurement of cumulants, $C_{n}$, for net-proton, proton and antiproton multiplicity distributions, and correlation functions, $κ_n$, for proton and antiproton multiplicity distributions up to the fourth order in Au+Au collisions at $\sqrt{s_{\mathrm {NN}}}$ = 7.7, 11.5, 14.5, 19.6, 27, 39, 54.4, 62.4 and 200 GeV. The $C_{n}$ and $κ_n$ are presented as a function of collision energy, centrality and kinematic acceptance in rapidity, $y$, and transverse momentum, $p_{T}$. The data were taken during the first phase of the Beam Energy Scan (BES) program (2010 -- 2017) at the BNL Relativistic Heavy Ion Collider (RHIC) facility. The measurements are carried out at midrapidity ($|y| <$ 0.5) and transverse momentum 0.4 $<$ $p_{\rm T}$ $<$ 2.0 GeV/$c$, using the STAR detector at RHIC. We observe a non-monotonic energy dependence ($\sqrt{s_{\mathrm {NN}}}$ = 7.7 -- 62.4 GeV) of the net-proton $C_{4}$/$C_{2}$ with the significance of 3.1$σ$ for the 0-5\% central Au+Au collisions. This is consistent with the expectations of critical fluctuations in a QCD-inspired model. Thermal and transport model calculations show a monotonic variation with $\sqrt{s_{\mathrm {NN}}}$. For the multiparticle correlation functions, we observe significant negative values for a two-particle correlation function, $κ_2$, of protons and antiprotons, which are mainly due to the effects of baryon number conservation. Furthermore, it is found that the four-particle correlation function, $κ_4$, of protons plays a role in determining the energy dependence of proton $C_4/C_1$ below 19.6 GeV, which cannot be understood by the effect of baryon number conservation. △ Less

Submitted 7 August, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

Comments: 34 pages, 25 figures, 8 tables

Journal ref: Phys. Rev. C 104, 024902 (2021)

arXiv:2101.10149 [pdf]

doi 10.1103/PhysRevLett.126.246601

Multiple magnetic topological phases in bulk van der Waals crystal MnSb4Te7

Authors: Shuchun Huan, Shihao Zhang, Zhicheng Jiang, Hao Su, Hongyuan Wang, Xin Zhang, Yichen Yang, Zhengtai Liu, Xia Wang, Na Yu, Zhiqiang Zou, Dawei Shen, Jianpeng Liu, Yanfeng Guo

Abstract: The magnetic van der Waals crystals MnBi2Te4/(Bi2Te3)n have drawn significant attention due to their rich topological properties and the tunability by external magnetic field. Although the MnBi2Te4/(Bi2Te3)n family have been intensively studied in the past few years, their close relatives, the MnSb2Te4/(Sb2Te3)n family, remain much less explored. In this work, combining magnetotransport measuremen… ▽ More The magnetic van der Waals crystals MnBi2Te4/(Bi2Te3)n have drawn significant attention due to their rich topological properties and the tunability by external magnetic field. Although the MnBi2Te4/(Bi2Te3)n family have been intensively studied in the past few years, their close relatives, the MnSb2Te4/(Sb2Te3)n family, remain much less explored. In this work, combining magnetotransport measurements, angle-resolved photoemission spectroscopy, and first principles calculations, we find that MnSb4Te7, the n = 1 member of the MnSb2Te4/(Sb2Te3)n family, is a magnetic topological system with versatile topological phases which can be manipulated by both carrier do** and magnetic field. Our calculations unveil that its A-type antiferromagnetic (AFM) ground state stays in a Z_2 AFM topological insulator phase, which can be converted to an inversion-symmetry-protected axion insulator phase when in the ferromagnetic (FM) state. Moreover, when this system in the FM phase is slightly carrier doped on either the electron or hole side, it becomes a Weyl semimetal with multiple Weyl nodes in the highest valence bands and lowest conduction bands, which are manifested by the measured notable anomalous Hall effect. Our work thus introduces a new magnetic topological material with different topological phases which are highly tunable by carrier do** or magnetic field. △ Less

Submitted 14 February, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

Comments: 28 pages including Supplementary Information,14 figures, 3 tables

Journal ref: Phys. Rev. Lett. 126, 246601 (2021)

arXiv:2101.09405 [pdf, ps, other]

Channel Estimation for RIS Assisted Wireless Communications: Part II -- An Improved Solution Based on Double-Structured Sparsity

Authors: Xiuhong Wei, Decai Shen, Linglong Dai

Abstract: Reconfigurable intelligent surface (RIS) can manipulate the wireless communication environment by controlling the coefficients of RIS elements. However, due to the large number of passive RIS elements without signal processing capability, channel estimation in RIS assisted wireless communication system requires high pilot overhead. In the second part of this invited paper, we propose to exploit th… ▽ More Reconfigurable intelligent surface (RIS) can manipulate the wireless communication environment by controlling the coefficients of RIS elements. However, due to the large number of passive RIS elements without signal processing capability, channel estimation in RIS assisted wireless communication system requires high pilot overhead. In the second part of this invited paper, we propose to exploit the double-structured sparsity of the angular cascaded channels among users to reduce the pilot overhead. Specifically, we first reveal the double-structured sparsity, i.e., different angular cascaded channels for different users enjoy the completely common non-zero rows and the partially common non-zero columns. By exploiting this double-structured sparsity, we further propose the double-structured orthogonal matching pursuit (DS-OMP) algorithm, where the completely common non-zero rows and the partially common non-zero columns are jointly estimated for all users. Simulation results show that the pilot overhead required by the proposed scheme is lower than existing schemes. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: This paper has been accepted by the IEEE Communications Letters as an invited paper. Simulation codes are provided to reproduce the results presented in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

arXiv:2101.09404 [pdf, ps, other]

Channel Estimation for RIS Assisted Wireless Communications: Part I -- Fundamentals, Solutions, and Future Opportunities

Authors: Xiuhong Wei, Decai Shen, Linglong Dai

Abstract: The reconfigurable intelligent surface (RIS) with low hardware cost and energy consumption has been recognized as a potential technique for future 6G communications to enhance coverage and capacity. To achieve this goal, accurate channel state information (CSI) in RIS assisted wireless communication system is essential for the joint beamforming at the base station (BS) and the RIS. However, channe… ▽ More The reconfigurable intelligent surface (RIS) with low hardware cost and energy consumption has been recognized as a potential technique for future 6G communications to enhance coverage and capacity. To achieve this goal, accurate channel state information (CSI) in RIS assisted wireless communication system is essential for the joint beamforming at the base station (BS) and the RIS. However, channel estimation is challenging, since a large number of passive RIS elements cannot transmit, receive, or process signals. In the first part of this invited paper, we provide an overview of the fundamentals, solutions, and future opportunities of channel estimation in the RIS assisted wireless communication system. It is noted that a new channel estimation scheme with low pilot overhead will be provided in the second part of this paper. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: This paper has been accepted by the IEEE Communications Letters as an invited paper

arXiv:2101.07731 [pdf, other]

TC-DTW: Accelerating Multivariate Dynamic Time War** Through Triangle Inequality and Point Clustering

Authors: Daniel Shen, Min Chi

Abstract: Dynamic time war** (DTW) plays an important role in analytics on time series. Despite the large body of research on speeding up univariate DTW, the method for multivariate DTW has not been improved much in the last two decades. The most popular algorithm used today is still the one developed seventeen years ago. This paper presents a solution that, as far as we know, for the first time consisten… ▽ More Dynamic time war** (DTW) plays an important role in analytics on time series. Despite the large body of research on speeding up univariate DTW, the method for multivariate DTW has not been improved much in the last two decades. The most popular algorithm used today is still the one developed seventeen years ago. This paper presents a solution that, as far as we know, for the first time consistently outperforms the classic multivariate DTW algorithm across dataset sizes, series lengths, data dimensions, temporal window sizes, and machines. The new solution, named TC-DTW, introduces Triangle Inequality and Point Clustering into the algorithm design on lower bound calculations for multivariate DTW. In experiments on DTW-based nearest neighbor finding, the new solution avoids as much as 98% (60% average) DTW distance calculations and yields as much as 25X (7.5X average) speedups. △ Less

Submitted 19 January, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Report number: North Carolina State University TR-2021-2

arXiv:2101.06804 [pdf, other]

What Makes Good In-Context Examples for GPT-$3$?

Authors: Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen

Abstract: GPT-$3$ has attracted lots of attention due to its superior performance across a wide range of NLP tasks, especially with its powerful and versatile in-context few-shot learning ability. Despite its success, we found that the empirical results of GPT-$3$ depend heavily on the choice of in-context examples. In this work, we investigate whether there are more effective strategies for judiciously sel… ▽ More GPT-$3$ has attracted lots of attention due to its superior performance across a wide range of NLP tasks, especially with its powerful and versatile in-context few-shot learning ability. Despite its success, we found that the empirical results of GPT-$3$ depend heavily on the choice of in-context examples. In this work, we investigate whether there are more effective strategies for judiciously selecting in-context examples (relative to random sampling) that better leverage GPT-$3$'s few-shot capabilities. Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt. Intuitively, the in-context examples selected with such a strategy may serve as more informative inputs to unleash GPT-$3$'s extensive knowledge. We evaluate the proposed approach on several natural language understanding and generation benchmarks, where the retrieval-based prompt selection approach consistently outperforms the random baseline. Moreover, it is observed that the sentence encoders fine-tuned on task-related datasets yield even more helpful retrieval results. Notably, significant gains are observed on tasks such as table-to-text generation (41.9% on the ToTTo dataset) and open-domain question answering (45.5% on the NQ dataset). We hope our investigation could help understand the behaviors of GPT-$3$ and large-scale pre-trained LMs in general and enhance their few-shot capabilities. △ Less

Submitted 17 January, 2021; originally announced January 2021.

arXiv:2101.05493 [pdf]

Far-Field Super-Resolution Imaging By Nonlinear Excited Evanescent Waves

Authors: Zhihao Zhou, Wei Liu, Jia**g He, Lei Chen, Xin Luo, Dongyi Shen, Jianjun Cao, Ya** Dan, Xianfeng Chen, Wenjie Wan

Abstract: Abbe's resolution limit, one of the best-known physical limitations, poses a great challenge for any wave systems in imaging, wave transport, and dynamics. Originally formulated in linear optics, this Abbe's limit can be broken using nonlinear optical interactions. Here we extend the Abbe theory into a nonlinear regime and experimentally demonstrate a far-field, label-free, and scan-free super-res… ▽ More Abbe's resolution limit, one of the best-known physical limitations, poses a great challenge for any wave systems in imaging, wave transport, and dynamics. Originally formulated in linear optics, this Abbe's limit can be broken using nonlinear optical interactions. Here we extend the Abbe theory into a nonlinear regime and experimentally demonstrate a far-field, label-free, and scan-free super-resolution imaging technique based on nonlinear four-wave mixing to retrieve near-field scattered evanescent waves, achieving sub-wavelength resolution of $λ/15.6$. This method paves the way for application in biomedical imaging, semiconductor metrology, and photolithography. △ Less

Submitted 14 January, 2021; originally announced January 2021.

arXiv:2101.00363 [pdf, ps, other]

doi 10.1016/j.jpaa.2021.106990

The Prime Graphs of Some Classes of Finite Groups

Authors: Chris Florez, Jonathan Higgins, Kyle Huang, Thomas Michael Keller, Dawei Shen, Yong Yang

Abstract: In this paper we study prime graphs of finite groups. The prime graph of a finite group $G$, also known as the Gruenberg-Kegel graph, is the graph with vertex set {primes dividing $|G|$} and an edge $p$-$q$ if and only if there exists an element of order $pq$ in $G$. In finite group theory, studying the prime graph of a group has been an important topic for the past almost half century. Only recen… ▽ More In this paper we study prime graphs of finite groups. The prime graph of a finite group $G$, also known as the Gruenberg-Kegel graph, is the graph with vertex set {primes dividing $|G|$} and an edge $p$-$q$ if and only if there exists an element of order $pq$ in $G$. In finite group theory, studying the prime graph of a group has been an important topic for the past almost half century. Only recently prime graphs of solvable groups have been characterized in graph theoretical terms only. In this paper, we continue this line of research and give complete characterizations of several classes of groups, including groups of square-free order, metanilpotent groups, groups of cube-free order, and, for any $n\in \mathbb{N}$, solvable groups of $n^\text{th}$-power-free order. We also explore the prime graphs of groups whose composition factors are cyclic or $A_5$ and draw connections to a conjecture of Maslova. We then propose an algorithm that recovers the prime graph from a dual prime graph. △ Less

Submitted 2 January, 2022; v1 submitted 1 January, 2021; originally announced January 2021.

arXiv:2012.13697 [pdf, other]

TSGCNet: Discriminative Geometric Feature Learning with Two-Stream GraphConvolutional Network for 3D Dental Model Segmentation

Authors: Lingming Zhang, Yue Zhao, Deyu Meng, Zhiming Cui, Chenqiang Gao, Xinbo Gao, Chunfeng Lian, Dinggang Shen

Abstract: The ability to segment teeth precisely from digitized 3D dental models is an essential task in computer-aided orthodontic surgical planning. To date, deep learning based methods have been popularly used to handle this task. State-of-the-art methods directly concatenate the raw attributes of 3D inputs, namely coordinates and normal vectors of mesh cells, to train a single-stream network for fully-a… ▽ More The ability to segment teeth precisely from digitized 3D dental models is an essential task in computer-aided orthodontic surgical planning. To date, deep learning based methods have been popularly used to handle this task. State-of-the-art methods directly concatenate the raw attributes of 3D inputs, namely coordinates and normal vectors of mesh cells, to train a single-stream network for fully-automated tooth segmentation. This, however, has the drawback of ignoring the different geometric meanings provided by those raw attributes. This issue might possibly confuse the network in learning discriminative geometric features and result in many isolated false predictions on the dental model. Against this issue, we propose a two-stream graph convolutional network (TSGCNet) to learn multi-view geometric information from different geometric attributes. Our TSGCNet adopts two graph-learning streams, designed in an input-aware fashion, to extract more discriminative high-level geometric representations from coordinates and normal vectors, respectively. These feature representations learned from the designed two different streams are further fused to integrate the multi-view complementary information for the cell-wise dense prediction task. We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners, and experimental results demonstrate that our method significantly outperforms state-of-the-art methods for 3D shape segmentation. △ Less

Submitted 26 December, 2020; originally announced December 2020.

Comments: 10 pages, 7 figures

arXiv:2012.04743 [pdf, other]

2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Authors: Haoyu Wei, Florian Schiffers, Tobias Würfl, Daming Shen, Daniel Kim, Aggelos K. Katsaggelos, Oliver Cossairt

Abstract: Computed tomography is widely used to examine internal structures in a non-destructive manner. To obtain high-quality reconstructions, one typically has to acquire a densely sampled trajectory to avoid angular undersampling. However, many scenarios require a sparse-view measurement leading to streak-artifacts if unaccounted for. Current methods do not make full use of the domain-specific informati… ▽ More Computed tomography is widely used to examine internal structures in a non-destructive manner. To obtain high-quality reconstructions, one typically has to acquire a densely sampled trajectory to avoid angular undersampling. However, many scenarios require a sparse-view measurement leading to streak-artifacts if unaccounted for. Current methods do not make full use of the domain-specific information, and hence fail to provide reliable reconstructions for highly undersampled data. We present a novel framework for sparse-view tomography by decoupling the reconstruction into two steps: First, we overcome its ill-posedness using a super-resolution network, SIN, trained on the sparse projections. The intermediate result allows for a closed-form tomographic reconstruction with preserved details and highly reduced streak-artifacts. Second, a refinement network, PRN, trained on the reconstructions reduces any remaining artifacts. We further propose a light-weight variant of the perceptual-loss that enhances domain-specific information, boosting restoration accuracy. Our experiments demonstrate an improvement over current solutions by 4 dB. △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:2012.03410 [pdf, other]

doi 10.1103/PhysRevB.103.035133

Evidence of topological nodal lines and surface states in the centrosymmetric superconductor SnTaS2

Authors: Wenqing Chen, Lulu Liu, Wentao Yang, Dong Chen, Zhengtai Liu, Yaobo Huang, Tong Zhang, Haijun Zhang, Zhonghao Liu, D. W. Shen

Abstract: The discovery of signatures of topological superconductivity in superconducting bulk materials with topological surface states has attracted intensive research interests recently. Utilizing angle-resolved photoemission spectroscopy and first-principles calculations, here, we demonstrate the existence of topological nodal-line states and drumheadlike surface states in centrosymmetric superconductor… ▽ More The discovery of signatures of topological superconductivity in superconducting bulk materials with topological surface states has attracted intensive research interests recently. Utilizing angle-resolved photoemission spectroscopy and first-principles calculations, here, we demonstrate the existence of topological nodal-line states and drumheadlike surface states in centrosymmetric superconductor SnTaS2, which is a type-II superconductor with a critical transition temperature of about 3 K. The valence bands from Ta 5d orbitals and the conduction bands from Sn 5p orbitals cross each other, forming two nodal lines in the vicinity of the Fermi energy without the inclusion of spin-orbit coupling (SOC), protected by the spatial-inversion symmetry and time-reversal symmetry. The nodal lines are gapped out by SOC. The drumheadlike surface states, the typical characteristics in nodal-line semimetals, are quite visible near the Fermi level. Our findings indicate that SnTaS2 offers a promising platform for exploring the exotic properties of the topological nodal-line fermions and gives a help to study topological superconductivity. △ Less

Submitted 18 January, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

Comments: to appear in Physical Review B

Journal ref: Phys. Rev. B 103, 035133 (2021)

arXiv:2012.01666 [pdf, ps, other]

Condition numbers of the mixed least squares-total least squares problem: revisited

Authors: Qiaohua Liu, Qian Zhang, Dongmei Shen

Abstract: A new closed formula for the first order perturbation estimate of the mixed least squares-total least squares (MTLS) solution is presented. It is mathematically equivalent to the one by Zheng and Yang(Numer. Linear Algebra Appl. 2019; 26(4):e2239). With this formula, general and structured normwise, mixed and componentwise condition numbers of the MTLS problem are derived. Perturbation bounds base… ▽ More A new closed formula for the first order perturbation estimate of the mixed least squares-total least squares (MTLS) solution is presented. It is mathematically equivalent to the one by Zheng and Yang(Numer. Linear Algebra Appl. 2019; 26(4):e2239). With this formula, general and structured normwise, mixed and componentwise condition numbers of the MTLS problem are derived. Perturbation bounds based on the normwise condition number, and compact forms for the upper bounds of mixed and componentwise condition numbers are also given in order for economic storage and efficient computation. It is shown that the condition numbers and perturbation bound of the TLS problem are unified in the ones of the MTLS problem. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: 20 pages

MSC Class: 65F35; 65F20

arXiv:2011.11267 [pdf]

Charge density wave and weak Kondo effect in a Dirac semimetal CeSbTe

Authors: Peng Li, Baijiang Lv, Yuan Fang, Wei Guo, Zhongzheng Wu, Yi Wu, Cheng-Maw Cheng, Dawei Shen, Yuefeng Nie, Luca Petaccia, Chao Cao, Zhu-An Xu, Yang Liu

Abstract: Using angle-resolved photoemission spectroscopy (ARPES) and low-energy electron diffraction (LEED), together with density-functional theory (DFT) calculation, we report the formation of charge density wave (CDW) and its interplay with the Kondo effect and topological states in CeSbTe. The observed Fermi surface (FS) exhibits parallel segments that can be well connected by the observed CDW ordering… ▽ More Using angle-resolved photoemission spectroscopy (ARPES) and low-energy electron diffraction (LEED), together with density-functional theory (DFT) calculation, we report the formation of charge density wave (CDW) and its interplay with the Kondo effect and topological states in CeSbTe. The observed Fermi surface (FS) exhibits parallel segments that can be well connected by the observed CDW ordering vector, indicating that the CDW order is driven by the electron-phonon coupling (EPC) as a result of the nested FS. The CDW gap is large (~0.3 eV) and momentum-dependent, which naturally explains the robust CDW order up to high temperatures. The gap opening leads to a reduced density of states (DOS) near the Fermi level (EF), which correspondingly suppresses the many-body Kondo effect, leading to very localized 4f electrons at 20 K and above. The topological Dirac cone at the X point is found to remain gapless inside the CDW phase. Our results provide evidence for the competition between CDW and the Kondo effect in a Kondo lattice system. The robust CDW order in CeSbTe and related compounds provide an opportunity to search for the long-sought-after axionic insulator. △ Less

Submitted 23 November, 2020; originally announced November 2020.

arXiv:2011.10403 [pdf, ps, other]

Minimal Prime Graphs of Solvable Groups

Authors: Chris Florez, Jonathan Higgins, Kyle Huang, Thomas Michael Keller, Dawei Shen

Abstract: We explore graph theoretical properties of minimal prime graphs of finite solvable groups. In finite group theory studying the prime graph of a group has been an important topic for the past almost half century. Recently prime graphs of solvable groups have been characterized in graph theoretical terms only. This now allows the study of these graphs with methods from graph theory only. Minimal pri… ▽ More We explore graph theoretical properties of minimal prime graphs of finite solvable groups. In finite group theory studying the prime graph of a group has been an important topic for the past almost half century. Recently prime graphs of solvable groups have been characterized in graph theoretical terms only. This now allows the study of these graphs with methods from graph theory only. Minimal prime graphs turn out to be of particular interest, and in this paper we pursue this further by exploring, among other things, diameters, Hamiltonian cycles and the property of being self-complementary for minimal prime graphs. We also study a new, but closely related notion of minimality for prime graphs and look into counting minimal prime graphs. △ Less

Submitted 20 November, 2020; originally announced November 2020.

arXiv:2011.08938 [pdf, ps, other]

The Adjacency Spectra of Some Families of Minimally Connected Prime Graphs

Authors: Chris Florez, Jonathan Higgins, Kyle Huang, Thomas Michael Keller, Dawei Shen

Abstract: In finite group theory, studying the prime graph of a group has been an important topic for almost the past half-century. Recently, prime graphs of solvable groups have been characterized in graph theoretical terms only. This now allows the study of these graphs without any knowledge of the group theoretical background. In this paper we study prime graphs from a linear algebra angle and focus on t… ▽ More In finite group theory, studying the prime graph of a group has been an important topic for almost the past half-century. Recently, prime graphs of solvable groups have been characterized in graph theoretical terms only. This now allows the study of these graphs without any knowledge of the group theoretical background. In this paper we study prime graphs from a linear algebra angle and focus on the class of minimally connected prime graphs introduced in earlier work on the subject. As our main results, we determine the determinants of the adjacency matrices and the spectra of some important families of these graphs. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 23 pages, 4 figures

MSC Class: 15A18; 05C25

arXiv:2011.03942 [pdf, other]

Radiative decays of Upsilon(nS) into S-wave and P-wave charmonium

Authors: Dan-Dan Shen, Chong-Yang Lu, Peng Sun, Ruilin Zhu

Abstract: Motivated by very recent measurement of the radiative decays of $Υ(1S)$ to $χ_{c1}$ at Belle, we use the nonrelativistic QCD factorization theory and calculate the branching fractions of the radiative decays of bottomonium into S-wave and P-wave charmonium, i.e. $Υ(nS) \to η_c(nS)+γ$ and $Υ(nS)\to χ_{cJ}+γ$. We systematically studied the branching fractions of the radiative decays of bottomonium i… ▽ More Motivated by very recent measurement of the radiative decays of $Υ(1S)$ to $χ_{c1}$ at Belle, we use the nonrelativistic QCD factorization theory and calculate the branching fractions of the radiative decays of bottomonium into S-wave and P-wave charmonium, i.e. $Υ(nS) \to η_c(nS)+γ$ and $Υ(nS)\to χ_{cJ}+γ$. We systematically studied the branching fractions of the radiative decays of bottomonium into charmonium. Compared to the previous calculation, we obtained the analytical expression for the decay widths and considered the color-octet contributions. For $Υ(nS) \to η_c(nS)+γ$, the relativistic corrections are also obtained. Through the calculation, the theoretical prediction for $Υ(1S)\toχ_{c1}+γ$ is still smaller than the recent Belle measurement. Further theoretical work and experimental analysis are necessary to understand the $χ_{c1}$ production mechanism in upsilon decays. △ Less

Submitted 8 November, 2020; originally announced November 2020.

Comments: 10 pages,3 figures,6 tables

arXiv:2011.03127 [pdf, other]

Causal Imputation via Synthetic Interventions

Authors: Chandler Squires, Dennis Shen, Anish Agarwal, Devavrat Shah, Caroline Uhler

Abstract: Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every… ▽ More Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the outcome for every pair while only having to perform experiments on a small subset of pairs. This task, which we label "causal imputation", is a generalization of the causal transportability problem. To address this challenge, we extend the recently introduced synthetic interventions (SI) estimator to handle more general data sparsity patterns. We prove that, under a latent factor model, our estimator provides valid estimates for the causal imputation task. We motivate this model by establishing a connection to the linear structural causal model literature. Finally, we consider the prominent CMAP dataset in predicting the effects of compounds on gene expression across cell types. We find that our estimator outperforms standard baselines, thus confirming its utility in biological applications. △ Less

Submitted 11 June, 2023; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2011.00593 [pdf, other]

MixKD: Towards Efficient Distillation of Large-scale Language Models

Authors: Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Abstract: Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such… ▽ More Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such big models. However, large-scale neural network systems are prone to memorize training instances, and thus tend to make inconsistent predictions when the data distribution is altered slightly. Moreover, the student model has few opportunities to request useful information from the teacher model when there is limited task-specific data available. To address these issues, we propose MixKD, a data-agnostic distillation framework that leverages mixup, a simple yet efficient data augmentation approach, to endow the resulting model with stronger generalization ability. Concretely, in addition to the original training examples, the student model is encouraged to mimic the teacher's behavior on the linear interpolation of example pairs as well. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the generalization error and the empirical error. To verify its effectiveness, we conduct experiments on the GLUE benchmark, where MixKD consistently leads to significant gains over the standard KD training, and outperforms several competitive baselines. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach. △ Less

Submitted 17 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: ICLR 2021 Camera Ready

arXiv:2010.14449 [pdf, other]

On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls

Authors: Anish Agarwal, Devavrat Shah, Dennis Shen

Abstract: We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce… ▽ More We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions for out-of-sample predictions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. Accordingly, we construct a hypothesis test to check when this conditions holds in practice. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures. △ Less

Submitted 25 August, 2023; v1 submitted 27 October, 2020; originally announced October 2020.

arXiv:2010.14020 [pdf, ps, other]

Electronic structure of the high-mobility two-dimensional antiferromagnetic metal GdTe$_3$

Authors: J. S. Liu, S. C. Huan, Z. H. Liu, W. L. Liu, Z. T. Liu, X. L. Lu, Z. Huang, Z. C. Jiang, X. Wang, N. Yu, Z. Q. Zou, Y. F. Guo, D. W. Shen

Abstract: The new-found two-dimensional antiferromagnetic GdTe$_3$ is attractive owing to its highest carrier mobility among all known layered magnetic materials, as well as its potential application for novel magnetic twistronic and spintronic devices. Here, we have used high-resolution angle-resolved photoemission spectroscopy to investigate its Fermi surface topology and low-lying electronic band structu… ▽ More The new-found two-dimensional antiferromagnetic GdTe$_3$ is attractive owing to its highest carrier mobility among all known layered magnetic materials, as well as its potential application for novel magnetic twistronic and spintronic devices. Here, we have used high-resolution angle-resolved photoemission spectroscopy to investigate its Fermi surface topology and low-lying electronic band structure. The Fermi surface is partially gapped by charge density wave below the transition temperature, the residual part reconstructs making GdTe$_3$ metallic. The high carrier mobility can be attributed to the sharp and nearly linear band dispersions near the Fermi energy. We find that the scattering rate of the linear band near the Fermi energy is almost linear within a wide energy range, indicating that GdTe$_3$ is a non-Fermi liquid metal. Our results in this paper provide a fundamental understanding of this layered Van der Waals antiferromagnetic materials to guide future studies on it. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2010.08670 [pdf, other]

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Authors: Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen

Abstract: Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples… ▽ More Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization objective is introduced to capture the global relationship among all the data samples. A momentum encoder along with a memory bank is further leveraged to better estimate the contrastive loss. To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks. On the GLUE benchmark, CoDA gives rise to an average improvement of 2.2% while applied to the RoBERTa-large model. More importantly, it consistently exhibits stronger results relative to several competitive data augmentation and adversarial training base-lines (including the low-resource settings). Extensive experiments show that the proposed contrastive objective can be flexibly combined with various data augmentation approaches to further boost their performance, highlighting the wide applicability of the CoDA framework. △ Less

Submitted 16 October, 2020; originally announced October 2020.

arXiv:2010.06040 [pdf, other]

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model

Authors: Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao

Abstract: Masked Language Model (MLM) framework has been widely adopted for self-supervised language pre-training. In this paper, we argue that randomly sampled masks in MLM would lead to undesirably large gradient variance. Thus, we theoretically quantify the gradient variance via correlating the gradient covariance with the Hamming distance between two different masks (given a certain text sequence). To r… ▽ More Masked Language Model (MLM) framework has been widely adopted for self-supervised language pre-training. In this paper, we argue that randomly sampled masks in MLM would lead to undesirably large gradient variance. Thus, we theoretically quantify the gradient variance via correlating the gradient covariance with the Hamming distance between two different masks (given a certain text sequence). To reduce the variance due to the sampling of masks, we propose a fully-explored masking strategy, where a text sequence is divided into a certain number of non-overlap** segments. Thereafter, the tokens within one segment are masked for training. We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training. We conduct extensive experiments on both continual pre-training and general pre-training from scratch. Empirical results confirm that this new masking strategy can consistently outperform standard random masking. Detailed efficiency analysis and ablation studies further validate the advantages of our fully-explored masking strategy under the MLM framework. △ Less

Submitted 14 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

arXiv:2010.05994 [pdf, other]

Improving Text Generation with Student-Forcing Optimal Transport

Authors: Guoyin Wang, Chunyuan Li, Jianqiao Li, Hao Fu, Yuh-Chen Lin, Liqun Chen, Yizhe Zhang, Chenyang Tao, Ruiyi Zhang, Wenlin Wang, Dinghan Shen, Qian Yang, Lawrence Carin

Abstract: Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequ… ▽ More Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequences generated in these two modes. An extension is further proposed to improve the OT learning, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: To appear at EMNLP 2020

arXiv:2009.13818 [pdf, other]

A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation

Authors: Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen

Abstract: Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability. However, it typically requires expensive computation to determine the direction of the injected perturbations. In this paper, we introduce a set of simple yet effective data augmentation strategies dubbed cutoff, where part of the information within an input sentence is erase… ▽ More Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability. However, it typically requires expensive computation to determine the direction of the injected perturbations. In this paper, we introduce a set of simple yet effective data augmentation strategies dubbed cutoff, where part of the information within an input sentence is erased to yield its restricted views (during the fine-tuning stage). Notably, this process relies merely on stochastic sampling and thus adds little computational overhead. A Jensen-Shannon Divergence consistency loss is further utilized to incorporate these augmented samples into the training objective in a principled manner. To verify the effectiveness of the proposed strategies, we apply cutoff to both natural language understanding and generation problems. On the GLUE benchmark, it is demonstrated that cutoff, in spite of its simplicity, performs on par or better than several competitive adversarial-based approaches. We further extend cutoff to machine translation and observe significant gains in BLEU scores (based upon the Transformer Base model). Moreover, cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset. △ Less

Submitted 22 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: Source code is available at: https://github.com/dinghanshen/cutoff

arXiv:2009.06899 [pdf]

Co-evolution of Functional Brain Network at Multiple Scales during Early Infancy

Authors: Xuyun Wen, Liming Hsu, Weili Lin, Han Zhang, Dinggang Shen

Abstract: The human brains are organized into hierarchically modular networks facilitating efficient and stable information processing and supporting diverse cognitive processes during the course of development. While the remarkable reconfiguration of functional brain network has been firmly established in early life, all these studies investigated the network development from a "single-scale" perspective,… ▽ More The human brains are organized into hierarchically modular networks facilitating efficient and stable information processing and supporting diverse cognitive processes during the course of development. While the remarkable reconfiguration of functional brain network has been firmly established in early life, all these studies investigated the network development from a "single-scale" perspective, which ignore the richness engendered by its hierarchical nature. To fill this gap, this paper leveraged a longitudinal infant resting-state functional magnetic resonance imaging dataset from birth to 2 years of age, and proposed an advanced methodological framework to delineate the multi-scale reconfiguration of functional brain network during early development. Our proposed framework is consist of two parts. The first part developed a novel two-step multi-scale module detection method that could uncover efficient and consistent modular structure for longitudinal dataset from multiple scales in a completely data-driven manner. The second part designed a systematic approach that employed the linear mixed-effect model to four global and nodal module-related metrics to delineate scale-specific age-related changes of network organization. By applying our proposed methodological framework on the collected longitudinal infant dataset, we provided the first evidence that, in the first 2 years of life, the brain functional network is co-evolved at different scales, where each scale displays the unique reconfiguration pattern in terms of modular organization. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: 10 pages, 4 figures

arXiv:2009.06805 [pdf, other]

doi 10.1038/s41535-022-00477-z

Dual topological superconducting states in the layered titanium-based oxypnictide superconductor BaTi$_2$Sb$_2$O

Authors: Z. Huang, W. L. Liu, H. Y. Wang, Y. L. Su, Z. T. Liu, X. B. Shi, S. Y. Gao, Z. C. Jiang, Z. H. Liu, J. S. Liu, X. L. Lu, Y. C. Yang, J. X. Zhang, S. C. Huan, W. Xia, J. H. Wang, Y. S. Wu, X. Wang, N. Yu, Y. B. Huang, S. Qiao, J. Li, W. W. Zhao, Y. F. Guo, G. Li , et al. (1 additional authors not shown)

Abstract: Topological superconductors have long been predicted to host Majorana zero modes which obey non-Abelian statistics and have potential for realizing non-decoherence topological quantum computation. However, material realization of topological superconductors is still a challenge in condensed matter physics. Utilizing high-resolution angle-resolved photoemission spectroscopy and first-principles cal… ▽ More Topological superconductors have long been predicted to host Majorana zero modes which obey non-Abelian statistics and have potential for realizing non-decoherence topological quantum computation. However, material realization of topological superconductors is still a challenge in condensed matter physics. Utilizing high-resolution angle-resolved photoemission spectroscopy and first-principles calculations, we predict and then unveil the coexistence of topological Dirac semimetal and topological insulator states in the vicinity of Fermi energy ($E_F$) in the titanium-based oxypnictide superconductor BaTi$_2$Sb$_2$O. Further spin-resolved measurements confirm its spin-helical surface states around $E_F$, which are topologically protected and give an opportunity for realization of Majorana zero modes and Majorana flat bands in one material. Hosting dual topological superconducting states, the intrinsic superconductor BaTi$_2$Sb$_2$O is expected to be a promising platform for further investigation of topological superconductivity. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: 6 pages, 4 figures

Journal ref: npj Quantum Mater. 7, 70 (2022)

arXiv:2009.02797 [pdf, other]

Deep Modeling of Growth Trajectories for Longitudinal Prediction of Missing Infant Cortical Surfaces

Authors: Peirong Liu, Zhengwang Wu, Gang Li, Pew-Thian Yap, Dinggang Shen

Abstract: Charting cortical growth trajectories is of paramount importance for understanding brain development. However, such analysis necessitates the collection of longitudinal data, which can be challenging due to subject dropouts and failed scans. In this paper, we will introduce a method for longitudinal prediction of cortical surfaces using a spatial graph convolutional neural network (GCNN), which ex… ▽ More Charting cortical growth trajectories is of paramount importance for understanding brain development. However, such analysis necessitates the collection of longitudinal data, which can be challenging due to subject dropouts and failed scans. In this paper, we will introduce a method for longitudinal prediction of cortical surfaces using a spatial graph convolutional neural network (GCNN), which extends conventional CNNs from Euclidean to curved manifolds. The proposed method is designed to model the cortical growth trajectories and jointly predict inner and outer cortical surfaces at multiple time points. Adopting a binary flag in loss calculation to deal with missing data, we fully utilize all available cortical surfaces for training our deep learning model, without requiring a complete collection of longitudinal data. Predicting the surfaces directly allows cortical attributes such as cortical thickness, curvature, and convexity to be computed for subsequent analysis. We will demonstrate with experimental results that our method is capable of capturing the nonlinearity of spatiotemporal cortical growth patterns and can predict cortical surfaces with improved accuracy. △ Less

Submitted 11 September, 2020; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: Accepted as oral presentation at IPMI 2019

arXiv:2009.00039 [pdf]

doi 10.1103/PhysRevB.104.075134

Coexistence of Ferromagnetism and Topology by Charge Carrier Engineering in intrinsic magnetic topological insulator MnBi4Te7

Authors: Bo Chen, Fucong Fei, Dinghui Wang, Zhicheng Jiang, Bo Zhang, **gwen Guo, Hangkai Xie, Yong Zhang, Muhammad Naveed, Yu Du, Zhe Sun, Haijun Zhang, Dawei Shen, Fengqi Song

Abstract: Intrinsic magnetic topological insulators (MTIs) MnBi2Te4 and MnBi2Te4/(Bi2Te3)n are expected to realize the high-temperature quantum anomalous Hall effect (QAHE) and dissipationless electrical transport. Extensive efforts have been made on this field but there is still lack of ideal MTI candidate with magnetic ordering of ferromagnetic (FM) ground state. Here, we demonstrate a MTI sample of Mn(Bi… ▽ More Intrinsic magnetic topological insulators (MTIs) MnBi2Te4 and MnBi2Te4/(Bi2Te3)n are expected to realize the high-temperature quantum anomalous Hall effect (QAHE) and dissipationless electrical transport. Extensive efforts have been made on this field but there is still lack of ideal MTI candidate with magnetic ordering of ferromagnetic (FM) ground state. Here, we demonstrate a MTI sample of Mn(Bi0.7Sb0.3)4Te7 which holds the coexistence of FM ground state and topological non-triviality. The dramatic modulation of the magnetism is induced by a charge carrier engineering process by the way of Sb substitution in MnBi4Te7 matrix with AFM ordering. The evolution of magnetism in Mn(Bi1-xSbx)4Te7 is systematically investigated by magnetic measurements and theoretical calculations. The clear topological surface states of the FM sample of x = 0.3 are also verified by angle-resolved photoemission spectra. We also aware that the FM sample of x = 0.3 is close to the charge neutral point. Therefore, the demonstration of intrinsic FM-MTI of Mn(Bi0.7Sb0.3)4Te7 in this work sheds light to the further studies of QAHE realization and optimizations. △ Less

Submitted 31 August, 2020; originally announced September 2020.

Journal ref: Phys. Rev. B 104, 075134 (2021)

arXiv:2007.16103 [pdf, other]

Learning-based Computer-aided Prescription Model for Parkinson's Disease: A Data-driven Perspective

Authors: Yinghuan Shi, Wanqi Yang, Kim-Han Thung, Hao Wang, Yang Gao, Yang Pan, Li Zhang, Dinggang Shen

Abstract: In this paper, we study a novel problem: "automatic prescription recommendation for PD patients." To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new comi… ▽ More In this paper, we study a novel problem: "automatic prescription recommendation for PD patients." To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new coming patients, we could recommend (predict) suitable prescription drug on their observed symptoms by our prescription model. From the methodology part, our proposed model, namely Prescription viA Learning lAtent Symptoms (PALAS), could recommend prescription using the multi-modality representation of the data. In PALAS, a latent symptom space is learned to better model the relationship between symptoms and prescription drug, as there is a large semantic gap between them. Moreover, we present an efficient alternating optimization method for PALAS. We evaluated our method using the data collected from 136 PD patients at Nan**g Brain Hospital, which can be regarded as a large dataset in PD research community. The experimental results demonstrate the effectiveness and clinical potential of our method in this recommendation task, if compared with other competing methods. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: IEEE JBHI 2020

arXiv:2007.14642 [pdf, ps, other]

The moduli space of the tropicalizations of Riemann surfaces

Authors: Dali Shen

Abstract: In this paper we study the moduli space of the tropicalizations of Riemann surfaces. We first tropicalize a smooth pointed Riemann surface by a graph defined by its (hyperbolic) pair of pants decomposition. Then we can construct the moduli space of tropicalizations based on a fixed regular tropicalization, and compactify it by adding strata parametrizing weighted contractions. We show that this co… ▽ More In this paper we study the moduli space of the tropicalizations of Riemann surfaces. We first tropicalize a smooth pointed Riemann surface by a graph defined by its (hyperbolic) pair of pants decomposition. Then we can construct the moduli space of tropicalizations based on a fixed regular tropicalization, and compactify it by adding strata parametrizing weighted contractions. We show that this compact moduli space is also Hausdorff. In the end, we compare this moduli space with the moduli space of Riemann surfaces, establishing a partial order-preserving correspondence between the stratifications of these two moduli spaces. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: 20 pages. Comments welcome

arXiv:2007.05093 [pdf]

Extremum Power Seeking Control of A Hybrid Wind-Solar-Storage DC Power System

Authors: Dan Shen, Afshin Izadian

Abstract: This paper presents a combined power system with a common dc bus that contains solar power, wind power, battery storage, and a constant dc load (CDL). In wind system, an AC-DC uncontrolled rectifier is used at the first stage and the DC-DC converter is controlled by a maximum power point tracker (MPPT) at second stage. In the solar system, two cascaded boost converters are controlled through a sli… ▽ More This paper presents a combined power system with a common dc bus that contains solar power, wind power, battery storage, and a constant dc load (CDL). In wind system, an AC-DC uncontrolled rectifier is used at the first stage and the DC-DC converter is controlled by a maximum power point tracker (MPPT) at second stage. In the solar system, two cascaded boost converters are controlled through a sliding mode controller (SMC) to regulate the power flow to the load. A supervisory control strategy is also introduced to maximize the simultaneous energy harvesting from both renewable sources and balance the energy between the sources, battery, and the load. According to the level of power generation available at each renewable energy source, the state of charge in the battery, and the load requirement, the controller results in four contingencies. Simulation results show the accurate operation of the supervisory controller and functionality of the maximum power point tracking algorithm for solar and for wind power. △ Less

Submitted 15 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: Conference: 41st Annual Conference of the IEEE Industrial Electronics Society, 2015 at Yokohama

arXiv:2007.02096 [pdf]

doi 10.1109/TMI.2021.3055428

Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

Authors: Yue Sun, Kun Gao, Zhengwang Wu, Zhihao Lei, Ying Wei, Jun Ma, ** Yang, Xue Feng, Li Zhao, Trung Le Phan, Jitae Shin, Tao Zhong, Yu Zhang, Lequan Yu, Caizi Li, Ramesh Basnet, M. Omair Ahmad, M. N. S. Swamy, Wenao Ma, Qi Dou, Toan Duc Bui, Camilo Bermudez Noguera, Bennett Landman, Ian H. Gotlib, Kathryn L. Humphreys , et al. (8 additional authors not shown)

Abstract: To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i… ▽ More To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue. △ Less

Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Medical Imaging, 40(5), 1363-1376, 2021

arXiv:2006.13417 [pdf]

Movie Box office Prediction via Joint Actor Representations and Social Media Sentiment

Authors: Dezhou Shen

Abstract: In recent years, driven by the Asian film industry, such as China and India, the global box office has maintained a steady growth trend. Previous studies have rarely used long-term, full-sample film data in analysis, lack of research on actors' social networks. Existing film box office prediction algorithms only use film meta-data, lack of using social network characteristics and the model is less… ▽ More In recent years, driven by the Asian film industry, such as China and India, the global box office has maintained a steady growth trend. Previous studies have rarely used long-term, full-sample film data in analysis, lack of research on actors' social networks. Existing film box office prediction algorithms only use film meta-data, lack of using social network characteristics and the model is less interpretable. I propose a FC-GRU-CNN binary classification model in of box office prediction task, combining five characteristics, including the film meta-data, Sina Weibo text sentiment, actors' social network measurement, all pairs shortest path and actors' art contribution. Exploiting long-term memory ability of GRU layer in long sequences and the map** ability of CNN layer in retrieving all pairs shortest path matrix features, proposed model is 14% higher in accuracy than the current best C-LSTM model. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 9 pages, 3 figures, 4 tables

ACM Class: I.2.4; J.4

arXiv:2006.13412 [pdf]

Lower Bounds on Rate of Convergence of Matrix Products in All Pairs Shortest Path of Social Network

Authors: Dezhou Shen

Abstract: With the rapid development of social network applications, social network has become an important medium for people to interact. For the minimum distance computation of all pairs in networks, Alon N[4] proposed an algorithm with matrix multiplication, combining with distance product association law and block matrix multiplication, all pairs shortest path length algorithm on networks has time bound… ▽ More With the rapid development of social network applications, social network has become an important medium for people to interact. For the minimum distance computation of all pairs in networks, Alon N[4] proposed an algorithm with matrix multiplication, combining with distance product association law and block matrix multiplication, all pairs shortest path length algorithm on networks has time bound O((2n^3)/B logn). In practical applications, considering the scale-free characteristics of social networks and the precision limitations of floating-point operations on computer hardware, I found that the shortest path algorithm has an improved time bound O((14n^3)/B). Based on the above theory, I propose an all pairs shortest path algorithm that combines sparseness judgment and convergence judgment, leveraging the distance product algorithm with matrix multiplication, distance product association law, block matrix multiplication, scale-free characteristics of social networks, and limitation of floating-point operations on hardware. Testing on a social network dataset with 8508 actors, compared to Alon N algorithm, proposed algorithm has a performance improvement of 39% to 36.2 times on CPU and GPU. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 9 pages, 1 figure, 4 tables

ACM Class: F.2.2

arXiv:2006.08858 [pdf, other]

Generative Semantic Hashing Enhanced via Boltzmann Machines

Authors: Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen

Abstract: Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training, existing generative-hashing methods mostly assume a factorized form for the posterior distribution, enforcing independence among the bits of hash codes. From the perspectives of both model representation and code… ▽ More Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training, existing generative-hashing methods mostly assume a factorized form for the posterior distribution, enforcing independence among the bits of hash codes. From the perspectives of both model representation and code space size, independence is always not the best assumption. In this paper, to introduce correlations among the bits of hash codes, we propose to employ the distribution of Boltzmann machine as the variational posterior. To address the intractability issue of training, we first develop an approximate method to reparameterize the distribution of a Boltzmann machine by augmenting it as a hierarchical concatenation of a Gaussian-like distribution and a Bernoulli distribution. Based on that, an asymptotically-exact lower bound is further derived for the evidence lower bound (ELBO). With these novel techniques, the entire model can be optimized efficiently. Extensive experimental results demonstrate that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains. △ Less

Submitted 15 June, 2020; originally announced June 2020.

arXiv:2006.07691 [pdf, other]

Synthetic Interventions

Authors: Anish Agarwal, Devavrat Shah, Dennis Shen

Abstract: Consider a setting with $N$ heterogeneous units (e.g., individuals, sub-populations) and $D$ interventions (e.g., socio-economic policies). Our goal is to learn the expected potential outcome associated with every intervention on every unit, totaling $N \times D$ causal parameters. Towards this, we present a causal framework, synthetic interventions (SI), to infer these $N \times D$ causal paramet… ▽ More Consider a setting with $N$ heterogeneous units (e.g., individuals, sub-populations) and $D$ interventions (e.g., socio-economic policies). Our goal is to learn the expected potential outcome associated with every intervention on every unit, totaling $N \times D$ causal parameters. Towards this, we present a causal framework, synthetic interventions (SI), to infer these $N \times D$ causal parameters while only observing each of the $N$ units under at most two interventions, independent of $D$. This can be significant as the number of interventions, i.e., level of personalization, grows. Under a novel tensor factor model across units, outcomes, and interventions, we prove an identification result for each of these $N \times D$ causal parameters, establish finite-sample consistency of our estimator along with asymptotic normality under additional conditions. Importantly, our estimator also allows for latent confounders that determine how interventions are assigned. The estimator is further furnished with data-driven tests to examine its suitability. Empirically, we validate our framework through a large-scale A/B test performed on an e-commerce platform. We believe our results could have implications for the design of data-efficient randomized experiments (e.g., randomized control trials) with heterogeneous units and multiple interventions. △ Less

Submitted 31 October, 2023; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2006.00693 [pdf, other]

Improving Disentangled Text Representation Learning with Information-Theoretic Guidance

Authors: Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, Yizhe Zhang, Yitong Li, Lawrence Carin

Abstract: Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g.,… ▽ More Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g., the manipulation over the data space cannot be easily achieved). Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text, without any supervision on semantics. A new mutual information upper bound is derived and leveraged to measure dependence between style and content. By minimizing this upper bound, the proposed method induces style and content embeddings into two independent low-dimensional spaces. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation in terms of content and style preservation. △ Less

Submitted 12 January, 2022; v1 submitted 31 May, 2020; originally announced June 2020.

Comments: Accepted by the 58th Annual Meeting of the Association for Computational Linguistics (ACL2020)

arXiv:2005.10439 [pdf, other]

HF-UNet: Learning Hierarchically Inter-Task Relevance in Multi-Task U-Net for Accurate Prostate Segmentation

Authors: Kelei He, Chunfeng Lian, Bing Zhang, Xin Zhang, Xiaohuan Cao, Dong Nie, Yang Gao, Junfeng Zhang, Dinggang Shen

Abstract: Accurate segmentation of the prostate is a key step in external beam radiation therapy treatments. In this paper, we tackle the challenging task of prostate segmentation in CT images by a two-stage network with 1) the first stage to fast localize, and 2) the second stage to accurately segment the prostate. To precisely segment the prostate in the second stage, we formulate prostate segmentation in… ▽ More Accurate segmentation of the prostate is a key step in external beam radiation therapy treatments. In this paper, we tackle the challenging task of prostate segmentation in CT images by a two-stage network with 1) the first stage to fast localize, and 2) the second stage to accurately segment the prostate. To precisely segment the prostate in the second stage, we formulate prostate segmentation into a multi-task learning framework, which includes a main task to segment the prostate, and an auxiliary task to delineate the prostate boundary. Here, the second task is applied to provide additional guidance of unclear prostate boundary in CT images. Besides, the conventional multi-task deep networks typically share most of the parameters (i.e., feature representations) across all tasks, which may limit their data fitting ability, as the specificities of different tasks are inevitably ignored. By contrast, we solve them by a hierarchically-fused U-Net structure, namely HF-UNet. The HF-UNet has two complementary branches for two tasks, with the novel proposed attention-based task consistency learning block to communicate at each level between the two decoding branches. Therefore, HF-UNet endows the ability to learn hierarchically the shared representations for different tasks, and preserve the specificities of learned representations for different tasks simultaneously. We did extensive evaluations of the proposed method on a large planning CT image dataset, including images acquired from 339 patients. The experimental results show HF-UNet outperforms the conventional multi-task network architectures and the state-of-the-art methods. △ Less

Submitted 23 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:2005.09230 [pdf, other]

An Auto-Context Deformable Registration Network for Infant Brain MRI

Authors: Dongming Wei, Sahar Ahmad, Yunzhi Huang, Lei Ma, Zhengwang Wu, Gang Li, Li Wang, Qian Wang, Pew-Thian Yap, Dinggang Shen

Abstract: Deformable image registration is fundamental to longitudinal and population analysis. Geometric alignment of the infant brain MR images is challenging, owing to rapid changes in image appearance in association with brain development. In this paper, we propose an infant-dedicated deep registration network that uses the auto-context strategy to gradually refine the deformation fields to obtain highl… ▽ More Deformable image registration is fundamental to longitudinal and population analysis. Geometric alignment of the infant brain MR images is challenging, owing to rapid changes in image appearance in association with brain development. In this paper, we propose an infant-dedicated deep registration network that uses the auto-context strategy to gradually refine the deformation fields to obtain highly accurate correspondences. Instead of training multiple registration networks, our method estimates the deformation fields by invoking a single network multiple times for iterative deformation refinement. The final deformation field is obtained by the incremental composition of the deformation fields. Experimental results in comparison with state-of-the-art registration methods indicate that our method achieves higher accuracy while at the same time preserves the smoothness of the deformation fields. Our implementation is available online. △ Less

Submitted 5 July, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:2005.09212 [pdf, other]

A Self-ensembling Framework for Semi-supervised Knee Cartilage Defects Assessment with Dual-Consistency

Authors: Jiayu Huo, Li** Si, Xi Ouyang, Kai Xuan, Weiwu Yao, Zhong Xue, Qian Wang, Dinggang Shen, Lichi Zhang

Abstract: Knee osteoarthritis (OA) is one of the most common musculoskeletal disorders and requires early-stage diagnosis. Nowadays, the deep convolutional neural networks have achieved greatly in the computer-aided diagnosis field. However, the construction of the deep learning models usually requires great amounts of annotated data, which is generally high-cost. In this paper, we propose a novel approach… ▽ More Knee osteoarthritis (OA) is one of the most common musculoskeletal disorders and requires early-stage diagnosis. Nowadays, the deep convolutional neural networks have achieved greatly in the computer-aided diagnosis field. However, the construction of the deep learning models usually requires great amounts of annotated data, which is generally high-cost. In this paper, we propose a novel approach for knee cartilage defects assessment, including severity classification and lesion localization. This can be treated as a subtask of knee OA diagnosis. Particularly, we design a self-ensembling framework, which is composed of a student network and a teacher network with the same structure. The student network learns from both labeled data and unlabeled data and the teacher network averages the student model weights through the training course. A novel attention loss function is developed to obtain accurate attention masks. With dual-consistency checking of the attention in the lesion classification and localization, the two networks can gradually optimize the attention distribution and improve the performance of each other, whereas the training relies on partially labeled data only and follows the semi-supervised manner. Experiments show that the proposed method can significantly improve the self-ensembling performance in both knee cartilage defects classification and localization, and also greatly reduce the needs of annotated data. △ Less

Submitted 12 October, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: accepted by International Workshop on PRedictive Intelligence In MEdicine, 2020

arXiv:2005.07462 [pdf, other]

MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling

Authors: Kelei He, Chunfeng Lian, Ehsan Adeli, **g Huo, Yang Gao, Bing Zhang, Junfeng Zhang, Dinggang Shen

Abstract: Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmenta… ▽ More Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmentation. To address this problem, we propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate by a multi-task UNet architecture. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Therefore, the proposed network has a dual-branch architecture that tackles two tasks: 1) a segmentation sub-network aiming to generate the prostate segmentation, and 2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss. Specifically, the voxel-metric learning sub-network samples tuples (including triplets and pairs) in voxel-level through the intermediate feature maps. Unlike conventional deep metric learning methods that generate triplets or pairs in image-level before the training phase, our proposed voxel-wise tuples are sampled in an online manner and operated in an end-to-end fashion via multi-task learning. To evaluate the proposed method, we implement extensive experiments on a real CT image dataset consisting of 339 patients. The ablation studies show that our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss. And the comparisons show that the proposed method outperforms the state-of-the-art methods by a reasonable margin. △ Less

Submitted 23 January, 2021; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:2005.04043 [pdf, ps, other]

Hypergraph Learning for Identification of COVID-19 with CT Imaging

Authors: Donglin Di, Feng Shi, Fuhua Yan, Liming Xia, Zhanhao Mo, Zhongxiang Ding, Fei Shan, Shengrui Li, Ying Wei, Ying Shao, Miaofei Han, Yaozong Gao, He Sui, Yue Gao, Dinggang Shen

Abstract: The coronavirus disease, named COVID-19, has become the largest global public health crisis since it started in early 2020. CT imaging has been used as a complementary tool to assist early screening, especially for the rapid identification of COVID-19 cases from community acquired pneumonia (CAP) cases. The main challenge in early screening is how to model the confusing cases in the COVID-19 and C… ▽ More The coronavirus disease, named COVID-19, has become the largest global public health crisis since it started in early 2020. CT imaging has been used as a complementary tool to assist early screening, especially for the rapid identification of COVID-19 cases from community acquired pneumonia (CAP) cases. The main challenge in early screening is how to model the confusing cases in the COVID-19 and CAP groups, with very similar clinical manifestations and imaging features. To tackle this challenge, we propose an Uncertainty Vertex-weighted Hypergraph Learning (UVHL) method to identify COVID-19 from CAP using CT images. In particular, multiple types of features (including regional features and radiomics features) are first extracted from CT image for each case. Then, the relationship among different cases is formulated by a hypergraph structure, with each case represented as a vertex in the hypergraph. The uncertainty of each vertex is further computed with an uncertainty score measurement and used as a weight in the hypergraph. Finally, a learning process of the vertex-weighted hypergraph is used to predict whether a new testing case belongs to COVID-19 or not. Experiments on a large multi-center pneumonia dataset, consisting of 2,148 COVID-19 cases and 1,182 CAP cases from five hospitals, are conducted to evaluate the performance of the proposed method. Results demonstrate the effectiveness and robustness of our proposed method on the identification of COVID-19 in comparison to state-of-the-art methods. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2005.03832 [pdf, other]

Synergistic Learning of Lung Lobe Segmentation and Hierarchical Multi-Instance Classification for Automated Severity Assessment of COVID-19 in CT Images

Authors: Kelei He, Wei Zhao, Xingzhi Xie, Wen Ji, Mingxia Liu, Zhenyu Tang, Feng Shi, Yang Gao, Jun Liu, Junfeng Zhang, Dinggang Shen

Abstract: Understanding chest CT imaging of the coronavirus disease 2019 (COVID-19) will help detect infections early and assess the disease progression. Especially, automated severity assessment of COVID-19 in CT images plays an essential role in identifying cases that are in great need of intensive clinical care. However, it is often challenging to accurately assess the severity of this disease in CT imag… ▽ More Understanding chest CT imaging of the coronavirus disease 2019 (COVID-19) will help detect infections early and assess the disease progression. Especially, automated severity assessment of COVID-19 in CT images plays an essential role in identifying cases that are in great need of intensive clinical care. However, it is often challenging to accurately assess the severity of this disease in CT images, due to variable infection regions in the lungs, similar imaging biomarkers, and large inter-case variations. To this end, we propose a synergistic learning framework for automated severity assessment of COVID-19 in 3D CT images, by jointly performing lung lobe segmentation and multi-instance classification. Considering that only a few infection regions in a CT image are related to the severity assessment, we first represent each input image by a bag that contains a set of 2D image patches (with each cropped from a specific slice). A multi-task multi-instance deep network (called M$^2$UNet) is then developed to assess the severity of COVID-19 patients and also segment the lung lobe simultaneously. Our M$^2$UNet consists of a patch-level encoder, a segmentation sub-network for lung lobe segmentation, and a classification sub-network for severity assessment (with a unique hierarchical multi-instance learning strategy). Here, the context information provided by segmentation can be implicitly employed to improve the performance of severity assessment. Extensive experiments were performed on a real COVID-19 CT image dataset consisting of 666 chest CT images, with results suggesting the effectiveness of our proposed method compared to several state-of-the-art methods. △ Less

Submitted 24 May, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

arXiv:2005.03405 [pdf, ps, other]

doi 10.1016/j.media.2020.101824

Joint Prediction and Time Estimation of COVID-19 Develo** Severe Symptoms using Chest CT Scan

Authors: Xiaofeng Zhu, Bin Song, Feng Shi, Yanbo Chen, Rongyao Hu, Jiangzhang Gan, Wenhai Zhang, Man Li, Liye Wang, Yaozong Gao, Fei Shan, Dinggang Shen

Abstract: With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the time that patients might convert to the severe stage, for designing effective treatment plan and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop… ▽ More With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the time that patients might convert to the severe stage, for designing effective treatment plan and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time, and if yes, predict the possible conversion time that the patient would spend to convert to the severe stage. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification, and 2) the weight for each feature via a sparsity regularization term to remove the redundant features of high-dimensional data and learn the shared information across the classification task and the regression task. To our knowledge, this study is the first work to predict the disease progression and the conversion time, which could help clinicians to deal with the potential severe cases in time or even save the patients' lives. Experimental analysis was conducted on a real data set from two hospitals with 422 chest computed tomography (CT) scans, where 52 cases were converted to severe on average 5.64 days and 34 cases were severe at admission. Results show that our method achieves the best classification (e.g., 85.91% of accuracy) and regression (e.g., 0.462 of the correlation coefficient) performance, compared to all comparison methods. Moreover, our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Journal ref: Medical Image Analysis (2020)

arXiv:2005.03264 [pdf, other]

Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT

Authors: Liang Sun, Zhanhao Mo, Fuhua Yan, Liming Xia, Fei Shan, Zhongxiang Ding, Wei Shao, Feng Shi, Huan Yuan, Huiting Jiang, Dijia Wu, Ying Wei, Yaozong Gao, Wanchun Gao, He Sui, Daoqiang Zhang, Dinggang Shen

Abstract: Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19… ▽ More Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19 classification based on chest CT images. Specifically, we first extract location-specific features from CT images. Then, in order to capture the high-level representation of these features with the relatively small-scale data, we leverage a deep forest model to learn high-level representation of the features. Moreover, we propose a feature selection method based on the trained deep forest model to reduce the redundancy of features, where the feature selection could be adaptively incorporated with the COVID-19 classification model. We evaluated our proposed AFS-DF on COVID-19 dataset with 1495 patients of COVID-19 and 1027 patients of community acquired pneumonia (CAP). The accuracy (ACC), sensitivity (SEN), specificity (SPE) and AUC achieved by our method are 91.79%, 93.05%, 89.95% and 96.35%, respectively. Experimental results on the COVID-19 dataset suggest that the proposed AFS-DF achieves superior performance in COVID-19 vs. CAP classification, compared with 4 widely used machine learning methods. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2005.03227 [pdf, other]

doi 10.1109/TMI.2020.2992546

Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent Multi-View Representation Learning

Authors: Hengyuan Kang, Liming Xia, Fuhua Yan, Zhibin Wan, Feng Shi, Huan Yuan, Huiting Jiang, Dijia Wu, He Sui, Changqing Zhang, Dinggang Shen

Abstract: Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world. Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed, and could largely reduce the efforts of clinicians and accelerate the diagnosis process. Chest computed tomography (CT) has been recognized as an info… ▽ More Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world. Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed, and could largely reduce the efforts of clinicians and accelerate the diagnosis process. Chest computed tomography (CT) has been recognized as an informative tool for diagnosis of the disease. In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images. To fully explore multiple features describing CT images from different views, a unified latent representation is learned which can completely encode information from different aspects of features and is endowed with promising class structure for separability. Specifically, the completeness is guaranteed with a group of backward neural networks (each for one type of features), while by using class labels the representation is enforced to be compact within COVID-19/community-acquired pneumonia (CAP) and also a large margin is guaranteed between different types of pneumonia. In this way, our model can well avoid overfitting compared to the case of directly projecting highdimensional features into classes. Extensive experimental results show that the proposed method outperforms all comparison methods, and rather stable performances are observed when varying the numbers of training data. △ Less

Submitted 6 May, 2020; originally announced May 2020.

Journal ref: IEEE Transactions on Medical Imaging (2020)

arXiv:2005.02690 [pdf, other]

Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired Pneumonia

Authors: Xi Ouyang, Jiayu Huo, Liming Xia, Fei Shan, Jun Liu, Zhanhao Mo, Fuhua Yan, Zhongxiang Ding, Qi Yang, Bin Song, Feng Shi, Huan Yuan, Ying Wei, Xiaohuan Cao, Yaozong Gao, Dijia Wu, Qian Wang, Dinggang Shen

Abstract: The coronavirus disease (COVID-19) is rapidly spreading all over the world, and has infected more than 1,436,000 people in more than 200 countries and territories as of April 9, 2020. Detecting COVID-19 at early stage is essential to deliver proper healthcare to the patients and also to protect the uninfected population. To this end, we develop a dual-sampling attention network to automatically di… ▽ More The coronavirus disease (COVID-19) is rapidly spreading all over the world, and has infected more than 1,436,000 people in more than 200 countries and territories as of April 9, 2020. Detecting COVID-19 at early stage is essential to deliver proper healthcare to the patients and also to protect the uninfected population. To this end, we develop a dual-sampling attention network to automatically diagnose COVID- 19 from the community acquired pneumonia (CAP) in chest computed tomography (CT). In particular, we propose a novel online attention module with a 3D convolutional network (CNN) to focus on the infection regions in lungs when making decisions of diagnoses. Note that there exists imbalanced distribution of the sizes of the infection regions between COVID-19 and CAP, partially due to fast progress of COVID-19 after symptom onset. Therefore, we develop a dual-sampling strategy to mitigate the imbalanced learning. Our method is evaluated (to our best knowledge) upon the largest multi-center CT data for COVID-19 from 8 hospitals. In the training-validation stage, we collect 2186 CT scans from 1588 patients for a 5-fold cross-validation. In the testing stage, we employ another independent large-scale testing dataset including 2796 CT scans from 2057 patients. Results show that our algorithm can identify the COVID-19 images with the area under the receiver operating characteristic curve (AUC) value of 0.944, accuracy of 87.5%, sensitivity of 86.9%, specificity of 90.1%, and F1-score of 82.0%. With this performance, the proposed algorithm could potentially aid radiologists with COVID-19 diagnosis from CAP, especially in the early stage of the COVID-19 outbreak. △ Less

Submitted 19 May, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: accepted by IEEE Transactions on Medical Imaging, 2020

arXiv:2005.01279 [pdf, other]

Improving Adversarial Text Generation by Modeling the Distant Future

Authors: Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen, Lawrence Carin

Abstract: Auto-regressive text generation models usually focus on local fluency, and may cause inconsistent semantic meaning in long text generation. Further, automatically generating words with similar semantics is challenging, and hand-crafted linguistic rules are difficult to apply. We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned is… ▽ More Auto-regressive text generation models usually focus on local fluency, and may cause inconsistent semantic meaning in long text generation. Further, automatically generating words with similar semantics is challenging, and hand-crafted linguistic rules are difficult to apply. We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. Specifically, we propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments demonstrate that the proposed method leads to improved performance. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: ACL 2020. arXiv admin note: substantial text overlap with arXiv:1811.00696

Showing 251–300 of 472 results for author: Shen, D