Search | arXiv e-print repository

Novel three-dimensional Fermi surface and electron-correlation-induced charge density wave in FeGe

Authors: Lin Wu, Yating Hu, Di Wang, Xiangang Wan

Abstract: As the first magnetic kagome material to exhibit the charge density wave (CDW) order, FeGe has attracted much attention in recent studies. Similar to AV$_{3}$Sb$_{5}$ (A = K, Cs, Rb), FeGe exhibits the CDW pattern with an in-plane 2$\times $2 structure and the existence of van Hove singularities (vHSs) near the Fermi level. However, sharply different from AV$_{3}$Sb$_{5}$ which has phonon instabil… ▽ More As the first magnetic kagome material to exhibit the charge density wave (CDW) order, FeGe has attracted much attention in recent studies. Similar to AV$_{3}$Sb$_{5}$ (A = K, Cs, Rb), FeGe exhibits the CDW pattern with an in-plane 2$\times $2 structure and the existence of van Hove singularities (vHSs) near the Fermi level. However, sharply different from AV$_{3}$Sb$_{5}$ which has phonon instability at $M$ point, all the theoretically calculated phonon frequencies in FeGe remain positive. Here, we perform a comprehensive study of the band structures, Fermi surfaces and nesting function of FeGe through first-principles calculations. Surprisingly, we find that the maximum of nesting function is at $K$ point instead of $M$ point. Two Fermi pockets with Fe-$d_{xz}$ and Fe-$d_{x^{2}-y^{2}}$/$d_{xy}$ orbital characters have large contribution to the Fermi nesting, which evolve significantly with $k_{z}$, indicating the highly three-dimensional (3D) feature of FeGe in contrast to AV$_{3}$Sb$_{5}$. Meanwhile, the vHSs are close to the Fermi surface only in a small $k_{z}$ range, and does not play a leading role in nesting function. Considering the effect of local Coulomb interaction, we reveal that the Fermi level eigenstates nested by vector $K$ are mainly distributed from unequal sublattice occupancy, thus the instability at $K$ point is significantly suppressed. Meanwhile, the wave functions nested by vector $M$ have many ingredients located at the same Fe site, thus the instability at $M$ point is enhanced. This indicates that the electron correlation, rather than electron-phonon interaction, plays a key role in the CDW transition at $M$ point. △ Less

Submitted 13 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Journal ref: Chinese Physics Letters 40, 117103 (2023)

arXiv:2301.13538 [pdf, other]

AMD: Adaptive Masked Distillation for Object Detection

Authors: Guang Yang, Yin Tang, Jun Li, Jianhua Xu, Xili Wan

Abstract: As a general model compression paradigm, feature-based knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct i… ▽ More As a general model compression paradigm, feature-based knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct important feature regions, we first perform attention-guided feature masking on the feature map of the student network, such that we can identify the important features via spatially adaptive feature masking instead of random masking in the previous methods. In addition, we employ a simple and efficient module to allow the student network channel to be adaptive, improving its model capability in object perception and detection. In contrast to the previous methods, more crucial object-aware features can be reconstructed and learned from the proposed network, which is conducive to accurate object detection. The empirical experiments demonstrate the superiority of our method: with the help of our proposed distillation method, the student networks report 41.3%, 42.4%, and 42.7% mAP scores when RetinaNet, Cascade Mask-RCNN and RepPoints are respectively used as the teacher framework for object detection, which outperforms the previous state-of-the-art distillation methods including FGD and MGD. △ Less

Submitted 10 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.12132 [pdf, other]

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

Authors: Han Zhou, Xingchen Wan, Ivan Vulić, Anna Korhonen

Abstract: Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as… ▽ More Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection: we first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimisation in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs. △ Less

Submitted 29 January, 2024; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: Accepted to TACL; pre-MIT Press publication version

arXiv:2301.06277 [pdf, ps, other]

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

Authors: Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou

Abstract: As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and leverage the speaker cue to benefit the extracted speech quality. The cue extraction method adopted in majority existing TSE studies is to directly utilize… ▽ More As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and leverage the speaker cue to benefit the extracted speech quality. The cue extraction method adopted in majority existing TSE studies is to directly utilize discriminative speaker embedding, which is extracted from the pre-trained models for speaker verification. Although the high speaker discriminability is a most desirable property for speaker verification task, we argue that it may be too sophisticated for TSE. In this study, we propose that a simplified speaker cue with clear class separability might be preferred for TSE. To verify our proposal, we introduce several forms of speaker cues, including naive speaker embedding (such as, x-vector and xi-vector) and new speaker embeddings produced from sparse LDA-transform. Corresponding TSE models are built by integrating these speaker cues with SepFormer (one SOTA speech separation model). Performances of these TSE models are examined on the benchmark WSJ0-2mix dataset. Experimental results validate the effectiveness and generalizability of our proposal, showing up to 9.9% relative improvement in SI-SDRi. Moreover, with SI-SDRi of 19.4 dB and PESQ of 3.78, our best TSE system significantly outperforms the current SOTA systems and offers the top TSE results reported till date on the WSJ0-2mix. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: ACCEPTED by NCMMSC 2022

arXiv:2301.04904 [pdf, other]

Lesion-aware Dynamic Kernel for Polyp Segmentation

Authors: Ruifei Zhang, Peiwen Lai, Xiang Wan, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li

Abstract: Automatic and accurate polyp segmentation plays an essential role in early colorectal cancer diagnosis. However, it has always been a challenging task due to 1) the diverse shape, size, brightness and other appearance characteristics of polyps, 2) the tiny contrast between concealed polyps and their surrounding regions. To address these problems, we propose a lesion-aware dynamic network (LDNet) f… ▽ More Automatic and accurate polyp segmentation plays an essential role in early colorectal cancer diagnosis. However, it has always been a challenging task due to 1) the diverse shape, size, brightness and other appearance characteristics of polyps, 2) the tiny contrast between concealed polyps and their surrounding regions. To address these problems, we propose a lesion-aware dynamic network (LDNet) for polyp segmentation, which is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme. Specifically, the designed segmentation head is conditioned on the global context features of the input image and iteratively updated by the extracted lesion features according to polyp segmentation predictions. This simple but effective scheme endows our model with powerful segmentation performance and generalization capability. Besides, we utilize the extracted lesion representation to enhance the feature contrast between the polyp and background regions by a tailored lesion-aware cross-attention module (LCA), and design an efficient self-attention module (ESA) to capture long-range context relations, further improving the segmentation accuracy. Extensive experiments on four public polyp benchmarks and our collected large-scale polyp dataset demonstrate the superior performance of our method compared with other state-of-the-art approaches. The source code is available at https://github.com/ReaFly/LDNet. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: Accepted by MICCAI2022

arXiv:2301.03950 [pdf, other]

doi 10.1017/fms.2023.125

Positivity of Schur forms for strongly decomposably positive vector bundles

Authors: Xueyuan Wan

Abstract: In this paper, we define two types of strongly decomposable positivity, which serve as generalizations of (dual) Nakano positivity and are stronger than the decomposable positivity introduced by S. Finski. We provide the criteria for strongly decomposable positivity of type I and type II and prove that the Schur forms of a strongly decomposable positive vector bundle of type I are weakly positive,… ▽ More In this paper, we define two types of strongly decomposable positivity, which serve as generalizations of (dual) Nakano positivity and are stronger than the decomposable positivity introduced by S. Finski. We provide the criteria for strongly decomposable positivity of type I and type II and prove that the Schur forms of a strongly decomposable positive vector bundle of type I are weakly positive, while the Schur forms of a strongly decomposable positive vector bundle of type II are positive. These answer a question of Griffiths affirmatively for strongly decomposably positive vector bundles. Consequently, we present an algebraic proof of the positivity of Schur forms for (dual) Nakano positive vector bundles, which was initially proven by S. Finski. △ Less

Submitted 7 December, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: 31 pages, 1 figure, final version, to appear in Forum of Mathematics, Sigma

arXiv:2212.13963 [pdf, other]

First-principles study of spin orbit coupling contribution to anisotropic magnetic interaction

Authors: Di Wang, Xiangyan Bo, Feng Tang, Xiangang Wan

Abstract: Anisotropic magnetic exchange interactions lead to a surprisingly rich variety of the magnetic properties. Considering the spin orbit coupling (SOC) as perturbation, we extract the general expression of a bilinear spin Hamiltonian, including isotropic exchange interaction, antisymmetric Dzyaloshinskii-Moriya (DM) interaction and symmetric $Γ$ term. Though it is commonly believed that the magnitude… ▽ More Anisotropic magnetic exchange interactions lead to a surprisingly rich variety of the magnetic properties. Considering the spin orbit coupling (SOC) as perturbation, we extract the general expression of a bilinear spin Hamiltonian, including isotropic exchange interaction, antisymmetric Dzyaloshinskii-Moriya (DM) interaction and symmetric $Γ$ term. Though it is commonly believed that the magnitude of the DM and $Γ$ interaction correspond to the first and second order of SOC strength $% λ$ respectively, we clarify that the term proportional to $λ^{2}$ also has contribution to DM interaction. Based on combining magnetic force theorem and linear-response approach, we have presented the method of calculating anisotropic magnetic interactions, which now has been implemented in the open source software WienJ. Furthermore, we introduce another method which could calculate the first and second order SOC contribution to the DM interaction separately, and overcome some shortcomings of previous methods. Our methods are successfully applied to several typical weak ferromagnets for $3d$, $4d$ and $5d$ transition metal oxides. We also predict the conditions where the DM interactions proportional to $λ$ are symmetrically forbidden while the DM interactions proportional to $λ^{2}$ are nonzero, and believe that it is widespread in certain magnetic materials. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Journal ref: Phys. Rev. B 108, 085140 (2023)

arXiv:2212.10305 [pdf, other]

doi 10.1109/TMI.2022.3221666

Which Pixel to Annotate: a Label-Efficient Nuclei Segmentation Framework

Authors: Wei Lou, Haofeng Li, Guanbin Li, Xiaoguang Han, Xiang Wan

Abstract: Recently deep neural networks, which require a large amount of annotated samples, have been widely applied in nuclei instance segmentation of H\&E stained pathology images. However, it is inefficient and unnecessary to label all pixels for a dataset of nuclei images which usually contain similar and redundant patterns. Although unsupervised and semi-supervised learning methods have been studied fo… ▽ More Recently deep neural networks, which require a large amount of annotated samples, have been widely applied in nuclei instance segmentation of H\&E stained pathology images. However, it is inefficient and unnecessary to label all pixels for a dataset of nuclei images which usually contain similar and redundant patterns. Although unsupervised and semi-supervised learning methods have been studied for nuclei segmentation, very few works have delved into the selective labeling of samples to reduce the workload of annotation. Thus, in this paper, we propose a novel full nuclei segmentation framework that chooses only a few image patches to be annotated, augments the training set from the selected samples, and achieves nuclei segmentation in a semi-supervised manner. In the proposed framework, we first develop a novel consistency-based patch selection method to determine which image patches are the most beneficial to the training. Then we introduce a conditional single-image GAN with a component-wise discriminator, to synthesize more training samples. Lastly, our proposed framework trains an existing segmentation model with the above augmented samples. The experimental results show that our proposed method could obtain the same-level performance as a fully-supervised baseline by annotating less than 5% pixels on some benchmarks. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: IEEE TMI 2022, Released code: https://github.com/lhaof/NuSeg

ACM Class: I.4.6

arXiv:2212.10171 [pdf, other]

Document-level Relation Extraction with Relation Correlations

Authors: Ridong Han, Tao Peng, Benyou Wang, Lu Liu, Xiang Wan

Abstract: Document-level relation extraction faces two overlooked challenges: long-tail problem and multi-label problem. Previous work focuses mainly on obtaining better contextual representations for entity pairs, hardly address the above challenges. In this paper, we analyze the co-occurrence correlation of relations, and introduce it into DocRE task for the first time. We argue that the correlations can… ▽ More Document-level relation extraction faces two overlooked challenges: long-tail problem and multi-label problem. Previous work focuses mainly on obtaining better contextual representations for entity pairs, hardly address the above challenges. In this paper, we analyze the co-occurrence correlation of relations, and introduce it into DocRE task for the first time. We argue that the correlations can not only transfer knowledge between data-rich relations and data-scarce ones to assist in the training of tailed relations, but also reflect semantic distance guiding the classifier to identify semantically close relations for multi-label entity pairs. Specifically, we use relation embedding as a medium, and propose two co-occurrence prediction sub-tasks from both coarse- and fine-grained perspectives to capture relation correlations. Finally, the learned correlation-aware embeddings are used to guide the extraction of relational facts. Substantial experiments on two popular DocRE datasets are conducted, and our method achieves superior results compared to baselines. Insightful analysis also demonstrates the potential of relation correlations to address the above challenges. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 13 pages

arXiv:2212.07019 [pdf]

Data-Driven Prediction and Evaluation on Future Impact of Energy Transition Policies in Smart Regions

Authors: Chunmeng Yang, Siqi Bu, Yi Fan, Wayne Xinwei Wan, Ruoheng Wang, Aoife Foley

Abstract: To meet widely recognised carbon neutrality targets, over the last decade metropolitan regions around the world have implemented policies to promote the generation and use of sustainable energy. Nevertheless, there is an availability gap in formulating and evaluating these policies in a timely manner, since sustainable energy capacity and generation are dynamically determined by various factors al… ▽ More To meet widely recognised carbon neutrality targets, over the last decade metropolitan regions around the world have implemented policies to promote the generation and use of sustainable energy. Nevertheless, there is an availability gap in formulating and evaluating these policies in a timely manner, since sustainable energy capacity and generation are dynamically determined by various factors along dimensions based on local economic prosperity and societal green ambitions. We develop a novel data-driven platform to predict and evaluate energy transition policies by applying an artificial neural network and a technology diffusion model. Using Singapore, London, and California as case studies of metropolitan regions at distinctive stages of energy transition, we show that in addition to forecasting renewable energy generation and capacity, the platform is particularly powerful in formulating future policy scenarios. We recommend global application of the proposed methodology to future sustainable energy transition in smart regions. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.03609 [pdf, ps, other]

Fractional Path Integrals and its degeneration to Dimensional Regularization

Authors: Zheng-Wei Cheng, You-Kai Wang, Xia Wan

Abstract: In this work we study particles propagate in a fractional path and use fractional derivatives to extend the dynamic dimension of Quantum Field Theory. we construct the Lagrangian of fractional scalar, vector and spinor fields to obtain their propagators by path integral. Then we compute the typical tree level and one loop diagrams which correspond to QED cases. The calculations show the dimension… ▽ More In this work we study particles propagate in a fractional path and use fractional derivatives to extend the dynamic dimension of Quantum Field Theory. we construct the Lagrangian of fractional scalar, vector and spinor fields to obtain their propagators by path integral. Then we compute the typical tree level and one loop diagrams which correspond to QED cases. The calculations show the dimension dependence of amplitudes. Additionally, in one loop calculation we obtain results which are consistent with dimensional regularization as the dimension approaches to the Standard Model value. Therefore, the fractional Path Integrals can be regarded as an equivalent theoretical representation for regularizing the divergence in the normal Quantum Field Theory. We also derive the equation of motion for scalar, vector and spinor particles propagate in fractal paths and discuss the corresponding gauge symmetry, where we find a special non-local gauge transformation. △ Less

Submitted 14 September, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: 17pages

arXiv:2212.02040 [pdf]

Discovery of a metallic oxide with ultralow thermal conductivity

Authors: Jianhong Dai, Zhehong Liu, Jialin Ji, Xuejuan Dong, Jihai Yu, Xubin Ye, Weipeng Wang, RiCheng Yu, Zhiwei Hu, Huaizhou Zhao, Xiangang Wan, Wenqing Zhang, Youwen Long

Abstract: A compound with metallic electrical conductivity usually has a considerable total thermal conductivity because both electrons and photons contribute to thermal transport. Here, we show an exceptional example of iridium oxide, Bi3Ir3O11, that concurrently displays metallic electrical conductivity and ultralow thermal conductivity approaching 0.61 W m-1 K-1 at 300 K. The compound crystallizes into a… ▽ More A compound with metallic electrical conductivity usually has a considerable total thermal conductivity because both electrons and photons contribute to thermal transport. Here, we show an exceptional example of iridium oxide, Bi3Ir3O11, that concurrently displays metallic electrical conductivity and ultralow thermal conductivity approaching 0.61 W m-1 K-1 at 300 K. The compound crystallizes into a cubic structural framework with space group Pn-3. The edge- and corner-sharing IrO6 octahedra with a mixed Ir4.33+ charge state favor metallic electrical transport. Bi3Ir3O11 exhibits an extremely low lattice thermal conductivity close to the minimum limit in theory owing to its tunnel-like structure with filled heavy atoms Bi rattling inside. Theoretical calculations reveal the underlying mechanisms for the extraordinary compatibility between metallic electrical conductivity and ultralow thermal conductivity. This study may establish a new avenue for designing and develo** unprecedented heat-insulation metals. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: 20 pages, 4 figures

arXiv:2211.15545 [pdf, other]

doi 10.1103/PhysRevB.108.035138

Magnetic interactions and possible structural distortion in kagome FeGe from first-principles study and symmetry analysis

Authors: Han**g Zhou, Songsong Yan, Dongze Fan, Di Wang, Xiangang Wan

Abstract: Based on density functional theory and symmetry analysis, we present a comprehensive investigation of electronic structure, magnetic properties and possible structural distortion of magnetic kagome metal FeGe. We estimate the magnetic parameters including Heisenberg and Dzyaloshinskii-Moriya (DM) interactions, and find that the ferromagnetic nearest-neighbor $J_{1}$ dominates over the others, whil… ▽ More Based on density functional theory and symmetry analysis, we present a comprehensive investigation of electronic structure, magnetic properties and possible structural distortion of magnetic kagome metal FeGe. We estimate the magnetic parameters including Heisenberg and Dzyaloshinskii-Moriya (DM) interactions, and find that the ferromagnetic nearest-neighbor $J_{1}$ dominates over the others, while the magnetic interactions between nearest kagome layers favors antiferromagnetic. The Néel temperature $T_{N}$ and Curie-Weiss temperature $θ_{CW}$ are successfully reproduced, and the calculated magnetic anisotropy energy is also in consistent with the experiment. However, these reasonable Heisenberg interactions and magnetic anisotropy cannot explain the double cone magnetic transition, and the DM interactions, which even exist in the centrosymmetric materials, can result in this small magnetic cone angle. Unfortunately, due to the crystal symmetry of the high-temperature structure, the net contribution of DM interactions to double cone magnetic structure is absent. Based on the experimental $2\times 2\times 2$ supercell, we thus explore the subgroups of the parent phase. Group theoretical analysis reveals that there are 68 different distortions, and only four of them (space group $P622$ or $P6_{3}22$) without inversion and mirror symmetry thus can explain the low-temperature magnetic structure. Furthermore, we suggest that these four proposed CDW phases can be identified by using Raman spectroscopy. Since DM interactions are very sensitive to small atomic displacements and symmetry restrictions, we believe that symmetry analysis is an effective method to reveal the interplay of delicate structural distortions and complex magnetic configurations. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Journal ref: Phys. Rev. B 108, 035138 (2023)

arXiv:2211.11618 [pdf, other]

doi 10.1103/PhysRevLett.130.216401

Topological exact flat bands in two dimensional materials under periodic strain

Authors: Xiaohan Wan, Siddhartha Sarkar, Shi-Zeng Lin, Kai Sun

Abstract: We study flat bands and their topology in 2D materials with quadratic band crossing points (QBCPs) under periodic strain. In contrast to Dirac points in graphene, where strain acts as a vector potential, strain for QBCPs serves as a director potential with angular momentum $\ell=2$. We prove that when the strengths of the strain fields hit certain ``magic" values, exact flat bands with $C=\pm 1$ e… ▽ More We study flat bands and their topology in 2D materials with quadratic band crossing points (QBCPs) under periodic strain. In contrast to Dirac points in graphene, where strain acts as a vector potential, strain for QBCPs serves as a director potential with angular momentum $\ell=2$. We prove that when the strengths of the strain fields hit certain ``magic" values, exact flat bands with $C=\pm 1$ emerge at charge neutrality point in the chiral limit, in strong analogy to magic angle twisted bilayer graphene. These flat bands have ideal quantum geometry for the realization of fractional Chern insulators, and they are always fragile topological. The number of flat bands can be doubled for certain point group, and the interacting Hamiltonian is exactly solvable at integer fillings. We further demonstrate the stability of these flat bands against deviations from the chiral limit, and discuss possible realization in 2D materials. △ Less

Submitted 26 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Journal ref: Phys. Rev. Lett. 130, 216401 (2023)

arXiv:2211.10992 [pdf, other]

How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation

Authors: Jie Ruan, Yue Wu, Xiaojun Wan, Yuesheng Zhu

Abstract: Sarcasm generation has been investigated in previous studies by considering it as a text-to-text generation problem, i.e., generating a sarcastic sentence for an input sentence. In this paper, we study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image. CMSG is challenging as models need to satisfy the characteristics of sarcasm, as w… ▽ More Sarcasm generation has been investigated in previous studies by considering it as a text-to-text generation problem, i.e., generating a sarcastic sentence for an input sentence. In this paper, we study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image. CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities. In addition, there should be some inconsistency between the two modalities, which requires imagination. Moreover, high-quality training data is insufficient. To address these problems, we take a step toward generating sarcastic descriptions from images without paired training data and propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation. Specifically, EGRM first extracts diverse information from an image at different levels and uses the obtained image tags, sentimental descriptive caption, and commonsense-based consequence to generate candidate sarcastic texts. Then, a comprehensive ranking algorithm, which considers image-text relation, sarcasticness, and grammaticality, is proposed to select a final text from the candidate texts. Human evaluation at five criteria on a total of 1200 generated image-text pairs from eight systems and auxiliary automatic evaluation show the superiority of our method. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.08909 [pdf]

Continuous Electrical Manipulation of Magnetic Anisotropy and Spin Flop** in van der Waals Ferromagnetic Devices

Authors: Ming Tang, Junwei Huang, Feng Qin, Kun Zhai, Toshiya Ideue, Zeya Li, Fanhao Meng, Anmin Nie, Linglu Wu, Xiangyu Bi, Caorong Zhang, Ling Zhou, Peng Chen, Caiyu Qiu, Peizhe Tang, Haijun Zhang, Xiangang Wan, Lin Wang, Zhongyuan Liu, Yongjun Tian, Yoshihiro Iwasa, Hongtao Yuan

Abstract: Controlling the magnetic anisotropy of ferromagnetic materials plays a key role in magnetic switching devices and spintronic applications. Examples of spin-orbit torque devices with different magnetic anisotropy geometries (in-plane or out-of-plane directions) have been demonstrated with novel magnetization switching mechanisms for extended device functionalities. Normally, the intrinsic magnetic… ▽ More Controlling the magnetic anisotropy of ferromagnetic materials plays a key role in magnetic switching devices and spintronic applications. Examples of spin-orbit torque devices with different magnetic anisotropy geometries (in-plane or out-of-plane directions) have been demonstrated with novel magnetization switching mechanisms for extended device functionalities. Normally, the intrinsic magnetic anisotropy in ferromagnetic materials is unchanged within a fixed direction, and thus, it is difficult to realize multifunctionality devices. Therefore, continuous modulation of magnetic anisotropy in ferromagnetic materials is highly desired but remains challenging. Here, we demonstrate a gate-tunable magnetic anisotropy transition from out-of-plane to canted and finally to in-plane in layered Fe$_5$GeTe$_2$ by combining the measurements of the angle-dependent anomalous Hall effect and magneto-optical Kerr effect with quantitative Stoner-Wohlfarth analysis. The magnetic easy axis continuously rotates in a spin-flop pathway by gating or temperature modulation. Such observations offer a new avenue for exploring magnetization switching mechanisms and realizing new spintronic functionalities. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 4 figures

arXiv:2211.08584 [pdf, other]

doi 10.18653/v1/2023.acl-short.41

Toward expanding the scope of radiology report summarization to multiple anatomies and modalities

Authors: Zhihong Chen, Maya Varma, Xiang Wan, Curtis Langlotz, Jean-Benoit Delbrouck

Abstract: Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproductio… ▽ More Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric. △ Less

Submitted 21 July, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023

arXiv:2211.07843 [pdf, other]

Error-Robust Retrieval for Chinese Spelling Check

Authors: Xunjian Yin, Xinyu Hu, ** Jiang, Xiaojun Wan

Abstract: Chinese Spelling Check (CSC) aims to detect and correct error tokens in Chinese contexts, which has a wide range of applications. However, it is confronted with the challenges of insufficient annotated data and the issue that previous methods may actually not fully leverage the existing datasets. In this paper, we introduce our plug-and-play retrieval method with error-robust information for Chine… ▽ More Chinese Spelling Check (CSC) aims to detect and correct error tokens in Chinese contexts, which has a wide range of applications. However, it is confronted with the challenges of insufficient annotated data and the issue that previous methods may actually not fully leverage the existing datasets. In this paper, we introduce our plug-and-play retrieval method with error-robust information for Chinese Spelling Check (RERIC), which can be directly applied to existing CSC models. The datastore for retrieval is built completely based on the training data, with elaborate designs according to the characteristics of CSC. Specifically, we employ multimodal representations that fuse phonetic, morphologic, and contextual information in the calculation of query and key during retrieval to enhance robustness against potential errors. Furthermore, in order to better judge the retrieved candidates, the n-gram surrounding the token to be checked is regarded as the value and utilized for specific reranking. The experiment results on the SIGHAN benchmarks demonstrate that our proposed method achieves substantial improvements over existing work. △ Less

Submitted 25 February, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: 11 pages, 3 figures

arXiv:2211.01543 [pdf, other]

Custodial Symmetry Violation in Scalar Extensions of the Standard Model

Authors: Huayang Song, Xia Wan, Jiang-Hao Yu

Abstract: The new measurement of the W boson mass from the CDF collaboration shows a significant tension with the Standard Model prediction, which evidences violation of custodial symmetry in the scalar sector. We study the scalar extensions of the Standard Model, which can be categorized into two classes, scalar sector with custodial symmetry (Georgi-Machacek model and its generalizations) and scalar secto… ▽ More The new measurement of the W boson mass from the CDF collaboration shows a significant tension with the Standard Model prediction, which evidences violation of custodial symmetry in the scalar sector. We study the scalar extensions of the Standard Model, which can be categorized into two classes, scalar sector with custodial symmetry (Georgi-Machacek model and its generalizations) and scalar sector without custodial symmetry, and explore how these extensions fit to the electroweak precision data and the CDF new $m_W$ . The favored oblique parameters are coming from either the large mass splitting in the multiplet via the loop contribution or the large vacuum expectation value which breaks custodial symmetry at the tree level. In particular, we find that $\mathcal{O}(100)$ GeV new particles are allowed in the scalar extension scenarios. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 24 pages, 4 figures

arXiv:2210.16098 [pdf, other]

Predicting Protein-Ligand Binding Affinity with Equivariant Line Graph Network

Authors: Yiqiang Yi, Xu Wan, Kangfei Zhao, Le Ou-Yang, Peilin Zhao

Abstract: Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local c… ▽ More Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local coordinate systems of the 3D complex. As a result, the method can not fully learn the global information of the complex, such as, the physical symmetry and the topological information of bonds. To address these issues, we propose a novel Equivariant Line Graph Network (ELGN) for affinity prediction of 3D protein ligand complexes. The proposed ELGN firstly adds a super node to the 3D complex, and then builds a line graph based on the 3D complex. After that, ELGN uses a new E(3)-equivariant network layer to pass the messages between nodes and edges based on the global coordinate system of the 3D complex. Experimental results on two real datasets demonstrate the effectiveness of ELGN over several state-of-the-art baselines. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.14402 [pdf, other]

Adaptive deep density approximation for fractional Fokker-Planck equations

Authors: Li Zeng, Xiaoliang Wan, Tao Zhou

Abstract: In this work, we propose adaptive deep learning approaches based on normalizing flows for solving fractional Fokker-Planck equations (FPEs). The solution of a FPE is a probability density function (PDF). Traditional mesh-based methods are ineffective because of the unbounded computation domain, a large number of dimensions and the nonlocal fractional operator. To this end, we represent the solutio… ▽ More In this work, we propose adaptive deep learning approaches based on normalizing flows for solving fractional Fokker-Planck equations (FPEs). The solution of a FPE is a probability density function (PDF). Traditional mesh-based methods are ineffective because of the unbounded computation domain, a large number of dimensions and the nonlocal fractional operator. To this end, we represent the solution with an explicit PDF model induced by a flow-based deep generative model, simplified KRnet, which constructs a transport map from a simple distribution to the target distribution. We consider two methods to approximate the fractional Laplacian. One method is the Monte Carlo approximation. The other method is to construct an auxiliary model with Gaussian radial basis functions (GRBFs) to approximate the solution such that we may take advantage of the fact that the fractional Laplacian of a Gaussian is known analytically. Based on these two different ways for the approximation of the fractional Laplacian, we propose two models, MCNF and GRBFNF, to approximate stationary FPEs and MCTNF to approximate time-dependent FPEs. To further improve the accuracy, we refine the training set and the approximate solution alternately. A variety of numerical examples is presented to demonstrate the effectiveness of our adaptive deep density approaches. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: 25 pages, 22 figures

arXiv:2210.12956 [pdf, other]

doi 10.1088/0256-307X/40/9/097301

Gate-tunable Lifshitz transition of Fermi arcs and its nonlocal transport signatures

Authors: Yue Zheng, Wei Chen, Xiangang Wan, D. Y. Xing

Abstract: One hallmark of the Weyl semimetal is the emergence of Fermi arcs (FAs) in the surface Brillouin zone that connect the projected Weyl nodes of opposite chirality. The unclosed FAs can give rise to various exotic effects that have attracted tremendous research interest. The configurations of the FAs are usually thought to be determined fully by the band topology of the bulk states, which seems impo… ▽ More One hallmark of the Weyl semimetal is the emergence of Fermi arcs (FAs) in the surface Brillouin zone that connect the projected Weyl nodes of opposite chirality. The unclosed FAs can give rise to various exotic effects that have attracted tremendous research interest. The configurations of the FAs are usually thought to be determined fully by the band topology of the bulk states, which seems impossible to manipulate. Here, we show that the FAs can be simply modified by a surface gate voltage. Because the penetration length of the surface states depends on the in-plane momentum, a surface gate voltage induces an effective energy dispersion. As a result, a continuous deformation of the surface band can be implemented by tuning the surface gate voltage. In particular, as the saddle point of the surface band meets the Fermi energy, the topological Lifshitz transition takes place for the FAs, during which the Weyl nodes switch their partners connected by the FAs. Accordingly, the magnetic Weyl orbits composed of the FAs on opposite surfaces and chiral Landau bands inside the bulk change its configurations. We show that such an effect can be probed by the nonlocal transport measurements in a magnetic field, in which the switch on and off of the nonlocal conductance by the surface gate voltage signals the Lifshitz transition. Our work opens a new route for manipulating the FAs by surface gates and exploring novel transport phenomena associated with the topological Lifshitz transition. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 9 pages, 5 figures

Journal ref: Chinese Phys. Lett. 40 097301 (2023 )

arXiv:2210.10312 [pdf, ps, other]

doi 10.1103/PhysRevB.106.144503

Competing ferromagnetic superconducting states in europium-based iron pnictides

Authors: Huai-Xiang Huang, Yu-Qian Cao, Xin Wan

Abstract: In europium-based iron pnictides superconducting Fe-planes can be influenced by a Zeeman field originated from the neighboring Eu-planes. The field tends to induce spin-density waves with a ferromagnetic average which coexists with the superconducting order by forming complementary patterns of the superconducting and magnetic order parameters in a Fulde-Ferrell-Larkin-Ovchinnikov phase and a two-d… ▽ More In europium-based iron pnictides superconducting Fe-planes can be influenced by a Zeeman field originated from the neighboring Eu-planes. The field tends to induce spin-density waves with a ferromagnetic average which coexists with the superconducting order by forming complementary patterns of the superconducting and magnetic order parameters in a Fulde-Ferrell-Larkin-Ovchinnikov phase and a two-dimensional textured-superconducting phase. The hard gap around the Fermi energy disappears in these fragile inhomogeneous superconducting states, which features, instead, V-shaped spin-resolved local density of states. The inhomogeneous states are also competing with either a homogeneous superconducting or a homogeneous ferromagnetic state, manifesting the intertwining influences of the magnetic orders in Fe and Eu planes, the spin-density wave band structure, and the superconducting pairing order. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 7pages,11figures

Journal ref: PRB 106,144503 (2022)

arXiv:2210.10199 [pdf, other]

Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization

Authors: Samuel Daulton, Xingchen Wan, David Eriksson, Maximilian Balandat, Michael A. Osborne, Eytan Bakshy

Abstract: Optimizing expensive-to-evaluate black-box functions of discrete (and potentially continuous) design parameters is a ubiquitous problem in scientific and engineering applications. Bayesian optimization (BO) is a popular, sample-efficient method that leverages a probabilistic surrogate model and an acquisition function (AF) to select promising designs to evaluate. However, maximizing the AF over mi… ▽ More Optimizing expensive-to-evaluate black-box functions of discrete (and potentially continuous) design parameters is a ubiquitous problem in scientific and engineering applications. Bayesian optimization (BO) is a popular, sample-efficient method that leverages a probabilistic surrogate model and an acquisition function (AF) to select promising designs to evaluate. However, maximizing the AF over mixed or high-cardinality discrete search spaces is challenging standard gradient-based methods cannot be used directly or evaluating the AF at every point in the search space would be computationally prohibitive. To address this issue, we propose using probabilistic reparameterization (PR). Instead of directly optimizing the AF over the search space containing discrete parameters, we instead maximize the expectation of the AF over a probability distribution defined by continuous parameters. We prove that under suitable reparameterizations, the BO policy that maximizes the probabilistic objective is the same as that which maximizes the AF, and therefore, PR enjoys the same regret bounds as the original BO policy using the underlying AF. Moreover, our approach provably converges to a stationary point of the probabilistic objective under gradient ascent using scalable, unbiased estimators of both the probabilistic objective and its gradient. Therefore, as the number of starting points and gradient steps increase, our approach will recover of a maximizer of the AF (an often-neglected requisite for commonly used BO regret bounds). We validate our approach empirically and demonstrate state-of-the-art optimization performance on a wide range of real-world applications. PR is complementary to (and benefits) recent work and naturally generalizes to settings with multiple objectives and black-box constraints. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: To appear in Advances in Neural Information Processing Systems 35, 2022. Code available at: https://github.com/facebookresearch/bo_pr

arXiv:2210.08859 [pdf, other]

Social Biases in Automatic Evaluation Metrics for NLG

Authors: Mingqi Gao, Xiaojun Wan

Abstract: Many studies have revealed that word embeddings, language models, and models for specific downstream tasks in NLP are prone to social biases, especially gender bias. Recently these techniques have been gradually applied to automatic evaluation metrics for text generation. In the paper, we propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Associat… ▽ More Many studies have revealed that word embeddings, language models, and models for specific downstream tasks in NLP are prone to social biases, especially gender bias. Recently these techniques have been gradually applied to automatic evaluation metrics for text generation. In the paper, we propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics and discover that social biases are also widely present in some model-based automatic evaluation metrics. Moreover, we construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks. Results show that given gender-neutral references in the evaluation, model-based evaluation metrics may show a preference for the male hypothesis, and the performance of them, i.e. the correlation between evaluation metrics and human judgments, usually has more significant variation after gender swap**. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.08303 [pdf, other]

Improving Radiology Summarization with Radiograph and Anatomy Prompts

Authors: **peng Hu, Zhihong Chen, Yang Liu, Xiang Wan, Tsung-Hui Chang

Abstract: The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impression generation. However, recent works on this task mainly summarize the corresponding findings and p… ▽ More The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impression generation. However, recent works on this task mainly summarize the corresponding findings and pay less attention to the radiology images. In clinical, radiographs can provide more detailed valuable observations to enhance radiologists' impression writing, especially for complicated cases. Besides, each sentence in findings usually focuses on single anatomy, so they only need to be matched to corresponding anatomical regions instead of the whole image, which is beneficial for textual and visual features alignment. Therefore, we propose a novel anatomy-enhanced multimodal model to promote impression generation. In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics. Then, two separate encoders are applied to extract features from the radiograph and findings. Afterward, we utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level with the help of anatomy-enhanced sentence representation. Finally, the decoder takes the fused information as the input to generate impressions. The experimental results on two benchmark datasets confirm the effectiveness of the proposed method, which achieves state-of-the-art results. △ Less

Submitted 27 December, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: 11 pages, ACL2023 Findings

arXiv:2210.02954 [pdf, other]

All hourglass bosonic excitations in the 1651 magnetic space groups and 528 magnetic layer groups

Authors: Dongze Fan, Xiangang Wan, Feng Tang

Abstract: The band connectivity as imposed by the compatibility relations between the irreducible representations of little groups can give rise to the exotic hourglass-like shape composed of four branches of bands and five band crossings (BCs). Such an hourglass band connectivity could enforce the emergence of nontrivial excitations like Weyl fermion, Dirac fermion or even beyond them. On the other hand, t… ▽ More The band connectivity as imposed by the compatibility relations between the irreducible representations of little groups can give rise to the exotic hourglass-like shape composed of four branches of bands and five band crossings (BCs). Such an hourglass band connectivity could enforce the emergence of nontrivial excitations like Weyl fermion, Dirac fermion or even beyond them. On the other hand, the bosons, like phonons, magnons, and photons, were also shown to possess nontrivial topology and a comprehensive symmetry classification of the hourglass bosonic excitations would be of great significance to both materials design and device applications. Here we firstly list all concrete positions and representations of little groups in the Brillouin zone (BZ) related with the hourglass bosonic excitations in all the 1651 magnetic space groups and 528 magnetic layer groups, applicable to three dimensional (3D) and two dimensional (2D) systems, respectively. 255 (42) MSGs (MLGs) are found to essentially host such hourglass BCs: Here ``essentially'' means that the bosonic hourglass BC exists definitely as long as the studied system is crystallized in the corresponding MSG/MLG. We also perform first-principles calculations on hundreds of 3D nonmagnetic materials essentially hosting hourglass phonons and propose that the 2D material AlI can host hourglass phonons. We choose AuX (X=Br and I) as illustrative examples to demonstrate that two essential hourglass band structures can coexist in the phonon spectra for both materials while for AuBr, an accidental band crossing sticking two hourglasses is found interestingly. Our results of symmetry conditions for hourglass bosonic excitations can provide a useful guide of designing artificial structures with hourglass bosonic excitations. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Supplementary Material can be found in the ancillary file

arXiv:2209.13761 [pdf, other]

Image Compressed Sensing with Multi-scale Dilated Convolutional Neural Network

Authors: Zhifeng Wang, Zhenghui Wang, Chunyan Zeng, Yan Yu, Xiangkui Wan

Abstract: Deep Learning (DL) based Compressed Sensing (CS) has been applied for better performance of image reconstruction than traditional CS methods. However, most existing DL methods utilize the block-by-block measurement and each measurement block is restored separately, which introduces harmful blocking effects for reconstruction. Furthermore, the neuronal receptive fields of those methods are designed… ▽ More Deep Learning (DL) based Compressed Sensing (CS) has been applied for better performance of image reconstruction than traditional CS methods. However, most existing DL methods utilize the block-by-block measurement and each measurement block is restored separately, which introduces harmful blocking effects for reconstruction. Furthermore, the neuronal receptive fields of those methods are designed to be the same size in each layer, which can only collect single-scale spatial information and has a negative impact on the reconstruction process. This paper proposes a novel framework named Multi-scale Dilated Convolution Neural Network (MsDCNN) for CS measurement and reconstruction. During the measurement period, we directly obtain all measurements from a trained measurement network, which employs fully convolutional structures and is jointly trained with the reconstruction network from the input image. It needn't be cut into blocks, which effectively avoids the block effect. During the reconstruction period, we propose the Multi-scale Feature Extraction (MFE) architecture to imitate the human visual system to capture multi-scale features from the same feature map, which enhances the image feature extraction ability of the framework and improves the performance of image reconstruction. In the MFE, there are multiple parallel convolution channels to obtain multi-scale feature information. Then the multi-scale features information is fused and the original image is reconstructed with high quality. Our experimental results show that the proposed method performs favorably against the state-of-the-art methods in terms of PSNR and SSIM. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: 28 pages, 8 figures, MsDCNN for CS

arXiv:2209.11906 [pdf, other]

Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Authors: Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou

Abstract: Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion. Despite numerous research efforts and progresses, comparing with speech activity detection (VAD), OSD remains an open challenge and its overall performance is far from satisfactory. The majority of prior research typically formulates the OSD problem as a standard classification problem, to… ▽ More Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion. Despite numerous research efforts and progresses, comparing with speech activity detection (VAD), OSD remains an open challenge and its overall performance is far from satisfactory. The majority of prior research typically formulates the OSD problem as a standard classification problem, to identify speech with binary (OSD) or three-class label (joint VAD and OSD) at frame level. In contrast to the mainstream, this study investigates the joint VAD and OSD task from a new perspective. In particular, we propose to extend traditional classification network with multi-exit architecture. Such an architecture empowers our system with unique capability to identify class using either low-level features from early exits or high-level features from last exit. In addition, two training schemes, knowledge distillation and dense connection, are adopted to further boost our system performance. Experimental results on benchmark datasets (AMI and DIHARD-III) validated the effectiveness and generality of our proposed system. Our ablations further reveal the complementary contribution of proposed schemes. With $F_1$ score of 0.792 on AMI and 0.625 on DIHARD-III, our proposed system outperforms several top performing models on these datasets, but also surpasses the current state-of-the-art by large margins across both datasets. Besides the performance benefit, our proposed system offers another appealing potential for quality-complexity trade-offs, which is highly preferred for efficient OSD deployment. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.11905 [pdf, other]

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

Authors: Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou

Abstract: To address the monaural speech enhancement problem, numerous research studies have been conducted to enhance speech via operations either in time-domain on the inner-domain learned from the speech mixture or in time--frequency domain on the fixed full-band short time Fourier transform (STFT) spectrograms. Very recently, a few studies on sub-band based speech enhancement have been proposed. By enha… ▽ More To address the monaural speech enhancement problem, numerous research studies have been conducted to enhance speech via operations either in time-domain on the inner-domain learned from the speech mixture or in time--frequency domain on the fixed full-band short time Fourier transform (STFT) spectrograms. Very recently, a few studies on sub-band based speech enhancement have been proposed. By enhancing speech via operations on sub-band spectrograms, those studies demonstrated competitive performances on the benchmark dataset of DNS2020. Despite attractive, this new research direction has not been fully explored and there is still room for improvement. As such, in this study, we delve into the latest research direction and propose a sub-band based speech enhancement system with perceptually-motivated optimization and dual transformations, called PT-FSE. Specially, our proposed PT-FSE model improves its backbone, a full-band and sub-band fusion model, by three efforts. First, we design a frequency transformation module that aims to strengthen the global frequency correlation. Then a temporal transformation is introduced to capture long range temporal contexts. Lastly, a novel loss, with leverage of properties of human auditory perception, is proposed to facilitate the model to focus on low frequency enhancement. To validate the effectiveness of our proposed model, extensive experiments are conducted on the DNS2020 dataset. Experimental results show that our PT-FSE system achieves substantial improvements over its backbone, but also outperforms the current state-of-the-art while being 27\% smaller than the SOTA. With average NB-PESQ of 3.57 on the benchmark dataset, our system offers the best speech enhancement results reported till date. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.09694 [pdf]

Modulating Thermal Conductivity via Targeted Phonon Excitation

Authors: Xiao Wan, Dongkai Pan, **g-Tao Lü, Sebastian Volz, Lifa Zhang, Qing Hao, Yangjun Qin, Zhicheng Zong, Nuo Yang

Abstract: Thermal conductivity is a critical material property in numerous applications, such as those related to thermoelectric devices and heat dissipation. Effectively modulating thermal conductivity has become a great concern in the field of heat conduction. In this study, a quantum strategy is proposed to modulate thermal conductivity by exciting targeted phonons. The results show that the thermal cond… ▽ More Thermal conductivity is a critical material property in numerous applications, such as those related to thermoelectric devices and heat dissipation. Effectively modulating thermal conductivity has become a great concern in the field of heat conduction. In this study, a quantum strategy is proposed to modulate thermal conductivity by exciting targeted phonons. The results show that the thermal conductivity of graphene can be tailored in the range of 1559 W/m-K (49%) to 4093 W/m-K (128%), compared with the intrinsic value of 3189 W/m-K. A similar trend is also observed for graphene nanoribbons. The results are obtained through both ab initio calculations and molecular dynamics simulations. This brand-new quantum strategy to modulate thermal conductivity paves a way for quantum heat conduction. △ Less

Submitted 5 April, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.09657 [pdf, other]

doi 10.1109/ISBI52829.2022.9761542

View-Disentangled Transformer for Brain Lesion Detection

Authors: Haofeng Li, Junjia Huang, Guanbin Li, Zhou Liu, Yihong Zhong, Yingying Chen, Yunfei Wang, Xiang Wan

Abstract: Deep neural networks (DNNs) have been widely adopted in brain lesion detection and segmentation. However, locating small lesions in 2D MRI slices is challenging, and requires to balance between the granularity of 3D context aggregation and the computational complexity. In this paper, we propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour… ▽ More Deep neural networks (DNNs) have been widely adopted in brain lesion detection and segmentation. However, locating small lesions in 2D MRI slices is challenging, and requires to balance between the granularity of 3D context aggregation and the computational complexity. In this paper, we propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour detection. First, the proposed transformer harvests long-range correlation among different positions in a 3D brain scan. Second, the transformer models a stack of slice features as multiple 2D views and enhance these features view-by-view, which approximately achieves the 3D correlation computing in an efficient way. Third, we deploy the proposed transformer module in a transformer backbone, which can effectively detect the 2D regions surrounding brain lesions. The experimental results show that our proposed view-disentangled transformer performs well for brain lesion detection on a challenging brain MRI dataset. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: International Symposium on Biomedical Imaging (ISBI) 2022, code: https://github.com/lhaof/ISBI-VDFormer

arXiv:2209.08887 [pdf, other]

Attentive Symmetric Autoencoder for Brain MRI Segmentation

Authors: Junjia Huang, Haofeng Li, Guanbin Li, Xiang Wan

Abstract: Self-supervised learning methods based on image patch reconstruction have witnessed great success in training auto-encoders, whose pre-trained weights can be transferred to fine-tune other downstream tasks of image understanding. However, existing methods seldom study the various importance of reconstructed patches and the symmetry of anatomical structures, when they are applied to 3D medical imag… ▽ More Self-supervised learning methods based on image patch reconstruction have witnessed great success in training auto-encoders, whose pre-trained weights can be transferred to fine-tune other downstream tasks of image understanding. However, existing methods seldom study the various importance of reconstructed patches and the symmetry of anatomical structures, when they are applied to 3D medical images. In this paper we propose a novel Attentive Symmetric Auto-encoder (ASA) based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. We conjecture that forcing the auto-encoder to recover informative image regions can harvest more discriminative representations, than to recover smooth image patches. Then we adopt a gradient based metric to estimate the importance of each image patch. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Moreover, we resort to the prior of brain structures and develop a Symmetric Position Encoding (SPE) method to better exploit the correlations between long-range but spatially symmetric regions to obtain effective features. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models on three brain MRI segmentation benchmarks. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: MICCAI 2022, code:https://github.com/lhaof/ASA

arXiv:2209.07759 [pdf, other]

An Empirical Study of Automatic Post-Editing

Authors: Xu Zhang, Xiaojun Wan

Abstract: Automatic post-editing (APE) aims to reduce manual post-editing efforts by automatically correcting errors in machine-translated output. Due to the limited amount of human-annotated training data, data scarcity is one of the main challenges faced by all APE systems. To alleviate the lack of genuine training data, most of the current APE systems employ data augmentation methods to generate large-sc… ▽ More Automatic post-editing (APE) aims to reduce manual post-editing efforts by automatically correcting errors in machine-translated output. Due to the limited amount of human-annotated training data, data scarcity is one of the main challenges faced by all APE systems. To alleviate the lack of genuine training data, most of the current APE systems employ data augmentation methods to generate large-scale artificial corpora. In view of the importance of data augmentation in APE, we separately study the impact of the construction method of artificial corpora and artificial data domain on the performance of APE models. Moreover, the difficulty of APE varies between different machine translation (MT) systems. We study the outputs of the state-of-art APE model on a difficult APE dataset to analyze the problems in existing APE systems. Primarily, we find that 1) Artificial corpora with high-quality source text and machine-translated text more effectively improve the performance of APE models; 2) In-domain artificial training data can better improve the performance of APE models, while irrelevant out-of-domain data actually interfere with the model; 3) Existing APE model struggles with cases containing long source text or high-quality machine-translated text; 4) The state-of-art APE model works well on grammatical and semantic addition problems, but the output is prone to entity and semantic omission errors. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 14 pages, 4 figures

arXiv:2209.07280 [pdf]

doi 10.1126/sciadv.abq4578

Observation of robust zero-energy state and enhanced superconducting gap in a tri-layer heterostructure of MnTe/Bi2Te3/Fe(Te, Se)

Authors: Shuyue Ding, Chen Chen, Zhipeng Cao, Di Wang, Yongqiang Pan, Ran Tao, Dongming Zhao, Yining Hu, Tianxing Jiang, Yajun Yan, Zhixiang Shi, Xiangang Wan, Donglai Feng, Tong Zhang

Abstract: The interface between magnetic material and superconductors has long been predicted to host unconventional superconductivity, such as spin-triplet pairing and topological nontrivial pairing state, particularly when spin-orbital coupling (SOC) is incorporated. To identify these novel pairing states, fabricating homogenous heterostructures which contain such various properties are preferred, but oft… ▽ More The interface between magnetic material and superconductors has long been predicted to host unconventional superconductivity, such as spin-triplet pairing and topological nontrivial pairing state, particularly when spin-orbital coupling (SOC) is incorporated. To identify these novel pairing states, fabricating homogenous heterostructures which contain such various properties are preferred, but often challenging. Here we synthesized a tri-layer type van-der Waals heterostructure of MnTe/Bi2Te3/Fe(Te, Se), which combined s-wave superconductivity, thickness dependent magnetism and strong SOC. Via low-temperature scanning tunneling microscopy (STM), we observed robust zero-energy states with notably nontrivial properties and an enhanced superconducting gap size on single unit-cell (UC) MnTe surface. In contrast, no zero-energy state was observed on 2UC MnTe. First-principle calculations further suggest the 1UC MnTe has large interfacial Dzyaloshinskii-Moriya interaction (DMI) and a frustrated AFM state, which could promote non-collinear spin textures. It thus provides a promising platform for exploring topological nontrivial superconductivity. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 33 pages, supplementary materials included

Journal ref: Sci. Adv. 8, eabq4578 (2022)

arXiv:2209.07118 [pdf, other]

Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

Authors: Zhihong Chen, Guanbin Li, Xiang Wan

Abstract: Medical vision-and-language pre-training (Med-VLP) has received considerable attention owing to its applicability to extracting generic vision-and-language representations from medical images and texts. Most existing methods mainly contain three elements: uni-modal encoders (i.e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering… ▽ More Medical vision-and-language pre-training (Med-VLP) has received considerable attention owing to its applicability to extracting generic vision-and-language representations from medical images and texts. Most existing methods mainly contain three elements: uni-modal encoders (i.e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP. Although there exist knowledge-enhanced vision-and-language pre-training (VLP) methods in the general domain, most require off-the-shelf toolkits (e.g., object detectors and scene graph parsers), which are unavailable in the medical domain. In this paper, we propose a systematic and effective approach to enhance Med-VLP by structured medical knowledge from three perspectives. First, considering knowledge can be regarded as the intermediate medium between vision and language, we align the representations of the vision encoder and the language encoder through knowledge. Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text. Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks. To perform a comprehensive evaluation and facilitate further research, we construct a medical vision-and-language benchmark including three tasks. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on all downstream tasks. Further analyses explore the effects of different components of our approach and various settings of pre-training. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Natural Language Processing. 10 pages, 3 figures

arXiv:2209.07098 [pdf, other]

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Authors: Zhihong Chen, Yuhao Du, **peng Hu, Yang Liu, Guanbin Li, Xiang Wan, Tsung-Hui Chang

Abstract: Medical vision-and-language pre-training provides a feasible solution to extract effective vision-and-language representations from medical images and texts. However, few studies have been dedicated to this field to facilitate medical vision-and-language understanding. In this paper, we propose a self-supervised learning paradigm with multi-modal masked autoencoders (M$^3$AE), which learn cross-mo… ▽ More Medical vision-and-language pre-training provides a feasible solution to extract effective vision-and-language representations from medical images and texts. However, few studies have been dedicated to this field to facilitate medical vision-and-language understanding. In this paper, we propose a self-supervised learning paradigm with multi-modal masked autoencoders (M$^3$AE), which learn cross-modal domain knowledge by reconstructing missing pixels and tokens from randomly masked images and texts. There are three key designs to make this simple approach work. First, considering the different information densities of vision and language, we adopt different masking ratios for the input image and text, where a considerably larger masking ratio is used for images. Second, we use visual and textual features from different layers to perform the reconstruction to deal with different levels of abstraction in visual and language. Third, we develop different designs for vision and language decoders (i.e., a Transformer for vision and a multi-layer perceptron for language). To perform a comprehensive evaluation and facilitate further research, we construct a medical vision-and-language benchmark including three tasks. Experimental results demonstrate the effectiveness of our approach, where state-of-the-art results are achieved on all downstream tasks. Besides, we conduct further analysis to better verify the effectiveness of different components of our approach and various settings of pre-training. The source code is available at~\url{https://github.com/zhjohnchan/M3AE}. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Natural Language Processing. 11 pages, 3 figures

arXiv:2209.06209 [pdf, other]

doi 10.1145/3503161.3548166

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Authors: Zijie Wang, Aichun Zhu, **gyi Xue, Xili Wan, Chao Liu, Tian Wang, Yifeng Li

Abstract: The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold map** paradigm following a \textbf{cross-modal distribution consensus prediction (CDCP)} manner. When map** features from distribution of one certain modality into the common manifold, feature distribution of the… ▽ More The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold map** paradigm following a \textbf{cross-modal distribution consensus prediction (CDCP)} manner. When map** features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf{CDCP dilemma}, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C$^{3}$M) for text-based person retrieval. The core idea of our method, just as a Chinese saying goes, is to `\textit{san si er hou xing}', namely, to \textbf{Look Before yoU Leap (LBUL)}. The common manifold map** mechanism of LBUL contains a looking step and a lea** step. Compared to CDCP-based methods, LBUL considers distribution characteristics of both the visual and textual modalities before embedding data from one certain modality into C$^{3}$M to achieve a more solid cross-modal distribution consensus, and hence achieve a superior retrieval accuracy. We evaluate our proposed method on two text-based person retrieval datasets CUHK-PEDES and RSTPReid. Experimental results demonstrate that the proposed LBUL outperforms previous methods and achieves the state-of-the-art performance. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Accepted on ACM MM '22. arXiv admin note: text overlap with arXiv:2209.05773

arXiv:2209.05773 [pdf, other]

doi 10.1145/3503161.3548057

CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Authors: Zijie Wang, Aichun Zhu, **gyi Xue, Xili Wan, Chao Liu, Tian Wang, Yifeng Li

Abstract: Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a \textbf{color over-reliance problem}, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval,… ▽ More Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a \textbf{color over-reliance problem}, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval, but the over-reliance on color would distract the model from other key clues (e.g. texture information, structural information, etc.), and thereby lead to a sub-optimal retrieval performance. To solve this problem, in this paper, we propose to \textbf{C}apture \textbf{A}ll-round \textbf{I}nformation \textbf{B}eyond \textbf{C}olor (\textbf{CAIBC}) via a jointly optimized multi-branch architecture for text-based person retrieval. CAIBC contains three branches including an RGB branch, a grayscale (GRS) branch and a color (CLR) branch. Besides, with the aim of making full use of all-round information in a balanced and effective way, a mutual learning mechanism is employed to enable the three branches which attend to varied aspects of information to communicate with and learn from each other. Extensive experimental analysis is carried out to evaluate our proposed CAIBC method on the CUHK-PEDES and RSTPReid datasets in both \textbf{supervised} and \textbf{weakly supervised} text-based person retrieval settings, which demonstrates that CAIBC significantly outperforms existing methods and achieves the state-of-the-art performance on all the three tasks. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Accepted on ACM MM '22

arXiv:2209.01370 [pdf, other]

CrossDial: An Entertaining Dialogue Dataset of Chinese Crosstalk

Authors: Baizhou Huang, Shikang Du, Xiaojun Wan

Abstract: Crosstalk is a traditional Chinese theatrical performance art. It is commonly performed by two performers in the form of a dialogue. With the typical features of dialogues, crosstalks are also designed to be hilarious for the purpose of amusing the audience. In this study, we introduce CrossDial, the first open-source dataset containing most classic Chinese crosstalks crawled from the Web. Moreove… ▽ More Crosstalk is a traditional Chinese theatrical performance art. It is commonly performed by two performers in the form of a dialogue. With the typical features of dialogues, crosstalks are also designed to be hilarious for the purpose of amusing the audience. In this study, we introduce CrossDial, the first open-source dataset containing most classic Chinese crosstalks crawled from the Web. Moreover, we define two new tasks, provide two benchmarks, and investigate the ability of current dialogue generation models in the field of crosstalk generation. The experiment results and case studies demonstrate that crosstalk generation is challenging for straightforward methods and remains an interesting topic for future works. △ Less

Submitted 3 September, 2022; originally announced September 2022.

arXiv:2208.12753 [pdf, other]

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

Authors: Chunyan Zeng, Shixiong Feng, Zhifeng Wang, Xiangkui Wan, Yunfan Chen, Nan Zhao

Abstract: The existing source cell-phone recognition method lacks the long-term feature characterization of the source device, resulting in inaccurate representation of the source cell-phone related features which leads to insufficient recognition accuracy. In this paper, we propose a source cell-phone recognition method based on spatio-temporal representation learning, which includes two main parts: extrac… ▽ More The existing source cell-phone recognition method lacks the long-term feature characterization of the source device, resulting in inaccurate representation of the source cell-phone related features which leads to insufficient recognition accuracy. In this paper, we propose a source cell-phone recognition method based on spatio-temporal representation learning, which includes two main parts: extraction of sequential Gaussian mean matrix features and construction of a recognition model based on spatio-temporal representation learning. In the feature extraction part, based on the analysis of time-series representation of recording source signals, we extract sequential Gaussian mean matrix with long-term and short-term representation ability by using the sensitivity of Gaussian mixture model to data distribution. In the model construction part, we design a structured spatio-temporal representation learning network C3D-BiLSTM to fully characterize the spatio-temporal information, combine 3D convolutional network and bidirectional long short-term memory network for short-term spectral information and long-time fluctuation information representation learning, and achieve accurate recognition of cell-phones by fusing spatio-temporal feature information of recording source signals. The method achieves an average accuracy of 99.03% for the closed-set recognition of 45 cell-phones under the CCNU\_Mobile dataset, and 98.18% in small sample size experiments, with recognition performance better than the existing state-of-the-art methods. The experimental results show that the method exhibits excellent recognition performance in multi-class cell-phones recognition. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 29 pages, 4 figures

arXiv:2208.11920 [pdf]

Digital Audio Tampering Detection Based on ENF Spatio-temporal Features Representation Learning

Authors: Chunyan Zeng, Shuai Kong, Zhifeng Wang, Xiangkui Wan, Yunfan Chen

Abstract: Most digital audio tampering detection methods based on electrical network frequency (ENF) only utilize the static spatial information of ENF, ignoring the variation of ENF in time series, which limit the ability of ENF feature representation and reduce the accuracy of tampering detection. This paper proposes a new method for digital audio tampering detection based on ENF spatio-temporal features… ▽ More Most digital audio tampering detection methods based on electrical network frequency (ENF) only utilize the static spatial information of ENF, ignoring the variation of ENF in time series, which limit the ability of ENF feature representation and reduce the accuracy of tampering detection. This paper proposes a new method for digital audio tampering detection based on ENF spatio-temporal features representation learning. A parallel spatio-temporal network model is constructed using CNN and BiLSTM, which deeply extracts ENF spatial feature information and ENF temporal feature information to enhance the feature representation capability to improve the tampering detection accuracy. In order to extract the spatial and temporal features of the ENF, this paper firstly uses digital audio high-precision Discrete Fourier Transform analysis to extract the phase sequences of the ENF. The unequal phase series is divided into frames by adaptive frame shifting to obtain feature matrices of the same size to represent the spatial features of the ENF. At the same time, the phase sequences are divided into frames based on ENF time changes information to represent the temporal features of the ENF. Then deep spatial and temporal features are further extracted using CNN and BiLSTM respectively, and an attention mechanism is used to adaptively assign weights to the deep spatial and temporal features to obtain spatio-temporal features with stronger representation capability. Finally, the deep neural network is used to determine whether the audio has been tampered with. The experimental results show that the proposed method improves the accuracy by 2.12%-7.12% compared with state-of-the-art methods under the public database Carioca, New Spanish. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 19 pages, 6 figures

arXiv:2208.06964 [pdf, ps, other]

Curvature of the total space of a Griffiths negative vector bundle and quasi-Fuchsian space

Authors: Inkang Kim, Xueyuan Wan, Genkai Zhang

Abstract: For a holomorphic vector bundle $E$ over a Hermitian manifold $M$ there are two important notions of curvature positivity, the Griffiths positivity and Nakano positivity. We study the consequence of these positivities and the relevant estimates. If $E$ is Griffiths negative over Kähler manifold, then there is a Kähler metric on its total space $E$, and we calculate the curvature and prove the non-… ▽ More For a holomorphic vector bundle $E$ over a Hermitian manifold $M$ there are two important notions of curvature positivity, the Griffiths positivity and Nakano positivity. We study the consequence of these positivities and the relevant estimates. If $E$ is Griffiths negative over Kähler manifold, then there is a Kähler metric on its total space $E$, and we calculate the curvature and prove the non-positivity of the curvature along the tautological direction. The Nakano positivity can be formulated as a positivity for the Nakano curvature operator and we give estimate the Nakano curvature operator associated with a Nakano positive direct image bundle. As applications we construct a map** class group invariant Kähler metric on the quasi-Fuchsian space QF$(S)$, which extends the Weil-Petersson metric on the Teichmüller space $\mathcal{T}(S)\subset {\rm QF}(S)$, and we obtain estimates for the Nakano curvature operator for the dual Weil-Petersson metric on the holomorphic cotangent bundle of Teichmüller space. △ Less

Submitted 14 August, 2022; originally announced August 2022.

Comments: 27 pages, extension of the previous paper arXiv:1902.04523 New Kähler metric on quasifuchsian space and its curvature properties

arXiv:2208.01433 [pdf]

Review of Energy Transition Policies in Singapore, London, and California

Authors: Chunmeng Yang, Siqi Bu, Yi Fan, Wayne Xinwei Wan, Ruoheng Wang, Aoife Foley

Abstract: The paper contains the online supplementary materials for "Data-Driven Prediction and Evaluation on Future Impact of Energy Transition Policies in Smart Regions". We review the renewable energy development and policies in the three metropolitan cities/regions over recent decades. Depending on the geographic variations in the types and quantities of renewable energy resources and the levels of poli… ▽ More The paper contains the online supplementary materials for "Data-Driven Prediction and Evaluation on Future Impact of Energy Transition Policies in Smart Regions". We review the renewable energy development and policies in the three metropolitan cities/regions over recent decades. Depending on the geographic variations in the types and quantities of renewable energy resources and the levels of policymakers' commitment to carbon neutrality, we classify Singapore, London, and California as case studies at the primary, intermediate, and advanced stages of the renewable energy transition, respectively. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2207.10447 [pdf, other]

doi 10.1007/978-3-031-20077-9_36

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

Authors: Haotian Bai, Ruimao Zhang, Jiong Wang, Xiang Wan

Abstract: Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications. Recent studies leverage the advantage of self-attention in visual Transformer for long-range dependency to re-active semantic regions, aiming to avoid partial activation in traditional class activation m… ▽ More Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications. Recent studies leverage the advantage of self-attention in visual Transformer for long-range dependency to re-active semantic regions, aiming to avoid partial activation in traditional class activation map** (CAM). However, the long-range modeling in Transformer neglects the inherent spatial coherence of the object, and it usually diffuses the semantic-aware regions far from the object boundary, making localization results significantly larger or far smaller. To address such an issue, we introduce a simple yet effective Spatial Calibration Module (SCM) for accurate WSOL, incorporating semantic similarities of patch tokens and their spatial relationships into a unified diffusion model. Specifically, we introduce a learnable parameter to dynamically adjust the semantic correlations and spatial context intensities for effective information propagation. In practice, SCM is designed as an external module of Transformer, and can be removed during inference to reduce the computation cost. The object-sensitive localization ability is implicitly embedded into the Transformer encoder through optimization in the training phase. It enables the generated attention maps to capture the sharper object boundaries and filter the object-irrelevant background area. Extensive experimental results demonstrate the effectiveness of the proposed method, which significantly outperforms its counterpart TS-CAM on both CUB-200 and ImageNet-1K benchmarks. The code is available at https://github.com/164140757/SCM. △ Less

Submitted 10 March, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV2022

arXiv:2207.09405 [pdf, other]

Bayesian Generational Population-Based Training

Authors: Xingchen Wan, Cong Lu, Jack Parker-Holder, Philip J. Ball, Vu Nguyen, Binxin Ru, Michael A. Osborne

Abstract: Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architec… ▽ More Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architectures may be optimal at different points of training. This motivates AutoRL, a class of methods seeking to automate these design choices. One prominent class of AutoRL methods is Population-Based Training (PBT), which have led to impressive performance in several large scale settings. In this paper, we introduce two new innovations in PBT-style methods. First, we employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space. Second, we show that using a generational approach, we can also learn both architectures and hyperparameters jointly on-the-fly in a single training run. Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to large performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly. Code is available at https://github.com/xingchenwan/bgpbt. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: AutoML Conference 2022. 10 pages, 4 figure, 3 tables (28 pages, 10 figures, 7 tables including references and appendices)

arXiv:2207.06141 [pdf, other]

doi 10.1063/5.0121452

The mass of an asymptotically hyperbolic end and distance estimates

Authors: Xiaoxiang Chai, Xueyuan Wan

Abstract: Let $(M,g)$ be a complete connected $n$-dimensional Riemannian spin manifold without boundary such that the scalar curvature satisfies $R_g\geq -n(n-1)$ and $\mathcal{E}\subset M$ be an asymptotically hyperbolic end, we prove that the mass functional of the end $\mathcal{E}$ is timelike future-directed or zero. Moreover, it vanishes if and only if $(M,g)$ is isometric to the hyperbolic space. We a… ▽ More Let $(M,g)$ be a complete connected $n$-dimensional Riemannian spin manifold without boundary such that the scalar curvature satisfies $R_g\geq -n(n-1)$ and $\mathcal{E}\subset M$ be an asymptotically hyperbolic end, we prove that the mass functional of the end $\mathcal{E}$ is timelike future-directed or zero. Moreover, it vanishes if and only if $(M,g)$ is isometric to the hyperbolic space. We also consider the mass of an asymptotically hyperbolic manifold with compact boundary, we prove the mass is timelike future-directed if the mean curvature of the boundary is bounded from below by a function defined using distance estimates. As an application, the mass is timelike future-directed if the mean curvature of the boundary is bounded from below by $-(n-1)$ or the scalar curvature satisfies $R_g\geq (-1+κ)n(n-1)$ for any positive constant $κ$ less than one. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 24 pages, 3 figures

arXiv:2206.13778 [pdf, other]

CC-Riddle: A Question Answering Dataset of Chinese Character Riddles

Authors: Fan Xu, Yunxiang Zhang, Xiaojun Wan

Abstract: The Chinese character riddle is a unique form of cultural entertainment specific to the Chinese language. It typically comprises two parts: the riddle description and the solution. The solution to the riddle is a single character, while the riddle description primarily describes the glyph of the solution, occasionally supplemented with its explanation and pronunciation. Solving Chinese character r… ▽ More The Chinese character riddle is a unique form of cultural entertainment specific to the Chinese language. It typically comprises two parts: the riddle description and the solution. The solution to the riddle is a single character, while the riddle description primarily describes the glyph of the solution, occasionally supplemented with its explanation and pronunciation. Solving Chinese character riddles is a challenging task that demands understanding of character glyph, general knowledge, and a grasp of figurative language. In this paper, we construct a \textbf{C}hinese \textbf{C}haracter riddle dataset named CC-Riddle, which covers the majority of common simplified Chinese characters. The construction process is a combination of web crawling, language model generation and manual filtering. In generation stage, we input the Chinese phonetic alphabet, glyph and meaning of the solution character into the generation model, which then produces multiple riddle descriptions. The generated riddles are then manually filtered and the final CC-Riddle dataset is composed of both human-written riddles and these filtered, generated riddles. In order to assess the performance of language models on the task of solving character riddles, we use retrieval-based, generative and multiple-choice QA strategies to test three language models: BERT, ChatGPT and ChatGLM. The test results reveal that current language models still struggle to solve Chinese character riddles. CC-Riddle is publicly available at \url{https://github.com/pku0xff/CC-Riddle}. △ Less

Submitted 24 September, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

ACM Class: I.2.7

arXiv:2206.08023 [pdf, other]

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Authors: Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, ** Luo

Abstract: Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a l… ▽ More Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org. △ Less

Submitted 1 September, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.02624 [pdf, other]

Band width estimates of CMC initial data sets

Authors: Xiaoxiang Chai, Xueyuan Wan

Abstract: We generalize a band width estimate of Gromov to CMC initial data sets. We give three independent proofs: via the stability of a hypersurface with prescribed null expansion, via a perturbation of the spacetime harmonic function and via the Dirac operator. We generalize a band width estimate of Gromov to CMC initial data sets. We give three independent proofs: via the stability of a hypersurface with prescribed null expansion, via a perturbation of the spacetime harmonic function and via the Dirac operator. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: 20 pages, 1 figure

Showing 151–200 of 577 results for author: Wan, X