Search | arXiv e-print repository

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02103 [pdf, ps, other]

Rossby wave instability in weakly ionized protoplanetary disks. I. azimuthal or vertical B-fields

Authors: Can Cui, Ashutosh Tripathi, Cong Yu, Min-Kai Lin, Andrew Youdin

Abstract: Rossby wave instability (RWI) is considered the underlying mechanism to crescent-shaped azimuthal asymmetries, discovered in (sub-)millimeter dust continuum of many protoplanetary disks. Previous works on linear theory were conducted in the hydrodynamic limit. Nevertheless, protoplanetary disks are likely magnetized and weakly ionized. We examine the influence of magnetic fields and non-ideal magn… ▽ More Rossby wave instability (RWI) is considered the underlying mechanism to crescent-shaped azimuthal asymmetries, discovered in (sub-)millimeter dust continuum of many protoplanetary disks. Previous works on linear theory were conducted in the hydrodynamic limit. Nevertheless, protoplanetary disks are likely magnetized and weakly ionized. We examine the influence of magnetic fields and non-ideal magnetohydrodynamic (MHD) effects - namely, Ohmic resistivity, Hall drift, and ambipolar diffusion - on the RWI unstable modes. We perform radially global linear analyses, employing constant azimuthal ($B_φ$) or vertical ($B_z$) background magnetic fields. It is found that, in the ideal MHD regime, magnetism can either enhance or diminish RWI growth. Strong non-ideal MHD effects cause RWI growth rates to recover hydrodynamic results. The sign of Hall Elsässer number subtly complicates the results, and vertical wavenumbers generically diminish growth rates. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 13 pages, 4 figures, submitted to MNRAS

arXiv:2407.00596 [pdf, other]

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.19286

arXiv:2406.17226 [pdf, other]

Extended alternating structure-adapted proximal gradient algorithm for nonconvex nonsmooth problems

Authors: Ying Gao, Chunfeng Cui, Wenxing Zhang, Deren Han

Abstract: Alternating structure-adapted proximal (ASAP) gradient algorithm (M. Nikolova and P. Tan, SIAM J Optim, 29:2053-2078, 2019) has drawn much attention due to its efficiency in solving nonconvex nonsmooth optimization problems. However, the multiblock nonseparable structure confines the performance of ASAP to far-reaching practical problems, e.g., coupled tensor decomposition. In this paper, we propo… ▽ More Alternating structure-adapted proximal (ASAP) gradient algorithm (M. Nikolova and P. Tan, SIAM J Optim, 29:2053-2078, 2019) has drawn much attention due to its efficiency in solving nonconvex nonsmooth optimization problems. However, the multiblock nonseparable structure confines the performance of ASAP to far-reaching practical problems, e.g., coupled tensor decomposition. In this paper, we propose an extended ASAP (eASAP) algorithm for nonconvex nonsmooth optimization whose objective is the sum of two nonseperable functions and a coupling one. By exploiting the blockwise restricted prox-regularity, eASAP is capable of minimizing the objective whose coupling function is multiblock nonseparable. Moreover, we analyze the global convergence of eASAP by virtue of the Aubin property on partial subdifferential map** and the Kurdyka-Łojasiewicz property on the objective. Furthermore, the sublinear convergence rate of eASAP is built upon the proximal point algorithmic framework under some mild conditions. Numerical simulations on multimodal data fusion demonstrate the compelling performance of the proposed method. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16865 [pdf, other]

Variational Monte Carlo Study of the Doped $t$-$J$ Model on Honeycomb Lattice

Authors: Can Cui, **g-Yu Zhao, Zheng-Yu Weng

Abstract: The ground state of the bipartite $t$-$J$ model must satisfy a specific sign structure, based on which the single-hole and two-hole ground state $Ans\ddot{a}tze$ on honeycomb lattice are constructed and studied by a variational Monte Carlo (VMC) method. The VMC results are in good agreement with the exact diagonalization (ED) calculation. For the single-hole case, the degenerate ground states are… ▽ More The ground state of the bipartite $t$-$J$ model must satisfy a specific sign structure, based on which the single-hole and two-hole ground state $Ans\ddot{a}tze$ on honeycomb lattice are constructed and studied by a variational Monte Carlo (VMC) method. The VMC results are in good agreement with the exact diagonalization (ED) calculation. For the single-hole case, the degenerate ground states are characterized by quantum numbers of a spin-1/2 and an orbital angular momentum $L_z=\pm 2$. The latter is associated with the emergent chiral spin/hole currents mutually surrounding the hole/spin-1/2 as a composite object or ``twisted hole''. A vanishing quasiparticle spectral weight is shown in the large-sample limit. In the two-hole ground state, the holes form a spin-singlet pairing with $d$+$id$ symmetry in the Cooper channel, but are of $s$-wave symmetry as a tightly bound pair of the ``twisted holes''. Such a pairing mechanism of dichotomy can be attributed to eliminating the local spin currents which has nothing to do with the long-range antiferromagnetic correlation. Superconducting ground state at finite do** is briefly discussed in terms of the tightly bound hole pairs as the building blocks. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 14 pages, 11 figures

arXiv:2406.16168 [pdf, other]

An All-MLP Sequence Modeling Architecture That Excels at Copying

Authors: Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner

Abstract: Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while… ▽ More Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while maintaining computational feasibility. We discovered that exponentially-activated RNs are reducible to linear time complexity, and pre-activation normalization induces an infinitely growing memory pool, similar to a KV cache. In ablation study, we found both exponential activation and pre-activation normalization are indispensable for Transformer-level copying. Our findings provide new insights into what actually constitutes strong in-context retrieval. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024 Next Generation of Sequence Modeling Architectures Workshop

arXiv:2406.15787 [pdf, other]

On Physics-Informed Neural Network Control for Power Electronics

Authors: Peifeng Hui, Chenggang Cui, Pengfeng Lin, Amer M. Y. M. Ghias, Xitong Niu, Chuanlin Zhang

Abstract: Considering the growing necessity for precise modeling of power electronics amidst operational and environmental uncertainties, this paper introduces an innovative methodology that ingeniously combines model-driven and data-driven approaches to enhance the stability of power electronics interacting with grid-forming microgrids. By employing the physics-informed neural network (PINN) as a foundatio… ▽ More Considering the growing necessity for precise modeling of power electronics amidst operational and environmental uncertainties, this paper introduces an innovative methodology that ingeniously combines model-driven and data-driven approaches to enhance the stability of power electronics interacting with grid-forming microgrids. By employing the physics-informed neural network (PINN) as a foundation, this strategy merges robust data-fitting capabilities with fundamental physical principles, thereby constructing an accurate system model. By this means, it significantly enhances the ability to understand and replicate the dynamics of power electronics systems under complex working conditions. Moreover, by incorporating advanced learning-based control methods, the proposed method is enabled to make precise predictions and implement the satisfactory control laws even under serious uncertain conditions. Experimental validation demonstrates the effectiveness and robustness of the proposed approach, highlighting its substantial potential in addressing prevalent uncertainties in controlling modern power electronics systems. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.13920 [pdf, other]

Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks

Authors: Tao Wu, Canyixing Cui, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu

Abstract: Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios. Therefore, the design of robust GNNs has attracted increasing attention. However, existing research has mainly been conducted via experimental trial and error, and thus far, there remains a… ▽ More Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios. Therefore, the design of robust GNNs has attracted increasing attention. However, existing research has mainly been conducted via experimental trial and error, and thus far, there remains a lack of a comprehensive understanding of the vulnerability of GNNs. To address this limitation, we systematically investigate the adversarial robustness of GNNs by considering graph data patterns, model-specific factors, and the transferability of adversarial examples. Through extensive experiments, a set of principled guidelines is obtained for improving the adversarial robustness of GNNs, for example: (i) rather than highly regular graphs, the training graph data with diverse structural patterns is crucial for model robustness, which is consistent with the concept of adversarial training; (ii) the large model capacity of GNNs with sufficient training data has a positive effect on model robustness, and only a small percentage of neurons in GNNs are affected by adversarial attacks; (iii) adversarial transfer is not symmetric and the adversarial examples produced by the small-capacity model have stronger adversarial transferability. This work illuminates the vulnerabilities of GNNs and opens many promising avenues for designing robust GNNs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13499 [pdf, other]

GraphMU: Repairing Robustness of Graph Neural Networks via Machine Unlearning

Authors: Tao Wu, Xinwen Cao, Chao Wang, Shaojie Qiao, Lin Yuan, Canyixing Cui, Yanbing Liu

Abstract: Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need… ▽ More Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need for a method to repair poisoned GNN. In this paper, we address this gap by introducing the novel concept of model repair for GNNs. We propose a repair framework, Repairing Robustness of Graph Neural Networks via Machine Unlearning (GraphMU), which aims to fine-tune poisoned GNN to forget adversarial samples without the need for complete retraining. We also introduce a unlearning validation method to ensure that our approach effectively forget specified poisoned data. To evaluate the effectiveness of GraphMU, we explore three fine-tuned subgraph construction scenarios based on the available perturbation information: (i) Known Perturbation Ratios, (ii) Known Complete Knowledge of Perturbations, and (iii) Unknown any Knowledge of Perturbations. Our extensive experiments, conducted across four citation datasets and four adversarial attack scenarios, demonstrate that GraphMU can effectively restore the performance of poisoned GNN. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12628 [pdf, other]

Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics

Authors: Chenggang Cui, Jiaming Liu, Junkang Feng, Peifeng Hui, Amer M. Y. M. Ghias, Chuanlin Zhang

Abstract: Power electronics, a critical component in modern power systems, face several challenges in control design, including model uncertainties, and lengthy and costly design cycles. This paper is aiming to propose a Large Language Models (LLMs) based multi-agent framework for objective-oriented control design in power electronics. The framework leverages the reasoning capabilities of LLMs and a multi-a… ▽ More Power electronics, a critical component in modern power systems, face several challenges in control design, including model uncertainties, and lengthy and costly design cycles. This paper is aiming to propose a Large Language Models (LLMs) based multi-agent framework for objective-oriented control design in power electronics. The framework leverages the reasoning capabilities of LLMs and a multi-agent workflow to develop an efficient and autonomous controller design process. The LLM agent is able to understand and respond to high-level instructions in natural language, adapting its behavior based on the task's specific requirements and constraints from a practical implementation point of view. This novel and efficient approach promises a more flexible and adaptable controller design process in power electronics that will largely facilitate the practitioners. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 6 pages, 6 figures

arXiv:2406.12373 [pdf, other]

WebCanvas: Benchmarking Web Agents in Online Environments

Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web interactions. WebCanvas contains three main components to facilitate realistic assessments: (1) A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements. (2) A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states; (3) Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset. Building on WebCanvas, we open-source an agent framework with extensible modules for reasoning, providing a foundation for the community to conduct online inference and evaluations. Our best-performing agent achieves a task success rate of 23.1% and a task completion rate of 48.8% on the Mind2Web-Live test set. Additionally, we analyze the performance discrepancies across various websites, domains, and experimental environments. We encourage the community to contribute further insights on online agent evaluation, thereby advancing this field of research. △ Less

Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2406.11273 [pdf, other]

Planar Hall Plateau in Magnetic Weyl Semimetals

Authors: Lei Li, Chaoxi Cui, Run-Wu Zhang, Zhi-Ming Yu, Yugui Yao

Abstract: Despite the rapid progress in the study of planar Hall effect (PHE) in recent years, all the previous works only showed that the PHE is connected to local geometric quantities, such as Berry curvature. Here, for the first time, we point out that the PHE in magnetic Weyl semimetals is directly related to a global quantity, namely, the Chern number of the Weyl point. This leads to a remarkable conse… ▽ More Despite the rapid progress in the study of planar Hall effect (PHE) in recent years, all the previous works only showed that the PHE is connected to local geometric quantities, such as Berry curvature. Here, for the first time, we point out that the PHE in magnetic Weyl semimetals is directly related to a global quantity, namely, the Chern number of the Weyl point. This leads to a remarkable consequence that the PHE observation predicted here is robust against many system details, including the Fermi energy. The main difference between non-magnetic and magnetic Weyl points is that the latter breaks time-reversal symmetry T, thus generally possessing an energy tilt. Via semiclassical Boltzmann theory, we investigate the PHE in generic magnetic Weyl models with energy tilt and arbitrary Chern number. We find that by aligning the magnetic and electric fields in the same direction, the trace of the PHE conductivity contributed from Berry curvature and orbital moment is proportional to the Chern number and the energy tilt of the Weyl points, resulting in previously undiscovered quantized PHE plateau by varying Fermi energy. We further confirm the existence of PHE plateaus in a more realistic lattice model without T symmetry. By proposing a new quantized physical quantity, our work not only provides a new tool for extracting the topological character of the Weyl points but also suggests that the interplay between topology and magnetism can give rise to intriguing physics. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 15 pages, 5 figures

arXiv:2406.00225 [pdf, other]

Kinematic Model of Magnetic Domain Wall Motion for Fast, High-Accuracy Simulations

Authors: Kristi Doleh, Leonard Humphrey, Chandler M. Linseisen, Michael D. Kitcher, Joanna M. Martin, Can Cui, Jean Anne C. Incorvia, Felipe Garcia-Sanchez, Naimul Hassan, Alexander J. Edwards, Joseph S. Friedman

Abstract: Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the ph… ▽ More Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the physics, drastically reducing model accuracy. We propose a DW model inspired by the phenomenological similarities between motions of a DW and a classical object being acted on by forces like air resistance or static friction. Our proposed phenomenological model predicts DW motion within 1.2% on average compared with micromagnetic simulations that are 400 times slower. Additionally our model is seven times faster than extant collective coordinate models and 14 times more accurate than extant hyper-reduced models making it an essential tool for large-scale DW circuit design and simulation. The model is publicly posted along with scripts that automatically extract model parameters from user-provided simulation or experimental data to extend the model to alternative micromagnetic parameters. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.19004 [pdf, ps, other]

An implementation of tensor product patch smoothers on GPU

Authors: Cu Cui, Paul Grosse-Bley, Guido Kanschat, Robert Strzodka

Abstract: We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a c… ▽ More We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a conflict free memory access pattern. Performance tests demonstrate that the optimized kernel is at least 2 times faster than the straightforward implementation for the Poisson problem, across various polynomial degrees in 2D and 3D, achieving up to 36% of the peak performance in both single and double precision on Nvidia A100 GPU. △ Less

Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

MSC Class: 65N55; 65Y20

arXiv:2405.18982 [pdf, other]

Multilevel Interior Penalty Methods on GPUs

Authors: Cu Cui, Guido Kanschat

Abstract: We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 3… ▽ More We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 39% of the peak performance on Nvidia A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and MPI parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems. △ Less

Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

MSC Class: 65N55; 65Y20

arXiv:2405.17824 [pdf, other]

mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis

Authors: Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Vishwesh Nath, Yucheng Tang, Yuankai Huo

Abstract: Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g.,… ▽ More Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (global-to-local) and the development of a WSI-level image-text representation (local-to-global) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.15410 [pdf, other]

Electric Hall Effect and Quantum Electric Hall Effect

Authors: Chaoxi Cui, Run-Wu Zhang, Yilin Han, Zhi-Ming Yu, Yugui Yao

Abstract: Exploring new Hall effect is always a fascinating research topic. The ordinary Hall effect and the quantum Hall effect, initially discovered in two-dimensional (2D) non-magnetic systems, are the phenomena that a transverse current is generated when a system carrying an electron current is placed in a magnetic field perpendicular to the currents. In this work, we propose the electric counterparts o… ▽ More Exploring new Hall effect is always a fascinating research topic. The ordinary Hall effect and the quantum Hall effect, initially discovered in two-dimensional (2D) non-magnetic systems, are the phenomena that a transverse current is generated when a system carrying an electron current is placed in a magnetic field perpendicular to the currents. In this work, we propose the electric counterparts of these two Hall effects, termed as electric Hall effect (EHE) and quantum electric Hall effect (QEHE). The EHE and QEHE emerge in 2D magnetic systems, where the transverse current is generated by applying an electric gate-field instead of a magnetic field. We present a symmetry requirement for intrinsic EHE and QEHE. With a weak gate-field, we establish an analytical expression of the intrinsic EHE coefficient. We show that it is determined by intrinsic band geometric quantities: Berry curvature and its polarizability which consists of both intraband and interband layer polarization. Via first-principles calculations, we investigate the EHE in the monolayer Ca(FeN)$_2$, where significant EHE coefficient is observed around band crossings. Furthermore, we demonstrate that the QEHE can appear in the semiconductor monolayer $\rm BaMn_2S_3$, of which the Hall conductivity exhibits steps that take on the quantized values $0$ and $\pm1$ in the unit of $e^2/h$ by varying the gate-field within the experimentally achievable range. Due to the great tunability of the electric gate-field, the EHE and QEHE proposed here can be easily controlled and should have more potential applications. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures

arXiv:2405.14622 [pdf, other]

Calibrated Self-Rewarding Vision Language Models

Authors: Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao

Abstract: Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. T… ▽ More Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This misalignment arises because the model tends to prioritize textual information over visual input, even when both the language model and visual representations are of high quality. Existing methods leverage additional models or human annotations to curate preference data and enhance modality alignment through preference optimization. These approaches may not effectively reflect the target LVLM's preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR enhances performance and reduces hallucinations across ten benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm. Additionally, CSR shows compatibility with different vision-language models and the ability to incrementally improve performance through iterative fine-tuning. Our data and code are available at https://github.com/YiyangZhou/CSR. △ Less

Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: fix some typos and add acknowledgement section in V3

arXiv:2405.11210 [pdf, other]

Computational predictions of hydrogen-assisted fatigue crack growth

Authors: C. Cui, P. Bortot, M. Ortolani, E. Martínez-Pañeda

Abstract: A new model is presented to predict hydrogen-assisted fatigue. The model combines a phase field description of fracture and fatigue, stress-assisted hydrogen diffusion, and a toughness degradation formulation with cyclic and hydrogen contributions. Hydrogen-assisted fatigue crack growth predictions exhibit an excellent agreement with experiments over all the scenarios considered, spanning multiple… ▽ More A new model is presented to predict hydrogen-assisted fatigue. The model combines a phase field description of fracture and fatigue, stress-assisted hydrogen diffusion, and a toughness degradation formulation with cyclic and hydrogen contributions. Hydrogen-assisted fatigue crack growth predictions exhibit an excellent agreement with experiments over all the scenarios considered, spanning multiple load ratios, H2 pressures and loading frequencies. These are obtained without any calibration with hydrogen-assisted fatigue data, taking as input only mechanical and hydrogen transport material properties, the material's fatigue characteristics (from a single test in air), and the sensitivity of fracture toughness to hydrogen content. Furthermore, the model is used to determine: (i) what are suitable test loading frequencies to obtain conservative data, and (ii) the underestimation made when not pre-charging samples. The model can handle both laboratory specimens and large-scale engineering components, enabling the Virtual Testing paradigm in infrastructure exposed to hydrogen environments and cyclic loading. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.09965 [pdf, other]

Leveraging Large Language Models for Automated Web-Form-Test Generation: An Empirical Study

Authors: Tao Li, Chenhui Cui, Lei Ma, Dave Towey, Yujie Xie, Rubing Huang

Abstract: The testing of web forms is an essential activity for ensuring the quality of web applications, which mainly involves evaluating the interactions between users and forms. Automated test-case generation remains a challenge for web-form testing: Due to the complex, multi-level structure of web pages, it can be difficult to automatically capture their inherent contextual information for inclusion in… ▽ More The testing of web forms is an essential activity for ensuring the quality of web applications, which mainly involves evaluating the interactions between users and forms. Automated test-case generation remains a challenge for web-form testing: Due to the complex, multi-level structure of web pages, it can be difficult to automatically capture their inherent contextual information for inclusion in the tests. Large Language Models (LLMs) have great potential for contextual text generation. OpenAI's GPT LLMs have been receiving a lot of attention in software testing, however, they may fail to be applied in practice because of information security concerns. To the best of our knowledge, no comparative study examining different LLMs has yet been reported for web-form-test generation. To address this gap in the literature, we conducted a comprehensive empirical study investigating the effectiveness of 11 LLMs on 146 web forms from 30 open-source Java web applications. According to the experimental results, different LLMs can achieve different testing effectiveness. Notably, the GPT-4, GLM-4, and Baichuan2 LLMs can generate better web-form tests than the others. Compared with GPT-4, other LLMs find it difficult to generate appropriate tests for web forms, resulting in decreased successfully-submitted rates (SSRs, measured by the proportions of the LLMs-generated web-form tests that can be successfully inserted into the web forms and submitted) ranging from 9.10% to 74.15%. Nevertheless, some LLMs achieve higher SSRs than GPT-3.5, indicating a better ability to generate appropriate tests for web forms. Our findings also show that, for all LLMs, when the designed prompts include complete and clear contextual information about the web forms, more effective web-form tests were generated. Finally, we offer some insights for using LLMs to guide automated web-form testing. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.06059 [pdf, other]

A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds

Authors: Christopher Z. Cui, Xiangyu Peng, Mark O. Riedl

Abstract: Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a pr… ▽ More Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03160 [pdf, ps, other]

Moore Determinant of Dual Quaternion Hermitian Matrices

Authors: Chunfeng Cui, Liqun Qi, Guang**g Song, Qingwen Wang

Abstract: In this paper, we extend the Chen and Moore determinants of quaternion Hermitian} matrices to dual quaternion Hermitian matrices. We show the Chen determinant of dual quaternion Hermitian {matrices is invariant under addition, switching, multiplication, and unitary operations at the both hand sides. We then show the Chen and Moore determinants of dual quaternion Hermitian matrices are equal to eac… ▽ More In this paper, we extend the Chen and Moore determinants of quaternion Hermitian} matrices to dual quaternion Hermitian matrices. We show the Chen determinant of dual quaternion Hermitian {matrices is invariant under addition, switching, multiplication, and unitary operations at the both hand sides. We then show the Chen and Moore determinants of dual quaternion Hermitian matrices are equal to each other, and they are also equal to the products of eigenvalues. The characteristic polynomial of a dual quaternion Hermitian matrix is also studied. △ Less

Submitted 18 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.18560 [pdf, other]

Non-convex Pose Graph Optimization in SLAM via Proximal Linearized Riemannian ADMM

Authors: Xin Chen, Chunfeng Cui, Deren Han, Liqun Qi

Abstract: Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and map** (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and th… ▽ More Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and map** (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and the projection onto the constraints can be calculated by normalization. Then a proximal linearized Riemannian alternating direction method of multipliers (PieADMM) is developed to solve the proposed model, which not only has low memory requirements, but also can update the poses in parallel. Furthermore, we establish the iteration complexity of $O(1/ε^{2})$ of PieADMM for finding an $ε$-stationary solution of our model. The efficiency of our proposed algorithm is demonstrated by numerical experiments on two synthetic and four 3D SLAM benchmark datasets. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18416 [pdf, other]

Capabilities of Gemini Models in Medicine

Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17949 [pdf, other]

Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering

Authors: Chenhao Cui, Yufan Jiang, Shuangzhi Wu, Zhoujun Li

Abstract: Multi-choice Machine Reading Comprehension (MMRC) aims to select the correct answer from a set of options based on a given passage and question. The existing methods employ the pre-trained language model as the encoder, share and transfer knowledge through fine-tuning.These methods mainly focus on the design of exquisite mechanisms to effectively capture the relationships among the triplet of pass… ▽ More Multi-choice Machine Reading Comprehension (MMRC) aims to select the correct answer from a set of options based on a given passage and question. The existing methods employ the pre-trained language model as the encoder, share and transfer knowledge through fine-tuning.These methods mainly focus on the design of exquisite mechanisms to effectively capture the relationships among the triplet of passage, question and answers. It is non-trivial but ignored to transfer knowledge from other MRC tasks such as SQuAD due to task specific of MMRC.In this paper, we reconstruct multi-choice to single-choice by training a binary classification to distinguish whether a certain answer is correct. Then select the option with the highest confidence score as the final answer. Our proposed method gets rid of the multi-choice framework and can leverage resources of other tasks. We construct our model based on the ALBERT-xxlarge model and evaluate it on the RACE and DREAM datasets. Experimental results show that our model performs better than multi-choice methods. In addition, by transferring knowledge from other kinds of MRC tasks, our model achieves state-of-the-art results in both single and ensemble settings. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 10 pages, 1 figures.This article supersedes arXiv:2011.03292

arXiv:2404.16745 [pdf, other]

Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness

Authors: **g Ouyang, Chengyu Cui, Kean Ming Tan, Gongjun Xu

Abstract: In the era of data explosion, statisticians have been develo** interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wi… ▽ More In the era of data explosion, statisticians have been develo** interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to develo** efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA). △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.15690 [pdf, other]

Neural Proto-Language Reconstruction

Authors: Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen

Abstract: Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neu… ▽ More Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.08948 [pdf, other]

Large Language Models for Mobile GUI Text Input Generation: An Empirical Study

Authors: Chenhui Cui, Tao Li, Junjie Wang, Chunyang Chen, Dave Towey, Rubing Huang

Abstract: Mobile applications (apps) have become an essential part of our daily lives, making ensuring their quality an important activity. GUI testing, a quality assurance method, has frequently been used for mobile apps. When conducting GUI testing, it is important to generate effective text inputs for the text-input components. Some GUIs require these text inputs to move from one page to the next, which… ▽ More Mobile applications (apps) have become an essential part of our daily lives, making ensuring their quality an important activity. GUI testing, a quality assurance method, has frequently been used for mobile apps. When conducting GUI testing, it is important to generate effective text inputs for the text-input components. Some GUIs require these text inputs to move from one page to the next, which remains a challenge to achieving complete UI exploration. Recently, Large Language Models (LLMs) have shown excellent text-generation capabilities. Among the LLMs, OpenAI's GPT series has been widely discussed and used. However, it may not be possible to use these LLMs for GUI testing actual mobile apps, due to the security and privacy issues related to the production data. Therefore, it is necessary to explore the potential of different LLMs to guide text-input generation in mobile GUI testing. This paper reports on a large-scale empirical study that extensively investigates the effectiveness of nine state-of-the-art LLMs in Android text-input generation for UI pages. We collected 114 UI pages from 62 open-source Android apps and extracted contextual information from the UI pages to construct prompts for LLMs to generate text inputs. The experimental results show that some LLMs can generate relatively more effective and higher-quality text inputs, achieving a 50.58% to 66.67% page-pass-through rate, and even detecting some real bugs in open-source apps. Compared with the GPT-3.5 and GPT-4 LLMs, other LLMs reduce the page-pass-through rates by 17.97% to 84.79% and 21.93% to 85.53%, respectively. We also found that using more complete UI contextual information can increase the page-pass-through rates of LLMs for generating text inputs. In addition, we also describe six insights gained regarding the use of LLMs for Android testing: These insights will benefit the Android testing community. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.06054 [pdf, other]

Pseudo MIMO (pMIMO): An Energy and Spectral Efficient MIMO-OFDM System

Authors: Sen Wang, Tianxiong Wang, Shulun Zhao, Zhen Feng, Guangyi Liu, Chunfeng Cui, Chih-Lin I, Jiangzhou Wang

Abstract: This article introduces an energy and spectral efficient multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transmission scheme designed for the future sixth generation (6G) wireless communication networks. The approach involves connecting each receiving radio frequency (RF) chain with multiple antenna elements and conducting sample-level adjustments for receivin… ▽ More This article introduces an energy and spectral efficient multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transmission scheme designed for the future sixth generation (6G) wireless communication networks. The approach involves connecting each receiving radio frequency (RF) chain with multiple antenna elements and conducting sample-level adjustments for receiving beamforming patterns. The proposed system architecture and the dedicated signal processing methods enable the scheme to transmit a bigger number of parallel data streams than the number of receiving RF chains, achieving a spectral efficiency performance close to that of a fully digital (FD) MIMO system with the same number of antenna elements, each equipped with an RF chain. We refer to this system as a ''pseudo MIMO'' system due to its ability to mimic the functionality of additional invisible RF chains. The article begins with introducing the underlying principles of pseudo MIMO and discussing potential hardware architectures for its implementation. We then highlight several advantages of integrating pseudo MIMO into next-generation wireless networks. To demonstrate the superiority of our proposed pseudo MIMO transmission scheme to conventional MIMO systems, simulation results are presented. Additionally, we validate the feasibility of this new scheme by building the first pseudo MIMO prototype. Furthermore, we present some key challenges and outline potential directions for future research. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03789 [pdf, other]

Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture

Authors: Juanwu Lu, Can Cui, Yunsheng Ma, Aniket Bera, Ziran Wang

Abstract: Safety and robustness are crucial factors in develo** trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describe… ▽ More Safety and robustness are crucial factors in develo** trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describes the distribution of future trajectories for a single moving object. Our approach can distinguish Out-of-Distribution data while quantifying uncertainty and achieving competitive performance compared to state-of-the-art methods on the Argoverse 2 and INTERACTION datasets. Specifically, a 0.446 meters minimum Final Displacement Error, a 0.203 meters minimum Average Displacement Error, and a 5.35% Miss Rate are achieved on the INTERACTION test set. Extensive qualitative and quantitative analysis is also provided to evaluate the proposed model. Our open-source code is available at https://github.com/PurdueDigitalTwin/seneva. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024

arXiv:2404.02493 [pdf, other]

A Neural Multigrid Solver for Helmholtz Equations with High Wavenumber and Heterogeneous Media

Authors: Chen Cui, Kai Jiang, Shi Shu

Abstract: Solving high-wavenumber and heterogeneous Helmholtz equations presents a long-standing challenge in scientific computing. In this paper, we introduce a deep learning-enhanced multigrid solver to address this issue. By conducting error analysis on standard multigrid applied to a discrete Helmholtz equation, we devise a strategy to handle errors with different frequencies separately. For error com… ▽ More Solving high-wavenumber and heterogeneous Helmholtz equations presents a long-standing challenge in scientific computing. In this paper, we introduce a deep learning-enhanced multigrid solver to address this issue. By conducting error analysis on standard multigrid applied to a discrete Helmholtz equation, we devise a strategy to handle errors with different frequencies separately. For error components with frequencies distant from the wavenumber, we perform simple smoothing based on local operations at different levels to eliminate them. On the other hand, to address error components with frequencies near the wavenumber, we utilize another multigrid V-cycle to solve an advection-diffusion-reaction (ADR) equation at a coarse scale. The resulting solver, named Wave-ADR-NS, involves parameters learned through unsupervised training. Numerical results demonstrate that Wave-ADR-NS effectively resolves heterogeneous 2D Helmholtz equation with wavenumber up to 2000. Comparative experiments against classical multigrid preconditioners and existing deep learning-based multigrid preconditioners reveals the superior performance of Wave-ADR-NS. △ Less

Submitted 3 April, 2024; originally announced April 2024.

MSC Class: 65N22; 65N55; 68T07

arXiv:2403.17560 [pdf, other]

Anomalous shift in Andreev reflection from side incidence

Authors: Runze Li, Chaoxi Cui, Ying Liu, Zhi-Ming Yu, Shengyuan A. Yang

Abstract: Andreev reflection at a normal-superconductor interface may be accompanied with an anomalous spatial shift. The studies so far are limited to the top incidence configuration. Here, we investigate this effect in the side incidence configuration, with the interface parallel to the principal axis of superconductor. We find that the shift exhibits rich behaviors reflecting the character of pair potent… ▽ More Andreev reflection at a normal-superconductor interface may be accompanied with an anomalous spatial shift. The studies so far are limited to the top incidence configuration. Here, we investigate this effect in the side incidence configuration, with the interface parallel to the principal axis of superconductor. We find that the shift exhibits rich behaviors reflecting the character of pair potential. It has two contributions: one from the $k$-dependent phase of pair potential, and the other from the evanescent mode. For chiral $p$-wave pairing, the pairing phase contribution is proportional to the chirality of pairing and is independent of excitation energy, whereas the evanescent mode contribution is independent of chirality and is nonzero only for excitation energy below the gap. The two contributions also have opposite parity with respect to the incident angle. For $d_{x^{2}-y^{2}}$-wave pairing, only the evanescent mode contribution exists, and the shift exhibits suppressed zones in incident angles, manifesting the superconducting nodes. The dependence of the shift on other factors, such as the angle of incident plane and Fermi surface anisotropy, are discussed. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.13358 [pdf, other]

GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot

Authors: Wenxuan Song, Han Zhao, Pengxiang Ding, Can Cui, Shangke Lyu, Yaning Fan, Donglin Wang

Abstract: Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optima… ▽ More Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optimal data, thus surpassing the limitations of human demonstrations. Thereafter, we employ a transformer-based VLA network to process multi-modal inputs and output actions. By introducing the Mixture-of-Experts structure, GeRM allows faster inference speed with higher whole model capacity, and thus resolves the issue of limited RL parameters, enhancing model performance in multi-task learning while controlling computational costs. Through a series of experiments, we demonstrate that GeRM outperforms other methods across all tasks, while also validating its efficiency in both training and inference processes. Additionally, we uncover its potential to acquire emergent skills. Additionally, we contribute the QUARD-Auto dataset, collected automatically to support our training approach and foster advancements in multi-task quadruped robot learning. This work presents a new paradigm for reducing the cost of collecting robot data and driving progress in the multi-task learning community. You can reach our project and video through the link: https://songwxuan.github.io/GeRM/ . △ Less

Submitted 9 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.10308 [pdf, other]

Eigenvalues of Dual Hermitian Matrices with Application in Formation Control

Authors: Liqun Qi, Chunfeng Cui

Abstract: We propose a supplement matrix method for computing eigenvalues of a dual Hermitian matrix, and discuss its application in multi-agent formation control. Suppose we have a ring, which can be the real field, the complex field, or the quaternion ring. We study dual number symmetric matrices, dual complex Hermitian matrices and dual quaternion Hermitian matrices in a unified frame of dual Hermitian m… ▽ More We propose a supplement matrix method for computing eigenvalues of a dual Hermitian matrix, and discuss its application in multi-agent formation control. Suppose we have a ring, which can be the real field, the complex field, or the quaternion ring. We study dual number symmetric matrices, dual complex Hermitian matrices and dual quaternion Hermitian matrices in a unified frame of dual Hermitian matrices. An $n \times n$ dual Hermitian matrix has $n$ dual number eigenvalues. We define determinant, characteristic polynomial and supplement matrices for a dual Hermitian matrix. Supplement matrices are Hermitian matrices in the original ring. The standard parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of the standard part Hermitian matrix in the original ring, while the dual parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of those supplement matrices. Hence, by applying any practical method for computing eigenvalues of Hermitian matrices in the original ring, we have a practical method for computing eigenvalues of a dual Hermitian matrix. We call this method the supplement matrix method. In multi-agent formation control, a desired relative configuration scheme may be given. People need to know if this scheme is reasonable such that a feasible solution of configurations of these multi-agents exists. By exploring the eigenvalue problem of dual Hermitian matrices, and its link with the unit gain graph theory, we open a cross-disciplinary approach to solve the relative configuration problem. Numerical experiments are reported. △ Less

Submitted 1 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.12988

arXiv:2403.09094 [pdf, other]

Digitization of Astronomical Photographic Plate of China and Astrometric Measurement of Single-exposure Plates

Authors: Zheng-Jun Shang, Yong Yu, Liang-Liang Wang, Mei-Ting Yang, **g Yang, Shi-Yin Shen, Min Liu, Quan-Feng Xu, Chen-Zhou Cui, Dong-Wei Fan, Zheng-Hong Tang, Jian-Hai Zhao

Abstract: From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates have been captured. These historical plates play an irreplaceable role in conducting long-term, time-domain ast… ▽ More From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates have been captured. These historical plates play an irreplaceable role in conducting long-term, time-domain astronomical research. To preserve and explore these valuable original astronomical observational data, Shanghai Astronomical Observatory has organized the transportation of plates taken at night from various stations across the country to the Sheshan Plate Archive for centralized preservation. For the first time, plate information statistics was performed. On this basis, the plates were cleaned and digitally scanned, and finally digitized images were acquired for 29,314 plates. In this study, using Gaia DR2 as the reference star catalog, astrometric processing has been carried out successfully on 15,696 single-exposure plates, including object extraction, stellar identification, and plate model computation. As a result, for long focal length telescopes, such as the 40cm double-tube refractor telescope and the 1.56m reflector telescope at the Shanghai Astronomical Observatory and the 1m reflector telescope at the Yunnan Astronomical Observatory, the astrometric accuracy obtained for their plates is approximately 0.1" to 0.3". The distribution of astrometric accuracy for medium and short focal length telescopes ranges from 0.3" to 1.0". The relevant data of this batch of plates, including digitized images and stellar catalog of the plates are archived and released by the National Astronomical Data Center. Users can access and download plate data based on keywords such as station, telescope, observation year, and observed celestial coordinates. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted for Research in Astronomy and Astrophysics, 17 pages, 14 figures, 6 tables. Database, https://nadc.china-vo.org/res/r100742/

arXiv:2403.08167 [pdf, other]

MolBind: Multimodal Alignment of Language, Molecules, and Proteins

Authors: Teng Xiao, Chao Cui, Huaisheng Zhu, Vasant G. Honavar

Abstract: Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) re… ▽ More Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) remains challenging due to inherent gaps among them. In this work, we propose MolBind, a framework that trains encoders for multiple modalities through contrastive learning, map** all modalities to a shared feature space for multi-modal semantic alignment. To facilitate effective pre-training of MolBind on multiple modalities, we also build and collect a high-quality dataset with four modalities, MolBind-M4, including graph-language, conformation-language, graph-conformation, and conformation-protein paired data. MolBind shows superior zero-shot learning performance across a wide range of tasks, demonstrating its strong capability of capturing the underlying semantics of multiple modalities. △ Less

Submitted 2 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06570 [pdf, other]

Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications

Authors: Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent

Abstract: Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life app… ▽ More Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life applications involving Voice Activity Detection (VAD), Speaker Diarization (SD), and SA-ASR. Second, we advocate using VAD output segments to fine-tune the SA-ASR model, considering that it is also applied to VAD segments during test, and show that this results in a relative reduction of Speaker Error Rate (SER) up to 28%. Finally, we explore strategies to enhance the extraction of the speaker embedding templates used as inputs by the SA-ASR system. We show that extracting them from SD output rather than annotated speaker segments results in a relative SER reduction up to 20%. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Submitted to Odyssey 2024

arXiv:2403.01145 [pdf, other]

Mirror real Chern insulator in two and three dimensions

Authors: Yang Wang, Chaoxi Cui, Run-Wu Zhang, Xiaotian Wang, Zhi-Ming Yu, Gui-Bin Liu, Yugui Yao

Abstract: A real Chern insulator (RCI) featuring a real Chern number and a second-order boundary mode appears in a two-dimensional (2D) system with the space-time inversion symmetry (PT ). Here, we propose a kind of RCI: mirror real Chern insulator (MRCI) which emerges from the system having additional horizontal mirror symmetry Mz. The MRCI generally is characterized by two independent real Chern numbers,… ▽ More A real Chern insulator (RCI) featuring a real Chern number and a second-order boundary mode appears in a two-dimensional (2D) system with the space-time inversion symmetry (PT ). Here, we propose a kind of RCI: mirror real Chern insulator (MRCI) which emerges from the system having additional horizontal mirror symmetry Mz. The MRCI generally is characterized by two independent real Chern numbers, respectively defined in the two mirror subsystems of the system. Hence, the MRCI may host the second-order boundary modes different from the conventional RCI. We show that for spinless systems, the definition of the MRCI is straightforward, as PT keeps each mirror subsystem invariant. For the spinful systems with both PT and Mz, the real Chern number for the total system remain well defined, as MzPT = C2zT , and (C2zT )2= 1. However, since C2zT exchanges the two mirror subsystems, the definition of the MRCI in spinful systems requires the help of projective symmetry algebra. We also discuss the MRCIs in 3D systems, where the MRCI is defined on certain mirror-invariant 2D planes. Compared with its 2D counterpart, the 3D MRCI can exhibit more abundant physics when the systems have additional nonsymmorphic operators. Several concrete MRCI models including 2D and 3D, spinless and spinful models are constructed to further demonstrate our ideas. △ Less

Submitted 6 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.00371 [pdf, other]

Quasi-one-dimensional spin transport in altermagnetic $Z^3$ nodal net metals

Authors: Tingli He, Lei Li, Chaoxi Cui, Run-Wu Zhang, Zhi-Ming Yu, Guodong Liu, Xiaoming Zhang

Abstract: In three dimensions, quasi-one-dimensional (Q1D) transport has traditionally been associated with systems featuring a Q1D chain structure. Here, based on first-principle calculations, we go beyond the common belief to show that the Q1D transport can also be realized in many three-dimensional (3D) altermagnetic (AM) metals with a topological nodal net in momentum space but lacking Q1D chain structu… ▽ More In three dimensions, quasi-one-dimensional (Q1D) transport has traditionally been associated with systems featuring a Q1D chain structure. Here, based on first-principle calculations, we go beyond the common belief to show that the Q1D transport can also be realized in many three-dimensional (3D) altermagnetic (AM) metals with a topological nodal net in momentum space but lacking Q1D chain structure in real space, including the existing compounds $β$-Fe$_2$(PO$_4$)O, Co$_2$(PO$_4$)O, and LiTi$_2$O$_4$. These materials exhibit an AM ground state and feature an ideal crossed $Z^3$ Weyl nodal line in each spin channel, formed by three straight and flat nodal lines traversing the entire Brillouin zone. These nodal lines eventually lead to an AM $Z^3$ nodal net. Surprisingly, longitudinal conductivity $σ_{xx}$ in these topological nodal net metals is dozens of times larger than $σ_{yy}$ and $σ_{zz}$ in the up-spin channel, while $σ_{yy}$ dominates transport in the down-spin channel. This suggests a distinctive Q1D transport signature in each spin channel, with orthogonal principal moving directions for the two spin channels, resulting in Q1D direction-dependent spin transport. This novel phenomenon cannot be found in both conventional 3D bulk materials and Q1D chain materials. In particular, it gradually disappears as the Fermi level moves away from the nodal net, further confirming its topological origin. Our work not only enhances the comprehension of topological physics in altermagnets but also opens a new direction for the exploration of topological spintronics. △ Less

Submitted 3 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.19286 [pdf, other]

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intricate spatial interrelations among objects from clinical knowledge. In this research, we introduce a novel universal proposition learning approach, called panoramic renal pathology segmentation (PrPSeg), designed to segment comprehensively panoramic structures within kidney by integrating extensive knowledge of kidney anatomy. In this paper, we propose (1) the design of a comprehensive universal proposition matrix for renal pathology, facilitating the incorporation of classification and spatial relationships into the segmentation process; (2) a token-based dynamic head single network architecture, with the improvement of the partial label image segmentation and capability for future data enlargement; and (3) an anatomy loss function, quantifying the inter-object relationships across the kidney. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

arXiv:2402.15771 [pdf, ps, other]

Inertial Accelerated Stochastic Mirror Descent for Large-Scale Generalized Tensor CP Decomposition

Authors: Zehui Liu, Qingsong Wang, Chunfeng Cui, Yong Xia

Abstract: The majority of classic tensor CP decomposition models are designed for squared loss, employing Euclidean distance as a local proximal term. However, the Euclidean distance is unsuitable for the generalized loss function applicable to various types of real-world data, such as integer and binary data. Consequently, algorithms developed under the squared loss are not easily adaptable to handle these… ▽ More The majority of classic tensor CP decomposition models are designed for squared loss, employing Euclidean distance as a local proximal term. However, the Euclidean distance is unsuitable for the generalized loss function applicable to various types of real-world data, such as integer and binary data. Consequently, algorithms developed under the squared loss are not easily adaptable to handle these generalized losses, partially due to the lack of the gradient Lipschitz continuity. This paper considers the generalized tensor CP decomposition. We use the Bregman distance as the proximal term and propose an inertial accelerated block randomized stochastic mirror descent algorithm (iTableSMD). Within a broader multi-block variance reduction and inertial acceleration framework, we demonstrate the sublinear convergence rate for the subsequential sequence produced by the iTableSMD algorithm. We further show that iTableSMD requires at most O(ε^{-2}) iterations in expectation to attain an ε-stationary point and establish the global convergence of the sequence. Numerical experiments on real datasets demonstrate that our proposed algorithm is efficient and achieve better performance than the existing state-of-the-art methods. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.12988 [pdf, other]

Spectral Properties of Dual Unit Gain Graphs

Authors: Chunfeng Cui, Yong Lu, Liqun Qi, Ligong Wang

Abstract: In this paper, we study dual quaternion and dual complex unit gain graphs and their spectral properties in a unified frame of dual unit gain graphs. Unit dual quaternions represent rigid movements in the 3D space, and have wide applications in robotics and computer graphics. Dual complex numbers found application in brain science recently. We establish the interlacing theorem for dual unit gain gr… ▽ More In this paper, we study dual quaternion and dual complex unit gain graphs and their spectral properties in a unified frame of dual unit gain graphs. Unit dual quaternions represent rigid movements in the 3D space, and have wide applications in robotics and computer graphics. Dual complex numbers found application in brain science recently. We establish the interlacing theorem for dual unit gain graphs, and show that the spectral radius of a dual unit gain graph is always not greater than the spectral radius of the underlying graph, and these two radii are equal if and only if the dual gain graph is balanced. By using the dual cosine functions, we establish the closed form of eigenvalues of adjacency and Laplacian matrices of dual complex and quaternion unit gain cycles. We then show the coefficient theorem holds for dual unit gain graphs. Similar results hold for the spectral radius of the Laplacian matrix of the dual unit gain graph too. △ Less

Submitted 7 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12413 [pdf]

Global Data in Astronomy: Challenges and Opportunities

Authors: Renee Hložek, Chenzhou Cui, Mark Allen, Patricia Whitelock, Jess McIver, Giuseppe Longo, Christopher Fluke, Ajit Kembhavi, Pranav Sharma, Ashish Mahabal

Abstract: Policy Brief on "Global Data in Astronomy: Challenges and Opportunities", distilled from the corresponding panel that was part of the discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023. Astronomy is increasingly becoming a data-driven science. Advances in our understanding of the physical mechanisms at work in the Universe require building… ▽ More Policy Brief on "Global Data in Astronomy: Challenges and Opportunities", distilled from the corresponding panel that was part of the discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023. Astronomy is increasingly becoming a data-driven science. Advances in our understanding of the physical mechanisms at work in the Universe require building ever-more sensitive telescopes to gather observations of the cosmos to test and advance our theoretical models of how the universe works. To confront the observed data with our theoretical models we require data hosting, archiving and storage and high-performance computing resources to run the theoretical calculations and compare our simulated and observed universe. We also require the sophisticated development of highly skilled human resources. Newer large projects are often run through international collaborations and partnerships, driving a need for 'open science' and collaborative structure across national boundaries. While astronomical data are useful scientifically, the data do not come with the same ethical/privacy-related restrictions as medical/biological data. Moreover, the ability to use data for new scientific analysis extends and expands the impact and reach of scientific surveys -- this is a strength that national funding agencies should capitalize on. We discuss the management and analysis of such large volumes of data and the corresponding significant challenges that require policy-level preparations. The policy webinar took place during the G20 presidency in India (2023). A summary based on the seven panels can be found here: arxiv:2401.04623. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 5 pages. The panel videos including keynotes and the white papers are available on the S20 site at: https://s20india.org/science-policy-webinar-astroinformatics-for-sustainable-development/

arXiv:2402.11411 [pdf, other]

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Authors: Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Abstract: Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and… ▽ More Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and can cause the model to hallucinate - provide answers that do not accurately reflect the image, even when the core LLM is highly factual and the vision backbone has sufficiently complete representations. In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning. Specifically, we propose POVID to generate feedback data with AI models. We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data. First, we prompt GPT-4V to inject plausible hallucinations into the correct answer. Second, we distort the image to trigger the inherent hallucination behavior of the VLLM. This is an automated approach, which does not rely on human data generation or require a perfect expert, which makes it easily scalable. Finally, both of these generation strategies are integrated into an RLHF pipeline via Direct Preference Optimization. In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches. Our data and code are available at https://github.com/YiyangZhou/POVID. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10042 [pdf, ps, other]

Dynamics of cold circumstellar gas in debris disks

Authors: Can Cui, Sebastian Marino, Quentin Kral, Henrik Latter

Abstract: Mounting observational evidence indicates that cold circumstellar gas is present in debris disk systems. This work focuses on various dynamical processes that debris-disk gas may undergo. We review five mechanisms that can transport angular momentum and their applications to debris disks. These include molecular viscosity, hydrodynamic turbulence, magnetohydrodynamic turbulence, magnetized disk wi… ▽ More Mounting observational evidence indicates that cold circumstellar gas is present in debris disk systems. This work focuses on various dynamical processes that debris-disk gas may undergo. We review five mechanisms that can transport angular momentum and their applications to debris disks. These include molecular viscosity, hydrodynamic turbulence, magnetohydrodynamic turbulence, magnetized disk winds, and laminar magnetic stress. We find that molecular viscosity can result in $α$ as high as $\lesssim 0.1$ for sufficiently low densities, while the Rossby wave instability is a possible source of hydrodynamic turbulence and structure formation. We argue that the vertical shear instability is unlikely due to the long cooling times. The onset of the magnetorotational instability (MRI) is dichotomous: for low density disks the MRI can be excited at the midplane, while for high mass disks it may only be operating at $z>2-3H$, if at all. The MHD wind and laminar magnetic stress mechanisms rely on the configuration and strength of any background large-scale magnetic field, the existence of which is uncertain and possibly unlikely. We conclude that the dominant mechanism and its efficiency in transporting angular momentum varies from one system to the other, depending especially closely on the gas density. More detailed analyses shall be performed in the future focusing on representative, nearby debris disks. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 16 pages, 9 figures, submitted to MNRAS and revised

arXiv:2402.02442 [pdf, other]

A Momentum Accelerated Algorithm for ReLU-based Nonlinear Matrix Decomposition

Authors: Qingsong Wang, Chunfeng Cui, Deren Han

Abstract: Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD),… ▽ More Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD), we propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. Subsequently, we introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. A distinctive feature, setting our work apart from most existing studies, is the incorporation of both positive and negative momentum parameters in our algorithm. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm. Moreover, the code is available at https://github.com/nothing2wang/NMD-TM. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 5 pages, 7 figures

arXiv:2401.16150 [pdf]

Sliding ferroelectric memories and synapses

Authors: Xiuzhen Li, Biao Qin, Yaxian Wang, Yue Xi, Zhiheng Huang, Mengze Zhao, Yalin Peng, Zitao Chen, Zitian Pan, Jundong Zhu, Chenyang Cui, Rong Yang, Wei Yang, Sheng Meng, Dongxia Shi, Xuedong Bai, Can Liu, Na Li, Jianshi Tang, Kaihui Liu, Luojun Du, Guangyu Zhang

Abstract: Ferroelectric materials with switchable electric polarization hold great promise for a plethora of emergent applications, such as post-Moore's law nanoelectronics, beyond-Boltzmann transistors, non-volatile memories, and above-bandgap photovoltaic devices. Recent advances have uncovered an exotic sliding ferroelectric mechanism, which endows to design atomically thin ferroelectrics from non-ferroe… ▽ More Ferroelectric materials with switchable electric polarization hold great promise for a plethora of emergent applications, such as post-Moore's law nanoelectronics, beyond-Boltzmann transistors, non-volatile memories, and above-bandgap photovoltaic devices. Recent advances have uncovered an exotic sliding ferroelectric mechanism, which endows to design atomically thin ferroelectrics from non-ferroelectric parent monolayers. Although notable progress has been witnessed in understanding its fundamental properties, functional devices based on sliding ferroelectrics, the key touchstone toward applications, remain elusive. Here, we demonstrate the rewritable, non-volatile memory devices at room-temperature utilizing a two-dimensional (2D) sliding ferroelectric semiconductor of rhombohedral-stacked bilayer molybdenum disulfide. The 2D sliding ferroelectric memories (SFeMs) show superior performances with a large memory window of >8V, a high conductance ratio of above 106, a long retention time of >10 years, and a programming endurance greater than 104 cycles. Remarkably, flexible SFeMs are achieved with state-of-the-art performances competitive to their rigid counterparts and maintain their performances post bending over 103 cycles. Furthermore, synapse-specific Hebbian forms of plasticity and image recognition with a high accuracy of 97.81% are demonstrated based on flexible SFeMs. Our work demonstrates the sliding ferroelectric memories and synaptic plasticity on both rigid and flexible substrates, highlighting the great potential of sliding ferroelectrics for emerging technological applications in brain-inspired in-memory computing, edge intelligence and energy-efficient wearable electronics. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 16 pages, 4 figures

arXiv:2401.13090 [pdf, other]

Variational Estimation for Multidimensional Generalized Partial Credit Model

Authors: Chengyu Cui, Chun Wang, Gongjun Xu

Abstract: Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational e… ▽ More Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model (MGPCM). The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Showing 1–50 of 328 results for author: Cui, C