-
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization
Authors:
Yucheng Tang,
Yufan He,
Vishwesh Nath,
Pengfeig Guo,
Ruining Deng,
Tianyuan Yao,
Quan Liu,
Can Cui,
Mengmeng Yin,
Ziyue Xu,
Holger Roth,
Daguang Xu,
Haichun Yang,
Yuankai Huo
Abstract:
In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this…
▽ More
In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Rossby wave instability in weakly ionized protoplanetary disks. I. azimuthal or vertical B-fields
Authors:
Can Cui,
Ashutosh Tripathi,
Cong Yu,
Min-Kai Lin,
Andrew Youdin
Abstract:
Rossby wave instability (RWI) is considered the underlying mechanism to crescent-shaped azimuthal asymmetries, discovered in (sub-)millimeter dust continuum of many protoplanetary disks. Previous works on linear theory were conducted in the hydrodynamic limit. Nevertheless, protoplanetary disks are likely magnetized and weakly ionized. We examine the influence of magnetic fields and non-ideal magn…
▽ More
Rossby wave instability (RWI) is considered the underlying mechanism to crescent-shaped azimuthal asymmetries, discovered in (sub-)millimeter dust continuum of many protoplanetary disks. Previous works on linear theory were conducted in the hydrodynamic limit. Nevertheless, protoplanetary disks are likely magnetized and weakly ionized. We examine the influence of magnetic fields and non-ideal magnetohydrodynamic (MHD) effects - namely, Ohmic resistivity, Hall drift, and ambipolar diffusion - on the RWI unstable modes. We perform radially global linear analyses, employing constant azimuthal ($B_φ$) or vertical ($B_z$) background magnetic fields. It is found that, in the ideal MHD regime, magnetism can either enhance or diminish RWI growth. Strong non-ideal MHD effects cause RWI growth rates to recover hydrodynamic results. The sign of Hall Elsässer number subtly complicates the results, and vertical wavenumbers generically diminish growth rates.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis
Authors:
Ruining Deng,
Quan Liu,
Can Cui,
Tianyuan Yao,
Juming Xiong,
Shunxing Bao,
Hao Li,
Mengmeng Yin,
Yu Wang,
Shilin Zhao,
Yucheng Tang,
Haichun Yang,
Yuankai Huo
Abstract:
Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel…
▽ More
Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Extended alternating structure-adapted proximal gradient algorithm for nonconvex nonsmooth problems
Authors:
Ying Gao,
Chunfeng Cui,
Wenxing Zhang,
Deren Han
Abstract:
Alternating structure-adapted proximal (ASAP) gradient algorithm (M. Nikolova and P. Tan, SIAM J Optim, 29:2053-2078, 2019) has drawn much attention due to its efficiency in solving nonconvex nonsmooth optimization problems. However, the multiblock nonseparable structure confines the performance of ASAP to far-reaching practical problems, e.g., coupled tensor decomposition. In this paper, we propo…
▽ More
Alternating structure-adapted proximal (ASAP) gradient algorithm (M. Nikolova and P. Tan, SIAM J Optim, 29:2053-2078, 2019) has drawn much attention due to its efficiency in solving nonconvex nonsmooth optimization problems. However, the multiblock nonseparable structure confines the performance of ASAP to far-reaching practical problems, e.g., coupled tensor decomposition. In this paper, we propose an extended ASAP (eASAP) algorithm for nonconvex nonsmooth optimization whose objective is the sum of two nonseperable functions and a coupling one. By exploiting the blockwise restricted prox-regularity, eASAP is capable of minimizing the objective whose coupling function is multiblock nonseparable. Moreover, we analyze the global convergence of eASAP by virtue of the Aubin property on partial subdifferential map** and the Kurdyka-Łojasiewicz property on the objective. Furthermore, the sublinear convergence rate of eASAP is built upon the proximal point algorithmic framework under some mild conditions. Numerical simulations on multimodal data fusion demonstrate the compelling performance of the proposed method.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Variational Monte Carlo Study of the Doped $t$-$J$ Model on Honeycomb Lattice
Authors:
Can Cui,
**g-Yu Zhao,
Zheng-Yu Weng
Abstract:
The ground state of the bipartite $t$-$J$ model must satisfy a specific sign structure, based on which the single-hole and two-hole ground state $Ans\ddot{a}tze$ on honeycomb lattice are constructed and studied by a variational Monte Carlo (VMC) method. The VMC results are in good agreement with the exact diagonalization (ED) calculation. For the single-hole case, the degenerate ground states are…
▽ More
The ground state of the bipartite $t$-$J$ model must satisfy a specific sign structure, based on which the single-hole and two-hole ground state $Ans\ddot{a}tze$ on honeycomb lattice are constructed and studied by a variational Monte Carlo (VMC) method. The VMC results are in good agreement with the exact diagonalization (ED) calculation. For the single-hole case, the degenerate ground states are characterized by quantum numbers of a spin-1/2 and an orbital angular momentum $L_z=\pm 2$. The latter is associated with the emergent chiral spin/hole currents mutually surrounding the hole/spin-1/2 as a composite object or ``twisted hole''. A vanishing quasiparticle spectral weight is shown in the large-sample limit. In the two-hole ground state, the holes form a spin-singlet pairing with $d$+$id$ symmetry in the Cooper channel, but are of $s$-wave symmetry as a tightly bound pair of the ``twisted holes''. Such a pairing mechanism of dichotomy can be attributed to eliminating the local spin currents which has nothing to do with the long-range antiferromagnetic correlation. Superconducting ground state at finite do** is briefly discussed in terms of the tightly bound hole pairs as the building blocks.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
An All-MLP Sequence Modeling Architecture That Excels at Copying
Authors:
Chenwei Cui,
Zehao Yan,
Gedeon Muhawenayo,
Hannah Kerner
Abstract:
Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while…
▽ More
Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while maintaining computational feasibility. We discovered that exponentially-activated RNs are reducible to linear time complexity, and pre-activation normalization induces an infinitely growing memory pool, similar to a KV cache. In ablation study, we found both exponential activation and pre-activation normalization are indispensable for Transformer-level copying. Our findings provide new insights into what actually constitutes strong in-context retrieval.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
On Physics-Informed Neural Network Control for Power Electronics
Authors:
Peifeng Hui,
Chenggang Cui,
Pengfeng Lin,
Amer M. Y. M. Ghias,
Xitong Niu,
Chuanlin Zhang
Abstract:
Considering the growing necessity for precise modeling of power electronics amidst operational and environmental uncertainties, this paper introduces an innovative methodology that ingeniously combines model-driven and data-driven approaches to enhance the stability of power electronics interacting with grid-forming microgrids. By employing the physics-informed neural network (PINN) as a foundatio…
▽ More
Considering the growing necessity for precise modeling of power electronics amidst operational and environmental uncertainties, this paper introduces an innovative methodology that ingeniously combines model-driven and data-driven approaches to enhance the stability of power electronics interacting with grid-forming microgrids. By employing the physics-informed neural network (PINN) as a foundation, this strategy merges robust data-fitting capabilities with fundamental physical principles, thereby constructing an accurate system model. By this means, it significantly enhances the ability to understand and replicate the dynamics of power electronics systems under complex working conditions. Moreover, by incorporating advanced learning-based control methods, the proposed method is enabled to make precise predictions and implement the satisfactory control laws even under serious uncertain conditions. Experimental validation demonstrates the effectiveness and robustness of the proposed approach, highlighting its substantial potential in addressing prevalent uncertainties in controlling modern power electronics systems.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks
Authors:
Tao Wu,
Canyixing Cui,
Shaojie Qiao,
Chao Wang,
Lin Yuan,
Shui Yu
Abstract:
Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios. Therefore, the design of robust GNNs has attracted increasing attention. However, existing research has mainly been conducted via experimental trial and error, and thus far, there remains a…
▽ More
Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios. Therefore, the design of robust GNNs has attracted increasing attention. However, existing research has mainly been conducted via experimental trial and error, and thus far, there remains a lack of a comprehensive understanding of the vulnerability of GNNs. To address this limitation, we systematically investigate the adversarial robustness of GNNs by considering graph data patterns, model-specific factors, and the transferability of adversarial examples. Through extensive experiments, a set of principled guidelines is obtained for improving the adversarial robustness of GNNs, for example: (i) rather than highly regular graphs, the training graph data with diverse structural patterns is crucial for model robustness, which is consistent with the concept of adversarial training; (ii) the large model capacity of GNNs with sufficient training data has a positive effect on model robustness, and only a small percentage of neurons in GNNs are affected by adversarial attacks; (iii) adversarial transfer is not symmetric and the adversarial examples produced by the small-capacity model have stronger adversarial transferability. This work illuminates the vulnerabilities of GNNs and opens many promising avenues for designing robust GNNs.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
GraphMU: Repairing Robustness of Graph Neural Networks via Machine Unlearning
Authors:
Tao Wu,
Xinwen Cao,
Chao Wang,
Shaojie Qiao,
Lin Yuan,
Canyixing Cui,
Yanbing Liu
Abstract:
Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need…
▽ More
Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need for a method to repair poisoned GNN. In this paper, we address this gap by introducing the novel concept of model repair for GNNs. We propose a repair framework, Repairing Robustness of Graph Neural Networks via Machine Unlearning (GraphMU), which aims to fine-tune poisoned GNN to forget adversarial samples without the need for complete retraining. We also introduce a unlearning validation method to ensure that our approach effectively forget specified poisoned data. To evaluate the effectiveness of GraphMU, we explore three fine-tuned subgraph construction scenarios based on the available perturbation information: (i) Known Perturbation Ratios, (ii) Known Complete Knowledge of Perturbations, and (iii) Unknown any Knowledge of Perturbations. Our extensive experiments, conducted across four citation datasets and four adversarial attack scenarios, demonstrate that GraphMU can effectively restore the performance of poisoned GNN.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics
Authors:
Chenggang Cui,
Jiaming Liu,
Junkang Feng,
Peifeng Hui,
Amer M. Y. M. Ghias,
Chuanlin Zhang
Abstract:
Power electronics, a critical component in modern power systems, face several challenges in control design, including model uncertainties, and lengthy and costly design cycles. This paper is aiming to propose a Large Language Models (LLMs) based multi-agent framework for objective-oriented control design in power electronics. The framework leverages the reasoning capabilities of LLMs and a multi-a…
▽ More
Power electronics, a critical component in modern power systems, face several challenges in control design, including model uncertainties, and lengthy and costly design cycles. This paper is aiming to propose a Large Language Models (LLMs) based multi-agent framework for objective-oriented control design in power electronics. The framework leverages the reasoning capabilities of LLMs and a multi-agent workflow to develop an efficient and autonomous controller design process. The LLM agent is able to understand and respond to high-level instructions in natural language, adapting its behavior based on the task's specific requirements and constraints from a practical implementation point of view. This novel and efficient approach promises a more flexible and adaptable controller design process in power electronics that will largely facilitate the practitioners.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
WebCanvas: Benchmarking Web Agents in Online Environments
Authors:
Yichen Pan,
Dehan Kong,
Sida Zhou,
Cheng Cui,
Yifei Leng,
Bing Jiang,
Hangyu Liu,
Yanyi Shang,
Shuyan Zhou,
Tongshuang Wu,
Zhengyang Wu
Abstract:
For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web…
▽ More
For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web interactions. WebCanvas contains three main components to facilitate realistic assessments: (1) A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements. (2) A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states; (3) Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset. Building on WebCanvas, we open-source an agent framework with extensible modules for reasoning, providing a foundation for the community to conduct online inference and evaluations. Our best-performing agent achieves a task success rate of 23.1% and a task completion rate of 48.8% on the Mind2Web-Live test set. Additionally, we analyze the performance discrepancies across various websites, domains, and experimental environments. We encourage the community to contribute further insights on online agent evaluation, thereby advancing this field of research.
△ Less
Submitted 27 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Planar Hall Plateau in Magnetic Weyl Semimetals
Authors:
Lei Li,
Chaoxi Cui,
Run-Wu Zhang,
Zhi-Ming Yu,
Yugui Yao
Abstract:
Despite the rapid progress in the study of planar Hall effect (PHE) in recent years, all the previous works only showed that the PHE is connected to local geometric quantities, such as Berry curvature. Here, for the first time, we point out that the PHE in magnetic Weyl semimetals is directly related to a global quantity, namely, the Chern number of the Weyl point. This leads to a remarkable conse…
▽ More
Despite the rapid progress in the study of planar Hall effect (PHE) in recent years, all the previous works only showed that the PHE is connected to local geometric quantities, such as Berry curvature. Here, for the first time, we point out that the PHE in magnetic Weyl semimetals is directly related to a global quantity, namely, the Chern number of the Weyl point. This leads to a remarkable consequence that the PHE observation predicted here is robust against many system details, including the Fermi energy. The main difference between non-magnetic and magnetic Weyl points is that the latter breaks time-reversal symmetry T, thus generally possessing an energy tilt. Via semiclassical Boltzmann theory, we investigate the PHE in generic magnetic Weyl models with energy tilt and arbitrary Chern number. We find that by aligning the magnetic and electric fields in the same direction, the trace of the PHE conductivity contributed from Berry curvature and orbital moment is proportional to the Chern number and the energy tilt of the Weyl points, resulting in previously undiscovered quantized PHE plateau by varying Fermi energy. We further confirm the existence of PHE plateaus in a more realistic lattice model without T symmetry. By proposing a new quantized physical quantity, our work not only provides a new tool for extracting the topological character of the Weyl points but also suggests that the interplay between topology and magnetism can give rise to intriguing physics.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Kinematic Model of Magnetic Domain Wall Motion for Fast, High-Accuracy Simulations
Authors:
Kristi Doleh,
Leonard Humphrey,
Chandler M. Linseisen,
Michael D. Kitcher,
Joanna M. Martin,
Can Cui,
Jean Anne C. Incorvia,
Felipe Garcia-Sanchez,
Naimul Hassan,
Alexander J. Edwards,
Joseph S. Friedman
Abstract:
Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the ph…
▽ More
Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the physics, drastically reducing model accuracy. We propose a DW model inspired by the phenomenological similarities between motions of a DW and a classical object being acted on by forces like air resistance or static friction. Our proposed phenomenological model predicts DW motion within 1.2% on average compared with micromagnetic simulations that are 400 times slower. Additionally our model is seven times faster than extant collective coordinate models and 14 times more accurate than extant hyper-reduced models making it an essential tool for large-scale DW circuit design and simulation. The model is publicly posted along with scripts that automatically extract model parameters from user-provided simulation or experimental data to extend the model to alternative micromagnetic parameters.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
An implementation of tensor product patch smoothers on GPU
Authors:
Cu Cui,
Paul Grosse-Bley,
Guido Kanschat,
Robert Strzodka
Abstract:
We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a c…
▽ More
We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a conflict free memory access pattern. Performance tests demonstrate that the optimized kernel is at least 2 times faster than the straightforward implementation for the Poisson problem, across various polynomial degrees in 2D and 3D, achieving up to 36% of the peak performance in both single and double precision on Nvidia A100 GPU.
△ Less
Submitted 30 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Multilevel Interior Penalty Methods on GPUs
Authors:
Cu Cui,
Guido Kanschat
Abstract:
We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 3…
▽ More
We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 39% of the peak performance on Nvidia A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and MPI parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.
△ Less
Submitted 30 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Authors:
Quan Liu,
Ruining Deng,
Can Cui,
Tianyuan Yao,
Vishwesh Nath,
Yucheng Tang,
Yuankai Huo
Abstract:
Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g.,…
▽ More
Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (global-to-local) and the development of a WSI-level image-text representation (local-to-global) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
JUNO Sensitivity to Invisible Decay Modes of Neutrons
Authors:
JUNO Collaboration,
Angel Abusleme,
Thomas Adam,
Kai Adamowicz,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli,
Daniel Bick
, et al. (635 additional authors not shown)
Abstract:
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode…
▽ More
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Electric Hall Effect and Quantum Electric Hall Effect
Authors:
Chaoxi Cui,
Run-Wu Zhang,
Yilin Han,
Zhi-Ming Yu,
Yugui Yao
Abstract:
Exploring new Hall effect is always a fascinating research topic. The ordinary Hall effect and the quantum Hall effect, initially discovered in two-dimensional (2D) non-magnetic systems, are the phenomena that a transverse current is generated when a system carrying an electron current is placed in a magnetic field perpendicular to the currents. In this work, we propose the electric counterparts o…
▽ More
Exploring new Hall effect is always a fascinating research topic. The ordinary Hall effect and the quantum Hall effect, initially discovered in two-dimensional (2D) non-magnetic systems, are the phenomena that a transverse current is generated when a system carrying an electron current is placed in a magnetic field perpendicular to the currents. In this work, we propose the electric counterparts of these two Hall effects, termed as electric Hall effect (EHE) and quantum electric Hall effect (QEHE). The EHE and QEHE emerge in 2D magnetic systems, where the transverse current is generated by applying an electric gate-field instead of a magnetic field. We present a symmetry requirement for intrinsic EHE and QEHE. With a weak gate-field, we establish an analytical expression of the intrinsic EHE coefficient. We show that it is determined by intrinsic band geometric quantities: Berry curvature and its polarizability which consists of both intraband and interband layer polarization. Via first-principles calculations, we investigate the EHE in the monolayer Ca(FeN)$_2$, where significant EHE coefficient is observed around band crossings. Furthermore, we demonstrate that the QEHE can appear in the semiconductor monolayer $\rm BaMn_2S_3$, of which the Hall conductivity exhibits steps that take on the quantized values $0$ and $\pm1$ in the unit of $e^2/h$ by varying the gate-field within the experimentally achievable range. Due to the great tunability of the electric gate-field, the EHE and QEHE proposed here can be easily controlled and should have more potential applications.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Calibrated Self-Rewarding Vision Language Models
Authors:
Yiyang Zhou,
Zhiyuan Fan,
Dongjie Cheng,
Sihan Yang,
Zhaorun Chen,
Chenhang Cui,
Xiyao Wang,
Yun Li,
Linjun Zhang,
Huaxiu Yao
Abstract:
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. T…
▽ More
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This misalignment arises because the model tends to prioritize textual information over visual input, even when both the language model and visual representations are of high quality. Existing methods leverage additional models or human annotations to curate preference data and enhance modality alignment through preference optimization. These approaches may not effectively reflect the target LVLM's preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR enhances performance and reduces hallucinations across ten benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm. Additionally, CSR shows compatibility with different vision-language models and the ability to incrementally improve performance through iterative fine-tuning. Our data and code are available at https://github.com/YiyangZhou/CSR.
△ Less
Submitted 31 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Computational predictions of hydrogen-assisted fatigue crack growth
Authors:
C. Cui,
P. Bortot,
M. Ortolani,
E. Martínez-Pañeda
Abstract:
A new model is presented to predict hydrogen-assisted fatigue. The model combines a phase field description of fracture and fatigue, stress-assisted hydrogen diffusion, and a toughness degradation formulation with cyclic and hydrogen contributions. Hydrogen-assisted fatigue crack growth predictions exhibit an excellent agreement with experiments over all the scenarios considered, spanning multiple…
▽ More
A new model is presented to predict hydrogen-assisted fatigue. The model combines a phase field description of fracture and fatigue, stress-assisted hydrogen diffusion, and a toughness degradation formulation with cyclic and hydrogen contributions. Hydrogen-assisted fatigue crack growth predictions exhibit an excellent agreement with experiments over all the scenarios considered, spanning multiple load ratios, H2 pressures and loading frequencies. These are obtained without any calibration with hydrogen-assisted fatigue data, taking as input only mechanical and hydrogen transport material properties, the material's fatigue characteristics (from a single test in air), and the sensitivity of fracture toughness to hydrogen content. Furthermore, the model is used to determine: (i) what are suitable test loading frequencies to obtain conservative data, and (ii) the underestimation made when not pre-charging samples. The model can handle both laboratory specimens and large-scale engineering components, enabling the Virtual Testing paradigm in infrastructure exposed to hydrogen environments and cyclic loading.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Leveraging Large Language Models for Automated Web-Form-Test Generation: An Empirical Study
Authors:
Tao Li,
Chenhui Cui,
Lei Ma,
Dave Towey,
Yujie Xie,
Rubing Huang
Abstract:
The testing of web forms is an essential activity for ensuring the quality of web applications, which mainly involves evaluating the interactions between users and forms. Automated test-case generation remains a challenge for web-form testing: Due to the complex, multi-level structure of web pages, it can be difficult to automatically capture their inherent contextual information for inclusion in…
▽ More
The testing of web forms is an essential activity for ensuring the quality of web applications, which mainly involves evaluating the interactions between users and forms. Automated test-case generation remains a challenge for web-form testing: Due to the complex, multi-level structure of web pages, it can be difficult to automatically capture their inherent contextual information for inclusion in the tests. Large Language Models (LLMs) have great potential for contextual text generation. OpenAI's GPT LLMs have been receiving a lot of attention in software testing, however, they may fail to be applied in practice because of information security concerns. To the best of our knowledge, no comparative study examining different LLMs has yet been reported for web-form-test generation. To address this gap in the literature, we conducted a comprehensive empirical study investigating the effectiveness of 11 LLMs on 146 web forms from 30 open-source Java web applications. According to the experimental results, different LLMs can achieve different testing effectiveness. Notably, the GPT-4, GLM-4, and Baichuan2 LLMs can generate better web-form tests than the others. Compared with GPT-4, other LLMs find it difficult to generate appropriate tests for web forms, resulting in decreased successfully-submitted rates (SSRs, measured by the proportions of the LLMs-generated web-form tests that can be successfully inserted into the web forms and submitted) ranging from 9.10% to 74.15%. Nevertheless, some LLMs achieve higher SSRs than GPT-3.5, indicating a better ability to generate appropriate tests for web forms. Our findings also show that, for all LLMs, when the designed prompts include complete and clear contextual information about the web forms, more effective web-form tests were generated. Finally, we offer some insights for using LLMs to guide automated web-form testing.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds
Authors:
Christopher Z. Cui,
Xiangyu Peng,
Mark O. Riedl
Abstract:
Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a pr…
▽ More
Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Moore Determinant of Dual Quaternion Hermitian Matrices
Authors:
Chunfeng Cui,
Liqun Qi,
Guang**g Song,
Qingwen Wang
Abstract:
In this paper, we extend the Chen and Moore determinants of quaternion Hermitian} matrices to dual quaternion Hermitian matrices. We show the Chen determinant of dual quaternion Hermitian {matrices is invariant under addition, switching, multiplication, and unitary operations at the both hand sides. We then show the Chen and Moore determinants of dual quaternion Hermitian matrices are equal to eac…
▽ More
In this paper, we extend the Chen and Moore determinants of quaternion Hermitian} matrices to dual quaternion Hermitian matrices. We show the Chen determinant of dual quaternion Hermitian {matrices is invariant under addition, switching, multiplication, and unitary operations at the both hand sides. We then show the Chen and Moore determinants of dual quaternion Hermitian matrices are equal to each other, and they are also equal to the products of eigenvalues. The characteristic polynomial of a dual quaternion Hermitian matrix is also studied.
△ Less
Submitted 18 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Non-convex Pose Graph Optimization in SLAM via Proximal Linearized Riemannian ADMM
Authors:
Xin Chen,
Chunfeng Cui,
Deren Han,
Liqun Qi
Abstract:
Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and map** (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and th…
▽ More
Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and map** (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and the projection onto the constraints can be calculated by normalization. Then a proximal linearized Riemannian alternating direction method of multipliers (PieADMM) is developed to solve the proposed model, which not only has low memory requirements, but also can update the poses in parallel. Furthermore, we establish the iteration complexity of $O(1/ε^{2})$ of PieADMM for finding an $ε$-stationary solution of our model. The efficiency of our proposed algorithm is demonstrated by numerical experiments on two synthetic and four 3D SLAM benchmark datasets.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Capabilities of Gemini Models in Medicine
Authors:
Khaled Saab,
Tao Tu,
Wei-Hung Weng,
Ryutaro Tanno,
David Stutz,
Ellery Wulczyn,
Fan Zhang,
Tim Strother,
Chunjong Park,
Elahe Vedadi,
Juanma Zambrano Chaves,
Szu-Yeu Hu,
Mike Schaekermann,
Aishwarya Kamath,
Yong Cheng,
David G. T. Barrett,
Cathy Cheung,
Basil Mustafa,
Anil Palepu,
Daniel McDuff,
Le Hou,
Tomer Golany,
Luyang Liu,
Jean-baptiste Alayrac,
Neil Houlsby
, et al. (42 additional authors not shown)
Abstract:
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G…
▽ More
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering
Authors:
Chenhao Cui,
Yufan Jiang,
Shuangzhi Wu,
Zhoujun Li
Abstract:
Multi-choice Machine Reading Comprehension (MMRC) aims to select the correct answer from a set of options based on a given passage and question. The existing methods employ the pre-trained language model as the encoder, share and transfer knowledge through fine-tuning.These methods mainly focus on the design of exquisite mechanisms to effectively capture the relationships among the triplet of pass…
▽ More
Multi-choice Machine Reading Comprehension (MMRC) aims to select the correct answer from a set of options based on a given passage and question. The existing methods employ the pre-trained language model as the encoder, share and transfer knowledge through fine-tuning.These methods mainly focus on the design of exquisite mechanisms to effectively capture the relationships among the triplet of passage, question and answers. It is non-trivial but ignored to transfer knowledge from other MRC tasks such as SQuAD due to task specific of MMRC.In this paper, we reconstruct multi-choice to single-choice by training a binary classification to distinguish whether a certain answer is correct. Then select the option with the highest confidence score as the final answer. Our proposed method gets rid of the multi-choice framework and can leverage resources of other tasks. We construct our model based on the ALBERT-xxlarge model and evaluate it on the RACE and DREAM datasets. Experimental results show that our model performs better than multi-choice methods. In addition, by transferring knowledge from other kinds of MRC tasks, our model achieves state-of-the-art results in both single and ensemble settings.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness
Authors:
**g Ouyang,
Chengyu Cui,
Kean Ming Tan,
Gongjun Xu
Abstract:
In the era of data explosion, statisticians have been develo** interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wi…
▽ More
In the era of data explosion, statisticians have been develo** interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to develo** efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA).
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a
Authors:
Y. Liu,
H. Sun,
D. Xu,
D. S. Svinkin,
J. Delaunay,
N. R. Tanvir,
H. Gao,
C. Zhang,
Y. Chen,
X. -F. Wu,
B. Zhang,
W. Yuan,
J. An,
G. Bruni,
D. D. Frederiks,
G. Ghirlanda,
J. -W. Hu,
A. Li,
C. -K. Li,
J. -D. Li,
D. B. Malesani,
L. Piro,
G. Raman,
R. Ricci,
E. Troja
, et al. (170 additional authors not shown)
Abstract:
Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,…
▽ More
Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Neural Proto-Language Reconstruction
Authors:
Chenxuan Cui,
Ying Chen,
Qinxin Wang,
David R. Mortensen
Abstract:
Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neu…
▽ More
Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Large Language Models for Mobile GUI Text Input Generation: An Empirical Study
Authors:
Chenhui Cui,
Tao Li,
Junjie Wang,
Chunyang Chen,
Dave Towey,
Rubing Huang
Abstract:
Mobile applications (apps) have become an essential part of our daily lives, making ensuring their quality an important activity. GUI testing, a quality assurance method, has frequently been used for mobile apps. When conducting GUI testing, it is important to generate effective text inputs for the text-input components. Some GUIs require these text inputs to move from one page to the next, which…
▽ More
Mobile applications (apps) have become an essential part of our daily lives, making ensuring their quality an important activity. GUI testing, a quality assurance method, has frequently been used for mobile apps. When conducting GUI testing, it is important to generate effective text inputs for the text-input components. Some GUIs require these text inputs to move from one page to the next, which remains a challenge to achieving complete UI exploration. Recently, Large Language Models (LLMs) have shown excellent text-generation capabilities. Among the LLMs, OpenAI's GPT series has been widely discussed and used. However, it may not be possible to use these LLMs for GUI testing actual mobile apps, due to the security and privacy issues related to the production data. Therefore, it is necessary to explore the potential of different LLMs to guide text-input generation in mobile GUI testing. This paper reports on a large-scale empirical study that extensively investigates the effectiveness of nine state-of-the-art LLMs in Android text-input generation for UI pages. We collected 114 UI pages from 62 open-source Android apps and extracted contextual information from the UI pages to construct prompts for LLMs to generate text inputs. The experimental results show that some LLMs can generate relatively more effective and higher-quality text inputs, achieving a 50.58% to 66.67% page-pass-through rate, and even detecting some real bugs in open-source apps. Compared with the GPT-3.5 and GPT-4 LLMs, other LLMs reduce the page-pass-through rates by 17.97% to 84.79% and 21.93% to 85.53%, respectively. We also found that using more complete UI contextual information can increase the page-pass-through rates of LLMs for generating text inputs. In addition, we also describe six insights gained regarding the use of LLMs for Android testing: These insights will benefit the Android testing community.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Pseudo MIMO (pMIMO): An Energy and Spectral Efficient MIMO-OFDM System
Authors:
Sen Wang,
Tianxiong Wang,
Shulun Zhao,
Zhen Feng,
Guangyi Liu,
Chunfeng Cui,
Chih-Lin I,
Jiangzhou Wang
Abstract:
This article introduces an energy and spectral efficient multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transmission scheme designed for the future sixth generation (6G) wireless communication networks. The approach involves connecting each receiving radio frequency (RF) chain with multiple antenna elements and conducting sample-level adjustments for receivin…
▽ More
This article introduces an energy and spectral efficient multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transmission scheme designed for the future sixth generation (6G) wireless communication networks. The approach involves connecting each receiving radio frequency (RF) chain with multiple antenna elements and conducting sample-level adjustments for receiving beamforming patterns. The proposed system architecture and the dedicated signal processing methods enable the scheme to transmit a bigger number of parallel data streams than the number of receiving RF chains, achieving a spectral efficiency performance close to that of a fully digital (FD) MIMO system with the same number of antenna elements, each equipped with an RF chain. We refer to this system as a ''pseudo MIMO'' system due to its ability to mimic the functionality of additional invisible RF chains. The article begins with introducing the underlying principles of pseudo MIMO and discussing potential hardware architectures for its implementation. We then highlight several advantages of integrating pseudo MIMO into next-generation wireless networks. To demonstrate the superiority of our proposed pseudo MIMO transmission scheme to conventional MIMO systems, simulation results are presented. Additionally, we validate the feasibility of this new scheme by building the first pseudo MIMO prototype. Furthermore, we present some key challenges and outline potential directions for future research.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Authors:
Juanwu Lu,
Can Cui,
Yunsheng Ma,
Aniket Bera,
Ziran Wang
Abstract:
Safety and robustness are crucial factors in develo** trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describe…
▽ More
Safety and robustness are crucial factors in develo** trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describes the distribution of future trajectories for a single moving object. Our approach can distinguish Out-of-Distribution data while quantifying uncertainty and achieving competitive performance compared to state-of-the-art methods on the Argoverse 2 and INTERACTION datasets. Specifically, a 0.446 meters minimum Final Displacement Error, a 0.203 meters minimum Average Displacement Error, and a 5.35% Miss Rate are achieved on the INTERACTION test set. Extensive qualitative and quantitative analysis is also provided to evaluate the proposed model. Our open-source code is available at https://github.com/PurdueDigitalTwin/seneva.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
A Neural Multigrid Solver for Helmholtz Equations with High Wavenumber and Heterogeneous Media
Authors:
Chen Cui,
Kai Jiang,
Shi Shu
Abstract:
Solving high-wavenumber and heterogeneous Helmholtz equations presents a long-standing challenge in scientific computing. In this paper, we introduce a deep learning-enhanced multigrid solver to address this issue. By conducting error analysis on standard multigrid applied to a discrete Helmholtz equation, we devise a strategy to handle errors with different frequencies separately.
For error com…
▽ More
Solving high-wavenumber and heterogeneous Helmholtz equations presents a long-standing challenge in scientific computing. In this paper, we introduce a deep learning-enhanced multigrid solver to address this issue. By conducting error analysis on standard multigrid applied to a discrete Helmholtz equation, we devise a strategy to handle errors with different frequencies separately.
For error components with frequencies distant from the wavenumber, we perform simple smoothing based on local operations at different levels to eliminate them.
On the other hand, to address error components with frequencies near the wavenumber, we utilize another multigrid V-cycle to solve an advection-diffusion-reaction (ADR) equation at a coarse scale.
The resulting solver, named Wave-ADR-NS, involves parameters learned through unsupervised training.
Numerical results demonstrate that Wave-ADR-NS effectively resolves heterogeneous 2D Helmholtz equation with wavenumber up to 2000. Comparative experiments against classical multigrid preconditioners and existing deep learning-based multigrid preconditioners reveals the superior performance of Wave-ADR-NS.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Anomalous shift in Andreev reflection from side incidence
Authors:
Runze Li,
Chaoxi Cui,
Ying Liu,
Zhi-Ming Yu,
Shengyuan A. Yang
Abstract:
Andreev reflection at a normal-superconductor interface may be accompanied with an anomalous spatial shift. The studies so far are limited to the top incidence configuration. Here, we investigate this effect in the side incidence configuration, with the interface parallel to the principal axis of superconductor. We find that the shift exhibits rich behaviors reflecting the character of pair potent…
▽ More
Andreev reflection at a normal-superconductor interface may be accompanied with an anomalous spatial shift. The studies so far are limited to the top incidence configuration. Here, we investigate this effect in the side incidence configuration, with the interface parallel to the principal axis of superconductor. We find that the shift exhibits rich behaviors reflecting the character of pair potential. It has two contributions: one from the $k$-dependent phase of pair potential, and the other from the evanescent mode. For chiral $p$-wave pairing, the pairing phase contribution is proportional to the chirality of pairing and is independent of excitation energy, whereas the evanescent mode contribution is independent of chirality and is nonzero only for excitation energy below the gap. The two contributions also have opposite parity with respect to the incident angle. For $d_{x^{2}-y^{2}}$-wave pairing, only the evanescent mode contribution exists, and the shift exhibits suppressed zones in incident angles, manifesting the superconducting nodes. The dependence of the shift on other factors, such as the angle of incident plane and Fermi surface anisotropy, are discussed.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
Authors:
Wenxuan Song,
Han Zhao,
Pengxiang Ding,
Can Cui,
Shangke Lyu,
Yaning Fan,
Donglin Wang
Abstract:
Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optima…
▽ More
Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optimal data, thus surpassing the limitations of human demonstrations. Thereafter, we employ a transformer-based VLA network to process multi-modal inputs and output actions. By introducing the Mixture-of-Experts structure, GeRM allows faster inference speed with higher whole model capacity, and thus resolves the issue of limited RL parameters, enhancing model performance in multi-task learning while controlling computational costs. Through a series of experiments, we demonstrate that GeRM outperforms other methods across all tasks, while also validating its efficiency in both training and inference processes. Additionally, we uncover its potential to acquire emergent skills. Additionally, we contribute the QUARD-Auto dataset, collected automatically to support our training approach and foster advancements in multi-task quadruped robot learning. This work presents a new paradigm for reducing the cost of collecting robot data and driving progress in the multi-task learning community. You can reach our project and video through the link: https://songwxuan.github.io/GeRM/ .
△ Less
Submitted 9 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Eigenvalues of Dual Hermitian Matrices with Application in Formation Control
Authors:
Liqun Qi,
Chunfeng Cui
Abstract:
We propose a supplement matrix method for computing eigenvalues of a dual Hermitian matrix, and discuss its application in multi-agent formation control. Suppose we have a ring, which can be the real field, the complex field, or the quaternion ring. We study dual number symmetric matrices, dual complex Hermitian matrices and dual quaternion Hermitian matrices in a unified frame of dual Hermitian m…
▽ More
We propose a supplement matrix method for computing eigenvalues of a dual Hermitian matrix, and discuss its application in multi-agent formation control. Suppose we have a ring, which can be the real field, the complex field, or the quaternion ring. We study dual number symmetric matrices, dual complex Hermitian matrices and dual quaternion Hermitian matrices in a unified frame of dual Hermitian matrices. An $n \times n$ dual Hermitian matrix has $n$ dual number eigenvalues. We define determinant, characteristic polynomial and supplement matrices for a dual Hermitian matrix. Supplement matrices are Hermitian matrices in the original ring. The standard parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of the standard part Hermitian matrix in the original ring, while the dual parts of the eigenvalues of that dual Hermitian matrix are the eigenvalues of those supplement matrices. Hence, by applying any practical method for computing eigenvalues of Hermitian matrices in the original ring, we have a practical method for computing eigenvalues of a dual Hermitian matrix. We call this method the supplement matrix method. In multi-agent formation control, a desired relative configuration scheme may be given. People need to know if this scheme is reasonable such that a feasible solution of configurations of these multi-agents exists. By exploring the eigenvalue problem of dual Hermitian matrices, and its link with the unit gain graph theory, we open a cross-disciplinary approach to solve the relative configuration problem. Numerical experiments are reported.
△ Less
Submitted 1 April, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Digitization of Astronomical Photographic Plate of China and Astrometric Measurement of Single-exposure Plates
Authors:
Zheng-Jun Shang,
Yong Yu,
Liang-Liang Wang,
Mei-Ting Yang,
**g Yang,
Shi-Yin Shen,
Min Liu,
Quan-Feng Xu,
Chen-Zhou Cui,
Dong-Wei Fan,
Zheng-Hong Tang,
Jian-Hai Zhao
Abstract:
From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates have been captured. These historical plates play an irreplaceable role in conducting long-term, time-domain ast…
▽ More
From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates have been captured. These historical plates play an irreplaceable role in conducting long-term, time-domain astronomical research. To preserve and explore these valuable original astronomical observational data, Shanghai Astronomical Observatory has organized the transportation of plates taken at night from various stations across the country to the Sheshan Plate Archive for centralized preservation. For the first time, plate information statistics was performed. On this basis, the plates were cleaned and digitally scanned, and finally digitized images were acquired for 29,314 plates. In this study, using Gaia DR2 as the reference star catalog, astrometric processing has been carried out successfully on 15,696 single-exposure plates, including object extraction, stellar identification, and plate model computation. As a result, for long focal length telescopes, such as the 40cm double-tube refractor telescope and the 1.56m reflector telescope at the Shanghai Astronomical Observatory and the 1m reflector telescope at the Yunnan Astronomical Observatory, the astrometric accuracy obtained for their plates is approximately 0.1" to 0.3". The distribution of astrometric accuracy for medium and short focal length telescopes ranges from 0.3" to 1.0". The relevant data of this batch of plates, including digitized images and stellar catalog of the plates are archived and released by the National Astronomical Data Center. Users can access and download plate data based on keywords such as station, telescope, observation year, and observed celestial coordinates.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
MolBind: Multimodal Alignment of Language, Molecules, and Proteins
Authors:
Teng Xiao,
Chao Cui,
Huaisheng Zhu,
Vasant G. Honavar
Abstract:
Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) re…
▽ More
Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) remains challenging due to inherent gaps among them. In this work, we propose MolBind, a framework that trains encoders for multiple modalities through contrastive learning, map** all modalities to a shared feature space for multi-modal semantic alignment. To facilitate effective pre-training of MolBind on multiple modalities, we also build and collect a high-quality dataset with four modalities, MolBind-M4, including graph-language, conformation-language, graph-conformation, and conformation-protein paired data. MolBind shows superior zero-shot learning performance across a wide range of tasks, demonstrating its strong capability of capturing the underlying semantics of multiple modalities.
△ Less
Submitted 2 April, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
Authors:
Can Cui,
Imran Ahamad Sheikh,
Mostafa Sadeghi,
Emmanuel Vincent
Abstract:
Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life app…
▽ More
Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life applications involving Voice Activity Detection (VAD), Speaker Diarization (SD), and SA-ASR. Second, we advocate using VAD output segments to fine-tune the SA-ASR model, considering that it is also applied to VAD segments during test, and show that this results in a relative reduction of Speaker Error Rate (SER) up to 28%. Finally, we explore strategies to enhance the extraction of the speaker embedding templates used as inputs by the SA-ASR system. We show that extracting them from SD output rather than annotated speaker segments results in a relative SER reduction up to 20%.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Mirror real Chern insulator in two and three dimensions
Authors:
Yang Wang,
Chaoxi Cui,
Run-Wu Zhang,
Xiaotian Wang,
Zhi-Ming Yu,
Gui-Bin Liu,
Yugui Yao
Abstract:
A real Chern insulator (RCI) featuring a real Chern number and a second-order boundary mode appears in a two-dimensional (2D) system with the space-time inversion symmetry (PT ). Here, we propose a kind of RCI: mirror real Chern insulator (MRCI) which emerges from the system having additional horizontal mirror symmetry Mz. The MRCI generally is characterized by two independent real Chern numbers,…
▽ More
A real Chern insulator (RCI) featuring a real Chern number and a second-order boundary mode appears in a two-dimensional (2D) system with the space-time inversion symmetry (PT ). Here, we propose a kind of RCI: mirror real Chern insulator (MRCI) which emerges from the system having additional horizontal mirror symmetry Mz. The MRCI generally is characterized by two independent real Chern numbers, respectively defined in the two mirror subsystems of the system. Hence, the MRCI may host the second-order boundary modes different from the conventional RCI. We show that for spinless systems, the definition of the MRCI is straightforward, as PT keeps each mirror subsystem invariant. For the spinful systems with both PT and Mz, the real Chern number for the total system remain well defined, as MzPT = C2zT , and (C2zT )2= 1. However, since C2zT exchanges the two mirror subsystems, the definition of the MRCI in spinful systems requires the help of projective symmetry algebra. We also discuss the MRCIs in 3D systems, where the MRCI is defined on certain mirror-invariant 2D planes. Compared with its 2D counterpart, the 3D MRCI can exhibit more abundant physics when the systems have additional nonsymmorphic operators. Several concrete MRCI models including 2D and 3D, spinless and spinful models are constructed to further demonstrate our ideas.
△ Less
Submitted 6 March, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
Quasi-one-dimensional spin transport in altermagnetic $Z^3$ nodal net metals
Authors:
Tingli He,
Lei Li,
Chaoxi Cui,
Run-Wu Zhang,
Zhi-Ming Yu,
Guodong Liu,
Xiaoming Zhang
Abstract:
In three dimensions, quasi-one-dimensional (Q1D) transport has traditionally been associated with systems featuring a Q1D chain structure. Here, based on first-principle calculations, we go beyond the common belief to show that the Q1D transport can also be realized in many three-dimensional (3D) altermagnetic (AM) metals with a topological nodal net in momentum space but lacking Q1D chain structu…
▽ More
In three dimensions, quasi-one-dimensional (Q1D) transport has traditionally been associated with systems featuring a Q1D chain structure. Here, based on first-principle calculations, we go beyond the common belief to show that the Q1D transport can also be realized in many three-dimensional (3D) altermagnetic (AM) metals with a topological nodal net in momentum space but lacking Q1D chain structure in real space, including the existing compounds $β$-Fe$_2$(PO$_4$)O, Co$_2$(PO$_4$)O, and LiTi$_2$O$_4$. These materials exhibit an AM ground state and feature an ideal crossed $Z^3$ Weyl nodal line in each spin channel, formed by three straight and flat nodal lines traversing the entire Brillouin zone. These nodal lines eventually lead to an AM $Z^3$ nodal net. Surprisingly, longitudinal conductivity $σ_{xx}$ in these topological nodal net metals is dozens of times larger than $σ_{yy}$ and $σ_{zz}$ in the up-spin channel, while $σ_{yy}$ dominates transport in the down-spin channel. This suggests a distinctive Q1D transport signature in each spin channel, with orthogonal principal moving directions for the two spin channels, resulting in Q1D direction-dependent spin transport. This novel phenomenon cannot be found in both conventional 3D bulk materials and Q1D chain materials. In particular, it gradually disappears as the Fermi level moves away from the nodal net, further confirming its topological origin. Our work not only enhances the comprehension of topological physics in altermagnets but also opens a new direction for the exploration of topological spintronics.
△ Less
Submitted 3 April, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Authors:
Ruining Deng,
Quan Liu,
Can Cui,
Tianyuan Yao,
Jialin Yue,
Juming Xiong,
Lining Yu,
Yifei Wu,
Mengmeng Yin,
Yu Wang,
Shilin Zhao,
Yucheng Tang,
Haichun Yang,
Yuankai Huo
Abstract:
Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica…
▽ More
Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intricate spatial interrelations among objects from clinical knowledge. In this research, we introduce a novel universal proposition learning approach, called panoramic renal pathology segmentation (PrPSeg), designed to segment comprehensively panoramic structures within kidney by integrating extensive knowledge of kidney anatomy.
In this paper, we propose (1) the design of a comprehensive universal proposition matrix for renal pathology, facilitating the incorporation of classification and spatial relationships into the segmentation process; (2) a token-based dynamic head single network architecture, with the improvement of the partial label image segmentation and capability for future data enlargement; and (3) an anatomy loss function, quantifying the inter-object relationships across the kidney.
△ Less
Submitted 20 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Inertial Accelerated Stochastic Mirror Descent for Large-Scale Generalized Tensor CP Decomposition
Authors:
Zehui Liu,
Qingsong Wang,
Chunfeng Cui,
Yong Xia
Abstract:
The majority of classic tensor CP decomposition models are designed for squared loss, employing Euclidean distance as a local proximal term. However, the Euclidean distance is unsuitable for the generalized loss function applicable to various types of real-world data, such as integer and binary data. Consequently, algorithms developed under the squared loss are not easily adaptable to handle these…
▽ More
The majority of classic tensor CP decomposition models are designed for squared loss, employing Euclidean distance as a local proximal term. However, the Euclidean distance is unsuitable for the generalized loss function applicable to various types of real-world data, such as integer and binary data. Consequently, algorithms developed under the squared loss are not easily adaptable to handle these generalized losses, partially due to the lack of the gradient Lipschitz continuity. This paper considers the generalized tensor CP decomposition. We use the Bregman distance as the proximal term and propose an inertial accelerated block randomized stochastic mirror descent algorithm (iTableSMD). Within a broader multi-block variance reduction and inertial acceleration framework, we demonstrate the sublinear convergence rate for the subsequential sequence produced by the iTableSMD algorithm. We further show that iTableSMD requires at most O(ε^{-2}) iterations in expectation to attain an ε-stationary point and establish the global convergence of the sequence. Numerical experiments on real datasets demonstrate that our proposed algorithm is efficient and achieve better performance than the existing state-of-the-art methods.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Spectral Properties of Dual Unit Gain Graphs
Authors:
Chunfeng Cui,
Yong Lu,
Liqun Qi,
Ligong Wang
Abstract:
In this paper, we study dual quaternion and dual complex unit gain graphs and their spectral properties in a unified frame of dual unit gain graphs. Unit dual quaternions represent rigid movements in the 3D space, and have wide applications in robotics and computer graphics. Dual complex numbers found application in brain science recently. We establish the interlacing theorem for dual unit gain gr…
▽ More
In this paper, we study dual quaternion and dual complex unit gain graphs and their spectral properties in a unified frame of dual unit gain graphs. Unit dual quaternions represent rigid movements in the 3D space, and have wide applications in robotics and computer graphics. Dual complex numbers found application in brain science recently. We establish the interlacing theorem for dual unit gain graphs, and show that the spectral radius of a dual unit gain graph is always not greater than the spectral radius of the underlying graph, and these two radii are equal if and only if the dual gain graph is balanced. By using the dual cosine functions, we establish the closed form of eigenvalues of adjacency and Laplacian matrices of dual complex and quaternion unit gain cycles. We then show the coefficient theorem holds for dual unit gain graphs. Similar results hold for the spectral radius of the Laplacian matrix of the dual unit gain graph too.
△ Less
Submitted 7 May, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Global Data in Astronomy: Challenges and Opportunities
Authors:
Renee Hložek,
Chenzhou Cui,
Mark Allen,
Patricia Whitelock,
Jess McIver,
Giuseppe Longo,
Christopher Fluke,
Ajit Kembhavi,
Pranav Sharma,
Ashish Mahabal
Abstract:
Policy Brief on "Global Data in Astronomy: Challenges and Opportunities", distilled from the corresponding panel that was part of the discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023.
Astronomy is increasingly becoming a data-driven science. Advances in our understanding of the physical mechanisms at work in the Universe require building…
▽ More
Policy Brief on "Global Data in Astronomy: Challenges and Opportunities", distilled from the corresponding panel that was part of the discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023.
Astronomy is increasingly becoming a data-driven science. Advances in our understanding of the physical mechanisms at work in the Universe require building ever-more sensitive telescopes to gather observations of the cosmos to test and advance our theoretical models of how the universe works. To confront the observed data with our theoretical models we require data hosting, archiving and storage and high-performance computing resources to run the theoretical calculations and compare our simulated and observed universe. We also require the sophisticated development of highly skilled human resources. Newer large projects are often run through international collaborations and partnerships, driving a need for 'open science' and collaborative structure across national boundaries. While astronomical data are useful scientifically, the data do not come with the same ethical/privacy-related restrictions as medical/biological data. Moreover, the ability to use data for new scientific analysis extends and expands the impact and reach of scientific surveys -- this is a strength that national funding agencies should capitalize on. We discuss the management and analysis of such large volumes of data and the corresponding significant challenges that require policy-level preparations.
The policy webinar took place during the G20 presidency in India (2023). A summary based on the seven panels can be found here: arxiv:2401.04623.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Authors:
Yiyang Zhou,
Chenhang Cui,
Rafael Rafailov,
Chelsea Finn,
Huaxiu Yao
Abstract:
Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and…
▽ More
Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and can cause the model to hallucinate - provide answers that do not accurately reflect the image, even when the core LLM is highly factual and the vision backbone has sufficiently complete representations. In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning. Specifically, we propose POVID to generate feedback data with AI models. We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data. First, we prompt GPT-4V to inject plausible hallucinations into the correct answer. Second, we distort the image to trigger the inherent hallucination behavior of the VLLM. This is an automated approach, which does not rely on human data generation or require a perfect expert, which makes it easily scalable. Finally, both of these generation strategies are integrated into an RLHF pipeline via Direct Preference Optimization. In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches. Our data and code are available at https://github.com/YiyangZhou/POVID.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Dynamics of cold circumstellar gas in debris disks
Authors:
Can Cui,
Sebastian Marino,
Quentin Kral,
Henrik Latter
Abstract:
Mounting observational evidence indicates that cold circumstellar gas is present in debris disk systems. This work focuses on various dynamical processes that debris-disk gas may undergo. We review five mechanisms that can transport angular momentum and their applications to debris disks. These include molecular viscosity, hydrodynamic turbulence, magnetohydrodynamic turbulence, magnetized disk wi…
▽ More
Mounting observational evidence indicates that cold circumstellar gas is present in debris disk systems. This work focuses on various dynamical processes that debris-disk gas may undergo. We review five mechanisms that can transport angular momentum and their applications to debris disks. These include molecular viscosity, hydrodynamic turbulence, magnetohydrodynamic turbulence, magnetized disk winds, and laminar magnetic stress. We find that molecular viscosity can result in $α$ as high as $\lesssim 0.1$ for sufficiently low densities, while the Rossby wave instability is a possible source of hydrodynamic turbulence and structure formation. We argue that the vertical shear instability is unlikely due to the long cooling times. The onset of the magnetorotational instability (MRI) is dichotomous: for low density disks the MRI can be excited at the midplane, while for high mass disks it may only be operating at $z>2-3H$, if at all. The MHD wind and laminar magnetic stress mechanisms rely on the configuration and strength of any background large-scale magnetic field, the existence of which is uncertain and possibly unlikely. We conclude that the dominant mechanism and its efficiency in transporting angular momentum varies from one system to the other, depending especially closely on the gas density. More detailed analyses shall be performed in the future focusing on representative, nearby debris disks.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
A Momentum Accelerated Algorithm for ReLU-based Nonlinear Matrix Decomposition
Authors:
Qingsong Wang,
Chunfeng Cui,
Deren Han
Abstract:
Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD),…
▽ More
Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD), we propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. Subsequently, we introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. A distinctive feature, setting our work apart from most existing studies, is the incorporation of both positive and negative momentum parameters in our algorithm. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm. Moreover, the code is available at https://github.com/nothing2wang/NMD-TM.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Sliding ferroelectric memories and synapses
Authors:
Xiuzhen Li,
Biao Qin,
Yaxian Wang,
Yue Xi,
Zhiheng Huang,
Mengze Zhao,
Yalin Peng,
Zitao Chen,
Zitian Pan,
Jundong Zhu,
Chenyang Cui,
Rong Yang,
Wei Yang,
Sheng Meng,
Dongxia Shi,
Xuedong Bai,
Can Liu,
Na Li,
Jianshi Tang,
Kaihui Liu,
Luojun Du,
Guangyu Zhang
Abstract:
Ferroelectric materials with switchable electric polarization hold great promise for a plethora of emergent applications, such as post-Moore's law nanoelectronics, beyond-Boltzmann transistors, non-volatile memories, and above-bandgap photovoltaic devices. Recent advances have uncovered an exotic sliding ferroelectric mechanism, which endows to design atomically thin ferroelectrics from non-ferroe…
▽ More
Ferroelectric materials with switchable electric polarization hold great promise for a plethora of emergent applications, such as post-Moore's law nanoelectronics, beyond-Boltzmann transistors, non-volatile memories, and above-bandgap photovoltaic devices. Recent advances have uncovered an exotic sliding ferroelectric mechanism, which endows to design atomically thin ferroelectrics from non-ferroelectric parent monolayers. Although notable progress has been witnessed in understanding its fundamental properties, functional devices based on sliding ferroelectrics, the key touchstone toward applications, remain elusive. Here, we demonstrate the rewritable, non-volatile memory devices at room-temperature utilizing a two-dimensional (2D) sliding ferroelectric semiconductor of rhombohedral-stacked bilayer molybdenum disulfide. The 2D sliding ferroelectric memories (SFeMs) show superior performances with a large memory window of >8V, a high conductance ratio of above 106, a long retention time of >10 years, and a programming endurance greater than 104 cycles. Remarkably, flexible SFeMs are achieved with state-of-the-art performances competitive to their rigid counterparts and maintain their performances post bending over 103 cycles. Furthermore, synapse-specific Hebbian forms of plasticity and image recognition with a high accuracy of 97.81% are demonstrated based on flexible SFeMs. Our work demonstrates the sliding ferroelectric memories and synaptic plasticity on both rigid and flexible substrates, highlighting the great potential of sliding ferroelectrics for emerging technological applications in brain-inspired in-memory computing, edge intelligence and energy-efficient wearable electronics.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Variational Estimation for Multidimensional Generalized Partial Credit Model
Authors:
Chengyu Cui,
Chun Wang,
Gongjun Xu
Abstract:
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational e…
▽ More
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model (MGPCM). The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.