-
How We Built Cedar: A Verification-Guided Approach
Authors:
Craig Disselkoen,
Aaron Eline,
Shaobo He,
Kyle Headley,
Michael Hicks,
Kesha Hietala,
John Kastner,
Anwar Mamat,
Matt McCutchen,
Neha Rungta,
Bhakti Shah,
Emina Torlak,
Andrew Wells
Abstract:
This paper presents verification-guided development (VGD), a software engineering process we used to build Cedar, a new policy language for expressive, fast, safe, and analyzable authorization. Develo** a system with VGD involves writing an executable model of the system and mechanically proving properties about the model; writing production code for the system and using differential random test…
▽ More
This paper presents verification-guided development (VGD), a software engineering process we used to build Cedar, a new policy language for expressive, fast, safe, and analyzable authorization. Develo** a system with VGD involves writing an executable model of the system and mechanically proving properties about the model; writing production code for the system and using differential random testing (DRT) to check that the production code matches the model; and using property-based testing (PBT) to check properties of unmodeled parts of the production code. Using VGD for Cedar, we can build fast, idiomatic production code, prove our model correct, and find and fix subtle implementation bugs that evade code reviews and unit testing. While carrying out proofs, we found and fixed 4 bugs in Cedar's policy validator, and DRT and PBT helped us find and fix 21 additional bugs in various parts of Cedar.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Exploiting Dependency-Aware Priority Adjustment for Mixed-Criticality TSN Flow Scheduling
Authors:
Miao Guo,
Yifei Sun,
Chaojie Gu,
Shibo He,
Zhiguo Shi
Abstract:
Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads…
▽ More
Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads to varying delays due to different sha** mechanisms applied to different flow types. Leveraging this insight, we introduce a new scheduling method in mixed-criticality TSN that incorporates a priority adjustment scheme among diverse flow types to mitigate queuing delays and enhance schedulability. Specifically, we propose dependency-aware priority adjustment algorithms tailored to different link-overlap** conditions. Experiments in various settings validate the effectiveness of the proposed method, which enhances the schedulability by 20.57% compared with the SOTA method.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A Differentiable Approach to Multi-scale Brain Modeling
Authors:
Chaoming Wang,
Muyang Lyu,
Tianqiu Zhang,
Sichao He,
Si Wu
Abstract:
We present a multi-scale differentiable brain modeling workflow utilizing BrainPy, a unique differentiable brain simulator that combines accurate brain simulation with powerful gradient-based optimization. We leverage this capability of BrainPy across different brain scales. At the single-neuron level, we implement differentiable neuron models and employ gradient methods to optimize their fit to e…
▽ More
We present a multi-scale differentiable brain modeling workflow utilizing BrainPy, a unique differentiable brain simulator that combines accurate brain simulation with powerful gradient-based optimization. We leverage this capability of BrainPy across different brain scales. At the single-neuron level, we implement differentiable neuron models and employ gradient methods to optimize their fit to electrophysiological data. On the network level, we incorporate connectomic data to construct biologically constrained network models. Finally, to replicate animal behavior, we train these models on cognitive tasks using gradient-based learning rules. Experiments demonstrate that our approach achieves superior performance and speed in fitting generalized leaky integrate-and-fire and Hodgkin-Huxley single neuron models. Additionally, training a biologically-informed network of excitatory and inhibitory spiking neurons on working memory tasks successfully replicates observed neural activity and synaptic weight distributions. Overall, our differentiable multi-scale simulation approach offers a promising tool to bridge neuroscience data across electrophysiological, anatomical, and behavioral scales.
△ Less
Submitted 1 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis
Authors:
Yuxiang Hu,
Haowei Yang,
Ting Xu,
Shuyao He,
Jiajie Yuan,
Haozhang Deng
Abstract:
The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a…
▽ More
The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is added to the network for processing. The brain glioma MRI image dataset provided by cancer imaging archives was experimentally verified. A multi-scale segmentation method based on a weighted least squares filter was used to complete the 3D reconstruction of brain tumors. Thus, the accuracy of three-dimensional reconstruction is further improved. Experiments show that the local texture features obtained by the proposed algorithm are similar to those obtained by laser scanning. The algorithm is improved by using the U-Net method and an accuracy of 0.9851 is obtained. This approach significantly enhances the precision of image segmentation and boosts the efficiency of image classification.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints
Authors:
Ran Song,
Shizhu He,
Shengxiang Gao,
Li Cai,
Kang Liu,
Zhengtao Yu,
Jun Zhao
Abstract:
Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages…
▽ More
Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities, while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model
Authors:
Fei Xia,
Yixuan Weng,
Shizhu He,
Kang Liu,
Jun Zhao
Abstract:
Taxonomies, which organize domain concepts into hierarchical structures, are crucial for building knowledge systems and downstream applications. As domain knowledge evolves, taxonomies need to be continuously updated to include new concepts. Previous approaches have mainly focused on adding concepts to the leaf nodes of the existing hierarchical tree, which does not fully utilize the taxonomy's kn…
▽ More
Taxonomies, which organize domain concepts into hierarchical structures, are crucial for building knowledge systems and downstream applications. As domain knowledge evolves, taxonomies need to be continuously updated to include new concepts. Previous approaches have mainly focused on adding concepts to the leaf nodes of the existing hierarchical tree, which does not fully utilize the taxonomy's knowledge and is unable to update the original taxonomy structure (usually involving non-leaf nodes). In this paper, we propose a two-stage method called ATTEMPT for taxonomy completion. Our method inserts new concepts into the correct position by finding a parent node and labeling child nodes. Specifically, by combining local nodes with prompts to generate natural sentences, we take advantage of pre-trained language models for hypernym/hyponymy recognition. Experimental results on two public datasets (including six domains) show that ATTEMPT performs best on both taxonomy completion and extension tasks, surpassing existing methods.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Authors:
Henghui Ding,
Chang Liu,
Yunchao Wei,
Nikhila Ravi,
Shuting He,
Song Bai,
Philip Torr,
Deshui Miao,
Xin Li,
Zhenyu He,
Yaowei Wang,
Ming-Hsuan Yang,
Zhensong Xu,
Jiangtao Yao,
Cheng**g Wu,
Ting Liu,
Luoqi Liu,
Xinyu Liu,
**g Zhang,
Kexin Zhang,
Yuting Yang,
Licheng Jiao,
Shuyuan Yang,
Mingqi Gao,
**gnan Luo
, et al. (12 additional authors not shown)
Abstract:
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as…
▽ More
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Foliation of area minimizing hypersurfaces in asymptotically flat manifolds and Schoen's conjecture
Authors:
Shihang He,
Yuguang Shi,
Haobin Yu
Abstract:
In this paper, we demonstrate that any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$ can be foliated by a family of area-minimizing hypersurfaces, each of which is asymptotic to Cartesian coordinate hyperplanes defined at an end of $(M^n, g)$. As an application of this foliation, we show that for any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$, nonnegative scalar cu…
▽ More
In this paper, we demonstrate that any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$ can be foliated by a family of area-minimizing hypersurfaces, each of which is asymptotic to Cartesian coordinate hyperplanes defined at an end of $(M^n, g)$. As an application of this foliation, we show that for any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$, nonnegative scalar curvature and positive mass, the solution of free boundary problem for area-minimizing hypersurface in coordinate cylinder $C_{R_i}$ in $(M^n, g)$ either does not exist or drifts to infinity of $(M^n, g)$ as $R_i$ tends to infinity. Additionally, we introduce a concept of globally minimizing hypersurface in $(M^n, g)$, and verify a version of the Schoen Conjecture.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
What Matters in Transformers? Not All Attention is Needed
Authors:
Shwai He,
Guoheng Sun,
Zheyu Shen,
Ang Li
Abstract:
Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, this scaling also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different structures, such as MLP and Attention layers, is under-explored. In this work, we in…
▽ More
Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, this scaling also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different structures, such as MLP and Attention layers, is under-explored. In this work, we investigate the varying redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. This metric operates on the premise that redundant structures produce outputs highly similar to their inputs. Surprisingly, while attention layers are essential for transformers and distinguish them from other mainstream architectures, we found that a large proportion of attention layers exhibit excessively high similarity and can be safely pruned without degrading performance, leading to reduced memory and computation costs. Additionally, we further propose a method that jointly drops Attention and MLP layers, achieving improved performance and drop** ratios. Extensive experiments demonstrate the effectiveness of our methods, e.g., Llama-3-70B maintains comparable performance even after pruning half of the attention layers. Our findings provide valuable insights for future network architecture design. The code will be released at: \url{https://github.com/Shwai-He/LLM-Drop}.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
QCD Phase Diagram at finite Magnetic Field and Chemical Potential: A Holographic Approach Using Machine Learning
Authors:
Rong-Gen Cai,
Song He,
Li Li,
Hong-An Zeng
Abstract:
By leveraging neural networks, we address the inverse problem of constructing a quantitative 2+1-flavor holographic QCD model based on state-of-the-art lattice QCD data. Our model demonstrates quantitative agreement with the latest lattice QCD results. We construct the full phase diagram at finite magnetic field $B$, baryon chemical potential $μ_B$ and temperature $T$. We uncover rich phase struct…
▽ More
By leveraging neural networks, we address the inverse problem of constructing a quantitative 2+1-flavor holographic QCD model based on state-of-the-art lattice QCD data. Our model demonstrates quantitative agreement with the latest lattice QCD results. We construct the full phase diagram at finite magnetic field $B$, baryon chemical potential $μ_B$ and temperature $T$. We uncover rich phase structure with a first-order phase transition surface and a critical endpoint line within the 3-dimensional phase diagram. The critical endpoint at vanishing chemical potential aligns with current speculations in the lattice QCD literature. In particular, for large magnetic field, we find two critical endpoints in the $T$-$μ_B$ plane. The critical exponents of the critical endpoints adhere to scaling relations and depend on the background magnetic field. Moreover, they are exhibit deviations from mean-field theory, highlighting the distinctive features of our holographic approach.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
Authors:
Huanxuan Liao,
Yao Xu,
Shizhu He,
Yuanzhe Zhang,
Yanchao Hao,
Sheng** Liu,
Kang Liu,
Jun Zhao
Abstract:
Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills…
▽ More
Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills and complete tasks not merely through repeated practice but also by understanding and following instructional guidelines. This paper is dedicated to simulating human learning to address the shortcomings of instance training, focusing on instruction learning to enhance cross-task generalization. Within this context, we introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model in a parameter generation manner based on the given task instructions without retraining for unseen tasks. Specifically, we utilize knowledge distillation to enhance the consistency between TAGI developed through Learning with Instruction and task-specific models developed through Training with Instance, by aligning the labels, output logits, and adapter parameters between them. TAGI is endowed with cross-task generalization capabilities through a two-stage training process that includes hypernetwork pretraining and finetuning. We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Adaptive Uncertainty Quantification for Trajectory Prediction Under Distributional Shift
Authors:
Huiqun Huang,
Sihong He,
Fei Miao
Abstract:
Trajectory prediction models that can infer both finite future trajectories and their associated uncertainties of the target vehicles in an online setting (e.g., real-world application scenarios) is crucial for ensuring the safe and robust navigation and path planning of autonomous vehicle motion. However, the majority of existing trajectory prediction models have neither considered reducing the u…
▽ More
Trajectory prediction models that can infer both finite future trajectories and their associated uncertainties of the target vehicles in an online setting (e.g., real-world application scenarios) is crucial for ensuring the safe and robust navigation and path planning of autonomous vehicle motion. However, the majority of existing trajectory prediction models have neither considered reducing the uncertainty as one objective during the training stage nor provided reliable uncertainty quantification during inference stage under potential distribution shift. Therefore, in this paper, we propose the Conformal Uncertainty Quantification under Distribution Shift framework, CUQDS, to quantify the uncertainty of the predicted trajectories of existing trajectory prediction models under potential data distribution shift, while considering improving the prediction accuracy of the models and reducing the estimated uncertainty during the training stage. Specifically, CUQDS includes 1) a learning-based Gaussian process regression module that models the output distribution of the base model (any existing trajectory prediction or time series forecasting neural networks) and reduces the estimated uncertainty by additional loss term, and 2) a statistical-based Conformal P control module to calibrate the estimated uncertainty from the Gaussian process regression module in an online setting under potential distribution shift between training and testing data.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Optimizing Large Model Training through Overlapped Activation Recomputation
Authors:
** Chen,
Wenjie Zhang,
Shuibing He,
Yingjie Gu,
Zhuwei Peng,
Kexin Huang,
Xuan Zhan,
Weijian Chen,
Yi Zheng,
Zhefeng Wang,
Yanlong Yin,
Gang Chen
Abstract:
Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a ne…
▽ More
Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a new recomputation framework, Lynx, to reduce the overhead by overlap** the recomputation with communication occurring in training pipelines. It consists of an optimal scheduling algorithm (OPT) and a heuristic-based scheduling algorithm (HEU). OPT achieves a global optimum but suffers from a long search time. HEU was designed based on our observation that there are identical structures in large DNN models so that we can apply the same scheduling policy to all identical structures. HEU achieves a local optimum but reduces the search time by 99% compared to OPT. Our comprehensive evaluation using GPT models with 1.3B-20B parameters shows that both OPT and HEU outperform the state-of-the-art recomputation approaches (e.g., Megatron-LM and Checkmake) by 1.02-1.53x. HEU achieves a similar performance as OPT with a search time of 0.16s on average.
△ Less
Submitted 27 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Supergluon scattering in AdS: constructibility, spinning amplitudes, and new structures
Authors:
Qu Cao,
Song He,
Xiang Li,
Yichao Tang
Abstract:
We elaborate on a new recursive method proposed in arXiv:2312.15484 for computing tree-level $n$-point supergluon amplitudes as well as those with one gluon, i.e., spinning amplitudes, in ${\rm AdS}_5 \times S^3$. We present an improved proof for the so-called "constructibility" of supergluon and spinning amplitudes based on their factorizations and flat-space limit, which allows us to determine t…
▽ More
We elaborate on a new recursive method proposed in arXiv:2312.15484 for computing tree-level $n$-point supergluon amplitudes as well as those with one gluon, i.e., spinning amplitudes, in ${\rm AdS}_5 \times S^3$. We present an improved proof for the so-called "constructibility" of supergluon and spinning amplitudes based on their factorizations and flat-space limit, which allows us to determine these amplitudes in Mellin space to all $n$. We present explicit and remarkably simple expressions for up to $n=7$ supergluon amplitudes and $n=6$ spinning amplitudes, which can be viewed as AdS generalizations of the scalar-scaffolded gluon amplitudes proposed recently. We then reveal a series of hidden structures of these AdS amplitudes including (1) an understanding of general pole structures especially the precise truncation on descendent poles (2) a derivation of simple "Feynman rules" for the all-$n$ amplitudes with the simplest R-symmetry structures, and (3) certain universal behavior analogous to the soft/collinear limit of flat-space amplitudes.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Massive 1D Dirac Line, Solitons and Reversible Manipulation on the Surface of a Prototype Obstructed Atomic Insulator, Silicon
Authors:
Zhongkai Liu,
Peng Deng,
Yuanfeng Xu,
Haifeng Yang,
Ding Pei,
Cheng Chen,
Shanmei He,
Defa Liu,
Sung-Kwan Mo,
Timur Kim,
Cephise Cacho,
Hong Yao,
Zhi-Da Song,
Xi Chen,
Zhong Wang,
Binghai Yan,
Lexian Yang,
Bogdan A. Bernevig,
Yulin Chen
Abstract:
Topologically trivial insulators can be classified into atomic insulators (AIs) and obstructed atomic insulators (OAIs) depending on whether the Wannier charge centers are localized or not at spatial positions occupied by atoms. An OAI can possess unusual properties such as surface states along certain crystalline surfaces, which advantageously appear in materials with much larger bulk energy gap…
▽ More
Topologically trivial insulators can be classified into atomic insulators (AIs) and obstructed atomic insulators (OAIs) depending on whether the Wannier charge centers are localized or not at spatial positions occupied by atoms. An OAI can possess unusual properties such as surface states along certain crystalline surfaces, which advantageously appear in materials with much larger bulk energy gap than topological insulators, making them more attractive for potential applications. In this work, we show that a well-known crystal, silicon (Si) is a model OAI, which naturally explains some of Si's unusual properties such as its famous (111) surface states. On this surface, using angle resolved photoemission spectroscopy (ARPES), we reveal sharp quasi-1D massive Dirac line dispersions; we also observe, using scanning tunneling microscopy/spectroscopy (STM/STS), topological solitons at the interface of the two atomic chains. Remarkably, we show that the different chain domains can be reversibly switched at the nanometer scale, suggesting the application potential in ultra-high density storage devices.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Wearable Device-Based Real-Time Monitoring of Physiological Signals: Evaluating Cognitive Load Across Different Tasks
Authors:
Ling He,
Yanxin Chen,
Wenqi Wang,
Shuting He,
Xiaoqiang Hu
Abstract:
This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution (1-second interval) cognitive load assessment on electroencephalogram (EEG) data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students. By jointly analyzing these two critical physiological indicators, the research delves into their application va…
▽ More
This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution (1-second interval) cognitive load assessment on electroencephalogram (EEG) data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students. By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among secondary vocational students and their utility across various tasks. The study designed two experiments to validate the efficacy of the proposed approach: Initially, a random forest classification model, developed using the N-BACK task, enabled the precise decoding of physiological signal characteristics in secondary vocational students under different levels of cognitive load, achieving a classification accuracy of 97%. Subsequently, this classification model was applied in a cross-task experiment involving the National Computer Rank Examination (Level-1), demonstrating the method's significant applicability and cross-task transferability in diverse learning contexts. Conducted with high portability, this research holds substantial theoretical and practical significance for optimizing teaching resource allocation in secondary vocational education, as well as for cognitive load assessment methods and monitoring. Currently, the research findings are undergoing trial implementation in the school.
△ Less
Submitted 3 July, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Weighted KL-Divergence for Document Ranking Model Refinement
Authors:
Yingrui Yang,
Yifan Qiao,
Shanxiu He,
Tao Yang
Abstract:
Transformer-based retrieval and reranking models for text document search are often refined through knowledge distillation together with contrastive learning. A tight distribution matching between the teacher and student models can be hard as over-calibration may degrade training effectiveness when a teacher does not perform well. This paper contrastively reweights KL divergence terms to prioritiz…
▽ More
Transformer-based retrieval and reranking models for text document search are often refined through knowledge distillation together with contrastive learning. A tight distribution matching between the teacher and student models can be hard as over-calibration may degrade training effectiveness when a teacher does not perform well. This paper contrastively reweights KL divergence terms to prioritize the alignment between a student and a teacher model for proper separation of positive and negative documents. This paper analyzes and evaluates the proposed loss function on the MS MARCO and BEIR datasets to demonstrate its effectiveness in improving the relevance of tested student models.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Authors:
Chengyuan Deng,
Yiqun Duan,
Xin **,
Heng Chang,
Yijun Tian,
Han Liu,
Henry Peng Zou,
Yiqiao **,
Yijia Xiao,
Yichen Wang,
Shenghao Wu,
Zongxing Xie,
Kuofeng Gao,
Sihong He,
Jun Zhuang,
Lu Cheng,
Haohan Wang
Abstract:
Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an…
▽ More
Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Holographic stress tensor correlators on higher genus Riemann surfaces
Authors:
Song He,
Yun-ze Li,
Yunfei Xie
Abstract:
In this work, we present a comprehensive study of holographic stress tensor correlators on general Riemann surfaces, extending beyond the previously well-studied torus cases to explore higher genus conformal field theories (CFTs) within the framework of the Anti-de Sitter/conformal field theory (AdS/CFT) correspondence. We develop a methodological approach to compute holographic stress tensor corr…
▽ More
In this work, we present a comprehensive study of holographic stress tensor correlators on general Riemann surfaces, extending beyond the previously well-studied torus cases to explore higher genus conformal field theories (CFTs) within the framework of the Anti-de Sitter/conformal field theory (AdS/CFT) correspondence. We develop a methodological approach to compute holographic stress tensor correlators, employing the Schottky uniformization technique to address the handlebody solutions for higher genus Riemann surfaces. Through rigorous calculations, we derive four-point stress tensor correlators, alongside recurrence relations for higher-point correlators, within the $\mathrm{AdS}_3/\mathrm{CFT}_2$ context. Additionally, our research delves into the holography of cutoff $\mathrm{AdS}_3$ spaces, offering novel insights into the lower-point correlators of the $T\bar{T}$-deformed theories on higher genus Riemann surfaces up to the first deformation order.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
On universal splittings of tree-level particle and string scattering amplitudes
Authors:
Qu Cao,
** Dong,
Song He,
Canxin Shi,
Fanky Zhu
Abstract:
In this paper, we study the newly discovered universal splitting behavior for tree-level scattering amplitudes of particles and strings~\cite{Cao:2024gln}: when a set of Mandelstam variables (and Lorentz products involving polarizations for gluons/gravitons) vanish, the $n$-point amplitude factorizes as the product of two lower-point {\it currents} with $n{+}3$ external legs in total. We refer to…
▽ More
In this paper, we study the newly discovered universal splitting behavior for tree-level scattering amplitudes of particles and strings~\cite{Cao:2024gln}: when a set of Mandelstam variables (and Lorentz products involving polarizations for gluons/gravitons) vanish, the $n$-point amplitude factorizes as the product of two lower-point {\it currents} with $n{+}3$ external legs in total. We refer to any such subspace of the kinematic space of $n$ massless momenta as ``2-split kinematics", where the scattering potential for string amplitudes and the corresponding scattering equations for particle amplitudes nicely split into two parts. Based on these, we provide a systematic and detailed study of the splitting behavior for essentially all ingredients which appear as integrands for open- and closed-string amplitudes as well as Cachazo-He-Yuan (CHY) formulas, including Parke-Taylor factors, correlators in superstring and bosonic string theories, and CHY integrands for a variety of amplitudes of scalars, gluons and gravitons. These results then immediately lead to the splitting behavior of string and particle amplitudes in a wide range of theories, including bi-adjoint $φ^3$ (with string extension known as $Z$ and $J$ integrals), non-linear sigma model, Dirac-Born-Infeld, the special Galileon, \textit{etc.}, as well as Yang-Mills and Einstein gravity (with bosonic and superstring extensions). Our results imply and extend some other factorization behavior of tree amplitudes considered recently, including smooth splittings~\cite{Cachazo:2021wsz} and factorizations near zeros~\cite{Arkani-Hamed:2023swr}, to all these theories. A special case of splitting also yields soft theorems for gluons/gravitons as well as analogous soft behavior for Goldstone particles near their Adler zeros.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Enhancing the Resilience of Graph Neural Networks to Topological Perturbations in Sparse Graphs
Authors:
Shuqi He,
Jun Zhuang,
Ding Wang,
Luyao Peng,
Jun Song
Abstract:
Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian labe…
▽ More
Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian label transitions and topology-based label sampling to strengthen the robustness of GNNs. However, GraphSS is hindered by slow convergence, while LlnDT faces challenges in sparse graphs. To overcome these limitations, we propose a novel label inference framework, TraTopo, which combines topology-driven label propagation, Bayesian label transitions, and link analysis via random walks. TraTopo significantly surpasses its predecessors on sparse graphs by utilizing random walk sampling, specifically targeting isolated nodes for link prediction, thus enhancing its effectiveness in topological sampling contexts. Additionally, TraTopo employs a shortest-path strategy to refine link prediction, thereby reducing predictive overhead and improving label inference accuracy. Empirical evaluations highlight TraTopo's superiority in node classification, significantly exceeding contemporary GCN models in accuracy.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Loki: Low-Rank Keys for Efficient Sparse Attention
Authors:
Prajwal Singhania,
Siddharth Singh,
Shwai He,
Soheil Feizi,
Abhinav Bhatele
Abstract:
Inference on large language models can be expensive in terms of the compute and memory costs involved, especially when long sequence lengths are used. In particular, the self-attention mechanism used in such models contributes significantly to these costs, which has resulted in several recent works that propose sparse attention approximations for inference. In this work, we propose to approximate…
▽ More
Inference on large language models can be expensive in terms of the compute and memory costs involved, especially when long sequence lengths are used. In particular, the self-attention mechanism used in such models contributes significantly to these costs, which has resulted in several recent works that propose sparse attention approximations for inference. In this work, we propose to approximate the self-attention computation by focusing on the dimensionality of key vectors computed in the attention block. Our analysis reveals that the key vectors lie in a significantly lower-dimensional space, consistently across several datasets and models. Exploiting this observation, we propose Loki, a novel sparse attention method that ranks and selects tokens in the KV-cache based on attention scores computed in low-dimensional space. Our evaluations show that Loki is able to maintain the efficacy of the models better than other popular approximation methods, while speeding up the attention computation due to reduced data movement (load/store) and compute costs.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Authors:
Shwai He,
Daize Dong,
Liang Ding,
Ang Li
Abstract:
Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE intro…
▽ More
Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE introduces potential redundancy (e.g., parameters) and extra costs (e.g., communication overhead). Despite numerous compression techniques developed for mitigating the redundancy in dense models, the compression of MoE remains under-explored. We first bridge this gap with a cutting-edge unified framework that not only seamlessly integrates mainstream compression methods but also helps systematically understand MoE compression. This framework approaches compression from two perspectives: Expert Slimming which compresses individual experts and Expert Trimming which removes structured modules. Within this framework, we explore the optimization space unexplored by existing methods,and further introduce aggressive Expert Trimming techniques, i.e., Layer Drop and Block Drop, to eliminate redundancy at larger scales. Based on these insights,we present a comprehensive recipe to guide practitioners in compressing MoE effectively. Extensive experimental results demonstrate the effectiveness of the compression methods under our framework and the proposed recipe, achieving a 6.05x speedup and only 20.0GB memory usage while maintaining over 92% of performance on Mixtral-8x7B. Code is released at \url{https://github.com/DaizeDong/Unified-MoE-Compression}.
△ Less
Submitted 24 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation
Authors:
Xueying Zeng,
Baixiang Huang,
Yu Luo,
Guangyu Wei,
Songyan He,
Yushuang Shao
Abstract:
Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary…
▽ More
Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary artery disease more accurately. In order to combine the advantages of both the base model and the domain-specific model, and to achieve high-precision and fully-automatic segmentation and detection with a limited number of training samples, we propose a novel architecture, SAM-VMNet, which combines the powerful feature extraction capability of MedSAM with the advantage of the linear complexity of the visual state-space model of VM-UNet, giving it faster inferences than Vision Transformer with faster inference speed and stronger data processing capability, achieving higher segmentation accuracy and stability for CTA images. Experimental results show that the SAM-VMNet architecture performs excellently in the CTA image segmentation task, with a segmentation accuracy of up to 98.32% and a sensitivity of up to 99.33%, which is significantly better than other existing models and has stronger domain adaptability. Comprehensive evaluation of the CTA image segmentation task shows that SAM-VMNet accurately extracts the vascular trunks and capillaries, demonstrating its great potential and wide range of application scenarios for the vascular segmentation task, and also laying a solid foundation for further stenosis detection.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
All-Loop Geometry for Four-Point Correlation Function
Authors:
Song He,
Yu-tin Huang,
Chia-Kai Kuo
Abstract:
In this letter, we consider a positive geometry conjectured to encode the loop integrand of four-point stress-energy correlators in planar $\mathcal{N}=4$ super Yang-Mills. Beginning with four lines in twistor space, we characterize a positive subspace to which an $\ell$-loop geometry is attached. The loop geometry then consists of $\ell$ lines in twistor space satisfying positivity conditions amo…
▽ More
In this letter, we consider a positive geometry conjectured to encode the loop integrand of four-point stress-energy correlators in planar $\mathcal{N}=4$ super Yang-Mills. Beginning with four lines in twistor space, we characterize a positive subspace to which an $\ell$-loop geometry is attached. The loop geometry then consists of $\ell$ lines in twistor space satisfying positivity conditions among themselves and with respect to the base. Consequently, the \textit{loop geometry} can be viewed as fibration over a \textit{tree geometry}. The fibration naturally dissects the base into chambers, in which the degree-$4 \ell$ loop form is unique and distinct for each chamber. Interestingly, up to three loops, the chambers are simply organized by the six ordering of $x^2_{12}x^2_{34}$, $x^2_{14}x^2_{23}$ and $x^2_{13}x^2_{24}$. We explicitly verify our conjecture by computing the loop-forms in terms of a basis of planar conformal integrals up to $\ell=3$, which indeed yield correct loop integrands for the four-point correlator.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Hierarchical Structure and Self-gravity in the Maddalena Giant Molecular Cloud
Authors:
Renjie Shen,
Yuehui Ma,
Hongchi Wang,
Suziye He,
Miaomiao Zhang
Abstract:
In this work, we present the data from the Milky Way Imaging Scroll Painting (MWISP) project for the Maddalena giant molecular cloud (GMC). We decompose the 13CO emission datacube of the observed region into hierarchical substructures using a modified Dendrogram algorithm. We investigate the statistical properties of these substructures and examine the role that self-gravity plays on various spati…
▽ More
In this work, we present the data from the Milky Way Imaging Scroll Painting (MWISP) project for the Maddalena giant molecular cloud (GMC). We decompose the 13CO emission datacube of the observed region into hierarchical substructures using a modified Dendrogram algorithm. We investigate the statistical properties of these substructures and examine the role that self-gravity plays on various spatial scales. The statistics of the mass (M), radius (R), velocity dispersion (σv), virial parameter (αvir), and sonic Mach number of the substructures are presented. The radius and mass distributions and the σv-R scaling relationship of the substructures resemble those reported in previous studies that use non-hierarchical algorithms to identify the entities. We find that for the hierarchical substructures αvir decreases as the radius or mass of the substructures increases. The majority of the substructures in the quiescent region of Maddalena GMC are not gravitationally bound (αvir > 2), while most of the substructures in the star-forming regions are gravitationally bound (αvir < 2). Furthermore, we find that self-gravity plays an important role on scales of 0.8-4 pc in the IRAS 06453 star-forming region, while it is not an important factor on scales below 5 pc in the non-star-forming region.
△ Less
Submitted 31 May, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
Authors:
Han Wang,
Sihong He,
Zhili Zhang,
Fei Miao,
James Anderson
Abstract:
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maxim…
▽ More
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(ε^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Are Large Language Models Chameleons?
Authors:
Mingmeng Geng,
Sihong He,
Roberto Trotta
Abstract:
Do large language models (LLMs) have their own worldviews and personality tendencies? Simulations in which an LLM was asked to answer subjective questions were conducted more than 1 million times. Comparison of the responses from different LLMs with real data from the European Social Survey (ESS) suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultura…
▽ More
Do large language models (LLMs) have their own worldviews and personality tendencies? Simulations in which an LLM was asked to answer subjective questions were conducted more than 1 million times. Comparison of the responses from different LLMs with real data from the European Social Survey (ESS) suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases. Methods for measuring the difference between LLMs and survey data are discussed, such as calculating weighted means and a new proposed measure inspired by Jaccard similarity. We conclude that it is important to analyze the robustness and variability of prompts before using LLMs to model individual decisions or collective behavior, as their imitation abilities are approximate at best.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Uniform Inviscid Dam** and Inviscid Limit of the 2D Navier-Stokes equation with Navier Boundary Conditions
Authors:
Jacob Bedrossian,
Siming He,
Sameer Iyer,
Fei Wang
Abstract:
We consider the 2D, incompressible Navier-Stokes equations near the Couette flow, $ω^{(NS)} = 1 + εω$, set on the channel $\mathbb{T} \times [-1, 1]$, supplemented with Navier boundary conditions on the perturbation, $ω|_{y = \pm 1} = 0$. We are simultaneously interested in two asymptotic regimes that are classical in hydrodynamic stability: the long time, $t \rightarrow \infty$, stability of back…
▽ More
We consider the 2D, incompressible Navier-Stokes equations near the Couette flow, $ω^{(NS)} = 1 + εω$, set on the channel $\mathbb{T} \times [-1, 1]$, supplemented with Navier boundary conditions on the perturbation, $ω|_{y = \pm 1} = 0$. We are simultaneously interested in two asymptotic regimes that are classical in hydrodynamic stability: the long time, $t \rightarrow \infty$, stability of background shear flows, and the inviscid limit, $ν\rightarrow 0$ in the presence of boundaries. Given small ($ε\ll 1$, but independent of $ν$) Gevrey 2- datum, $ω_0^{(ν)}(x, y)$, that is supported away from the boundaries $y = \pm 1$, we prove the following results: \begin{align*} & \|ω^{(ν)}(t) - \frac{1}{2π}\int ω^{(ν)}(t) dx \|_{L^2} \lesssim εe^{-δν^{1/3} t}, & \text{(Enhanced Dissipation)} \\ & \langle t \rangle \|u_1^{(ν)}(t) - \frac{1}{2π} \int u_1^{(ν)}(t) dx\|_{L^2} + \langle t \rangle^2 \|u_2^{(ν)}(t)\|_{L^2} \lesssim εe^{-δν^{1/3} t}, & \text{(Inviscid Dam**)} \\ &\| ω^{(ν)} - ω^{(0)} \|_{L^\infty} \lesssim ενt^{3+η}, \quad\quad t \lesssim ν^{-1/(3+η)} & \text{(Long-time Inviscid Limit)} \end{align*} This is the first nonlinear asymptotic stability result of its type, which combines three important physical phenomena at the nonlinear level: inviscid dam**, enhanced dissipation, and long-time inviscid limit in the presence of boundaries. The techniques we develop represent a major departure from prior works on nonlinear inviscid dam** as physical space techniques necessarily play a central role. In this paper, we focus on the primary nonlinear result, while tools for handling the linearized parabolic and elliptic equations are developed in our separate, companion work.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Pseudo-Gevrey Smoothing for the Passive Scalar Equations near Couette
Authors:
Jacob Bedrossian,
Siming He,
Sameer Iyer,
Fei Wang
Abstract:
In this article, we study the regularity theory for two linear equations that are important in fluid dynamics: the passive scalar equation for (time-varying) shear flows close to Couette in $\mathbb T \times [-1,1]$ with vanishing diffusivity $ν\to 0$ and the Poisson equation with right-hand side behaving in similar function spaces to such a passive scalar. The primary motivation for this work is…
▽ More
In this article, we study the regularity theory for two linear equations that are important in fluid dynamics: the passive scalar equation for (time-varying) shear flows close to Couette in $\mathbb T \times [-1,1]$ with vanishing diffusivity $ν\to 0$ and the Poisson equation with right-hand side behaving in similar function spaces to such a passive scalar. The primary motivation for this work is to develop some of the main technical tools required for our treatment of the (nonlinear) 2D Navier-Stokes equations, carried out in our companion work. Both equations are studied with homogeneous Dirichlet conditions (the analogue of a Navier slip-type boundary condition) and the initial condition is taken to be compactly supported away from the walls. We develop smoothing estimates with the following three features:
[1] Uniform-in-$ν$ regularity is with respect to $\partial_x$ and a time-dependent adapted vector-field $Γ$ which approximately commutes with the passive scalar equation (as opposed to `flat' derivatives), and a scaled gradient $\sqrtν \nabla$;
[2] $(\partial_x, Γ)$-regularity estimates are performed in Gevrey spaces with regularity that depends on the spatial coordinate, $y$ (what we refer to as `pseudo-Gevrey');
[3] The regularity of these pseudo-Gevrey spaces degenerates to finite regularity near the center of the channel and hence standard Gevrey product rules and other amenable properties do not hold.
Nonlinear analysis in such a delicate functional setting is one of the key ingredients to our companion paper, \cite{BHIW24a}, which proves the full nonlinear asymptotic stability of the Couette flow with slip boundary conditions. The present article introduces new estimates for the associated linear problems in these degenerate pseudo-Gevrey spaces, which is of independent interest.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks
Authors:
Yafeng Yan,
Shuyao He,
Zhou Yu,
Jiajie Yuan,
Ziang Liu,
Yan Chen
Abstract:
Aiming at the limitations of traditional medical decision system in processing large-scale heterogeneous medical data and realizing highly personalized recommendation, this paper introduces a personalized medical decision algorithm utilizing graph neural network (GNN). This research innovatively integrates graph neural network technology into the medical and health field, aiming to build a high-pr…
▽ More
Aiming at the limitations of traditional medical decision system in processing large-scale heterogeneous medical data and realizing highly personalized recommendation, this paper introduces a personalized medical decision algorithm utilizing graph neural network (GNN). This research innovatively integrates graph neural network technology into the medical and health field, aiming to build a high-precision representation model of patient health status by mining the complex association between patients' clinical characteristics, genetic information, living habits. In this study, medical data is preprocessed to transform it into a graph structure, where nodes represent different data entities (such as patients, diseases, genes, etc.) and edges represent interactions or relationships between entities. The core of the algorithm is to design a novel multi-scale fusion mechanism, combining the historical medical records, physiological indicators and genetic characteristics of patients, to dynamically adjust the attention allocation strategy of the graph neural network, so as to achieve highly customized analysis of individual cases. In the experimental part, this study selected several publicly available medical data sets for validation, and the results showed that compared with traditional machine learning methods and a single graph neural network model, the proposed personalized medical decision algorithm showed significantly superior performance in terms of disease prediction accuracy, treatment effect evaluation and patient risk stratification.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Stability Analysis of Biochemical Reaction Networks Linearly Conjugated to complex balanced Systems with Time Delays Added
Authors:
Xiaoyu Zhang,
Shibo He,
Chuanhou Gao,
Denis Dochain
Abstract:
Linear conjugacy offers a new perspective to broaden the scope of stable biochemical reaction networks to the systems linearly conjugated to the well-established complex balanced mass action systems ($\ell$cCBMASs). This paper addresses the challenge posed by time delay, which can disrupt the linear conjugacy relationship and complicate stability analysis for delayed versions of $\ell$cCBMASs (D…
▽ More
Linear conjugacy offers a new perspective to broaden the scope of stable biochemical reaction networks to the systems linearly conjugated to the well-established complex balanced mass action systems ($\ell$cCBMASs). This paper addresses the challenge posed by time delay, which can disrupt the linear conjugacy relationship and complicate stability analysis for delayed versions of $\ell$cCBMASs (D$\ell$cCBMAS). Firstly, we develop Lyapunov functionals tailored to some D$\ell$cCBMASs by using the persisted parameter relationships under time delays. Subsequently, we redivide the phase space as several invariant sets of trajectories and further investigate the existence and uniqueness of equilibriums in each newly defined invariant set. This enables us to determine the local asymptotic stability of some D$\ell$cCBMASs within an updated framework. Furthermore, illustrative examples are provided to demonstrate the practical implications of our approach.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection
Authors:
Yuwei Niu,
Shuo He,
Qi Wei,
Feng Liu,
Lei Feng
Abstract:
Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that coul…
▽ More
Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that could be attacked by inserted triggers in downstream tasks with a high success rate. To defend against backdoor attacks on CLIP, existing defense methods focus on either the pre-training stage or the fine-tuning stage, which would unfortunately cause high computational costs due to numerous parameter updates. In this paper, we provide the first attempt at a computationally efficient backdoor detection method to defend against backdoored CLIP in the inference stage. We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts. Motivated by this observation, we propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting. Specifically, we first prompt the language model (e.g., GPT-4) to produce class-related description texts (benign) and class-perturbed random texts (malignant) by specially designed instructions. Then, the distribution difference in cosine similarity between images and the two types of class description texts can be used as the criterion to detect backdoor samples. Extensive experiments validate that our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
From NeRFs to Gaussian Splats, and Back
Authors:
Siming He,
Zach Osman,
Pratik Chaudhari
Abstract:
For robotics applications where there is a limited number of (typically ego-centric) views, parametric representations such as neural radiance fields (NeRFs) generalize better than non-parametric ones such as Gaussian splatting (GS) to views that are very different from those in the training data; GS however can render much faster than NeRFs. We develop a procedure to convert back and forth betwee…
▽ More
For robotics applications where there is a limited number of (typically ego-centric) views, parametric representations such as neural radiance fields (NeRFs) generalize better than non-parametric ones such as Gaussian splatting (GS) to views that are very different from those in the training data; GS however can render much faster than NeRFs. We develop a procedure to convert back and forth between the two. Our approach achieves the best of both NeRFs (superior PSNR, SSIM, and LPIPS on dissimilar views, and a compact representation) and GS (real-time rendering and ability for easily modifying the representation); the computational cost of these conversions is minor compared to training the two from scratch.
△ Less
Submitted 10 June, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR
Authors:
Dewen Liu,
Shuai He,
Haoran Cheng,
Yadong Zeng
Abstract:
Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does…
▽ More
Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does parametric studies to investigate the computational efficiency of incompressible flows on a block-structured adaptive mesh. The parameters include refining block size, refining frequency, maximum level, and cycling method. A new projection skip** (PS) method is proposed, which brings insights about when and where the projections on coarser levels are safe to be omitted. We conduct extensive tests on different CPUs/GPUs for various 2D/3D incompressible flow cases, including bubble, RT instability, Taylor Green vortex, etc. Several valuable empirical conclusions are obtained to help guide simulations with BSAMR. Codes and all profiling data are available on GitHub.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition
Authors:
Shenglin He,
Xiaoyang Qu,
Jiguang Wan,
Guokuan Li,
Changsheng Xie,
Jianzong Wang
Abstract:
Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to hi…
▽ More
Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to high latency, rendering them impractical for real-world applications. To address this problem, we propose a Plane-Fit Redundancy Encoding point cloud sequence network named PRENet. The primary concept of our approach involves the utilization of plane fitting to mitigate spatial redundancy within the sequence, concurrently encoding the temporal redundancy of the entire sequence to minimize redundant computations. Specifically, our network comprises two principal modules: a Plane-Fit Embedding module and a Spatio-Temporal Consistency Encoding module. The Plane-Fit Embedding module capitalizes on the observation that successive point cloud frames exhibit unique geometric features in physical space, allowing for the reuse of spatially encoded data for temporal stream encoding. The Spatio-Temporal Consistency Encoding module amalgamates the temporal structure of the temporally redundant part with its corresponding spatial arrangement, thereby enhancing recognition accuracy. We have done numerous experiments to verify the effectiveness of our network. The experimental results demonstrate that our method achieves almost identical recognition accuracy while being nearly four times faster than other state-of-the-art methods.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Time-dependent Flows and Their Applications in Parabolic-parabolic Patlak-Keller-Segel Systems Part II: Shear Flows
Authors:
Siming He
Abstract:
In this study, we investigate the behavior of three-dimensional parabolic-parabolic Patlak-Keller-Segel (PKS) systems in the presence of ambient shear flows. Our findings demonstrate that when the total mass of the cell density is below a specific threshold, the solution remains globally regular as long as the flow is sufficiently strong. The primary difficulty in our analysis stems from the fast…
▽ More
In this study, we investigate the behavior of three-dimensional parabolic-parabolic Patlak-Keller-Segel (PKS) systems in the presence of ambient shear flows. Our findings demonstrate that when the total mass of the cell density is below a specific threshold, the solution remains globally regular as long as the flow is sufficiently strong. The primary difficulty in our analysis stems from the fast creation of chemical gradients due to strong shear advection.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Geometric formulation of generalized root-$T\bar{T}$ deformations
Authors:
H. Babaei-Aghbolagh,
Song He,
Tommaso Morone,
Hao Ouyang,
Roberto Tateo
Abstract:
We develop a generic geometric formalism that incorporates both $T\bar{T}$-like and root-$T\bar{T}$-like deformations in arbitrary dimensions. This framework applies to a wide family of stress-energy tensor perturbations and encompasses various well-known field theories. Building upon the recently proposed correspondence between Ricci-based gravity and $T\bar{T}$-like deformations, we further exte…
▽ More
We develop a generic geometric formalism that incorporates both $T\bar{T}$-like and root-$T\bar{T}$-like deformations in arbitrary dimensions. This framework applies to a wide family of stress-energy tensor perturbations and encompasses various well-known field theories. Building upon the recently proposed correspondence between Ricci-based gravity and $T\bar{T}$-like deformations, we further extend this duality to include root-$T\bar{T}$-like perturbations. This refinement extends the potential applications of our approach and contributes to a deeper exploration of the interplay between stress tensor perturbations and gravitational dynamics. Among the various original outcomes detailed in this article, we have also obtained a deformation of the flat Jackiw-Teitelboim gravity action.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Time-dependent Flows and Their Applications in Parabolic-parabolic Patlak-Keller-Segel Systems Part I: Alternating Flows
Authors:
Siming He
Abstract:
We consider the three-dimensional parabolic-parabolic Patlak-Keller-Segel equations (PKS) subject to ambient flows. Without the ambient fluid flow, the equation is super-critical in three-dimension and has finite-time blow-up solutions with arbitrarily small $L^1$-mass. In this study, we show that a family of time-dependent alternating shear flows, inspired by the clever ideas of Tarek Elgindi, ca…
▽ More
We consider the three-dimensional parabolic-parabolic Patlak-Keller-Segel equations (PKS) subject to ambient flows. Without the ambient fluid flow, the equation is super-critical in three-dimension and has finite-time blow-up solutions with arbitrarily small $L^1$-mass. In this study, we show that a family of time-dependent alternating shear flows, inspired by the clever ideas of Tarek Elgindi, can suppress the chemotactic blow-up in these systems.
△ Less
Submitted 9 May, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
Superconductivity of Bulk Abnormal Magic-stoichiometric Na3Cl Salt Crystals at Normal Pressure
Authors:
Shuqiang He,
Yi-Feng Zheng,
Guosheng Shi,
Yi-Jie Xiang,
Meihui Xiao,
Qituan Zhang,
Yue-Yu Zhang,
Hai** Fang
Abstract:
The identification of new materials with superconducting properties is the pursuit in the realm of superconductivity research. Here, excitedly, we show that the simplest salt daily used can be made a superconductor at normal pressure only by adjusting its stoichiometry of Na and Cl as Na3Cl at normal pressure based on first-principles calculations. This bulk stable abnormal Na-Cl stoichiometric cr…
▽ More
The identification of new materials with superconducting properties is the pursuit in the realm of superconductivity research. Here, excitedly, we show that the simplest salt daily used can be made a superconductor at normal pressure only by adjusting its stoichiometry of Na and Cl as Na3Cl at normal pressure based on first-principles calculations. This bulk stable abnormal Na-Cl stoichiometric crystal of 3:1, the first 'magic' ratio, includes metallic (Na) atoms in the core as well as hybridization of ionic and metallic bonding, facilitating the electron-phonon-coupling for superconductivity with a critical temperature Tc of 0.13 K. The flat bands and van Hove singularities near the Fermi level produce large densities of states, similar to H3S and LaH10, which is beneficial for the emergence of superconductivity. The crystal composed of with abnormal Na-Cl magic stoichiometry is a precisely tunable, purely sodium and chloride-based, three-dimensional bulk superconductor, which is therefore an ideal material for designing and understanding abnormal stoichiometric crystals. The methodology of constructing this bulk abnormal crystal may be general to almost all elements, which could lead to insights into the physics of other conventional superconductors and even high-critical-temperature superconductors.
△ Less
Submitted 17 April, 2024;
originally announced May 2024.
-
Constrained Reinforcement Learning Under Model Mismatch
Authors:
Zhongchang Sun,
Sihong He,
Fei Miao,
Shaofeng Zou
Abstract:
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constr…
▽ More
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a good policy that optimizes the reward and at the same time satisfy the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We demonstrate the effectiveness of our algorithm on a set of RL tasks with constraints.
△ Less
Submitted 3 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Note on holographic torus stress tensor correlators in $AdS_3$ gravity
Authors:
Song He,
Yi Li,
Yun-Ze Li,
Yunda Zhang
Abstract:
In the AdS$_3$/CFT$_2$ framework, the Euclidean BTZ black hole corresponds to the dominant high-temperature phase of its dual field theory. We initially employ perturbative methods to solve the Einstein equations as boundary value problems, providing correlators for the energy-momentum tensor operator at low points. Utilizing operator equations established in our previous work, we further compute…
▽ More
In the AdS$_3$/CFT$_2$ framework, the Euclidean BTZ black hole corresponds to the dominant high-temperature phase of its dual field theory. We initially employ perturbative methods to solve the Einstein equations as boundary value problems, providing correlators for the energy-momentum tensor operator at low points. Utilizing operator equations established in our previous work, we further compute arbitrary high-point correlators for the energy-momentum tensor operator in the high-temperature phase and recursive relations for these high-point functions. Concurrently, we employ the Chern-Simons formalism to derive consistent results. Further, using the cut-off AdS/$T\bar{T}$-deformed CFT duality, we calculate the energy-momentum tensor correlators, contributing to the comprehensive understanding of the system's dynamics. Finally, stress tensor correlators enable us to ascertain the corresponding KdV operator correlators at low-temperature.
△ Less
Submitted 25 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges
Authors:
ZhengZhao Feng,
Rui Wang,
TianXing Wang,
Mingli Song,
Sai Wu,
Shuibing He
Abstract:
Dynamic Graph Neural Networks (GNNs) combine temporal information with GNNs to capture structural, temporal, and contextual relationships in dynamic graphs simultaneously, leading to enhanced performance in various applications. As the demand for dynamic GNNs continues to grow, numerous models and frameworks have emerged to cater to different application needs. There is a pressing need for a compr…
▽ More
Dynamic Graph Neural Networks (GNNs) combine temporal information with GNNs to capture structural, temporal, and contextual relationships in dynamic graphs simultaneously, leading to enhanced performance in various applications. As the demand for dynamic GNNs continues to grow, numerous models and frameworks have emerged to cater to different application needs. There is a pressing need for a comprehensive survey that evaluates the performance, strengths, and limitations of various approaches in this domain. This paper aims to fill this gap by offering a thorough comparative analysis and experimental evaluation of dynamic GNNs. It covers 81 dynamic GNN models with a novel taxonomy, 12 dynamic GNN training frameworks, and commonly used benchmarks. We also conduct experimental results from testing representative nine dynamic GNN models and three frameworks on six standard graph datasets. Evaluation metrics focus on convergence accuracy, training efficiency, and GPU memory usage, enabling a thorough comparison of performance across various models and frameworks. From the analysis and evaluation results, we identify key challenges and offer principles for future research to enhance the design of models and frameworks in the dynamic GNNs field.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Research on Intelligent Aided Diagnosis System of Medical Image Based on Computer Deep Learning
Authors:
Jiajie Yuan,
Linxiao Wu,
Yulu Gong,
Zhou Yu,
Ziang Liu,
Shuyao He
Abstract:
This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteris…
▽ More
This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteristic under curve product (AUROC) is 0.9985, the recall rate is 0.9814, and the accuracy is 0.9833. This method can be applied to clinical diagnosis, and it is a practical method. Any outpatient doctor can register quickly through the system, or log in to the platform to upload the image to obtain more accurate images. Through the system, each outpatient physician can quickly register or log in to the platform for image uploading, thus obtaining more accurate images. The segmentation of images can guide doctors in clinical departments. Then the image is analyzed to determine the location and nature of the tumor, so as to make targeted treatment.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning
Authors:
Hao Liu,
Yi Shen,
Wen**g Zhou,
Yuelin Zou,
Chang Zhou,
Shuyao He
Abstract:
In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demo…
▽ More
In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demonstrates significant enhancements in maneuvering capabilities without frequent decelerations. Experiments conducted in simulated environments with varying obstacle densities confirm the effectiveness of the proposed method in achieving more stable and efficient path planning.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Authors:
Xiaohong Liu,
Xiongkuo Min,
Guangtao Zhai,
Chunyi Li,
Tengchuan Kou,
Wei Sun,
Haoning Wu,
Yixuan Gao,
Yuqin Cao,
Zicheng Zhang,
Xiele Wu,
Radu Timofte,
Fei Peng,
Huiyuan Fu,
Anlong Ming,
Chuanming Wang,
Huadong Ma,
Shuai He,
Zifei Dou,
Shu Chen,
Huacong Zhang,
Haiyi Xie,
Chengwei Wang,
Baoying Chen,
Jishen Zeng
, et al. (89 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte…
▽ More
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources
Authors:
Shizhao He,
Jungang Ge,
Ying-Chang Liang,
Dusit Niyato
Abstract:
The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on bo…
▽ More
The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on both the overall system performance and individual benefits of operators. Motivated by emerging symbiotic communication facilitating mutual benefits across different radio systems, we investigate the resource and service sharing in SAGIN from a symbiotic communication perspective in this paper. In particular, we consider a SAGIN consisting of a ground network operator (GNO) and a satellite network operator (SNO). Specifically, we aim to maximize the weighted sum rate (WSR) of the whole SAGIN by jointly optimizing the user association, resource allocation, and beamforming. Besides, we introduce a sharing coefficient to characterize the revenue of operators. Operators may suffer revenue loss when only focusing on maximizing the WSR. In pursuit of mutual benefits, we propose a mutual benefit constraint (MBC) to ensure that each operator obtains revenue gains. Then, we develop a centralized algorithm based on the successive convex approximation (SCA) method. Considering that the centralized algorithm is difficult to implement, we propose a distributed algorithm based on Lagrangian dual decomposition and the consensus alternating direction method of multipliers (ADMM). Finally, we provide extensive numerical simulations to demonstrate the effectiveness of the two proposed algorithms, and the distributed optimization algorithm can approach the performance of the centralized one.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images
Authors:
Zhiwei Wang,
Ying Zhou,
Shiquan He,
Ting Li,
Fan Huang,
Qiang Ding,
Xinxia Feng,
Mei Liu,
Qiang Li
Abstract:
Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves war** a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts onl…
▽ More
Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves war** a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by resha** the photometric constraint into a cycle form. Instead of only war** the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward war** paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward war**. The comprehensive and extensive experimental results on four endoscopic datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%, 9.38%, 9.90% and 3.17%, respectively.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
MedDr: Diagnosis-Guided Bootstrap** for Large-Scale Medical Vision-Language Learning
Authors:
Sunan He,
Yuxiang Nie,
Zhixuan Chen,
Zhiyuan Cai,
Hongmei Wang,
Shu Yang,
Hao Chen
Abstract:
The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrap** strategy that exploits both image and label information to con…
▽ More
The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrap** strategy that exploits both image and label information to construct vision-language datasets. Based on the constructed dataset, we developed MedDr, a generalist foundation model for healthcare capable of handling diverse medical data modalities, including radiology, pathology, dermatology, retinography, and endoscopy. Moreover, during inference, we propose a simple but effective retrieval-augmented medical diagnosis strategy, which enhances the model's generalization ability. Extensive experiments on visual question answering, medical report generation, and medical image diagnosis demonstrate the superiority of our method.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering
Authors:
Yao Xu,
Shizhu He,
Jiabei Chen,
Zihao Wang,
Yangqiu Song,
Hanghang Tong,
Kang Liu,
Jun Zhao
Abstract:
To address the issue of insufficient knowledge and the tendency to generate hallucination in Large Language Models (LLMs), numerous studies have endeavored to integrate LLMs with Knowledge Graphs (KGs). However, all these methods are evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where the factual triples involved in each question are entirely covered by the…
▽ More
To address the issue of insufficient knowledge and the tendency to generate hallucination in Large Language Models (LLMs), numerous studies have endeavored to integrate LLMs with Knowledge Graphs (KGs). However, all these methods are evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where the factual triples involved in each question are entirely covered by the given KG. In this situation, LLM mainly acts as an agent to find answer entities by exploring the KG, rather than effectively integrating internal and external knowledge sources. However, in real-world scenarios, KGs are often incomplete to cover all the knowledge required to answer questions. To simulate real-world scenarios and evaluate the ability of LLMs to integrate internal and external knowledge, in this paper, we propose leveraging LLMs for QA under Incomplete Knowledge Graph (IKGQA), where the given KG doesn't include all the factual triples involved in each question. To handle IKGQA, we propose a training-free method called Generate-on-Graph (GoG) that can generate new factual triples while exploring on KGs. Specifically, we propose a selecting-generating-answering framework, which not only treat the LLM as an agent to explore on KGs, but also treat it as a KG to generate new facts based on the explored subgraph and its inherent knowledge. Experimental results on two datasets demonstrate that our GoG can solve IKGQA to a certain extent, while almost all previous methods cannot perform well on IKGQA.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.