Search | arXiv e-print repository

Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

Authors: Lin Zuo, Kunshan Yang, Xianlong Tian, Kunbin He, Yongqi Ding, Mengmeng **g

Abstract: Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise fro… ▽ More Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring nodes, which adapts to the shape and size variations in flexible objects. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid nodes, which introduces local context information for the representation learning. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our Flexible Dataset demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: under review

arXiv:2406.14264 [pdf, other]

Zero-Shot Image Denoising for High-Resolution Electron Microscopy

Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we propose a super-resolution (SR) based self-supervised training strategy, incorporating the Random Sub-sampler module. The Random Sub-sampler is designed to generate approximate infinite noisy pairs from a single noisy image, serving as an effective data augmentation in zero-shot denoising. Noise2SR trains the network with paired noisy images of different resolutions, which is conducted via SR strategy. The SR-based training facilitates the network adopting more pixels for supervision, and the random sub-sampling helps compel the network to learn continuous signals enhancing the robustness. Meanwhile, we mitigate the uncertainty caused by random-sampling by adopting minimum mean squared error (MMSE) estimation for the denoised results. With the distinctive integration of training strategy and proposed designs, Noise2SR can achieve superior denoising performance using a single noisy HREM image. We evaluate the performance of Noise2SR in both simulated and real HREM denoising tasks. It outperforms state-of-the-art ZS-SSL methods and achieves comparable denoising performance with supervised methods. The success of Noise2SR suggests its potential for improving the SNR of images in material imaging domains. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 12 pages, 12 figures

arXiv:2406.13340 [pdf, other]

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech. Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses. We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation. SD-Eval focuses on paralinguistic and environmental information and includes 7,303 utterances, amounting to 8.76 hours of speech data. The data is aggregated from eight public datasets, representing four perspectives: emotion, accent, age, and background sound. To assess the SD-Eval benchmark dataset, we implement three different models and construct a training set following a similar process as SD-Eval. The training set contains 1,052.72 hours of speech data and 724.4k utterances. We also conduct a comprehensive evaluation using objective evaluation methods (e.g. BLEU and ROUGE), subjective evaluations and LLM-based metrics for the generated responses. Models conditioned with paralinguistic and environmental information outperform their counterparts in both objective and subjective measures. Moreover, experiments demonstrate LLM-based metrics show a higher correlation with human evaluation compared to traditional metrics. We open-source SD-Eval at https://github.com/amphionspace/SD-Eval. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11890 [pdf, other]

Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Authors: Hui Liu, Wenya Wang, Hao Sun, Chris Xing Tian, Chenqi Kong, Xin Dong, Haoliang Li

Abstract: Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across… ▽ More Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across tasks. These methods generally assume the selection process captures similarities between the exemplar and the target instance, however, it remains unknown what kinds of similarities are captured and vital to performing ICL. To dive into this question, we analyze the working mechanisms of the learning-based demonstration selection methods and empirically identify two important factors related to similarity measurement: 1) The ability to integrate different levels of task-agnostic text similarities between the input of exemplars and test cases enhances generalization power across different tasks. 2) Incorporating task-specific labels when measuring the similarities significantly improves the performance on each specific task. We validate these two findings through extensive quantitative and qualitative analyses across ten datasets and various LLMs. Based on our findings, we introduce two effective yet simplified exemplar selection methods catering to task-agnostic and task-specific demands, eliminating the costly LLM inference overhead. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08782 [pdf, other]

Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid spatial-spectral denoising network (HSSD), in which we design a novel hybrid dual-path network inspired by CNN and Transformer characteristics, leading to capturing both local and non-local spatial details while suppressing noise efficiently. Furthermore, to reduce computational complexity, we adopt a simple but effective decoupling strategy that disentangles the learning of space and spectral channels, where multilayer perception with few parameters is utilized to learn the global correlations among spectra. The synthetic and real experiments demonstrate that our proposed method outperforms state-of-the-art methods on spatial and spectral reconstruction. The code and details are available on https://github.com/HLImg/HSSD. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08343 [pdf, other]

Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by develo** a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2405.18955 [pdf, other]

RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

Authors: **zhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

Abstract: Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Atte… ▽ More Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue, Multi-modal Supervision (MS) is proposed to sufficiently supervise RGB-T object detection. Comprehensive experiments on two challenging benchmarks, KAIST and DroneVehicle, demonstrate the proposed model achieves the state-of-the-art accuracy while maintaining competitive efficiency. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18203 [pdf, other]

IAPT: Instruction-Aware Prompt Tuning for Large Language Models

Authors: Wei Zhu, Aaron Xuxiang Tian, Congrui Yin, Yuan Ni, Xiaoling Wang, Guotong Xie

Abstract: Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction… ▽ More Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens. First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction. The generated soft prompts can be seen as a semantic summary of the input instructions and can effectively guide the output generation. Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear projections, and an activation function. Pilot experiments show that prompt generators at different Transformer layers require different activation functions. Thus, we propose to learn the idiosyncratic activation functions for prompt generators automatically with the help of rational functions. We have conducted experiments on various tasks, and the experimental results demonstrate that (a) our IAPT method can outperform the recent baselines with comparable tunable parameters. (b) Our IAPT method is more efficient than LoRA under the single-backbone multi-tenant setting. △ Less

Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by ACL-2024

arXiv:2405.16837 [pdf, ps, other]

Enhancing Accuracy in Generative Models via Knowledge Transfer

Authors: Xinyu Tian, Xiaotong Shen

Abstract: This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribu… ▽ More This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribution metrics such as the Kullback-Leibler divergence. This framework underscores the importance of leveraging inherent similarities between diverse tasks despite their distinct data distributions. Our theory suggests that the shared structures can augment the generation accuracy for a target task, reliant on the capability of a source model to identify shared structures and effective knowledge transfer from source to target learning. To demonstrate the practical utility of this framework, we explore the theoretical implications for two specific generative models: diffusion and normalizing flows. The results show enhanced performance in both models over their non-transfer counterparts, indicating advancements for diffusion models and providing fresh insights into normalizing flows in transfer and non-transfer settings. These results highlight the significant contribution of knowledge transfer in boosting the generation capabilities of these models. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.13350 [pdf, other]

Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

Authors: Corinne Aars, Lauren Adams, Xiaokan Tian, Zhaoyu Wang, Colton Wismer, Jason Wu, Pablo Rivas, Korn Sooksatra, Matthew Fendt

Abstract: This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented w… ▽ More This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries. △ Less

Submitted 30 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: LXAI Workshop at the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

ACM Class: I.2.7

arXiv:2405.11485 [pdf]

doi 10.1038/s41467-024-48636-z

Evidence for Multiferroicity in Single-Layer CuCrSe$_2$

Authors: Zhenyu Sun, Yueqi Su, Aomiao Zhi, Zhicheng Gao, Xu Han, Kang Wu, Lihong Bao, Yuan Huang, Youguo Shi, Xuedong Bai, Peng Cheng, Lan Chen, Kehui Wu, Xuezeng Tian, Changzheng Wu, Baojie Feng

Abstract: Multiferroic materials, which simultaneously exhibit ferroelectricity and magnetism, have attracted substantial attention due to their fascinating physical properties and potential technological applications. With the trends towards device miniaturization, there is an increasing demand for the persistence of multiferroicity in single-layer materials at elevated temperatures. Here, we report high-t… ▽ More Multiferroic materials, which simultaneously exhibit ferroelectricity and magnetism, have attracted substantial attention due to their fascinating physical properties and potential technological applications. With the trends towards device miniaturization, there is an increasing demand for the persistence of multiferroicity in single-layer materials at elevated temperatures. Here, we report high-temperature multiferroicity in single-layer CuCrSe$_2$, which hosts room-temperature ferroelectricity and 120 K ferromagnetism. Notably, the ferromagnetic coupling in single-layer CuCrSe$_2$ is enhanced by the ferroelectricity-induced orbital shift of Cr atoms, which is distinct from both types I and II multiferroicity. These findings are supported by a combination of second-harmonic generation, piezo-response force microscopy, scanning transmission electron microscopy, magnetic, and Hall measurements. Our research provides not only an exemplary platform for delving into intrinsic magnetoelectric interactions at the single-layer limit but also sheds light on potential development of electronic and spintronic devices utilizing two-dimensional multiferroics. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Journal ref: Nature Communications 15, 4252 (2024)

arXiv:2405.05702 [pdf, other]

NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap

Authors: Mingrui Li, **gwei Huang, Lei Sun, Aaron Xuxiang Tian, Tianchen Deng, Hongyu Wang

Abstract: SLAM systems based on Gaussian Splatting have garnered attention due to their capabilities for rapid real-time rendering and high-fidelity map**. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure detection. To address these issues, we introduce NGM-SLAM, the first 3DGS based SLAM system that utilizes neural radiance… ▽ More SLAM systems based on Gaussian Splatting have garnered attention due to their capabilities for rapid real-time rendering and high-fidelity map**. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure detection. To address these issues, we introduce NGM-SLAM, the first 3DGS based SLAM system that utilizes neural radiance field submaps for progressive scene expression, effectively integrating the strengths of neural radiance fields and 3D Gaussian Splatting. We utilize neural radiance field submaps as supervision and achieve high-quality scene expression and online loop closure adjustments through Gaussian rendering of fused submaps. Our results on multiple real-world scenes and large-scale scene datasets demonstrate that our method can achieve accurate hole filling and high-quality scene expression, supporting monocular, stereo, and RGB-D inputs, and achieving state-of-the-art scene reconstruction and tracking performance. △ Less

Submitted 28 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 9pages, 4 figures

arXiv:2405.02807 [pdf]

Kinematic analysis of structural mechanics based on convolutional neural network

Authors: Leye Zhang, Xiangxiang Tian, Hongjun Zhang

Abstract: Attempt to use convolutional neural network to achieve kinematic analysis of plane bar structure. Through 3dsMax animation software and OpenCV module, self-build image dataset of geometrically stable system and geometrically unstable system. we construct and train convolutional neural network model based on the TensorFlow and Keras deep learning platform framework. The model achieves 100% accuracy… ▽ More Attempt to use convolutional neural network to achieve kinematic analysis of plane bar structure. Through 3dsMax animation software and OpenCV module, self-build image dataset of geometrically stable system and geometrically unstable system. we construct and train convolutional neural network model based on the TensorFlow and Keras deep learning platform framework. The model achieves 100% accuracy on the training set, validation set, and test set. The accuracy on the additional test set is 93.7%, indicating that convolutional neural network can learn and master the relevant knowledge of kinematic analysis of structural mechanics. In the future, the generalization ability of the model can be improved through the diversity of dataset, which has the potential to surpass human experts for complex structures. Convolutional neural network has certain practical value in the field of kinematic analysis of structural mechanics. Using visualization technology, we reveal how convolutional neural network learns and recognizes structural features. Using pre-trained VGG16 model for feature extraction and fine-tuning, we found that the generalization ability is inferior to the self-built model. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 9 pages, 13 figures

arXiv:2405.01189 [pdf, other]

Gradient-Congruity Guided Federated Sparse Training

Authors: Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C. C. Cheung, Shiqi Wang

Abstract: Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs reg… ▽ More Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.17890 [pdf, other]

DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, **gyi Yu, Yuyao Zhang

Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging reconstruction tasks. However, the unsupervised nature of INR architecture imposes limited constraints on the solution space, particularly for the highly ill-posed reconstruction task posed by LACT and ultra-SVCT. In this study, we introduce the Diffusion Prior Driven Neural Representation (DPER), an advanced unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems. DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems. The two sub-problems are respectively addressed by INR reconstruction scheme and pre-trained score-based diffusion model. This combination initially preserves the implicit image local consistency prior from INR. Additionally, it effectively augments the feasibility of the solution space for the inverse problem through the generative diffusion model, resulting in increased stability and precision in the solutions. We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets (AAPM and LIDC). The results show that our method outperforms the state-of-the-art reconstruction methods on in-domain datasets, while achieving significant performance improvements on out-of-domain datasets. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 15 pages, 10 figures

ACM Class: I.2.10; I.4.5

arXiv:2404.12826 [pdf, ps, other]

Semi-harmonious and harmonious quasi-projection pairs on Hilbert $C^*$-modules

Authors: Xiaoyi Tian, Qingxiang Xu, Chunhong Fu

Abstract: For each adjointable idempotent $Q$ on a Hilbert $C^*$-module $H$, a specific projection $m(Q)$ called the matched projection of $Q$ was introduced recently due to the characterization of the minimum value among all the distances from projections to $Q$. Inspired by the relationship between $m(Q)$ and $Q$, another term called the quasi-projection pair $(P,Q)$ was also introduced recently, where… ▽ More For each adjointable idempotent $Q$ on a Hilbert $C^*$-module $H$, a specific projection $m(Q)$ called the matched projection of $Q$ was introduced recently due to the characterization of the minimum value among all the distances from projections to $Q$. Inspired by the relationship between $m(Q)$ and $Q$, another term called the quasi-projection pair $(P,Q)$ was also introduced recently, where $P$ is a projection on $H$ satisfying $Q^*=(2P-I)Q(2P-I)$, in which $Q^*$ is the adjoint operator of the idempotent $Q$ and $I$ is the identity operator on $H$. Some fundamental issues on quasi-projection pairs, such as the block matrix representations for quasi-projection pairs and the $C^*$-morphisms associated with quasi-projection pairs, are worthwhile to be investigated. This paper aims to make some detailed preparations. Two objects called the semi-harmonious quasi-projection pair and the harmonious quasi-projection pair are introduced and are systematically studied in the general setting of the adjointable operators on Hilbert $C^*$-modules. Some applications concerning the common similarity of operators and a norm equation associated with the Friedrichs angle are also dealt with. Furthermore, many examples are provided to illustrate the non-triviality of the associated characterizations. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2305.12984

MSC Class: 46L08; 47A05

arXiv:2404.12759 [pdf, other]

decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

Authors: Yi Guo, Fanliu Kong, Xiaoyang Li, Hui Li, Wei Chen, Xiaogang Tian, **** Cai, Yang Zhang, Shouda Liu

Abstract: Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degr… ▽ More Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degradation at very low bits, or require some additional computational overhead when deployed, making it difficult to be applied to large-scale applications in industry. In this paper, we propose decoupleQ, achieving a substantial increase in model accuracy, especially at very low bits. decoupleQ abandons the traditional heuristic quantization paradigm and decouples the model parameters into integer and floating-point parts, thus transforming the quantization problem into a traditional mathematical optimization problem with constraints, which is then solved alternatively by off-the-shelf optimization methods. Quantization via decoupleQ is linear and uniform, making it hardware-friendlier than non-uniform counterpart, and enabling the idea to be migrated to high-bit quantization to enhance its robustness. Our method has achieved well on-line accuracy near fp16/bf16 on the 2-bit quantization of large speech models in ByteDance. The code is available at https://github.com/bytedance/decoupleQ △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: quantization for deep models

arXiv:2404.09079 [pdf, ps, other]

Compactness results for a Dirichlet energy of nonlocal gradient with applications

Authors: Zhaolong Han, Tadele Mengesha, Xiaochuan Tian

Abstract: We prove two compactness results for function spaces with finite Dirichlet energy of half-space nonlocal gradients. In each of these results, we provide sufficient conditions on a sequence of kernel functions that guarantee the asymptotic compact embedding of the associated nonlocal function spaces into the class of square-integrable functions. Moreover, we will demonstrate that the sequence of no… ▽ More We prove two compactness results for function spaces with finite Dirichlet energy of half-space nonlocal gradients. In each of these results, we provide sufficient conditions on a sequence of kernel functions that guarantee the asymptotic compact embedding of the associated nonlocal function spaces into the class of square-integrable functions. Moreover, we will demonstrate that the sequence of nonlocal function spaces converges in an appropriate sense to a limiting function space. As an application, we prove uniform Poincaré-type inequalities for sequence of half-space gradient operators. We also apply the compactness result to demonstrate the convergence of appropriately parameterized nonlocal heterogeneous anisotropic diffusion problems. We will construct asymptotically compatible schemes for these type of problems. Another application concerns the convergence and robust discretization of a nonlocal optimal control problem. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.03523 [pdf]

Integrating Generative AI into Financial Market Prediction for Improved Decision Making

Authors: Chang Che, Zengyi Huang, Chen Li, Haotian Zheng, Xinyu Tian

Abstract: This study provides an in-depth analysis of the model architecture and key technologies of generative artificial intelligence, combined with specific application cases, and uses conditional generative adversarial networks ( cGAN ) and time series analysis methods to simulate and predict dynamic changes in financial markets. The research results show that the cGAN model can effectively capture the… ▽ More This study provides an in-depth analysis of the model architecture and key technologies of generative artificial intelligence, combined with specific application cases, and uses conditional generative adversarial networks ( cGAN ) and time series analysis methods to simulate and predict dynamic changes in financial markets. The research results show that the cGAN model can effectively capture the complexity of financial market data, and the deviation between the prediction results and the actual market performance is minimal, showing a high degree of accuracy. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03433 [pdf, ps, other]

The operator distances from projections to an idempotent

Authors: Xiaofeng Zhang, Xiaoyi Tian, Qingxiang Xu

Abstract: The main purpose of this paper is to give a full characterization of the operator distances from projections to an idempotent, which includes the minimum value, the maximum value and the intermediate values. Let $H$ be a Hilbert space and $\mathbb{B}(H)$ be the set of bounded linear operators on $H$. Given an arbitrary idempotent $Q\in \mathbb{B}(H)$, it is proved that… ▽ More The main purpose of this paper is to give a full characterization of the operator distances from projections to an idempotent, which includes the minimum value, the maximum value and the intermediate values. Let $H$ be a Hilbert space and $\mathbb{B}(H)$ be the set of bounded linear operators on $H$. Given an arbitrary idempotent $Q\in \mathbb{B}(H)$, it is proved that $$\|m(Q)-Q\|\le \|P-Q\|\le \|I-m(Q)-Q\|$$ for every projection $P$ on $H$, in which $I$ is the identity operator on $H$ and $m(Q)$ is a specific projection called the matched projection of $Q$. When $Q\in\mathbb{B}(H)$ is a non-projection idempotent, it is proved that for every number $α$ contained in the interval $\left[\|m(Q)-Q\|,\|I-m(Q)-Q\|\right]$, there exists a projection $P\in \mathbb{B}(H)$ such that $\|P-Q\|=α$. Two uniqueness problems concerning the projections that attain the minimum value or the maximum value are also dealt with. △ Less

Submitted 4 April, 2024; originally announced April 2024.

MSC Class: 47A05

arXiv:2404.02082 [pdf, other]

WcDT: World-centric Diffusion Transformer for Traffic Scene Generation

Authors: Chen Yang, Aaron Xuxiang Tian, Dong Chen, Tianyu Shi, Arsalan Heydarian

Abstract: In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer" (WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance t… ▽ More In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer" (WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance the scene diversity and stochasticity, the historical trajectory data is first preprocessed and encoded into latent space using Denoising Diffusion Probabilistic Models (DDPM) enhanced with Diffusion with Transformer (DiT) blocks. Then, the latent features, historical trajectories, HD map features, and historical traffic signal information are fused with various transformer-based encoders. The encoded traffic scenes are then decoded by a trajectory decoder to generate multimodal future trajectories. Comprehensive experimental results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories, showing its potential for integration into automatic driving simulation systems. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures

arXiv:2404.00730 [pdf, other]

Long-range dipole-dipole exchange-induced atomic grating

Authors: Xuan-Qian Bao, Xue-Dong Tian, Dong-Xiao Li, Yi-Mou Liu

Abstract: We propose a theoretical scheme for dipole exchange-induced grating (DEIG) based on a hybrid system consisting of ultra-cold Rubidium ($^{87}$Rb) atomic ensemble and movable Rydberg spin atoms. The optical response of the grating appears as a superposition of three- and four-level configurations, similar to the cooperative optical nonlinear effect caused by the dipole blockade effect. However, suc… ▽ More We propose a theoretical scheme for dipole exchange-induced grating (DEIG) based on a hybrid system consisting of ultra-cold Rubidium ($^{87}$Rb) atomic ensemble and movable Rydberg spin atoms. The optical response of the grating appears as a superposition of three- and four-level configurations, similar to the cooperative optical nonlinear effect caused by the dipole blockade effect. However, such Rydberg atomic grating uniquely responds to the spatial positions of spin atoms, offering a novel approach to dynamically control electromagnetically induced gratings (EIG) except for input probe intensity. △ Less

Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.16187 [pdf, other]

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Authors: Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, Yvette Graham

Abstract: Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in the era of large language models. Low-rank adaptation (LoRA) has demonstrated commendable performance as a popular and representative method. However, it is implemented with a fixed intrinsic rank that might not be the ideal setting for the downstream tasks. Recognizing the need for more flexible downs… ▽ More Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in the era of large language models. Low-rank adaptation (LoRA) has demonstrated commendable performance as a popular and representative method. However, it is implemented with a fixed intrinsic rank that might not be the ideal setting for the downstream tasks. Recognizing the need for more flexible downstream task adaptation, we extend the methodology of LoRA to an innovative approach we call allocating low-rank adaptation (ALoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank. Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks. We have conducted experiments on various tasks, and the experimental results demonstrate that our ALoRA method can outperform the recent baselines with comparable tunable parameters. △ Less

Submitted 15 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted by NAACL-2024

arXiv:2403.14057 [pdf]

Exploring Fermi Surface Nesting and the Nature of Heavy Quasiparticles in the Spin-Triplet Superconductor Candidate CeRh$_2$As$_2$

Authors: Bo Chen, Hao Liu, Qi-Yi Wu, Chen Zhang, Xue-Qing Ye, Yin-Zou Zhao, Jiao-Jiao Song, Xin-Yi Tian, Ba-Lei Tan, Zheng-Tai Liu, Mao Ye, Zhen-Hua Chen, Yao-Bo Huang, Da-Wei Shen, Ya-Hua Yuan, Jun He, Yu-Xia Duan, Jian-Qiao Meng

Abstract: In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structur… ▽ More In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structures reveal primarily localized 4f electrons, with minor itinerant contributions. Additionally, a transition from localized to itinerant behavior and significant c-f hybridization anisotropy underscore the role of f-electrons in sha** electronic properties. These findings deepen our understanding of CeRh$_2$As$_2$'s unconventional superconductivity and magnetism. Further exploration promises advances in superconductivity research. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures

arXiv:2403.11405 [pdf, other]

A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm

Authors: Jun Lei, Yuxi Zhou, Xue Tian, Qinghao Zhao, Qi Zhang, Shijia Geng, Qingbo Wu, Shenda Hong

Abstract: Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt… ▽ More Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhythm is absent. To address this, this paper proposes a novel artificial intelligence (AI) algorithm to distinguish ``sinus rhythm in AF patients'' and ``sinus rhythm in normal individuals'' in beat-level. We introduce beat-level risk interpreters, trend risk interpreters, addressing the interpretability issues of deep learning models and the difficulty in explaining AF risk trends. Additionally, the beat-level information fusion decision is presented to enhance model accuracy. The experimental results demonstrate that the average AUC for single beats used as testing data from CPSC 2021 dataset is 0.7314. By employing 150 beats for information fusion decision algorithm, the average AUC can reach 0.7591. Compared to previous segment-level algorithms, we utilized beats as input, reducing data dimensionality and making the model more lightweight, facilitating deployment on portable medical devices. Furthermore, we draw new and interesting findings through average beat analysis and subgroup analysis, considering varying risk levels. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.09996 [pdf, other]

MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings

Authors: Yu Du, Yu Song, Ce Guo, Xiao**g Tian, Dong Liu, Ming Cong

Abstract: Due to their complex spatial structure and diverse geometric features, achieving high-precision and robust point cloud registration for complex Die Castings has been a significant challenge in the die-casting industry. Existing point cloud registration methods primarily optimize network models using well-established high-quality datasets, often neglecting practical application in real scenarios. T… ▽ More Due to their complex spatial structure and diverse geometric features, achieving high-precision and robust point cloud registration for complex Die Castings has been a significant challenge in the die-casting industry. Existing point cloud registration methods primarily optimize network models using well-established high-quality datasets, often neglecting practical application in real scenarios. To address this gap, this paper proposes a high-precision adaptive registration method called Multiscale Efficient Deep Closest Point (MEDPNet) and introduces a die-casting point cloud dataset, DieCastCloud, specifically designed to tackle the challenges of point cloud registration in the die-casting industry. The MEDPNet method performs coarse die-casting point cloud data registration using the Efficient-DCP method, followed by precision registration using the Multiscale feature fusion dual-channel registration (MDR) method. We enhance the modeling capability and computational efficiency of the model by replacing the attention mechanism of the Transformer in DCP with Efficient Attention and implementing a collaborative scale mechanism through the combination of serial and parallel blocks. Additionally, we propose the MDR method, which utilizes multilayer perceptrons (MLP), Normal Distributions Transform (NDT), and Iterative Closest Point (ICP) to achieve learnable adaptive fusion, enabling high-precision, scalable, and noise-resistant global point cloud registration. Our proposed method demonstrates excellent performance compared to state-of-the-art geometric and learning-based registration methods when applied to complex die-casting point cloud data. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09412 [pdf, other]

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

Authors: Yinan Deng, Jiahui Wang, **gyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

Abstract: Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary maps, powered by Visual-Language models (VLMs), possess inherent advantages, including zero-shot learning and support for open-set classes. However, existing open-vocabulary maps are prima… ▽ More Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary maps, powered by Visual-Language models (VLMs), possess inherent advantages, including zero-shot learning and support for open-set classes. However, existing open-vocabulary maps are primarily designed for small-scale environments, such as desktops or rooms, and are typically geared towards limited-area tasks involving robotic indoor navigation or in-place manipulation. They face challenges in direct generalization to outdoor environments characterized by numerous objects and complex tasks, owing to limitations in both understanding level and map structure. In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments. OpenGraph initially extracts instances and their captions from visual images, enhancing textual reasoning by encoding them. Subsequently, it achieves 3D incremental object-centric map** with feature embedding by projecting images onto LiDAR point clouds. Finally, the environment is segmented based on lane graph connectivity to construct a hierarchical graph. Validation results from public dataset SemanticKITTI demonstrate that OpenGraph achieves the highest segmentation and query accuracy. The source code of OpenGraph is publicly available at https://github.com/BIT-DYN/OpenGraph. △ Less

Submitted 28 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.05801 [pdf, other]

Enhancing Multi-Hop Knowledge Graph Reasoning through Reward Sha** Techniques

Authors: Chen Li, Haotian Zheng, Yi** Sun, Cangqing Wang, Liqiang Yu, Che Chang, Xinyu Tian, Bo Liu

Abstract: In the realm of computational knowledge representation, Knowledge Graph Reasoning (KG-R) stands at the forefront of facilitating sophisticated inferential capabilities across multifarious domains. The quintessence of this research elucidates the employment of reinforcement learning (RL) strategies, notably the REINFORCE algorithm, to navigate the intricacies inherent in multi-hop KG-R. This invest… ▽ More In the realm of computational knowledge representation, Knowledge Graph Reasoning (KG-R) stands at the forefront of facilitating sophisticated inferential capabilities across multifarious domains. The quintessence of this research elucidates the employment of reinforcement learning (RL) strategies, notably the REINFORCE algorithm, to navigate the intricacies inherent in multi-hop KG-R. This investigation critically addresses the prevalent challenges introduced by the inherent incompleteness of Knowledge Graphs (KGs), which frequently results in erroneous inferential outcomes, manifesting as both false negatives and misleading positives. By partitioning the Unified Medical Language System (UMLS) benchmark dataset into rich and sparse subsets, we investigate the efficacy of pre-trained BERT embeddings and Prompt Learning methodologies to refine the reward sha** process. This approach not only enhances the precision of multi-hop KG-R but also sets a new precedent for future research in the field, aiming to improve the robustness and accuracy of knowledge inference within complex KG frameworks. Our work contributes a novel perspective to the discourse on KG reasoning, offering a methodological advancement that aligns with the academic rigor and scholarly aspirations of the Natural journal, promising to invigorate further advancements in the realm of computational knowledge representation. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: This paper has been accepted by the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT 2024)

arXiv:2402.14430 [pdf, other]

Robust Training of Federated Models with Extremely Label Deficiency

Authors: Yonggang Zhang, Zhiqin Yang, Xinmei Tian, Nannan Wang, Tongliang Liu, Bo Han

Abstract: Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient con… ▽ More Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient conflicts. To alleviate gradient conflict, we propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. In particular, Twin-sight concurrently trains a supervised model with a supervised objective function while training an unsupervised model using an unsupervised objective function. To enhance the synergy between these two models, Twin-sight introduces a neighbourhood-preserving constraint, which encourages the preservation of the neighbourhood relationship among data features extracted by both models. Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings, demonstrating the efficacy of the proposed Twin-sight. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: ICLR 2024, 22 pages

arXiv:2402.14155 [pdf, other]

Can Similarity-Based Domain-Ordering Reduce Catastrophic Forgetting for Intent Recognition?

Authors: Amogh Mannekote, Xiaoyi Tian, Kristy Elizabeth Boyer, Bonnie J. Dorr

Abstract: Task-oriented dialogue systems are expected to handle a constantly expanding set of intents and domains even after they have been deployed to support more and more functionalities. To live up to this expectation, it becomes critical to mitigate the catastrophic forgetting problem (CF) that occurs in continual learning (CL) settings for a task such as intent recognition. While existing dialogue sys… ▽ More Task-oriented dialogue systems are expected to handle a constantly expanding set of intents and domains even after they have been deployed to support more and more functionalities. To live up to this expectation, it becomes critical to mitigate the catastrophic forgetting problem (CF) that occurs in continual learning (CL) settings for a task such as intent recognition. While existing dialogue systems research has explored replay-based and regularization-based methods to this end, the effect of domain ordering on the CL performance of intent recognition models remains unexplored. If understood well, domain ordering has the potential to be an orthogonal technique that can be leveraged alongside existing techniques such as experience replay. Our work fills this gap by comparing the impact of three domain-ordering strategies (min-sum path, max-sum path, random) on the CL performance of a generative intent recognition model. Our findings reveal that the min-sum path strategy outperforms the others in reducing catastrophic forgetting when training on the 220M T5-Base model. However, this advantage diminishes with the larger 770M T5-Large model. These results underscores the potential of domain ordering as a complementary strategy for mitigating catastrophic forgetting in continually learning intent recognition models, particularly in resource-constrained scenarios. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12289 [pdf, other]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Authors: Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

Abstract: A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scen… ▽ More A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. Experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the efficacy of DriveVLM and DriveVLM-Dual in handling complex and unpredictable driving conditions. Finally, we deploy the DriveVLM-Dual on a production vehicle, verifying it is effective in real-world autonomous driving environments. △ Less

Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Project Page: https://tsinghua-mars-lab.github.io/DriveVLM/

arXiv:2402.11778 [pdf, other]

Towards Theoretical Understandings of Self-Consuming Generative Models

Authors: Shi Fu, Sen Zhang, Yingjie Wang, Xinmei Tian, Dacheng Tao

Abstract: This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric a… ▽ More This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric and non-parametric models. Specifically, we derive bounds on the total variation (TV) distance between the synthetic data distributions produced by future models and the original real data distribution under various mixed training scenarios for diffusion models with a one-hidden-layer neural network score function. Our analysis demonstrates that this distance can be effectively controlled under the condition that mixed training dataset sizes or proportions of real data are large enough. Interestingly, we further unveil a phase transition induced by expanding synthetic data amounts, proving theoretically that while the TV distance exhibits an initial ascent, it declines beyond a threshold point. Finally, we present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation. △ Less

Submitted 24 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024

arXiv:2402.07749 [pdf, ps, other]

Asymptotically compatible schemes for nonlinear variational models via Gamma-convergence and applications to nonlocal problems

Authors: Qiang Du, James M. Scott, Xiaochuan Tian

Abstract: We present a study on asymptotically compatible Galerkin discretizations for a class of parametrized nonlinear variational problems. The abstract analytical framework is based on variational convergence, or Gamma-convergence. We demonstrate the broad applicability of the theoretical framework by develo** asymptotically compatible finite element discretizations of some representative nonlinear no… ▽ More We present a study on asymptotically compatible Galerkin discretizations for a class of parametrized nonlinear variational problems. The abstract analytical framework is based on variational convergence, or Gamma-convergence. We demonstrate the broad applicability of the theoretical framework by develo** asymptotically compatible finite element discretizations of some representative nonlinear nonlocal variational problems on a bounded domain. These include nonlocal nonlinear problems with classically-defined, local boundary constraints through heterogeneous localization at the boundary, as well as nonlocal problems posed on parameter-dependent domains. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07011 [pdf, other]

FedImpro: Measuring and Improving Client Update in Federated Learning

Authors: Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu

Abstract: Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved… ▽ More Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This approach enhances the generalization contribution and reduces the dissimilarity of gradients in FL. Experimental results show that FedImpro can help FL defend against data heterogeneity and enhance the generalization performance of the model. △ Less

Submitted 14 March, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2401.12264 [pdf, other]

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

Authors: Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

Abstract: There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using two separate systems: one for representing verbal (textual) information and one for representing non-verbal (visual and auditory) information. These two systems… ▽ More There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using two separate systems: one for representing verbal (textual) information and one for representing non-verbal (visual and auditory) information. These two systems can operate independently but can also interact with each other. Motivated by this understanding of human cognition, in this paper, we introduce CoAVT -- a novel cognition-inspired Correlated Audio-Visual-Text pre-training model to connect the three modalities. It contains a joint audio-visual encoder that learns to encode audio-visual synchronization information together with the audio and visual content for non-verbal information, and a text encoder to handle textual input for verbal information. To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text. Additionally, to leverage the correspondences between audio and vision with language respectively, we also establish the audio-text and visual-text bi-modal alignments upon the foundational audiovisual-text tri-modal alignment to enhance the multimodal representation learning. Finally, we jointly optimize CoAVT model with three multimodal objectives: contrastive loss, matching loss and language modeling loss. Extensive experiments show that CoAVT can learn strong multimodal correlations and be generalized to various downstream tasks. CoAVT establishes new state-of-the-art performance on text-video retrieval task on AudioCaps for both zero-shot and fine-tuning settings, audio-visual event classification and audio-visual retrieval tasks on AudioSet and VGGSound. △ Less

Submitted 21 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.04129 [pdf, ps, other]

Gradient stability of Caffarelli-Kohn-Nirenberg inequality involving weighted p-Laplace

Authors: Shengbing Deng, Xingliang Tian

Abstract: The best constant and extremal functions are well known of the following Caffarelli-Kohn-Nirenberg inequality \[ \int_{\mathbb{R}^N}|\nabla u|^p\frac{\mathrm{d}x}{|x|^μ}\geq \mathcal{S} \left(\int_{\mathbb{R}^N}|u|^r\frac{\mathrm{d}x}{|x|^s} \right)^{\frac{p}{r}}, \quad \mbox{for all}\quad u\in C^\infty_c(\mathbb{R}^N), \] where $1<p<p+μ<N$, $\fracμ{p}\leq \frac{s}{r}<\fracμ{p}+1$,… ▽ More The best constant and extremal functions are well known of the following Caffarelli-Kohn-Nirenberg inequality \[ \int_{\mathbb{R}^N}|\nabla u|^p\frac{\mathrm{d}x}{|x|^μ}\geq \mathcal{S} \left(\int_{\mathbb{R}^N}|u|^r\frac{\mathrm{d}x}{|x|^s} \right)^{\frac{p}{r}}, \quad \mbox{for all}\quad u\in C^\infty_c(\mathbb{R}^N), \] where $1<p<p+μ<N$, $\fracμ{p}\leq \frac{s}{r}<\fracμ{p}+1$, $r=\frac{p(N-s)}{N-p-μ}$. An important task is investigating the stability of extremals for this inequality. Firstly, we give the classification to the linearized problem related to the extremals which shows the extremals are non-degenerate. Then we investigate the gradient type remainder term of previous inequality by using spectral estimate combined with a compactness argument which partially extends the work of Wei and Wu [Math. Ann., 2022] to a general $p$-Laplace case, and also the work of Figalli and Zhang [Duke Math. J., 2022] to a weighted case. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 38 pages. Any suggestions and comments are welcome! arXiv admin note: text overlap with arXiv:2308.04111

arXiv:2401.02777 [pdf, other]

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

Authors: Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. I… ▽ More This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It entails a comprehensive agent construction scenario, including phases like Conversation Selection, Scene Extraction, CoT Completion, and Scene Augmentation, leading to the LLMs Training phase. This approach appears to enhance agent controllability and adaptability in complex, multi-turn dialogues. Our preliminary evaluations in a real estate sales context suggest that RAISE has some advantages over traditional agents, indicating its potential for broader applications. This work contributes to the AI field by providing a robust framework for develo** more context-aware and versatile conversational agents. △ Less

Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.02034 [pdf, other]

Text2MDT: Extracting Medical Decision Trees from Medical Texts

Authors: Wei Zhu, Wenfeng Li, Xing Tian, Pengfei Wang, Xiaoling Wang, ** Chen, Yuanbin Wu, Yuan Ni, Guotong Xie

Abstract: Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelin… ▽ More Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelines and textbooks. We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts. We investigate two different methods for the Text2MDT tasks: (a) an end-to-end framework which only relies on a GPT style large language models (LLM) instruction tuning to generate all the node information and tree structures. (b) The pipeline framework which decomposes the Text2MDT task to three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the end-to-end method basd on LLMs (7B parameters or larger) show promising results, and successfully outperform the pipeline methods. (b) The chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the performance of the fine-tuned LLMs on the Text2MDT test set. (c) the lightweight pipelined method based on encoder-based pretrained models can perform comparably with LLMs with model complexity two magnititudes smaller. Our Text2MDT dataset is open-sourced at \url{https://tianchi.aliyun.com/dataset/95414}, and the source codes are open-sourced at \url{https://github.com/michael-wzhu/text2dt}. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00464 [pdf, ps, other]

A note on the $L^p$-Sobolev inequality

Authors: Shengbing Deng, Xingliang Tian

Abstract: The usual Sobolev inequality in $\mathbb{R}^N$, asserts that $\|\nabla u\|_{L^p(\mathbb{R}^N)} \geq \mathcal{S}\|u\|_{L^{p^*}(\mathbb{R}^N)}$ for $1<p<N$ and $p^*=\frac{pN}{N-p}$, with $\mathcal{S}$ being the sharp constant. This note is concerned, instead, with function restricted to bounded domain $Ω\subset \mathbb{R}^N$. Based on the recent work of Figalli and Zhang [Duke Math. J., 2022], a rem… ▽ More The usual Sobolev inequality in $\mathbb{R}^N$, asserts that $\|\nabla u\|_{L^p(\mathbb{R}^N)} \geq \mathcal{S}\|u\|_{L^{p^*}(\mathbb{R}^N)}$ for $1<p<N$ and $p^*=\frac{pN}{N-p}$, with $\mathcal{S}$ being the sharp constant. This note is concerned, instead, with function restricted to bounded domain $Ω\subset \mathbb{R}^N$. Based on the recent work of Figalli and Zhang [Duke Math. J., 2022], a remainder term with weak norm is established \[ \frac{\|\nabla u\|_{L^p(Ω)}}{\|u\|_{L^{p^*}(Ω)}} -\mathcal{S} \geq \mathcal{C} \left(\frac{\|u\|_{L^{\bar{p}}_w(Ω)}} {\|u\|_{L^{p^*}(Ω)}}\right)^{\max\{2,p\}},\quad \forall u\in C^\infty_0(Ω)\setminus\{0\}, \] for some $\mathcal{C}=\mathcal{C}(N,p,Ω)>0$, where $\bar{p}=p^*(p-1)/p$ and $\|\cdot\|_{L^{\bar{p}}_w(Ω)}$ denotes the weak $L^{\bar{p}}$-norm. Furthermore, the weak norm can not be replaced by the strong norm. This result answers the long-standing open problem raised by Bianchi and Egnell [J. Funct. Anal., 1991]. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.17461 [pdf, ps, other]

Gaussian radial basis functions collocation for fractional PDEs: methodology and error analysis

Authors: Xiaochuan Tian, Yixuan Wu, Yanzhi Zhang

Abstract: The paper introduces a new meshfree pseudospectral method based on Gaussian radial basis functions (RBFs) collocation to solve fractional Poisson equations. Hypergeometric functions are used to represent the fractional Laplacian of Gaussian RBFs, enabling an efficient computation of stiffness matrix entries. Unlike existing RBF-based methods, our approach ensures a Toeplitz structure in the stiffn… ▽ More The paper introduces a new meshfree pseudospectral method based on Gaussian radial basis functions (RBFs) collocation to solve fractional Poisson equations. Hypergeometric functions are used to represent the fractional Laplacian of Gaussian RBFs, enabling an efficient computation of stiffness matrix entries. Unlike existing RBF-based methods, our approach ensures a Toeplitz structure in the stiffness matrix with equally spaced RBF centers, enabling efficient matrix-vector multiplications using fast Fourier transforms. We conduct a comprehensive study on the shape parameter selection, addressing challenges related to ill-conditioning and numerical stability. The main contribution of our work includes rigorous stability analysis and error estimates of the Gaussian RBF collocation method, representing a first attempt at the rigorous analysis of RBF-based methods for fractional PDEs to the best of our knowledge. We conduct numerical experiments to validate our analysis and provide practical insights for implementation. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.13305 [pdf, other]

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

Authors: Tao Zhang, Xingye Tian, Yikang Zhou, Shun** Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

Abstract: We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS). Unlike previous methods that model video segmentation in an end-to-end manner, our approach decouples video segmentat… ▽ More We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS). Unlike previous methods that model video segmentation in an end-to-end manner, our approach decouples video segmentation into three cascaded sub-tasks: segmentation, tracking, and refinement. This decoupling design allows for simpler and more effective modeling of the spatio-temporal representations of objects, especially in complex scenes and long videos. Accordingly, we introduce two novel components: the referring tracker and the temporal refiner. These components track objects frame by frame and model spatio-temporal representations based on pre-aligned features. To improve the tracking capability of DVIS, we propose a denoising training strategy and introduce contrastive learning, resulting in a more robust framework named DVIS++. Furthermore, we evaluate DVIS++ in various settings, including open vocabulary and using a frozen pre-trained backbone. By integrating CLIP with DVIS++, we present OV-DVIS++, the first open-vocabulary universal video segmentation framework. We conduct extensive experiments on six mainstream benchmarks, including the VIS, VSS, and VPS datasets. Using a unified architecture, DVIS++ significantly outperforms state-of-the-art specialized methods on these benchmarks in both close- and open-vocabulary settings. Code:~\url{https://github.com/zhang-tao-whu/DVIS_Plus}. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11413 [pdf, other]

DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

Authors: Xiao Tian, Rachael Hwee Ling Sim, Jue Fan, Bryan Kian Hsiang Low

Abstract: Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data ownership and data protection regulations, model owners will likely have to fulfil more data deletion requests. This raises issues that have not been addressed by exis… ▽ More Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data ownership and data protection regulations, model owners will likely have to fulfil more data deletion requests. This raises issues that have not been addressed by existing works: Are the data valuation scores still fair with deletions? Must the scores be expensively recomputed? The answer is no. To avoid recomputations, we propose using our data valuation framework DeRDaVa upfront for valuing each data source's contribution to preserving robust model performance after anticipated data deletions. DeRDaVa can be efficiently approximated and will assign higher values to data that are more useful or less likely to be deleted. We further generalize DeRDaVa to Risk-DeRDaVa to cater to risk-averse/seeking model owners who are concerned with the worst/best-cases model utility. We also empirically demonstrate the practicality of our solutions. △ Less

Submitted 21 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10315 [pdf, ps, other]

A neural network kernel decomposition for learning multiple steady states in parameterized dynamical systems

Authors: Yimeng Zhang, Alexander Cloninger, Bo Li, Xiaochuan Tian

Abstract: We develop a machine learning approach to identifying parameters with steady-state solutions, locating such solutions, and determining their linear stability for systems of ordinary differential equations and dynamical systems with parameters. Our approach begins with the construction of target functions that can be used to identify parameters with steady-state solution and the linear stability of… ▽ More We develop a machine learning approach to identifying parameters with steady-state solutions, locating such solutions, and determining their linear stability for systems of ordinary differential equations and dynamical systems with parameters. Our approach begins with the construction of target functions that can be used to identify parameters with steady-state solution and the linear stability of such solutions. We design a parameter-solution neural network (PSNN) that couples a parameter neural network and a solution neural network to approximate the target function, and develop efficient algorithms to train the PSNN and to locate steady-state solutions. We also present a theory of approximation of the target function by our PSNN based on the neural network kernel decomposition. Numerical results are reported to show that our approach is robust in identifying the phase boundaries separating different regions in the parameter space corresponding to no solution or different numbers of solutions and in classifying the stability of solutions. These numerical results also validate our analysis. Although the primary focus in this study centers on steady states of parameterized dynamical systems, our approach is applicable generally to finding solutions for parameterized nonlinear systems of algebraic equations. Some potential improvements and future work are discussed. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.07514 [pdf, other]

Integrated and Lightweight Design of Electro-hydraulic Ankle Prosthesis

Authors: Yi Wei, Xingjian Wang, Xinyu Tian, Shao** Wang, Rujun Jia

Abstract: For lower limb amputees, an active ankle joint prosthesis can provide basic mobility functions. This study focuses on an ankle joint prosthesis system based on the principle of electric-hydraulic actuation. By analyzing the characteristics of human gait cycles and the mechanics of ankle joint movement, a lightweight and integrated ankle joint prosthesis is designed, considering the requirements fo… ▽ More For lower limb amputees, an active ankle joint prosthesis can provide basic mobility functions. This study focuses on an ankle joint prosthesis system based on the principle of electric-hydraulic actuation. By analyzing the characteristics of human gait cycles and the mechanics of ankle joint movement, a lightweight and integrated ankle joint prosthesis is designed, considering the requirements for normal ankle joint kinematics and dynamics. The components of the prosthesis are optimized through simulation and iterative improvements, while ensuring tight integration within minimal space. The design and simulation verification of the integrated lightweight prosthesis components are achieved. This research addresses the contradiction between the high output capability and the constraints on volume and weight in prosthetic devices. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 8 pages, 21 figures, conference

arXiv:2312.07257 [pdf, ps, other]

The generalized polar decomposition, the weak complementarity and the parallel sum for adjointable operators on Hilbert $C^*$-modules

Authors: Xiaofeng Zhang, Xiaoyi Tian, Qingxiang Xu

Abstract: This paper deals mainly with some aspects of the adjointable operators on Hilbert $C^*$-modules. A new tool called the generalized polar decomposition for each adjointable operator is introduced and clarified. As an application, the general theory of the weakly complementable operators is set up in the framework of Hilbert $C^*$-modules. It is proved that there exists an operator equation which ha… ▽ More This paper deals mainly with some aspects of the adjointable operators on Hilbert $C^*$-modules. A new tool called the generalized polar decomposition for each adjointable operator is introduced and clarified. As an application, the general theory of the weakly complementable operators is set up in the framework of Hilbert $C^*$-modules. It is proved that there exists an operator equation which has a unique solution, whereas this unique solution fails to be the reduced solution. Some investigations are also carried out in the Hilbert space case. It is proved that there exist a closed subspace $M$ of certain Hilbert space $K$ and an operator $T\in \mathbb{B}(K)$ such that $T$ is $(M,M)$-weakly complementable, whereas $T$ fails to be $(M,M)$-complementable. The solvability of the equation $$A:B=X^*AX+(I-X)^*B(I-X) \quad (X\in\mathbb{B}(H))$$ is also dealt with in the Hilbert space case, where $A,B\in \mathbb{B}(H)$ are two general positive operators, and $A:B$ denotes their parallel sum. Among other things, it is shown that there exist certain positive operators $A$ and $B$ on the Hilbert space $\ell^2(\mathbb{N})\oplus \ell^2(\mathbb{N})$ such that the above equation has no solution. △ Less

Submitted 24 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: This version is accepted for publication in Banach Journal of Mathematical Analysis

MSC Class: 46L08; 47A05

arXiv:2312.04877 [pdf, other]

Generating Explanations to Understand and Repair Embedding-based Entity Alignment

Authors: Xiaobin Tian, Zequn Sun, Wei Hu

Abstract: Entity alignment (EA) seeks identical entities in different knowledge graphs, which is a long-standing task in the database research. Recent work leverages deep learning to embed entities in vector space and align them via nearest neighbor search. Although embedding-based EA has gained marked success in recent years, it lacks explanations for alignment decisions. In this paper, we present the firs… ▽ More Entity alignment (EA) seeks identical entities in different knowledge graphs, which is a long-standing task in the database research. Recent work leverages deep learning to embed entities in vector space and align them via nearest neighbor search. Although embedding-based EA has gained marked success in recent years, it lacks explanations for alignment decisions. In this paper, we present the first framework that can generate explanations for understanding and repairing embedding-based EA results. Given an EA pair produced by an embedding model, we first compare its neighbor entities and relations to build a matching subgraph as a local explanation. We then construct an alignment dependency graph to understand the pair from an abstract perspective. Finally, we repair the pair by resolving three types of alignment conflicts based on dependency graphs. Experiments on a variety of EA datasets demonstrate the effectiveness, generalization, and robustness of our framework in explaining and repairing embedding-based EA results. △ Less

Submitted 21 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted in the 40th IEEE International Conference on Data Engineering (ICDE 2024)

arXiv:2312.01598 [pdf, other]

Good Questions Help Zero-Shot Image Reasoning

Authors: Kaiwen Yang, Tao Shen, Xinmei Tian, Xiubo Geng, Chongyang Tao, Dacheng Tao, Tianyi Zhou

Abstract: Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To add… ▽ More Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To address this challenge, we introduce Question-Driven Visual Exploration (QVix), a novel prompting strategy that enhances the exploratory capabilities of LVLMs in zero-shot reasoning tasks. QVix leverages LLMs' strong language prior to generate input-exploratory questions with more details than the original query, guiding LVLMs to explore visual content more comprehensively and uncover subtle or peripheral details. QVix enables a wider exploration of visual scenes, improving the LVLMs' reasoning accuracy and depth in tasks such as visual question answering and visual entailment. Our evaluations on various challenging zero-shot vision-language benchmarks, including ScienceQA and fine-grained visual classification, demonstrate that QVix significantly outperforms existing methods, highlighting its effectiveness in bridging the gap between complex visual data and LVLMs' exploratory abilities. △ Less

Submitted 8 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

arXiv:2312.01233 [pdf, ps, other]

doi 10.1016/j.laa.2024.02.010

The Frobenious distances from projections to an idempotent matrix

Authors: Xiaoyi Tian, Qingxiang Xu, Chunhong Fu

Abstract: For each pair of matrices $A$ and $B$ with the same order, let $\|A-B\|_F$ denote their Frobenius distance. This paper deals mainly with the Frobenius distances from projections to an idempotent matrix. For every idempotent $Q\in \mathbb{C}^{n\times n}$, a projection $m(Q)$ called the matched projection can be induced. It is proved that $m(Q)$ is the unique projection whose Frobenius distance away… ▽ More For each pair of matrices $A$ and $B$ with the same order, let $\|A-B\|_F$ denote their Frobenius distance. This paper deals mainly with the Frobenius distances from projections to an idempotent matrix. For every idempotent $Q\in \mathbb{C}^{n\times n}$, a projection $m(Q)$ called the matched projection can be induced. It is proved that $m(Q)$ is the unique projection whose Frobenius distance away from $Q$ takes the minimum value among all the Frobenius distances from projections to $Q$, while $I_n-m(Q)$ is the unique projection whose Frobenius distance away from $Q$ takes the maximum value. Furthermore, it is proved that for every number $α$ between the minimum value and the maximum value, there exists a projection $P$ whose Frobenius distance away from $Q$ takes the value $α$. Based on the above characterization of the minimum distance, some Frobenius norm upper bounds and lower bounds of $\|P-Q\|_F$ are derived under the condition of $PQ=Q$ on a projection $P$ and an idempotent $Q$. △ Less

Submitted 17 December, 2023; v1 submitted 2 December, 2023; originally announced December 2023.

Journal ref: Linear Algebra Appl. 688 (2024), 21--43

arXiv:2312.01165 [pdf, other]

Data-driven optimal control with neural network modeling of gradient flows

Authors: Xu** Tian, Baskar Ganapathysubramanian, Hailiang Liu

Abstract: Extracting physical laws from observation data is a central challenge in many diverse areas of science and engineering. We propose Optimal Control Neural Networks (OCN) to learn the laws of vector fields in dynamical systems, with no assumption on their analytical form, given data consisting of sampled trajectories. The OCN framework consists of a neural network representation and an optimal contr… ▽ More Extracting physical laws from observation data is a central challenge in many diverse areas of science and engineering. We propose Optimal Control Neural Networks (OCN) to learn the laws of vector fields in dynamical systems, with no assumption on their analytical form, given data consisting of sampled trajectories. The OCN framework consists of a neural network representation and an optimal control formulation. We provide error bounds for both the solution and the vector field. The bounds are shown to depend on both the training error and the time step between the observation data. We also demonstrate the effectiveness of OCN, as well as its generalization ability, by testing on several canonical systems, including the chaotic Lorenz system. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 28 pages, 8 figures

MSC Class: 93C15; 49K15

arXiv:2311.16494 [pdf, other]

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

Authors: Xinyu Tian, Shu Zou, Zhaoyuan Yang, **g Zhang

Abstract: Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts. We address this issue with Attribute-Guided Prompt Tuning (ArGue), making three key contributions. 1) In contrast to the conventional approach of directly appending soft prompts preceding class names, we align the model with p… ▽ More Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts. We address this issue with Attribute-Guided Prompt Tuning (ArGue), making three key contributions. 1) In contrast to the conventional approach of directly appending soft prompts preceding class names, we align the model with primitive visual attributes generated by Large Language Models (LLMs). We posit that a model's ability to express high confidence in these attributes signifies its capacity to discern the correct class rationales. 2) We introduce attribute sampling to eliminate disadvantageous attributes, thus only semantically meaningful attributes are preserved. 3) We propose negative prompting, explicitly enumerating class-agnostic attributes to activate spurious correlations and encourage the model to generate highly orthogonal probability distributions in relation to these negative features. In experiments, our method significantly outperforms current state-of-the-art prompt tuning methods on both novel class prediction and out-of-distribution generalization tasks. △ Less

Submitted 12 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted to CVPR2024

Showing 1–50 of 433 results for author: Tian, X