Search | arXiv e-print repository

DGNN: Decoupled Graph Neural Networks with Structural Consistency between Attribute and Graph Embedding Representations

Authors: **lu Wang, Jipeng Guo, Yanfeng Sun, Junbin Gao, Shaofan Wang, Yachao Yang, Baocai Yin

Abstract: Graph neural networks (GNNs) demonstrate a robust capability for representation learning on graphs with complex structures, showcasing superior performance in various applications. The majority of existing GNNs employ a graph convolution operation by using both attribute and structure information through coupled learning. In essence, GNNs, from an optimization perspective, seek to learn a consensu… ▽ More Graph neural networks (GNNs) demonstrate a robust capability for representation learning on graphs with complex structures, showcasing superior performance in various applications. The majority of existing GNNs employ a graph convolution operation by using both attribute and structure information through coupled learning. In essence, GNNs, from an optimization perspective, seek to learn a consensus and compromise embedding representation that balances attribute and graph information, selectively exploring and retaining valid information. To obtain a more comprehensive embedding representation of nodes, a novel GNNs framework, dubbed Decoupled Graph Neural Networks (DGNN), is introduced. DGNN explores distinctive embedding representations from the attribute and graph spaces by decoupled terms. Considering that semantic graph, constructed from attribute feature space, consists of different node connection information and provides enhancement for the topological graph, both topological and semantic graphs are combined for the embedding representation learning. Further, structural consistency among attribute embedding and graph embeddings is promoted to effectively remove redundant information and establish soft connection. This involves promoting factor sharing for adjacency reconstruction matrices, facilitating the exploration of a consensus and high-level correlation. Finally, a more powerful and complete representation is achieved through the concatenation of these embeddings. Experimental results conducted on several graph benchmark datasets verify its superiority in node classification task. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.14580 [pdf, other]

Design Your Own Universe: A Physics-Informed Agnostic Method for Enhancing Graph Neural Networks

Authors: Dai Shi, Andi Han, Lequan Lin, Yi Guo, Zhiyong Wang, Junbin Gao

Abstract: Physics-informed Graph Neural Networks have achieved remarkable performance in learning through graph-structured data by mitigating common GNN challenges such as over-smoothing, over-squashing, and heterophily adaption. Despite these advancements, the development of a simple yet effective paradigm that appropriately integrates previous methods for handling all these challenges is still underway. I… ▽ More Physics-informed Graph Neural Networks have achieved remarkable performance in learning through graph-structured data by mitigating common GNN challenges such as over-smoothing, over-squashing, and heterophily adaption. Despite these advancements, the development of a simple yet effective paradigm that appropriately integrates previous methods for handling all these challenges is still underway. In this paper, we draw an analogy between the propagation of GNNs and particle systems in physics, proposing a model-agnostic enhancement framework. This framework enriches the graph structure by introducing additional nodes and rewiring connections with both positive and negative weights, guided by node labeling information. We theoretically verify that GNNs enhanced through our approach can effectively circumvent the over-smoothing issue and exhibit robustness against over-squashing. Moreover, we conduct a spectral analysis on the rewired graph to demonstrate that the corresponding GNNs can fit both homophilic and heterophilic graphs. Empirical validations on benchmarks for homophilic, heterophilic graphs, and long-term graph datasets show that GNNs enhanced by our method significantly outperform their original counterparts. △ Less

Submitted 12 June, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.13986 [pdf, other]

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Authors: Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao

Abstract: Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent ac… ▽ More Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning . △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.08678

arXiv:2401.13366 [pdf, other]

Mitigating System Bias in Resource Constrained Asynchronous Federated Learning Systems

Authors: Jikun Gao, Ioannis Mavromatis, Peizheng Li, Pietro Carnelli, Aftab Khan

Abstract: Federated learning (FL) systems face performance challenges in dealing with heterogeneous devices and non-identically distributed data across clients. We propose a dynamic global model aggregation method within Asynchronous Federated Learning (AFL) deployments to address these issues. Our aggregation method scores and adjusts the weighting of client model updates based on their upload frequency to… ▽ More Federated learning (FL) systems face performance challenges in dealing with heterogeneous devices and non-identically distributed data across clients. We propose a dynamic global model aggregation method within Asynchronous Federated Learning (AFL) deployments to address these issues. Our aggregation method scores and adjusts the weighting of client model updates based on their upload frequency to accommodate differences in device capabilities. Additionally, we also immediately provide an updated global model to clients after they upload their local models to reduce idle time and improve training efficiency. We evaluate our approach within an AFL deployment consisting of 10 simulated clients with heterogeneous compute constraints and non-IID data. The simulation results, using the FashionMNIST dataset, demonstrate over 10% and 19% improvement in global model accuracy compared to state-of-the-art methods PAPAYA and FedAsync, respectively. Our dynamic aggregation method allows reliable global model training despite limiting client resources and statistical data heterogeneity. This improves robustness and scalability for real-world FL deployments. △ Less

Submitted 1 February, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures. This work has been accepted by PerCom PerconAI workshop 2024

arXiv:2401.12881 [pdf, other]

Computing Diameter+2 in Truly Subquadratic Time for Unit-Disk Graphs

Authors: Hsien-Chih Chang, Jie Gao, Hung Le

Abstract: Finding the diameter of a graph in general cannot be done in truly subquadratic assuming the Strong Exponential Time Hypothesis (SETH), even when the underlying graph is unweighted and sparse. When restricting to concrete classes of graphs and assuming SETH, planar graphs and minor-free graphs admit truly subquadratic algorithms, while geometric intersection graphs of unit balls, congruent equilat… ▽ More Finding the diameter of a graph in general cannot be done in truly subquadratic assuming the Strong Exponential Time Hypothesis (SETH), even when the underlying graph is unweighted and sparse. When restricting to concrete classes of graphs and assuming SETH, planar graphs and minor-free graphs admit truly subquadratic algorithms, while geometric intersection graphs of unit balls, congruent equilateral triangles, and unit segments do not. Unit-disk graphs are one of the major open cases where the complexity of diameter computation remains unknown. More generally, it is conjectured that a truly subquadratic time algorithm exists for pseudo-disk graphs. In this paper, we show a truly subquadratic algorithm of running time $\tilde{O}(n^{2-1/18})$, for finding the diameter in a unit-disk graph, whose output differs from the optimal solution by at most 2. This is the first algorithm that provides an additive guarantee in distortion, independent of the size or the diameter of the graph. Our algorithm requires two important technical elements. First, we show that for the intersection graph of pseudo-disks, the graph VC-dimension, either of $k$-hop balls or the distance encoding vectors, is 4. This contracts to the VC dimension of the pseudo-disks themselves as geometric ranges (which is known to be 3). Second, we introduce a clique-based $r$-clustering for geometric intersection graphs, which is an analog of the $r$-division construction for planar graphs. We also showcase the new techniques by establishing new results for distance oracles for unit-disk graphs with subquadratic storage and $O(1)$ query time. The results naturally extend to unit $L_1$ or $L_\infty$-disks and fat pseudo-disks of similar size. Last, if the pseudo-disks additionally have bounded ply, we have a truly subquadratic algorithm to find the exact diameter. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 28 pages, 7 figures

arXiv:2401.12813 [pdf, other]

Bayesian parameter estimation of massive black hole binaries with TianQin-LISA

Authors: Jie Gao, Yi-Ming Hu, En-Kun Li, Jian-dong Zhang, Jianwei Mei

Abstract: This paper analyses the impact of various parameter changes on the estimation of parameters for massive black hole binary (MBHB) systems using a Bayesian inference technique. Several designed MBHB systems were chosen for comparison with a fiducial system to explore the influence of parameters such as sky location, inclination angle, anti-spin, large mass ratio and light mass. And the two reported… ▽ More This paper analyses the impact of various parameter changes on the estimation of parameters for massive black hole binary (MBHB) systems using a Bayesian inference technique. Several designed MBHB systems were chosen for comparison with a fiducial system to explore the influence of parameters such as sky location, inclination angle, anti-spin, large mass ratio and light mass. And the two reported MBHB candidates named OJ287 and Tick-Tock are also considered. The study found that the network of TianQin and LISA can break certain degeneracies among different parameters, improving the estimation of parameters, particularly for extrinsic parameters. Meanwhile, the degeneracies between different intrinsic parameters are highly sensitive to the value of the parameters. Additionally, the small inclination angles and limited detection of the inspiral phase can introduce significant bias in the estimation of parameters. The presence of instrument noise will also introduce bias and worsen the precision. The paper concludes that the network of TianQin and LISA can significantly improve the estimation of extrinsic parameters by about one order of magnitude while yielding slight improvements in the intrinsic parameters. Moreover, parameter estimation can still be subject to biases even with a sufficiently high signal-to-noise ratio if the detected signal does not encompass all stages of the inspiral, merger, and ringdown. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 17 pages, 10 figures

arXiv:2401.12137 [pdf, ps, other]

Generalized Minkowski formulas and rigidity results for anisotropic capillary hypersurfaces

Authors: **yu Gao, Guanghan Li

Abstract: We show the generalization of Hsiung-Minkowski integral formula for anisotropic capillary hypersurfaces in the half-space, which includes the weighted Hsiung-Minkowski formula and classical anisotropic Minkowski identity for closed hypersurfaces as special cases. As applications, we prove some anisotropic Alexandrov-type theorems and rigidity results for anisotropic capillary hypersurfaces. Specia… ▽ More We show the generalization of Hsiung-Minkowski integral formula for anisotropic capillary hypersurfaces in the half-space, which includes the weighted Hsiung-Minkowski formula and classical anisotropic Minkowski identity for closed hypersurfaces as special cases. As applications, we prove some anisotropic Alexandrov-type theorems and rigidity results for anisotropic capillary hypersurfaces. Specially, the uniqueness of the solution to the anisotropic Orlicz-Christoffel-Minkowski problem is obtained. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 16 pages

MSC Class: 53A10; 53C24; 53C40

arXiv:2401.10820 [pdf, other]

Help Me Reflect: Leveraging Self-Reflection Interface Nudges to Enhance Deliberativeness on Online Deliberation Platforms

Authors: Shun Yi Yeo, Gionnieve Lim, Jie Gao, Weiyu Zhang, Simon Tangi Perrault

Abstract: The deliberative potential of online platforms has been widely examined. However, little is known about how various interface-based reflection nudges impact the quality of deliberation. This paper presents two user studies with 12 and 120 participants, respectively, to investigate the impacts of different reflective nudges on the quality of deliberation. In the first study, we examined five distin… ▽ More The deliberative potential of online platforms has been widely examined. However, little is known about how various interface-based reflection nudges impact the quality of deliberation. This paper presents two user studies with 12 and 120 participants, respectively, to investigate the impacts of different reflective nudges on the quality of deliberation. In the first study, we examined five distinct reflective nudges: persona, temporal prompts, analogies and metaphors, cultural prompts and storytelling. Persona, temporal prompts, and storytelling emerged as the preferred nudges for implementation on online deliberation platforms. In the second study, we assess the impacts of these preferred reflectors more thoroughly. Results revealed a significant positive impact of these reflectors on deliberative quality. Specifically, persona promotes a deliberative environment for balanced and opinionated viewpoints while temporal prompts promote more individualised viewpoints. Our findings suggest that the choice of reflectors can significantly influence the dynamics and shape the nature of online discussions. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.10530 [pdf, other]

doi 10.1109/TGRS.2024.3356492

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images

Authors: Junyu Gao, Liangliang Zhao, Xuelong Li

Abstract: Object counting is a hot topic in computer vision, which aims to estimate the number of objects in a given image. However, most methods only count objects of a single category for an image, which cannot be applied to scenes that need to count objects with multiple categories simultaneously, especially in aerial scenes. To this end, this paper introduces a Multi-category Object Counting (MOC) task… ▽ More Object counting is a hot topic in computer vision, which aims to estimate the number of objects in a given image. However, most methods only count objects of a single category for an image, which cannot be applied to scenes that need to count objects with multiple categories simultaneously, especially in aerial scenes. To this end, this paper introduces a Multi-category Object Counting (MOC) task to estimate the numbers of different objects (cars, buildings, ships, etc.) in an aerial image. Considering the absence of a dataset for this task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3,416 scenes with a resolution of 1024 $\times$ 1024 pixels, and well-annotated using 14 fine-grained object categories. Besides, each scene contains RGB and Near Infrared (NIR) images, of which the NIR spectrum can provide richer characterization information compared with only the RGB spectrum. Based on NWPU-MOC, the paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR and subsequently regress multi-channel density maps corresponding to each object category. In addition, to modeling the dependency between different channels in the density map with each object category, a spatial contrast loss is designed as a penalty for overlap** predictions at the same spatial position. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared with some mainstream counting algorithms. The dataset, code and models are publicly available at https://github.com/lyongo/NWPU-MOC. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.08119 [pdf, other]

SpecSTG: A Fast Spectral Diffusion Framework for Probabilistic Spatio-Temporal Traffic Forecasting

Authors: Lequan Lin, Dai Shi, Andi Han, Junbin Gao

Abstract: Traffic forecasting, a crucial application of spatio-temporal graph (STG) learning, has traditionally relied on deterministic models for accurate point estimations. Yet, these models fall short of identifying latent risks of unexpected volatility in future observations. To address this gap, probabilistic methods, especially variants of diffusion models, have emerged as uncertainty-aware solutions.… ▽ More Traffic forecasting, a crucial application of spatio-temporal graph (STG) learning, has traditionally relied on deterministic models for accurate point estimations. Yet, these models fall short of identifying latent risks of unexpected volatility in future observations. To address this gap, probabilistic methods, especially variants of diffusion models, have emerged as uncertainty-aware solutions. However, existing diffusion methods typically focus on generating separate future time series for individual sensors in the traffic network, resulting in insufficient involvement of spatial network characteristics in the probabilistic learning process. To better leverage spatial dependencies and systematic patterns inherent in traffic data, we propose SpecSTG, a novel spectral diffusion framework. Our method generates the Fourier representation of future time series, transforming the learning process into the spectral domain enriched with spatial information. Additionally, our approach incorporates a fast spectral graph convolution designed for Fourier input, alleviating the computational burden associated with existing models. Numerical experiments show that SpecSTG achieves outstanding performance with traffic flow and traffic speed datasets compared to state-of-the-art baselines. The source code for SpecSTG is available at https://anonymous.4open.science/r/SpecSTG. △ Less

Submitted 23 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.08045 [pdf, other]

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Authors: Xu Yan, Haiming Zhang, Yingjie Cai, **gming Guo, Weichao Qiu, Bin Gao, Kaiqiang Zhou, Yue Zhao, Huan **, Jiantao Gao, Zhen Li, Lihui Jiang, Wei Zhang, Hongbo Zhang, Dengxin Dai, Bingbing Liu

Abstract: The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains chal… ▽ More The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Github Repo: https://github.com/zhanghm1995/Forge_VFM4AD

arXiv:2401.06374 [pdf, other]

SamLP: A Customized Segment Anything Model for License Plate Detection

Authors: Haoxuan Ding, Junyu Gao, Yuan Yuan, Qi Wang

Abstract: With the emergence of foundation model, this novel paradigm of deep learning has encouraged many powerful achievements in natural language processing and computer vision. There are many advantages of foundation model, such as excellent feature extraction power, mighty generalization ability, great few-shot and zero-shot learning capacity, etc. which are beneficial to vision tasks. As the unique id… ▽ More With the emergence of foundation model, this novel paradigm of deep learning has encouraged many powerful achievements in natural language processing and computer vision. There are many advantages of foundation model, such as excellent feature extraction power, mighty generalization ability, great few-shot and zero-shot learning capacity, etc. which are beneficial to vision tasks. As the unique identity of vehicle, different countries and regions have diverse license plate (LP) styles and appearances, and even different types of vehicles have different LPs. However, recent deep learning based license plate detectors are mainly trained on specific datasets, and these limited datasets constrain the effectiveness and robustness of LP detectors. To alleviate the negative impact of limited data, an attempt to exploit the advantages of foundation model is implement in this paper. We customize a vision foundation model, i.e. Segment Anything Model (SAM), for LP detection task and propose the first LP detector based on vision foundation model, named SamLP. Specifically, we design a Low-Rank Adaptation (LoRA) fine-tuning strategy to inject extra parameters into SAM and transfer SAM into LP detection task. And then, we further propose a promptable fine-tuning step to provide SamLP with prompatable segmentation capacity. The experiments show that our proposed SamLP achieves promising detection performance compared to other LP detectors. Meanwhile, the proposed SamLP has great few-shot and zero-shot learning ability, which shows the potential of transferring vision foundation model. The code is available at https://github.com/Dinghaoxuan/SamLP △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05561 [pdf, other]

TrustLLM: Trustworthiness in Large Language Models

Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness. △ Less

Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: This work is still under work and we welcome your contribution

arXiv:2401.03778 [pdf, other]

Evolved Massive Stars at Low-metallicity VI. Mass-Loss Rate of Red Supergiant Stars in the Large Magellanic Cloud

Authors: **g Wen, Jian Gao, Ming Yang, Bingqiu Chen, Yi Ren, Tianding Wang, Biwei Jiang

Abstract: Mass loss is a crucial process that affects the observational properties, evolution path and fate of highly evolved stars. However, the mechanism of mass loss is still unclear, and the mass-loss rate (MLR) of red supergiant stars (RSGs) requires further research and precise evaluation. To address this, we utilized an updated and complete sample of RSGs in the Large Magellanic Cloud (LMC) and emplo… ▽ More Mass loss is a crucial process that affects the observational properties, evolution path and fate of highly evolved stars. However, the mechanism of mass loss is still unclear, and the mass-loss rate (MLR) of red supergiant stars (RSGs) requires further research and precise evaluation. To address this, we utilized an updated and complete sample of RSGs in the Large Magellanic Cloud (LMC) and employed the 2-DUST radiation transfer model and spectral energy distribution (SED) fitting approach to determine the dust-production rates (DPRs) and dust properties of the RSGs. We have fitted 4,714 selected RSGs with over 100,000 theoretical templates of evolved stars. Our results show that the DPR range of RSGs in the LMC is $10^{-11}\, \rm{M_{\odot}\, yr^{-1}}$ to $10^{-7}\, \rm{M_{\odot}\, yr^{-1}}$, and the total DPR of all RSGs is 1.14 $\times 10^{-6} \, \rm{M_{\odot} \, yr^{-1}}$. We find that $63.3\%$ RSGs are oxygen-rich, and they account for $97.2\%$ of the total DPR. The optically thin RSG, which comprise $30.6\%$ of our sample, contribute only $0.1\%$ of the total DPR, while carbon-rich RSGs ($6.1\%$) produce $2.7\%$ of the total DPR. Overall, 208 RSGs contributed $76.6\%$ of the total DPR. We have established a new relationship between the MLR and luminosity of RSGs in the LMC, which exhibits a positive trend and a clear turning point at $\log{L/L_{\odot}} \approx 4.4$. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted for publication in AJ

arXiv:2401.03568 [pdf, other]

Agent AI: Surveying the Horizons of Multimodal Interaction

Authors: Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Ye** Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

Abstract: Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the a… ▽ More Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by develo** agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment. △ Less

Submitted 25 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.02992 [pdf]

Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis

Authors: Jiahui Peng, **g Gao, Xin Tong, **g Guo, Hang Yang, Jianchuan Qi, Ruiqiao Li, Nan Li, Ming Xu

Abstract: In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable for… ▽ More In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.02781 [pdf, other]

doi 10.1103/PhysRevLett.132.261903

Simultaneous Determination of Fragmentation Functions and Test on Momentum Sum Rule

Authors: Jun Gao, ChongYang Liu, XiaoMin Shen, Hongxi Xing, Yuxiang Zhao

Abstract: We perform a simultaneous global analysis of hadron fragmentation functions (FFs) to various charged hadrons at next-to-leading order in QCD. The world data set includes results from electron-positron single-inclusive annihilation, semi-inclusive deep inelastic scattering, as well as proton-proton collisions including jet fragmentation measurements which lead to strong constraints on the gluon fra… ▽ More We perform a simultaneous global analysis of hadron fragmentation functions (FFs) to various charged hadrons at next-to-leading order in QCD. The world data set includes results from electron-positron single-inclusive annihilation, semi-inclusive deep inelastic scattering, as well as proton-proton collisions including jet fragmentation measurements which lead to strong constraints on the gluon fragmentations. By carefully selecting hadron kinematics to ensure the validity of QCD factorization and the convergence of perturbative calculations, we achieve a satisfying best fit with $χ^2/$d.o.f.$=0.90$, in the simultaneous extraction of FFs for light charged hadrons ($π^{\pm}$, $K^{\pm}$ and $p/\bar{p}$). The total momentum of $u$, $d$ quarks and gluon carried by light charged hadrons have been determined precisely. That urges future precision measurements on fragmentation to neutral hadrons, which are crucial for the test of fundamental sum rules in QCD fragmentation. △ Less

Submitted 29 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: published version; link to FF grids provided

Journal ref: Phys.Rev.Lett. 132 (2024) 26, 261903

arXiv:2401.02458 [pdf, other]

Data-Centric Foundation Models in Computational Healthcare: A Survey

Authors: Yunkun Zhang, ** Gao, Zheling Tan, Lingfeng Zhou, Kexin Ding, Mu Zhou, Shaoting Zhang, Dequan Wang

Abstract: The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinica… ▽ More The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinical data records has been a longstanding challenge, ranging from data quantity, annotation, patient privacy, and ethics. In this survey, we investigate a wide range of data-centric approaches in the FM era (from model pre-training to inference) towards improving the healthcare workflow. We discuss key perspectives in AI security, assessment, and alignment with human values. Finally, we offer a promising outlook of FM-based analytics to enhance the performance of patient outcome and clinical workflow in the evolving landscape of healthcare and medicine. We provide an up-to-date list of healthcare-related foundation models and datasets at https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare . △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.02099 [pdf]

Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

Authors: Zeyu Li, Suncheng Xiang, Tong Yu, **gsheng Gao, Jiacheng Ruan, Yan** Hu, Ting Liu, Yuzhuo Fu

Abstract: The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audi… ▽ More The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audio data and predict the vessel type. The current UATR dataset exhibits shortcomings in both duration and sample quantity. In this paper, we propose Oceanship, a large-scale and diverse underwater audio dataset. This dataset comprises 15 categories, spans a total duration of 121 hours, and includes comprehensive annotation information such as coordinates, velocity, vessel types, and timestamps. We compiled the dataset by crawling and organizing original communication data from the Ocean Communication Network (ONC) database between 2021 and 2022. While audio retrieval tasks are well-established in general audio classification, they have not been explored in the context of underwater audio recognition. Leveraging the Oceanship dataset, we introduce a baseline model named Oceannet for underwater audio retrieval. This model achieves a recall at 1 (R@1) accuracy of 67.11% and a recall at 5 (R@5) accuracy of 99.13% on the Deepship dataset. △ Less

Submitted 10 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by ICIC 2024

arXiv:2401.01600 [pdf, other]

PLLaMa: An Open-source Large Language Model for Plant Science

Authors: Xianjun Yang, Junfeng Gao, Wenxin Xue, Erik Alexandersson

Abstract: Large Language Models (LLMs) have exhibited remarkable capabilities in understanding and interacting with natural language across various sectors. However, their effectiveness is limited in specialized areas requiring high accuracy, such as plant science, due to a lack of specific expertise in these fields. This paper introduces PLLaMa, an open-source language model that evolved from LLaMa-2. It's… ▽ More Large Language Models (LLMs) have exhibited remarkable capabilities in understanding and interacting with natural language across various sectors. However, their effectiveness is limited in specialized areas requiring high accuracy, such as plant science, due to a lack of specific expertise in these fields. This paper introduces PLLaMa, an open-source language model that evolved from LLaMa-2. It's enhanced with a comprehensive database, comprising more than 1.5 million scholarly articles in plant science. This development significantly enriches PLLaMa with extensive knowledge and proficiency in plant and agricultural sciences. Our initial tests, involving specific datasets related to plants and agriculture, show that PLLaMa substantially improves its understanding of plant science-related topics. Moreover, we have formed an international panel of professionals, including plant scientists, agricultural engineers, and plant breeders. This team plays a crucial role in verifying the accuracy of PLLaMa's responses to various academic inquiries, ensuring its effective and reliable application in the field. To support further research and development, we have made the model's checkpoints and source codes accessible to the scientific community. These resources are available for download at \url{https://github.com/Xianjun-Yang/PLLaMa}. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Work in progress

arXiv:2312.17493 [pdf, other]

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Authors: Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Matt White, Meikang Qiu

Abstract: The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, al… ▽ More The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead. △ Less

Submitted 2 June, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 21 pages, 1 figure, 19 tables

arXiv:2312.17030 [pdf, other]

Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation

Authors: Jiacheng Ruan, **gsheng Gao, Mingye Xie, Suncheng Xiang

Abstract: Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) ba… ▽ More Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2210.14007

arXiv:2312.15224 [pdf, other]

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Authors: Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, Yu Wang

Abstract: AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal in… ▽ More AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while kee** real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications. △ Less

Submitted 9 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: This paper is accpeted by AAMAS 2024. More demonstrations can be seen on our website https://sites.google.com/view/overcooked-hla/

arXiv:2312.14455 [pdf, other]

doi 10.1103/PhysRevX.14.011046

Evidence for an Excitonic Insulator State in Ta$_2$Pd$_3$Te$_5$

Authors: Jierui Huang, Bei Jiang, **gyu Yao, Dayu Yan, Xincheng Lei, Jiacheng Gao, Zhaopeng Guo, Feng **, Yupeng Li, Zhenyu Yuan, Congcong Chai, Haohao Sheng, Mojun Pan, Famin Chen, Junde Liu, Shunye Gao, Gexing Qu, Bo Liu, Zhicheng Jiang, Zhengtai Liu, Xiaoyan Ma, Shiming Zhou, Yaobo Huang, Chenxia Yun, Qingming Zhang , et al. (8 additional authors not shown)

Abstract: The excitonic insulator (EI) is an exotic ground state of narrow-gap semiconductors and semimetals arising from spontaneous condensation of electron-hole pairs bound by attractive Coulomb interaction. Despite research on EIs dating back to half a century ago, their existence in real materials remains a subject of ongoing debate. In this study, through systematic experimental and theoretical invest… ▽ More The excitonic insulator (EI) is an exotic ground state of narrow-gap semiconductors and semimetals arising from spontaneous condensation of electron-hole pairs bound by attractive Coulomb interaction. Despite research on EIs dating back to half a century ago, their existence in real materials remains a subject of ongoing debate. In this study, through systematic experimental and theoretical investigations, we provide evidence for the existence of an EI ground state in a van der Waals compound Ta$_2$Pd$_3$Te$_5$. Density-functional-theory calculations suggest that it is a semimetal with a small band overlap, whereas various experiments exhibit an insulating ground state with a clear band gap. Upon incorporating electron-hole Coulomb interaction into our calculations, we obtain an EI phase where the electronic symmetry breaking opens a many-body gap. Angle-resolved photoemission spectroscopy measurements exhibit that the band gap is closed with a significant change in the dispersions as the number of thermally excited charge carriers becomes sufficiently large in both equilibrium and nonequilibrium states. Structural measurements reveal a slight breaking of crystal symmetry with exceptionally small lattice distortion in the insulating state, which cannot account for the significant gap opening. Therefore, we attribute the insulating ground state with a gap opening in Ta$_2$Pd$_3$Te$_5$ to exciton condensation, where the coupling to the symmetry-breaking electronic state induces a subtle change in the crystal structure. △ Less

Submitted 14 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 10 pages, 5 figures

Journal ref: Phys. Rev. X 14, 011046, 2024

arXiv:2312.13877 [pdf, other]

A complete continuous-variable quantum computation architecture: from cluster state generation to fault-tolerant accomplishment

Authors: Peilin Du, **g Zhang, Tiancai Zhang, Rongguo Yang, Jiangrui Gao

Abstract: Continuous-variable measurement-based quantum computation, which requires deterministically generated large-scale cluster state, is a promising candidate for practical, scalable, universal, and fault-tolerant quantum computation. In this work, a complete architecture including cluster state preparation, gate implementations, and error correction, is demonstrated. First, a scheme for generating two… ▽ More Continuous-variable measurement-based quantum computation, which requires deterministically generated large-scale cluster state, is a promising candidate for practical, scalable, universal, and fault-tolerant quantum computation. In this work, a complete architecture including cluster state preparation, gate implementations, and error correction, is demonstrated. First, a scheme for generating two-dimensional large-scale continuous-variable cluster state by multiplexing both the temporal and spatial domains is proposed. Then, the corresponding gate implementations for universal quantum computation by gate teleportation are discussed and the actual gate noise from the generated cluster state and Gottesman-Kitaev-Preskill (GKP) state are considered. After that, the quantum error correction can be further achieved by utilizing the square-lattice GKP code. Finally, a fault-tolerent quantum computation can be realized by introducing bias into the square-lattice GKP code (to protect against phase-flips) and concatenating a classical repetition code (to handle the residual bit-flip errors), with a squeezing threshold of 12.3 dB. Our work provides a possible option for a complete fault-tolerent quantum computation architecture in the future. △ Less

Submitted 31 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 12 pages,12 figures

arXiv:2312.13752 [pdf]

doi 10.1016/j.media.2024.103253

Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Wei** Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, **yu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers. △ Less

Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 19 pages

arXiv:2312.13183 [pdf, other]

A framework for stable spectral methods in $d$-dimensional unit balls

Authors: **g Gao, Arieh Iserles

Abstract: The subject of this paper is the design of efficient and stable spectral methods for time-dependent partial differential equations in unit balls. We commence by sketching the desired features of a spectral method, which is defined by a choice of an orthonormal basis acting in the spatial domain. We continue by considering in detail the choice of a $W$-function basis in a disc in $\mathbb{R}^2$. Th… ▽ More The subject of this paper is the design of efficient and stable spectral methods for time-dependent partial differential equations in unit balls. We commence by sketching the desired features of a spectral method, which is defined by a choice of an orthonormal basis acting in the spatial domain. We continue by considering in detail the choice of a $W$-function basis in a disc in $\mathbb{R}^2$. This is a nontrivial issue because of a clash between two objectives: skew symmetry of the differentiation matrix (which ensures inter alia that the method is stable) and the correct behaviour at the origin. We resolve it by representing the underlying space as an affine space and splitting the underlying functions. This is generalised to any dimension $d \geq 2$ in a natural manner and the paper is concluded with numerical examples that demonstrate how our choice of basis attains the best outcome out of a number of alternatives. △ Less

Submitted 20 December, 2023; originally announced December 2023.

MSC Class: 65M70; 42C05

arXiv:2312.12970 [pdf, other]

D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer

Authors: Junjie Gao, Pengfei Wang, Qiujie Dong, Qiong Zeng, Shiqing Xin, Caiming Zhang

Abstract: Establishing accurate and representative matches is a crucial step in addressing the point cloud registration problem. A commonly employed approach involves detecting keypoints with salient geometric features and subsequently map** these keypoints from one frame of the point cloud to another. However, methods within this category are hampered by the repeatability of the sampled keypoints. In thi… ▽ More Establishing accurate and representative matches is a crucial step in addressing the point cloud registration problem. A commonly employed approach involves detecting keypoints with salient geometric features and subsequently map** these keypoints from one frame of the point cloud to another. However, methods within this category are hampered by the repeatability of the sampled keypoints. In this paper, we introduce a saliency-guided trans\textbf{former}, referred to as \textit{D3Former}, which entails the joint learning of repeatable \textbf{D}ense \textbf{D}etectors and feature-enhanced \textbf{D}escriptors. The model comprises a Feature Enhancement Descriptor Learning (FEDL) module and a Repetitive Keypoints Detector Learning (RKDL) module. The FEDL module utilizes a region attention mechanism to enhance feature distinctiveness, while the RKDL module focuses on detecting repeatable keypoints to enhance matching capabilities. Extensive experimental results on challenging indoor and outdoor benchmarks demonstrate that our proposed method consistently outperforms state-of-the-art point cloud matching methods. Notably, tests on 3DLoMatch, even with a low overlap ratio, show that our method consistently outperforms recently published approaches such as RoReg and RoITr. For instance, with the number of extracted keypoints reduced to 250, the registration recall scores for RoReg, RoITr, and our method are 64.3\%, 73.6\%, and 76.5\%, respectively. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 15 pages, 6 figures

arXiv:2312.12907 [pdf, ps, other]

doi 10.1103/PhysRevD.109.092001

Solar neutrino measurements using the full data period of Super-Kamiokande-IV

Authors: Super-Kamiokande Collaboration, :, K. Abe, C. Bronner, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, S. Imaizumi, K. Iyogi, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, Y. Kato, Y. Kishimoto, S. Miki, S. Mine, M. Miura, T. Mochizuki, S. Moriyama, Y. Nagao, M. Nakahata , et al. (305 additional authors not shown)

Abstract: An analysis of solar neutrino data from the fourth phase of Super-Kamiokande~(SK-IV) from October 2008 to May 2018 is performed and the results are presented. The observation time of the data set of SK-IV corresponds to $2970$~days and the total live time for all four phases is $5805$~days. For more precise solar neutrino measurements, several improvements are applied in this analysis: lowering th… ▽ More An analysis of solar neutrino data from the fourth phase of Super-Kamiokande~(SK-IV) from October 2008 to May 2018 is performed and the results are presented. The observation time of the data set of SK-IV corresponds to $2970$~days and the total live time for all four phases is $5805$~days. For more precise solar neutrino measurements, several improvements are applied in this analysis: lowering the data acquisition threshold in May 2015, further reduction of the spallation background using neutron clustering events, precise energy reconstruction considering the time variation of the PMT gain. The observed number of solar neutrino events in $3.49$--$19.49$ MeV electron kinetic energy region during SK-IV is $65,443^{+390}_{-388}\,(\mathrm{stat.})\pm 925\,(\mathrm{syst.})$ events. Corresponding $\mathrm{^{8}B}$ solar neutrino flux is $(2.314 \pm 0.014\, \rm{(stat.)} \pm 0.040 \, \rm{(syst.)}) \times 10^{6}~\mathrm{cm^{-2}\,s^{-1}}$, assuming a pure electron-neutrino flavor component without neutrino oscillations. The flux combined with all SK phases up to SK-IV is $(2.336 \pm 0.011\, \rm{(stat.)} \pm 0.043 \, \rm{(syst.)}) \times 10^{6}~\mathrm{cm^{-2}\,s^{-1}}$. Based on the neutrino oscillation analysis from all solar experiments, including the SK $5805$~days data set, the best-fit neutrino oscillation parameters are $\rm{sin^{2} θ_{12,\,solar}} = 0.306 \pm 0.013 $ and $Δm^{2}_{21,\,\mathrm{solar}} = (6.10^{+ 0.95}_{-0.81}) \times 10^{-5}~\rm{eV}^{2}$, with a deviation of about 1.5$σ$ from the $Δm^{2}_{21}$ parameter obtained by KamLAND. The best-fit neutrino oscillation parameters obtained from all solar experiments and KamLAND are $\sin^{2} θ_{12,\,\mathrm{global}} = 0.307 \pm 0.012 $ and $Δm^{2}_{21,\,\mathrm{global}} = (7.50^{+ 0.19}_{-0.18}) \times 10^{-5}~\rm{eV}^{2}$. △ Less

Submitted 20 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 47 pages, 61 figures

Journal ref: Phys. Rev. D 109, 092001 (2024)

arXiv:2312.11829 [pdf, other]

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Authors: Haiming Zhang, Xu Yan, Dongfeng Bai, Jiantao Gao, Pan Wang, Bingbing Liu, Shuguang Cui, Zhen Li

Abstract: 3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. However, image-based scene perception encounters significant challenges in achieving accurate prediction due to the absence of geometric priors. In this paper, we address this issue by exploring cross-modal knowledge distillation in this task, i.e., we leverage… ▽ More 3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. However, image-based scene perception encounters significant challenges in achieving accurate prediction due to the absence of geometric priors. In this paper, we address this issue by exploring cross-modal knowledge distillation in this task, i.e., we leverage a stronger multi-modal model to guide the visual model during training. In practice, we observe that directly applying features or logits alignment, proposed and widely used in bird's-eyeview (BEV) perception, does not yield satisfactory results. To overcome this problem, we introduce RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction. By employing differentiable volume rendering, we generate depth and semantic maps in perspective views and propose two novel consistency criteria between the rendered outputs of teacher and student models. Specifically, the depth consistency loss aligns the termination distributions of the rendered rays, while the semantic consistency loss mimics the intra-segment similarity guided by vision foundation models (VLMs). Experimental results on the nuScenes dataset demonstrate the effectiveness of our proposed method in improving various 3D occupancy prediction approaches, e.g., our proposed methodology enhances our baseline by 2.2% in the metric of mIoU and achieves 50% in Occ3D benchmark. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.11569 [pdf]

doi 10.5281/zenodo.10500601

Application of AI in Nutrition

Authors: Ritu Ramakrishnan, Tianxiang Xing, Tianfeng Chen, Ming-Hao Lee, **zhu Gao

Abstract: In healthcare, artificial intelligence (AI) has been changing the way doctors and health experts take care of people. This paper will cover how AI is making major changes in the health care system, especially with nutrition. Various machine learning and deep learning algorithms have been developed to extract valuable information from healthcare data which help doctors, nutritionists, and health ex… ▽ More In healthcare, artificial intelligence (AI) has been changing the way doctors and health experts take care of people. This paper will cover how AI is making major changes in the health care system, especially with nutrition. Various machine learning and deep learning algorithms have been developed to extract valuable information from healthcare data which help doctors, nutritionists, and health experts to make better decisions and make our lifestyle healthy. This paper provides an overview of the current state of AI applications in healthcare with a focus on the utilization of AI-driven recommender systems in nutrition. It will discuss the positive outcomes and challenges that arise when AI is used in this field. This paper addresses the challenges to develop AI recommender systems in healthcare, providing a well-rounded perspective on the complexities. Real-world examples and research findings are presented to underscore the tangible and significant impact AI recommender systems have in the field of healthcare, particularly in nutrition. The ongoing efforts of applying AI in nutrition lay the groundwork for a future where personalized recommendations play a pivotal role in guiding individuals toward healthier lifestyles. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Journal ref: Journal of Advances in Information Science and Technology, Volume 1, Issue 1, 2023, Pages 7-12

arXiv:2312.11460 [pdf, other]

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

Authors: Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, Jiangmiao Pang

Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu… ▽ More Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability. △ Less

Submitted 1 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

arXiv:2312.11370 [pdf, other]

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Authors: Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

Abstract: Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable… ▽ More Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable LLMs to solve geometric problems by understanding image input. We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships. To overcome these challenges, we take advantage of the unique characteristics of geometric problems (such as unique geometric logical form, and geometric scalability) and the capacity of the textual LLMs to build an enriched multimodal geometry dataset based on existing data. The augmented dataset, Geo170K, contains more than 170K geometric image-caption and question-answer pairs. Utilizing our constructed Geo170K dataset, we develop G-LLaVA, which demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 10 pages

arXiv:2312.11051 [pdf, other]

doi 10.1109/LRA.2023.3325715

Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking

Authors: Shihao Feng, Pengpeng Liang, ** Gao, Erkang Cheng

Abstract: Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this… ▽ More Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this paper, we present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage based on sparse pillars. More specifically, in each stage, self-attention is first applied to each branch separately to capture the non-local context information. Then, cross-attention is used to inject the template information into the search area. This strategy allows the feature learning of the search area to be aware of the template while kee** the individual characteristics of the template intact. To enable the network to easily preserve the information learned at different stages and ease the optimization, for the search area, we densely connect the initial input sparse pillars and the output of each stage to all subsequent stages and the target localization network, which converts pillars to bird's eye view (BEV) feature maps and predicts the state of the target with a small densely connected convolution network. Deep supervision is added to each stage to further boost the performance as well. The proposed algorithm is evaluated on the popular KITTI, nuScenes, and Waymo datasets, and the experimental results show that our method achieves promising performance compared with the state-of-the-art. Ablation study that shows the effectiveness of each component is provided as well. Code is available at https://github.com/liangp/MCSTN-3DSOT. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Preprint version for IEEE Robotics and Automation Letters (RAL)

Journal ref: IEEE Robotics and Automation Letters (RAL), vol. 8, no. 12, pp. 8066-8073, 2023

arXiv:2312.10704 [pdf, ps, other]

A m-weak group inverse for rectangular matrices

Authors: Jiale Gao, Kezheng Zuo, Qing-wen Wang

Abstract: The purpose of this paper is to extend the definition of the m-weak group inverse from a square matrix to a rectangular matrix, called the W-weighted m-weak group inverse. This new generalized inverse is also a generalization of the weak group inverse, generalized group inverse, Drazin inverse, weighted weak group inverse and W-weighted Drazin inverse. Furthermore, we discuss some properties, char… ▽ More The purpose of this paper is to extend the definition of the m-weak group inverse from a square matrix to a rectangular matrix, called the W-weighted m-weak group inverse. This new generalized inverse is also a generalization of the weak group inverse, generalized group inverse, Drazin inverse, weighted weak group inverse and W-weighted Drazin inverse. Furthermore, we discuss some properties, characterizations and representations of the W-weighted m-weak group inverse, as well as its applications in solving the matrix equation. Some of the results available in the paper not only recover a few results of the m-weak group inverse, weighted weak group, etc., but also some of them are novel for these generalized inverses. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 25 pages, 0 figure

MSC Class: 15A09; 15A24

arXiv:2312.09685 [pdf, other]

When Contracts Meets Crypto: Exploring Developers' Struggles with Ethereum Cryptographic APIs

Authors: Jiashuo Zhang, Jiachi Chen, Zhiyuan Wan, Ting Chen, Jianbo Gao, Zhong Chen

Abstract: To empower smart contracts with the promising capabilities of cryptography, Ethereum officially introduced a set of cryptographic APIs that facilitate basic cryptographic operations within smart contracts, such as elliptic curve operations. However, since developers are not necessarily cryptography experts, requiring them to directly interact with these basic APIs has caused real-world security is… ▽ More To empower smart contracts with the promising capabilities of cryptography, Ethereum officially introduced a set of cryptographic APIs that facilitate basic cryptographic operations within smart contracts, such as elliptic curve operations. However, since developers are not necessarily cryptography experts, requiring them to directly interact with these basic APIs has caused real-world security issues and potential usability challenges. To guide future research and solutions to these challenges, we conduct the first empirical study on Ethereum cryptographic practices. Through the analysis of 91,484,856 Ethereum transactions, 500 crypto-related contracts, and 483 StackExchange posts, we provide the first in-depth look at cryptographic tasks developers need to accomplish and identify five categories of obstacles they encounter. Furthermore, we conduct an online survey with 78 smart contract practitioners to explore their perspectives on these obstacles and elicit the underlying reasons. We find that more than half of practitioners face more challenges in cryptographic tasks compared to general business logic in smart contracts. Their feedback highlights the gap between low-level cryptographic APIs and high-level tasks they need to accomplish, emphasizing the need for improved cryptographic APIs, task-based templates, and effective assistance tools. Based on these findings, we provide practical implications for further improvements and outline future research directions. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: To appear at ICSE'24

arXiv:2312.08212 [pdf, other]

LAMM: Label Alignment for Multi-Modal Prompt Learning

Authors: **gsheng Gao, Jiacheng Ruan, Suncheng Xiang, Zefang Yu, Ke Ji, Mingye Xie, Ting Liu, Yuzhuo Fu

Abstract: With the success of pre-trained visual-language (VL) models such as CLIP in visual representation tasks, transferring pre-trained models to downstream tasks has become a crucial paradigm. Recently, the prompt tuning paradigm, which draws inspiration from natural language processing (NLP), has made significant progress in VL field. However, preceding methods mainly focus on constructing prompt temp… ▽ More With the success of pre-trained visual-language (VL) models such as CLIP in visual representation tasks, transferring pre-trained models to downstream tasks has become a crucial paradigm. Recently, the prompt tuning paradigm, which draws inspiration from natural language processing (NLP), has made significant progress in VL field. However, preceding methods mainly focus on constructing prompt templates for text and visual inputs, neglecting the gap in class label representations between the VL models and downstream tasks. To address this challenge, we introduce an innovative label alignment method named \textbf{LAMM}, which can dynamically adjust the category embeddings of downstream datasets through end-to-end training. Moreover, to achieve a more appropriate label distribution, we propose a hierarchical loss, encompassing the alignment of the parameter space, feature space, and logits space. We conduct experiments on 11 downstream vision datasets and demonstrate that our method significantly improves the performance of existing multi-modal prompt learning models in few-shot scenarios, exhibiting an average accuracy improvement of 2.31(\%) compared to the state-of-the-art methods on 16 shots. Moreover, our methodology exhibits the preeminence in continual learning compared to other prompt tuning methods. Importantly, our method is synergistic with existing prompt tuning methods and can boost the performance on top of them. Our code and dataset will be publicly available at https://github.com/gao**gsheng/LAMM. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted at AAAI 2024 Main Conference

arXiv:2312.07485 [pdf, other]

MinD-3D: Reconstruct High-quality 3D objects in Human Brain

Authors: Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu

Abstract: In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer vision. To support this pioneering task, we present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objec… ▽ More In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer vision. To support this pioneering task, we present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objects to enable comprehensive fMRI signal capture across various settings, thereby laying a foundation for future research. Furthermore, we propose MinD-3D, a novel and effective three-stage framework specifically designed to decode the brain's 3D visual information from fMRI signals, demonstrating the feasibility of this challenging task. The framework begins by extracting and aggregating features from fMRI frames through a neuro-fusion encoder, subsequently employs a feature bridge diffusion model to generate visual features, and ultimately recovers the 3D object via a generative transformer decoder. We assess the performance of MinD-3D using a suite of semantic and structural metrics and analyze the correlation between the features extracted by our model and the visual regions of interest (ROIs) in fMRI signals. Our findings indicate that MinD-3D not only reconstructs 3D objects with high semantic relevance and spatial similarity but also significantly enhances our understanding of the human brain's capabilities in processing 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D. △ Less

Submitted 21 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 26 pages, 13 figures

arXiv:2312.07255 [pdf, other]

GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction

Authors: Jiacheng Ruan, **gsheng Gao, Mingye Xie, Suncheng Xiang, Zefang Yu, Ting Liu, Yuzhuo Fu

Abstract: The Parameter-Efficient Fine-Tuning (PEFT) method, which adjusts or introduces fewer trainable parameters to calibrate pre-trained models on downstream tasks, has become a recent research interest. However, existing PEFT methods within the traditional fine-tiuning framework have two main shortcomings: 1) They overlook the explicit association between trainable parameters and downstream task knowle… ▽ More The Parameter-Efficient Fine-Tuning (PEFT) method, which adjusts or introduces fewer trainable parameters to calibrate pre-trained models on downstream tasks, has become a recent research interest. However, existing PEFT methods within the traditional fine-tiuning framework have two main shortcomings: 1) They overlook the explicit association between trainable parameters and downstream task knowledge. 2) They neglect the interaction between the intrinsic task-agnostic knowledge of pre-trained models and the task-specific knowledge in downstream tasks. To address this gap, we propose a novel fine-tuning framework, named GIST, in a plug-and-play manner. Specifically, our framework first introduces a trainable token, called the Gist token, when applying PEFT methods on downstream tasks. This token serves as an aggregator of the task-specific knowledge learned by the PEFT methods and forms an explicit association with downstream knowledge. Furthermore, to facilitate explicit interaction between task-agnostic and task-specific knowledge, we introduce the concept of Knowledge Interaction via a Bidirectional Kullback-Leibler Divergence objective. As a result, PEFT methods within our framework can make the pre-trained model understand downstream tasks more comprehensively by leveraging the knowledge interaction. Extensive experiments demonstrate the universality and scalability of our framework. Notably, on the VTAB-1K benchmark, we employ the Adapter (a prevalent PEFT method) within our GIST framework and achieve a performance boost of 2.25%, with an increase of only 0.8K parameters. The Code will be released. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 17pages, 8 figures, 22 tables, Work in progress

arXiv:2312.05758 [pdf, other]

CLeaRForecast: Contrastive Learning of High-Purity Representations for Time Series Forecasting

Authors: Jiaxin Gao, Yuxiao Hu, Qinglong Cao, Siqi Dai, Yuntian Chen

Abstract: Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccur… ▽ More Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccuracies and seriously demoting the forecasting performance. To address this issue, we propose CLeaRForecast, a novel contrastive learning framework to learn high-purity time series representations with proposed sample, feature, and architecture purifying methods. More specifically, to avoid more noise adding caused by the transformations of original samples (series), transformations are respectively applied for trendy and periodic parts to provide better positive samples with obviously less noise. Moreover, we introduce a channel independent training manner to mitigate noise originating from unrelated variables in the multivariate series. By employing a streamlined deep-learning backbone and a comprehensive global contrastive loss function, we prevent noise introduction due to redundant or uneven learning of periodicity and trend. Experimental results show the superior performance of CLeaRForecast in various downstream TSF tasks. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.04837 [pdf, other]

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Authors: Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Ye** Choi

Abstract: Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applica… ▽ More Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applications that require precise within-image reasoning. We build Localized Visual Commonsense models, which allow users to specify (multiple) regions as input. We train our model by sampling localized commonsense knowledge from a large language model (LLM): specifically, we prompt an LLM to collect commonsense knowledge given a global literal image description and a local literal region description automatically generated by a set of VL models. With a separately trained critic model that selects high-quality examples, we find that training on the localized commonsense corpus can successfully distill existing VL models to support a reference-as-input interface. Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM. △ Less

Submitted 12 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Neurips 2023

arXiv:2312.04420 [pdf, other]

Finite-Temperature Simulations of Quantum Lattice Models with Stochastic Matrix Product States

Authors: Jianxin Gao, Yuan Gao, Qiaoyi Li, Wei Li

Abstract: In this work, we develop a stochastic matrix product state (stoMPS) approach that combines the MPS technique and Monte Carlo samplings and can be applied to simulate quantum lattice models down to low temperature. In particular, we exploit a procedure to unbiasedly sample the local tensors in the matrix product states, which has one physical index of dimension $d$ and two geometric indices of dime… ▽ More In this work, we develop a stochastic matrix product state (stoMPS) approach that combines the MPS technique and Monte Carlo samplings and can be applied to simulate quantum lattice models down to low temperature. In particular, we exploit a procedure to unbiasedly sample the local tensors in the matrix product states, which has one physical index of dimension $d$ and two geometric indices of dimension $D$, and find the results can be continuously improved by enlarging $D$. We benchmark the methods on small system sizes and then compare the results to those obtained with minimally entangled typical thermal states, finding that stoMPS has overall better performance with finite $D$. We further exploit the MPS sampling to simulate long spin chains, as well as the triangular and square lattices with cylinder circumference $W$ up to 4. Our results showcase the accuracy and effectiveness of stochastic tensor networks in finite-temperature simulations. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04119 [pdf, other]

A brief introduction to a framework named Multilevel Guidance-Exploration Network

Authors: Guoqing Yang, Zhiming Luo, Jianzhe Gao, Yingxin Lai, Kun Yang, Yifan He, Shaozi Li

Abstract: Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to b… ▽ More Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network(MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the pre-trained Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore motion latent features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module(BSMM) to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets. △ Less

Submitted 9 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: More reasonable

arXiv:2312.02949 [pdf, other]

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Authors: Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

Abstract: With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for grounding and chat are usually separate, and their chat performance drops dramatically when asked to ground. The problem is the lack of a dataset for gr… ▽ More With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for grounding and chat are usually separate, and their chat performance drops dramatically when asked to ground. The problem is the lack of a dataset for grounded visual chat (GVC). Existing grounding datasets only contain short captions. To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities. To better evaluate the GVC capabilities, we have introduced a benchmark called Grounding-Bench. Additionally, we have proposed a model design that can support GVC and various types of visual prompts by connecting segmentation models with language models. Experimental results demonstrate that our model outperforms other LMMs on Grounding-Bench. Furthermore, our model achieves competitive performance on classic grounding benchmarks like RefCOCO/+/g and Flickr30K Entities. Our code will be released at https://github.com/UX-Decoder/LLaVA-Grounding . △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.02573 [pdf, other]

UTBoost: A Tree-boosting based System for Uplift Modeling

Authors: Junjie Gao, Xiangyu Zheng, DongDong Wang, Zhixiang Huang, Bangqi Zheng, Kai Yang

Abstract: Uplift modeling refers to the set of machine learning techniques that a manager may use to estimate customer uplift, that is, the net effect of an action on some customer outcome. By identifying the subset of customers for whom a treatment will have the greatest effect, uplift models assist decision-makers in optimizing resource allocations and maximizing overall returns. Accurately estimating cus… ▽ More Uplift modeling refers to the set of machine learning techniques that a manager may use to estimate customer uplift, that is, the net effect of an action on some customer outcome. By identifying the subset of customers for whom a treatment will have the greatest effect, uplift models assist decision-makers in optimizing resource allocations and maximizing overall returns. Accurately estimating customer uplift poses practical challenges, as it requires assessing the difference between two mutually exclusive outcomes for each individual. In this paper, we propose two innovative adaptations of the well-established Gradient Boosting Decision Trees (GBDT) algorithm, which learn the causal effect in a sequential way and overcome the counter-factual nature. Both approaches innovate existing techniques in terms of ensemble learning method and learning objectives, respectively. Experiments on large-scale datasets demonstrate the usefulness of the proposed methods, which often yielding remarkable improvements over base models. To facilitate the application, we develop the UTBoost, an end-to-end tree boosting system specifically designed for uplift modeling. The package is open source and has been optimized for training speed to meet the needs of real industrial applications. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 11 pages, 3 figures

arXiv:2312.02298 [pdf, other]

MoE-AMC: Enhancing Automatic Modulation Classification Performance Using Mixture-of-Experts

Authors: Jiaxin Gao, Qinglong Cao, Yuntian Chen

Abstract: Automatic Modulation Classification (AMC) plays a vital role in time series analysis, such as signal classification and identification within wireless communications. Deep learning-based AMC models have demonstrated significant potential in this domain. However, current AMC models inadequately consider the disparities in handling signals under conditions of low and high Signal-to-Noise Ratio (SNR)… ▽ More Automatic Modulation Classification (AMC) plays a vital role in time series analysis, such as signal classification and identification within wireless communications. Deep learning-based AMC models have demonstrated significant potential in this domain. However, current AMC models inadequately consider the disparities in handling signals under conditions of low and high Signal-to-Noise Ratio (SNR), resulting in an unevenness in their performance. In this study, we propose MoE-AMC, a novel Mixture-of-Experts (MoE) based model specifically crafted to address AMC in a well-balanced manner across varying SNR conditions. Utilizing the MoE framework, MoE-AMC seamlessly combines the strengths of LSRM (a Transformer-based model) for handling low SNR signals and HSRM (a ResNet-based model) for high SNR signals. This integration empowers MoE-AMC to achieve leading performance in modulation classification, showcasing its efficacy in capturing distinctive signal features under diverse SNR scenarios. We conducted experiments using the RML2018.01a dataset, where MoE-AMC achieved an average classification accuracy of 71.76% across different SNR levels, surpassing the performance of previous SOTA models by nearly 10%. This study represents a pioneering application of MoE techniques in the realm of AMC, offering a promising avenue for elevating signal classification accuracy within wireless communication systems. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01771 [pdf, other]

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

Authors: Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

Abstract: In-context learning allows adapting a model to new tasks given a task description at test time. In this paper, we present IMProv - a generative model that is able to in-context learn visual tasks from multimodal prompts. Given a textual description of a visual task (e.g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns t… ▽ More In-context learning allows adapting a model to new tasks given a task description at test time. In this paper, we present IMProv - a generative model that is able to in-context learn visual tasks from multimodal prompts. Given a textual description of a visual task (e.g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input. We train a masked generative transformer on a new dataset of figures from computer vision papers and their associated captions, together with a captioned large-scale image-text dataset. During inference time, we prompt the model with text and/or image task example(s) and have the model inpaint the corresponding output. We show that training our model with text conditioning and scaling the dataset size improves in-context learning for computer vision tasks by over +10\% AP for Foreground Segmentation, over +5\% gains in AP for Single Object Detection, and almost 20\% lower LPIPS in Colorization. Our empirical results suggest that vision and language prompts are complementary and it is advantageous to use both to achieve better in-context learning performance. Project page is available at https://jerryxu.net/IMProv . △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Project page: https://jerryxu.net/IMProv

arXiv:2312.01684 [pdf, other]

doi 10.1103/PhysRevA.108.022613

Orbital angular momentum-enhanced phase estimation using non-Gaussian state with photon loss

Authors: Yong-Jian Chen, **-Wei Gao, **-Xuan Han, Zhong-Hui Yuan, Ruo-Qi Li, Yong-Yuan Jiang, Jie Song

Abstract: This study investigates the use of orbital angular momentum (OAM) to enhance phase estimation in Mach-Zehnder interferometers (MZIs) by employing non-Gaussian states as input resources in the presence of noise. Our research demonstrates that non-Gaussian states, particularly the photonsubtraction-then-addition (PSA) state, exhibit the best sensitivity in the presence of symmetric noise. Additional… ▽ More This study investigates the use of orbital angular momentum (OAM) to enhance phase estimation in Mach-Zehnder interferometers (MZIs) by employing non-Gaussian states as input resources in the presence of noise. Our research demonstrates that non-Gaussian states, particularly the photonsubtraction-then-addition (PSA) state, exhibit the best sensitivity in the presence of symmetric noise. Additionally, higher-order of Bose operator of non-Gaussian states provide better sensitivity for symmetric noise. OAM can mitigate the deterioration of noise, making it possible to estimate small phase shifts theta close to 0. OAM enhances the resolution and sensitivity of all input states and mitigating the deterioration caused by photon loss. Additionally, OAM enhances the resolution and sensitivity of all input states, enabling the sensitivity to approach the 1/N limit even under significant photon loss (e.g.,50% symmetric photon loss). These results hold promise for enhancing the sensitivity and robustness of quantum metrology, particularly in the presence of significant photon loss. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 14 pages, 18 figures

Journal ref: Phys.Rev.A 108,022613(2023)

arXiv:2312.01410 [pdf, other]

doi 10.1103/PhysRevD.109.103531

Coupled Dark Sector Models and Cosmological Tensions

Authors: Gang Liu, Jiaze Gao, Yufen Han, Yuhao Mu, Lixin Xu

Abstract: In this paper, we introduce two coupling models of early dark energy (EDE) and cold dark matter aimed at alleviating cosmological tensions. We utilize the EDE component in the coupling models to relieve the Hubble tension, while leveraging the interaction between dark matter and dark energy to alleviate the large-scale structure tension. The interaction is implemented in the form of pure momentum… ▽ More In this paper, we introduce two coupling models of early dark energy (EDE) and cold dark matter aimed at alleviating cosmological tensions. We utilize the EDE component in the coupling models to relieve the Hubble tension, while leveraging the interaction between dark matter and dark energy to alleviate the large-scale structure tension. The interaction is implemented in the form of pure momentum coupling and Yukawa coupling. We employed various cosmological datasets, including cosmic microwave background radiation, baryon acoustic oscillations, Type Ia supernovae, the local distance-ladder data (SH0ES), and the Dark Energy Survey Year-3 data, to analyze our models. We first exclude SH0ES data from the entire dataset to constrain the parameters of novel models. We observe that the constraints on $H_0$ from two coupling models are slightly higher than that from the $Λ$CDM model, but they exhibit a significant inconsistency with the SH0ES data, consistent with prior research findings in the EDE model. Subsequently, we incorporate SH0ES data to re-constrain the parameters of various models, our findings reveal that both coupling models yield best-fit values for $H_0$ approximately around $72.23$ km/s/Mpc, effectively mitigating the Hubble tension. Similar to the EDE model, the coupling models yield the $S_8$ values that still surpasses the result of the $Λ$CDM model. Nevertheless, the best-fit values for $S_8$ obtained with the two new models are 0.8192 and 0.8177, respectively, which are lower than the 0.8316 achieved by the EDE model. Consequently, although our coupling models fail to fully resolve the large-scale structure tension, they partially mitigate the adverse effect of the original EDE model. △ Less

Submitted 23 April, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures. In this replacement, we have amalgamated the original content of this manuscript with that of a previous paper [arXiv:2310.09798]. arXiv admin note: substantial text overlap with arXiv:2310.09798

Journal ref: Phys. Rev. D 109, 103531 (2024)

arXiv:2312.01201 [pdf, other]

PAC Privacy Preserving Diffusion Models

Authors: Qipan Xu, Youlong Ding, Xinxi Zhang, Jie Gao, Hao Wang

Abstract: Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges,… ▽ More Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy protection over existing leading private generative models according to benchmark tests. △ Less

Submitted 21 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Showing 201–250 of 2,116 results for author: Gao, J