Search | arXiv e-print repository

arXiv:2406.18100 [pdf, other]

Natural Language but Omitted? On the Ineffectiveness of Large Language Models' privacy policy from End-users' Perspective

Authors: Shuning Zhang, Haobin Xing, Xin Yi, Hewu Li

Abstract: LLMs driven products were increasingly prevalent in our daily lives, With a natural language based interaction style, people may potentially leak their personal private information. Thus, privacy policy and user agreement played an important role in regulating and alerting people. However, there lacked the work examining the reading of LLM's privacy policy. Thus, we conducted the first user study… ▽ More LLMs driven products were increasingly prevalent in our daily lives, With a natural language based interaction style, people may potentially leak their personal private information. Thus, privacy policy and user agreement played an important role in regulating and alerting people. However, there lacked the work examining the reading of LLM's privacy policy. Thus, we conducted the first user study to let participants read the privacy policy and user agreement with two different styles (a cursory and detailed style). We found users lack important information upon cursory reading and even detailed reading. Besides, their privacy concerns was not solved even upon detailed reading. We provided four design implications based on the findings. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.16986 [pdf, ps, other]

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

Authors: Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce… ▽ More In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce the effectiveness of unlearning. Addressing these limitations, we introduce Mini-Unlearning, a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction map**. Our method, Mini-Unlearning, utilizes a minimal subset of historical gradients and leverages this contraction map** to facilitate scalable, efficient unlearning. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security, offering a promising solution for applications requiring robust unlearning capabilities. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14230 [pdf, other]

Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

Authors: Han Jiang, Xiaoyuan Yi, Zhihua Wei, Shu Wang, Xing Xie

Abstract: Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs,… ▽ More Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs, but they suffer from evaluation chronoeffect, that is, as models rapidly evolve, existing data becomes leaked or undemanding, overestimating ever-develo** LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach that dynamically probes the underlying moral baselines of LLMs. Distinct from previous adaptive testing methods that rely on static datasets with limited difficulty, GETA incorporates an iteratively-updated item generator which infers each LLM's moral boundaries and generates difficulty-tailored testing items, accurately reflecting the true alignment extent. This process theoretically learns a joint distribution of item and model response, with item difficulty and value conformity as latent variables, where the generator co-evolves with the LLM, addressing chronoeffect. We evaluate various popular LLMs with diverse capabilities and demonstrate that GETA can create difficulty-matching testing items and more accurately assess LLMs' values, better consistent with their performance on unseen OOD and i.i.d. items, laying the groundwork for future evaluation paradigms. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.12193 [pdf, other]

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Authors: Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li

Abstract: Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers withi… ▽ More Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods. △ Less

Submitted 25 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.06367 [pdf, other]

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Authors: Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

Abstract: Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-vi… ▽ More Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1\times$ of the model size. △ Less

Submitted 20 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.01282 [pdf, other]

Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Authors: Jiaxu Liu, ** Yi, Sihao Wu, Xiangyu Yin, Tianle Zhang, Xiaowei Huang, Shi **

Abstract: While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting… ▽ More While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting node-wise attention undertake the role of diffusivity within the Hyperbolic Neural PDE (HPDE). By introducing theoretical principles \textit{e.g.,} field and flow, gradient, divergence, and diffusivity on a non-Euclidean manifold for HPDE integration, we discuss both implicit and explicit discretization schemes to formulate numerical HPDE solvers. Further, we propose the Hyperbolic Graph Diffusion Equation (HGDE) -- a flexible vector flow function that can be integrated to obtain expressive hyperbolic node embeddings. By analyzing potential energy decay of embeddings, we demonstrate that HGDE is capable of modeling both low- and high-order proximity with the benefit of local-global diffusivity functions. Experiments on node classification and link prediction and image-text classification tasks verify the superiority of the proposed method, which consistently outperforms various competitive models by a significant margin. △ Less

Submitted 7 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: The short version of this work will appear in the Proceedings of the 2024 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024)

arXiv:2405.12604 [pdf, other]

Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Authors: Jiaxu Liu, Xiangyu Yin, Sihao Wu, Jianhong Wang, Meng Fang, ** Yi, Xiaowei Huang

Abstract: With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, ef… ▽ More With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, effectively reducing toxicity in responses from target LLMs. The sentinel model naturally overcomes the \textit{parameter inefficiency} and \textit{limited model accessibility} for fine-tuning large target models. We employ an interleaved training regimen using Proximal Policy Optimization (PPO) to optimize both red team and sentinel models dynamically, incorporating a value head-sharing mechanism inspired by the multi-agent centralized critic to manage the complex interplay between agents. Our extensive experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs, even when dealing with larger models like \texttt{Llama-2}, \texttt{GPT-3.5} and \texttt{Stable-Diffusion}, highlighting the potential of our framework in enhancing safety and robustness in various applications. △ Less

Submitted 17 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: Preprint, 10 pages main with 10 pages appendix

arXiv:2405.10481 [pdf, other]

Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

Authors: Yuqing Lan, Zhenghao Liu, Yu Gu, Xiaoyuan Yi, Xiaohua Li, Liner Yang, Ge Yu

Abstract: Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the… ▽ More Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the edges of the reasoning graph, which misleads the semantic representations of other nodes and amplifies the noise signals. To mitigate the propagation of noisy semantic information, we introduce a Confidential Graph Attention Network (CO-GAT), which proposes a node masking mechanism for modeling the nodes. Specifically, CO-GAT calculates the node confidence score by estimating the relevance between the claim and evidence pieces. Then, the node masking mechanism uses the node confidence scores to control the noise information flow from the vanilla node to the other graph nodes. CO-GAT achieves a 73.59% FEVER score on the FEVER dataset and shows the generalization ability by broadening the effectiveness to the science-specific domain. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 12pages

arXiv:2405.09055 [pdf, other]

A safety realignment framework via subspace-oriented model fusion for large language models

Authors: Xin Yi, Shunfan Zheng, Linlin Wang, Xiaoling Wang, Liang He

Abstract: The current safeguard mechanisms for large language models (LLMs) are indeed susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine-tuning on apparently benign data for downstream tasks can jeopardize safety. One potential solution is to conduct safety fine-tuning subsequent to downstream fine-tuning. However, there's a risk of catastrophic forgetting during saf… ▽ More The current safeguard mechanisms for large language models (LLMs) are indeed susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine-tuning on apparently benign data for downstream tasks can jeopardize safety. One potential solution is to conduct safety fine-tuning subsequent to downstream fine-tuning. However, there's a risk of catastrophic forgetting during safety fine-tuning, where LLMs may regain safety measures but lose the task-specific knowledge acquired during downstream fine-tuning. In this paper, we introduce a safety realignment framework through subspace-oriented model fusion (SOMF), aiming to combine the safeguard capabilities of initially aligned model and the current fine-tuned model into a realigned model. Our approach begins by disentangling all task vectors from the weights of each fine-tuned model. We then identify safety-related regions within these vectors by subspace masking techniques. Finally, we explore the fusion of the initial safely aligned LLM with all task vectors based on the identified safety subspace. We validate that our safety realignment framework satisfies the safety requirements of a single fine-tuned model as well as multiple models during their fusion. Our findings confirm that SOMF preserves safety without notably compromising performance on downstream tasks, including instruction following in Chinese, English, and Hindi, as well as problem-solving capabilities in Code and Math. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.03372 [pdf, other]

Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G

Authors: Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, Honggang Zhang

Abstract: In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, cos… ▽ More In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, costly communication overheads, severe computing resource consumption, and data heterogeneity across network nodes. These obstacles hinder the applications of ubiquitous computing capabilities of 6G networks, especially in light of the trend of escalating model parameters and training data volumes. To address these challenges effectively, this paper introduces "Snake Learning", a cost-effective distributed learning framework. Specifically, Snake Learning respects the heterogeneity of inter-node computing capability and local data distribution in 6G networks, and sequentially trains the designated part of model layers on individual nodes. This layer-by-layer serpentine update mechanism contributes to significantly reducing the requirements for storage, memory and communication during the model training phase, and demonstrates superior adaptability and efficiency for both Computer Vision (CV) training and Large Language Model (LLM) fine-tuning tasks across homogeneous and heterogeneous data distributions. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 7 pages, 6 figures

arXiv:2405.02676 [pdf, other]

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Authors: Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu

Abstract: Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and t… ▽ More Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: SIGGRAPH 2024 Conference Track

ACM Class: I.5.4

arXiv:2405.00699 [pdf, other]

Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Authors: Dengyu Wu, Yi Qi, Kaiwen Cai, Gaojie **, ** Yi, Xiaowei Huang

Abstract: Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN) and hold great promise in effectively processing spatial-temporal information. However, the choice of timestep becomes crucial as it significantly impacts the accuracy of the neural network training. Specifically, a smaller timestep indicates better performance in efficient computing, resulting i… ▽ More Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN) and hold great promise in effectively processing spatial-temporal information. However, the choice of timestep becomes crucial as it significantly impacts the accuracy of the neural network training. Specifically, a smaller timestep indicates better performance in efficient computing, resulting in reduced latency and operations. While, using a small timestep may lead to low accuracy due to insufficient information presentation with few spikes. This observation motivates us to develop an SNN that is more reliable for adaptive timestep by introducing a novel regularisation technique, namely Spatial-Temporal Regulariser (STR). Our approach regulates the ratio between the strength of spikes and membrane potential at each timestep. This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN. Through extensive experiments on frame-based and event-based datasets, our method, in combination with cutoff based on softmax output, achieves state-of-the-art performance in terms of both latency and accuracy. Notably, with STR and cutoff, SNN achieves 2.14 to 2.89 faster in inference compared to the pre-configured timestep with near-zero accuracy drop of 0.50% to 0.64% over the event-based datasets. Code available: https://github.com/Dengyu-Wu/AOI-SNN-Regularisation △ Less

Submitted 15 April, 2024; originally announced May 2024.

arXiv:2404.19619 [pdf, other]

Physical Non-inertial Poser (PNP): Modeling Non-inertial Effects in Sparse-inertial Human Motion Capture

Authors: Xinyu Yi, Yuxiao Zhou, Feng Xu

Abstract: Existing inertial motion capture techniques use the human root coordinate frame to estimate local poses and treat it as an inertial frame by default. We argue that when the root has linear acceleration or rotation, the root frame should be considered non-inertial theoretically. In this paper, we model the fictitious forces that are non-neglectable in a non-inertial frame by an auto-regressive esti… ▽ More Existing inertial motion capture techniques use the human root coordinate frame to estimate local poses and treat it as an inertial frame by default. We argue that when the root has linear acceleration or rotation, the root frame should be considered non-inertial theoretically. In this paper, we model the fictitious forces that are non-neglectable in a non-inertial frame by an auto-regressive estimator delicately designed following physics. With the fictitious forces, the force-related IMU measurement (accelerations) can be correctly compensated in the non-inertial frame and thus Newton's laws of motion are satisfied. In this case, the relationship between the accelerations and body motions is deterministic and learnable, and we train a neural network to model it for better motion capture. Furthermore, to train the neural network with synthetic data, we develop an IMU synthesis by simulation strategy to better model the noise model of IMU hardware and allow parameter tuning to fit different hardware. This strategy not only establishes the network training with synthetic data but also enables calibration error modeling to handle bad motion capture calibration, increasing the robustness of the system. Code is available at https://xinyu-yi.github.io/PNP/. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Accepted by SIGGRAPH 2024 Project Page: https://xinyu-yi.github.io/PNP/

arXiv:2404.16067 [pdf, other]

Layout2Rendering: AI-aided Greenspace design

Authors: Ran Chen, Zeke Lian, Yueheng He, Xiao Ling, Fuyu Yang, Xueqi Yao, Xingjian Yi, **g Zhao

Abstract: In traditional human living environment landscape design, the establishment of three-dimensional models is an essential step for designers to intuitively present the spatial relationships of design elements, as well as a foundation for conducting landscape analysis on the site. Rapidly and effectively generating beautiful and realistic landscape spaces is a significant challenge faced by designers… ▽ More In traditional human living environment landscape design, the establishment of three-dimensional models is an essential step for designers to intuitively present the spatial relationships of design elements, as well as a foundation for conducting landscape analysis on the site. Rapidly and effectively generating beautiful and realistic landscape spaces is a significant challenge faced by designers. Although generative design has been widely applied in related fields, they mostly generate three-dimensional models through the restriction of indicator parameters. However, the elements of landscape design are complex and have unique requirements, making it difficult to generate designs from the perspective of indicator limitations. To address these issues, this study proposes a park space generative design system based on deep learning technology. This system generates design plans based on the topological relationships of landscape elements, then vectorizes the plan element information, and uses Grasshopper to generate three-dimensional models while synchronously fine-tuning parameters, rapidly completing the entire process from basic site conditions to model effect analysis. Experimental results show that: (1) the system, with the aid of AI-assisted technology, can rapidly generate space green space schemes that meet the designer's perspective based on site conditions; (2) this study has vectorized and three-dimensionalized various types of landscape design elements based on semantic information; (3) the analysis and visualization module constructed in this study can perform landscape analysis on the generated three-dimensional models and produce node effect diagrams, allowing users to modify the design in real time based on the effects, thus enhancing the system's interactivity. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 14 pages,8 figures

arXiv:2404.12744 [pdf, other]

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

Authors: Pablo Biedma, Xiaoyuan Yi, Linus Huang, Maosong Sun, Xing Xie

Abstract: Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs… ▽ More Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system. Based on this system, we further develop tailored projective tests to evaluate and analyze the value inclinations of LLMs across different model sizes, training methods, and data sources. Our framework fosters an interdisciplinary paradigm of understanding LLMs, paving the way for future AI alignment and regulation. △ Less

Submitted 10 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: 16 pages, work in progress

arXiv:2404.10928 [pdf, other]

GPU-Based Parallel Computing Methods for Medical Photoacoustic Image Reconstruction

Authors: Xinyao Yi, Yuxin Qiao

Abstract: Recent years have witnessed a rapid advancement in GPU technology, establishing it as a formidable high-performance parallel computing technology with superior floating-point computational capabilities compared to traditional CPUs. This paper explores the application of this technology in the field of photoacoustic imaging, an emerging non-destructive testing technique in biomedical engineering ch… ▽ More Recent years have witnessed a rapid advancement in GPU technology, establishing it as a formidable high-performance parallel computing technology with superior floating-point computational capabilities compared to traditional CPUs. This paper explores the application of this technology in the field of photoacoustic imaging, an emerging non-destructive testing technique in biomedical engineering characterized by its high contrast, resolution, and penetration depth. We conduct a data parallelism analysis targeting the computationally intensive image reconstruction segment of photoacoustic imaging. By parallelizing the serial code for iterative reconstruction and optimizing memory access, we achieve significant improvements in processing speed. Our experiments compare the imaging speeds of vascular images reconstructed using CPUs and GPUs, with the results visualized using Matlab. The findings demonstrate that, while maintaining data accuracy, GPU parallel computing methods can markedly accelerate photoacoustic image reconstruction. This acceleration has the potential to facilitate the broader adoption of photoacoustic imaging in applications such as hemodynamic monitoring, clinical disease diagnosis, and drug development. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.04562 [pdf, other]

Diffusion Time-step Curriculum for One Image to 3D Generation

Authors: Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang

Abstract: Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image. It leverages pre-trained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success, SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is t… ▽ More Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image. It leverages pre-trained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success, SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization: it unreasonably treats the student-teacher knowledge distillation to be equal at all time-steps and thus entangles coarse-grained and fine-grained modeling. Therefore, we propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123), which involves both the teacher and student models collaborating with the time-step curriculum in a coarse-to-fine manner. Extensive experiments on NeRF4, RealFusion15, GSO and Level50 benchmark demonstrate that DTC123 can produce multi-view consistent, high-quality, and diverse 3D assets. Codes and more generation demos will be released in https://github.com/yxymessi/DTC123. △ Less

Submitted 2 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.02988 [pdf, other]

Risk-averse Learning with Non-Stationary Distributions

Authors: Siyi Wang, Zifan Wang, Xinlei Yi, Michael M. Zavlanos, Karl H. Johansson, Sandra Hirche

Abstract: Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost chan… ▽ More Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time. We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost function values multiple times at each iteration and estimates the CVaR gradient using the sampled values. To facilitate the regret analysis, we use a variation metric based on Wasserstein distance to capture time-varying distributions. Given that the distribution variation is sub-linear in the total number of episodes, we show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest that increasing the number of samples leads to a reduction in the dynamic regret bounds until the sampling number reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.00245 [pdf, other]

Aligning Large Language Models with Recommendation Knowledge

Authors: Yuwei Cao, Nikhil Mehta, Xinyang Yi, Raghunandan Keshavan, Lukasz Heldt, Lichan Hong, Ed H. Chi, Maheswaran Sathiamoorthy

Abstract: Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions… ▽ More Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions inherent in recommendation tasks. We propose bridging the knowledge gap and equip** LLMs with recommendation-specific knowledge to address this. Operations such as Masked Item Modeling (MIM) and Bayesian Personalized Ranking (BPR) have found success in conventional recommender systems. Inspired by this, we simulate these operations through natural language to generate auxiliary-task data samples that encode item correlations and user preferences. Fine-tuning LLMs on such auxiliary-task data samples and incorporating more informative recommendation-task data samples facilitates the injection of recommendation-specific knowledge into LLMs. Extensive experiments across retrieval, ranking, and rating prediction tasks on LLMs such as FLAN-T5-Base and FLAN-T5-XL show the effectiveness of our technique in domains such as Amazon Toys & Games, Beauty, and Sports & Outdoors. Notably, our method outperforms conventional and LLM-based baselines, including the current SOTA, by significant margins in retrieval, showcasing its potential for enhancing recommendation quality. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to the NAACL 2024 Findings

arXiv:2403.19206 [pdf, other]

CogniDot: Vasoactivity-based Cognitive Load Monitoring with a Miniature On-skin Sensor

Authors: Hongbo Lan, Yanrong Li, Shixuan Li, Xin Yi, Tengxiang Zhang

Abstract: Vascular activities offer valuable signatures for psychological monitoring applications. We present CogniDot, an affordable, miniature skin sensor placed on the temporal area on the head that senses cognitive loads with a single-pixel color sensor. With its energy-efficient design, bio-compatible adhesive, and compact size (22mm diameter, 8.5mm thickness), it is ideal for long-term monitoring of m… ▽ More Vascular activities offer valuable signatures for psychological monitoring applications. We present CogniDot, an affordable, miniature skin sensor placed on the temporal area on the head that senses cognitive loads with a single-pixel color sensor. With its energy-efficient design, bio-compatible adhesive, and compact size (22mm diameter, 8.5mm thickness), it is ideal for long-term monitoring of mind status. We showed in detail the hardware design of our sensor. The user study results with 12 participants show that CogniDot can accurately differentiate between three levels of cognitive loads with a within-user accuracy of 97%. We also discuss its potential for broader applications. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19093 [pdf, other]

doi 10.1109/IROS55552.2023.10341360

Task2Morph: Differentiable Task-inspired Framework for Contact-Aware Robot Design

Authors: Yishuai Cai, Shaowu Yang, Minglong Li, Xinglin Chen, Yunxin Mao, Xiaodong Yi, Wen**g Yang

Abstract: Optimizing the morphologies and the controllers that adapt to various tasks is a critical issue in the field of robot design, aka. embodied intelligence. Previous works typically model it as a joint optimization problem and use search-based methods to find the optimal solution in the morphology space. However, they ignore the implicit knowledge of task-to-morphology map** which can directly insp… ▽ More Optimizing the morphologies and the controllers that adapt to various tasks is a critical issue in the field of robot design, aka. embodied intelligence. Previous works typically model it as a joint optimization problem and use search-based methods to find the optimal solution in the morphology space. However, they ignore the implicit knowledge of task-to-morphology map** which can directly inspire robot design. For example, flip** heavier boxes tends to require more muscular robot arms. This paper proposes a novel and general differentiable task-inspired framework for contact-aware robot design called Task2Morph. We abstract task features highly related to task performance and use them to build a task-to-morphology map**. Further, we embed the map** into a differentiable robot design process, where the gradient information is leveraged for both the map** learning and the whole optimization. The experiments are conducted on three scenarios, and the results validate that Task2Morph outperforms DiffHand, which lacks a task-inspired morphology module, in terms of efficiency and effectiveness. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 9 pages, 10 figures, published to IROS

Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023: 452-459

arXiv:2403.18795 [pdf, other]

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Authors: Qiuhong Shen, Zike Wu, Xuanyu Yi, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang

Abstract: We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed. Existing methods for single-image 3D reconstruction are primarily based on Score Distillation Sampling (SDS) with Neural 3D representations. Despite promising results, these approaches encounter practical limitations due to lengthy optimizations and significant memory consumption. In this wor… ▽ More We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed. Existing methods for single-image 3D reconstruction are primarily based on Score Distillation Sampling (SDS) with Neural 3D representations. Despite promising results, these approaches encounter practical limitations due to lengthy optimizations and significant memory consumption. In this work, we introduce Gamba, an end-to-end 3D reconstruction model from a single-view image, emphasizing two main insights: (1) Efficient Backbone Design: introducing a Mamba-based GambaFormer network to model 3D Gaussian Splatting (3DGS) reconstruction as sequential prediction with linear scalability of token length, thereby accommodating a substantial number of Gaussians; (2) Robust Gaussian Constraints: deriving radial mask constraints from multi-view masks to eliminate the need for warmup supervision of 3D point clouds in training. We trained Gamba on Objaverse and assessed it against existing optimization-based and feed-forward 3D reconstruction approaches on the GSO Dataset, among which Gamba is the only end-to-end trained single-view reconstruction model with 3DGS. Experimental results demonstrate its competitive generation capabilities both qualitatively and quantitatively and highlight its remarkable speed: Gamba completes reconstruction within 0.05 seconds on a single NVIDIA A100 GPU, which is about $1,000\times$ faster than optimization-based methods. Please see our project page at https://florinshen.github.io/gamba-project. △ Less

Submitted 24 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: project page: https://florinshen.github.io/gamba-project

arXiv:2403.16387 [pdf, other]

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

Authors: Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Abstract: Image fusion aims to combine information from different source images to create a comprehensively representative image. Existing fusion methods are typically helpless in dealing with degradations in low-quality source images and non-interactive to multiple subjective and objective needs. To solve them, we introduce a novel approach that leverages semantic text guidance image fusion model for degra… ▽ More Image fusion aims to combine information from different source images to create a comprehensively representative image. Existing fusion methods are typically helpless in dealing with degradations in low-quality source images and non-interactive to multiple subjective and objective needs. To solve them, we introduce a novel approach that leverages semantic text guidance image fusion model for degradation-aware and interactive image fusion task, termed as Text-IF. It innovatively extends the classical image fusion to the text guided image fusion along with the ability to harmoniously address the degradation and interaction issues during fusion. Through the text semantic encoder and semantic interaction fusion decoder, Text-IF is accessible to the all-in-one infrared and visible image degradation-aware processing and the interactive flexible fusion outcomes. In this way, Text-IF achieves not only multi-modal image fusion, but also multi-modal information fusion. Extensive experiments prove that our proposed text guided image fusion strategy has obvious advantages over SOTA methods in the image fusion performance and degradation treatment. The code is available at https://github.com/XunpengYi/Text-IF. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.11868 [pdf, other]

View-Consistent 3D Editing with Gaussian Splatting

Authors: Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

Abstract: The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance imag… ▽ More The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. Further code and video results are released at http://yuxuanw.me/vcedit/. △ Less

Submitted 20 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 25 pages

arXiv:2403.04204 [pdf, other]

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Authors: Xinpeng Wang, Shitong Duan, Xiaoyuan Yi, **g Yao, Shanlin Zhou, Zhihua Wei, Peng Zhang, Dongkuan Xu, Maosong Sun, Xing Xie

Abstract: Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns. Addressing such concerns, alignment technologies were introduced to make these models conform to human preferences and values. Despite considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy, such as data cost and scalable o… ▽ More Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns. Addressing such concerns, alignment technologies were introduced to make these models conform to human preferences and values. Despite considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy, such as data cost and scalable oversight, and how to align remains an open question. In this survey paper, we comprehensively investigate value alignment approaches. We first unpack the historical context of alignment tracing back to the 1920s (where it comes from), then delve into the mathematical essence of alignment (what it is), shedding light on the inherent challenges. Following this foundation, we provide a detailed examination of existing alignment methods, which fall into three categories: Reinforcement Learning, Supervised Fine-Tuning, and In-context Learning, and demonstrate their intrinsic connections, strengths, and limitations, hel** readers better understand this research area. In addition, two emerging topics, personal alignment, and multimodal alignment, are also discussed as novel frontiers in this field. Looking forward, we discuss potential alignment paradigms and how they could handle remaining challenges, prospecting where future alignment will go. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 23 pages, 7 figures

arXiv:2403.03419 [pdf, other]

Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization

Authors: Shitong Duan, Xiaoyuan Yi, Peng Zhang, Tun Lu, Xing Xie, Ning Gu

Abstract: Large language models (LLMs) have revolutionized the role of AI, yet also pose potential risks of propagating unethical content. Alignment technologies have been introduced to steer LLMs towards human preference, gaining increasing attention. Despite notable breakthroughs in this direction, existing methods heavily rely on high-quality positive-negative training pairs, suffering from noisy labels… ▽ More Large language models (LLMs) have revolutionized the role of AI, yet also pose potential risks of propagating unethical content. Alignment technologies have been introduced to steer LLMs towards human preference, gaining increasing attention. Despite notable breakthroughs in this direction, existing methods heavily rely on high-quality positive-negative training pairs, suffering from noisy labels and the marginal distinction between preferred and dispreferred response data. Given recent LLMs' proficiency in generating helpful responses, this work pivots towards a new research focus: achieving alignment using solely human-annotated negative samples, preserving helpfulness while reducing harmfulness. For this purpose, we propose Distributional Dispreference Optimization (D$^2$O), which maximizes the discrepancy between the generated responses and the dispreferred ones to effectively eschew harmful information. We theoretically demonstrate that D$^2$O is equivalent to learning a distributional instead of instance-level preference model reflecting human dispreference against the distribution of negative responses. Besides, D$^2$O integrates an implicit Jeffrey Divergence regularization to balance the exploitation and exploration of reference policies and converges to a non-negative one during training. Extensive experiments demonstrate that our method achieves comparable generation quality and surpasses the latest baselines in producing less harmful and more informative responses with better training stability and faster convergence. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00839 [pdf, other]

ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph

Authors: Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, Dongkuan Xu

Abstract: While achieving remarkable progress in a broad range of tasks, large language models (LLMs) remain significantly limited in properly using massive external tools. Existing in-context learning approaches simply format tools into a list of plain text descriptions and input them to LLMs, from which, LLMs generate a sequence of tool calls to solve problems step by step. Such a paradigm ignores the int… ▽ More While achieving remarkable progress in a broad range of tasks, large language models (LLMs) remain significantly limited in properly using massive external tools. Existing in-context learning approaches simply format tools into a list of plain text descriptions and input them to LLMs, from which, LLMs generate a sequence of tool calls to solve problems step by step. Such a paradigm ignores the intrinsic dependency between tools and offloads all reasoning loads to LLMs, making them restricted to a limited number of specifically designed tools. It thus remains challenging for LLMs to operate on a library of massive tools, casting a great limitation when confronted with real-world scenarios. This paper proposes ToolNet, a plug-and-play framework that scales up the number of tools to thousands with a moderate increase in token consumption. ToolNet organizes tools into a directed graph. Each node represents a tool, and weighted edges denote tool transition. Starting from an initial tool node, an LLM navigates in the graph by iteratively choosing the next one from its successors until the task is resolved. Extensive experiments show that ToolNet can achieve impressive results in challenging multi-hop tool learning datasets and is resilient to tool failures. △ Less

Submitted 28 February, 2024; originally announced March 2024.

arXiv:2402.15592 [pdf, other]

Neural optimal controller for stochastic systems via pathwise HJB operator

Authors: Zhe Jiao, Xiaoyan Luo, Xinlei Yi

Abstract: The aim of this work is to develop deep learning-based algorithms for high-dimensional stochastic control problems based on physics-informed learning and dynamic programming. Unlike classical deep learning-based methods relying on a probabilistic representation of the solution to the Hamilton--Jacobi--Bellman (HJB) equation, we introduce a pathwise operator associated with the HJB equation so that… ▽ More The aim of this work is to develop deep learning-based algorithms for high-dimensional stochastic control problems based on physics-informed learning and dynamic programming. Unlike classical deep learning-based methods relying on a probabilistic representation of the solution to the Hamilton--Jacobi--Bellman (HJB) equation, we introduce a pathwise operator associated with the HJB equation so that we can define a problem of physics-informed learning. According to whether the optimal control has an explicit representation, two numerical methods are proposed to solve the physics-informed learning problem. We provide an error analysis on how the truncation, approximation and optimization errors affect the accuracy of these methods. Numerical results on various applications are presented to illustrate the performance of the proposed algorithms. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 20 pages

arXiv:2402.15202 [pdf, other]

Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models

Authors: Xin Yi, Linlin Wang, Xiaoling Wang, Liang He

Abstract: Impressive results have been achieved in natural language processing (NLP) tasks through the training of large language models (LLMs). However, these models occasionally produce toxic content such as insults, threats, and profanity in response to certain prompts, thereby constraining their practical utility. To tackle this issue, various finetuning-based and decoding-based approaches have been uti… ▽ More Impressive results have been achieved in natural language processing (NLP) tasks through the training of large language models (LLMs). However, these models occasionally produce toxic content such as insults, threats, and profanity in response to certain prompts, thereby constraining their practical utility. To tackle this issue, various finetuning-based and decoding-based approaches have been utilized to mitigate toxicity. However, these methods typically necessitate additional costs such as high-quality training data or auxiliary models. In this paper, we propose fine-grained detoxification via instance-level prefixes (FGDILP) to mitigate toxic text without additional cost. Specifically, FGDILP contrasts the contextualized representation in attention space using a positive prefix-prepended prompt against multiple negative prefix-prepended prompts at the instance level. This allows for constructing fine-grained subtoxicity vectors, which enables collaborative detoxification by fusing them to correct the normal generation process when provided with a raw prompt. We validate that FGDILP enables controlled text generation with regard to toxicity at both the utterance and context levels. Our method surpasses prompt-based baselines in detoxification, although at a slight cost to generation fluency and diversity. △ Less

Submitted 25 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2401.15371

LegalDuet: Learning Effective Representations for Legal Judgment Prediction through a Dual-View Legal Clue Reasoning

Authors: Pengjie Liu, Zhenghao Liu, Xiaoyuan Yi, Liner Yang, Shuo Wang, Yu Gu, Ge Yu, Xing Xie, Shuang-hua Yang

Abstract: Most existing Legal Judgment Prediction (LJP) models focus on discovering the legal triggers in the criminal fact description. However, in real-world scenarios, a professional judge not only needs to assimilate the law case experience that thrives on past sentenced legal judgments but also depends on the professional legal grounded reasoning that learned from professional legal knowledge. In this… ▽ More Most existing Legal Judgment Prediction (LJP) models focus on discovering the legal triggers in the criminal fact description. However, in real-world scenarios, a professional judge not only needs to assimilate the law case experience that thrives on past sentenced legal judgments but also depends on the professional legal grounded reasoning that learned from professional legal knowledge. In this paper, we propose a LegalDuet model, which pretrains language models to learn a tailored embedding space for making legal judgments. It proposes a dual-view legal clue reasoning mechanism, which derives from two reasoning chains of judges: 1) Law Case Reasoning, which makes legal judgments according to the judgment experiences learned from analogy/confusing legal cases; 2) Legal Ground Reasoning, which lies in matching the legal clues between criminal cases and legal decisions. Our experiments show that LegalDuet achieves state-of-the-art performance on the CAIL2018 dataset and outperforms baselines with about 4% improvements on average. Our dual-view reasoning based pretraining can capture critical legal clues to learn a tailored embedding space to distinguish criminal cases. It reduces LegalDuet's uncertainty during prediction and brings pretraining advances to the confusing/low frequent charges. All codes are available at https://github.com/NEUIR/LegalDuet. △ Less

Submitted 27 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: we will update this paper and revise this paper in the near future

arXiv:2401.09071 [pdf, other]

Rethinking Spectral Graph Neural Networks with Spatially Adaptive Filtering

Authors: **gwei Guo, Kaizhu Huang, ** Yi, Zixian Su, Rui Zhang

Abstract: Whilst spectral Graph Neural Networks (GNNs) are theoretically well-founded in the spectral domain, their practical reliance on polynomial approximation implies a profound linkage to the spatial domain. As previous studies rarely examine spectral GNNs from the spatial perspective, their spatial-domain interpretability remains elusive, e.g., what information is essentially encoded by spectral GNNs… ▽ More Whilst spectral Graph Neural Networks (GNNs) are theoretically well-founded in the spectral domain, their practical reliance on polynomial approximation implies a profound linkage to the spatial domain. As previous studies rarely examine spectral GNNs from the spatial perspective, their spatial-domain interpretability remains elusive, e.g., what information is essentially encoded by spectral GNNs in the spatial domain? In this paper, to answer this question, we investigate the theoretical connection between spectral filtering and spatial aggregation, unveiling an intrinsic interaction that spectral filtering implicitly leads the original graph to an adapted new graph, explicitly computed for spatial aggregation. Both theoretical and empirical investigations reveal that the adapted new graph not only exhibits non-locality but also accommodates signed edge weights to reflect label consistency among nodes. These findings thus highlight the interpretable role of spectral GNNs in the spatial domain and inspire us to rethink graph spectral filters beyond the fixed-order polynomials, which neglect global information. Built upon the theoretical findings, we revisit the state-of-the-art spectral GNNs and propose a novel Spatially Adaptive Filtering (SAF) framework, which leverages the adapted new graph by spectral filtering for an auxiliary non-local aggregation. Notably, our SAF comprehensively models both node similarity and dissimilarity from a global perspective, therefore alleviating persistent deficiencies of GNNs related to long-range dependencies and graph heterophily. Extensive experiments over 13 node classification benchmarks demonstrate the superiority of our proposed framework to the state-of-the-art methods. △ Less

Submitted 22 May, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.09050 [pdf, other]

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Authors: Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang

Abstract: Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajec… ▽ More Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model. However, the randomness in SDE sampling often leads to a diverse and unpredictable sample which is not always less noisy, and thus is not a consistently correct guidance, explaining the vulnerability of SDS. Since for any SDE, there always exists an ordinary differential equation (ODE) whose trajectory sampling can deterministically and consistently converge to the desired target point as the SDE, we propose a novel and effective "Consistent3D" method that explores the ODE deterministic sampling prior for text-to-3D generation. Specifically, at each training iteration, given a rendered image by a 3D model, we first estimate its desired 3D score function by a pre-trained 2D diffusion model, and build an ODE for trajectory sampling. Next, we design a consistency distillation sampling loss which samples along the ODE trajectory to generate two adjacent samples and uses the less noisy sample to guide another more noisy one for distilling the deterministic prior into the 3D model. Experimental results show the efficacy of our Consistent3D in generating high-fidelity and diverse 3D objects and large-scale scenes, as shown in Fig. 1. The codes are available at https://github.com/sail-sg/Consistent3D. △ Less

Submitted 12 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted to CVPR 2024

arXiv:2401.08615 [pdf, other]

Online Anomaly Detection over Live Social Video Streaming

Authors: Chengkun He, Xiangmin Zhou, Chen Wang, Iqbal Gondal, Jie Shao, Xun Yi

Abstract: Social video anomaly is an observation in video streams that does not conform to a common pattern of dataset's behaviour. Social video anomaly detection plays a critical role in applications from e-commerce to e-learning. Traditionally, anomaly detection techniques are applied to find anomalies in video broadcasting. However, they neglect the live social video streams which contain interactive tal… ▽ More Social video anomaly is an observation in video streams that does not conform to a common pattern of dataset's behaviour. Social video anomaly detection plays a critical role in applications from e-commerce to e-learning. Traditionally, anomaly detection techniques are applied to find anomalies in video broadcasting. However, they neglect the live social video streams which contain interactive talk, speech, or lecture with audience. In this paper, we propose a generic framework for effectively online detecting Anomalies Over social Video LIve Streaming (AOVLIS). Specifically, we propose a novel deep neural network model called Coupling Long Short-Term Memory (CLSTM) that adaptively captures the history behaviours of the presenters and audience, and their mutual interactions to predict their behaviour at next time point over streams. Then we well integrate the CLSTM with a decoder layer, and propose a new reconstruction error-based scoring function $RE_{IA}$ to calculate the anomaly score of each video segment for anomaly detection. After that, we propose a novel model update scheme that incrementally maintains CLSTM and decoder. Moreover, we design a novel upper bound and ADaptive Optimisation Strategy (ADOS) for improving the efficiency of our solution. Extensive experiments are conducted to prove the superiority of AOVLIS. △ Less

Submitted 1 December, 2023; originally announced January 2024.

arXiv:2401.08089 [pdf, other]

A Study on Training and Develo** Large Language Models for Behavior Tree Generation

Authors: Fu Li, Xueying Wang, Bin Li, Yunlong Wu, Yanzhen Wang, Xiaodong Yi

Abstract: This paper presents an innovative exploration of the application potential of large language models (LLM) in addressing the challenging task of automatically generating behavior trees (BTs) for complex tasks. The conventional manual BT generation method is inefficient and heavily reliant on domain expertise. On the other hand, existing automatic BT generation technologies encounter bottlenecks rel… ▽ More This paper presents an innovative exploration of the application potential of large language models (LLM) in addressing the challenging task of automatically generating behavior trees (BTs) for complex tasks. The conventional manual BT generation method is inefficient and heavily reliant on domain expertise. On the other hand, existing automatic BT generation technologies encounter bottlenecks related to task complexity, model adaptability, and reliability. In order to overcome these challenges, we propose a novel methodology that leverages the robust representation and reasoning abilities of LLMs. The core contribution of this paper lies in the design of a BT generation framework based on LLM, which encompasses the entire process, from data synthesis and model training to application develo** and data verification. Synthetic data is introduced to train the BT generation model (BTGen model), enhancing its understanding and adaptability to various complex tasks, thereby significantly improving its overall performance. In order to ensure the effectiveness and executability of the generated BTs, we emphasize the importance of data verification and introduce a multilevel verification strategy. Additionally, we explore a range of agent design and development schemes with LLM as the central element. We hope that the work in this paper may provide a reference for the researchers who are interested in BT generation based on LLMs. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.02116 [pdf, other]

doi 10.1145/3639269

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Authors: Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, Charles Xie

Abstract: High-dimensional vector similarity search (HVSS) is gaining prominence as a powerful tool for various data science and AI applications. As vector data scales up, in-memory indexes pose a significant challenge due to the substantial increase in main memory requirements. A potential solution involves leveraging disk-based implementation, which stores and searches vector data on high-performance devi… ▽ More High-dimensional vector similarity search (HVSS) is gaining prominence as a powerful tool for various data science and AI applications. As vector data scales up, in-memory indexes pose a significant challenge due to the substantial increase in main memory requirements. A potential solution involves leveraging disk-based implementation, which stores and searches vector data on high-performance devices like NVMe SSDs. However, implementing HVSS for data segments proves to be intricate in vector databases where a single machine comprises multiple segments for system scalability. In this context, each segment operates with limited memory and disk space, necessitating a delicate balance between accuracy, efficiency, and space cost. Existing disk-based methods fall short as they do not holistically address all these requirements simultaneously. In this paper, we present Starling, an I/O-efficient disk-resident graph index framework that optimizes data layout and search strategy within the segment. It has two primary components: (1) a data layout incorporating an in-memory navigation graph and a reordered disk-based graph with enhanced locality, reducing the search path length and minimizing disk bandwidth wastage; and (2) a block search strategy designed to minimize costly disk I/O operations during vector query execution. Through extensive experiments, we validate the effectiveness, efficiency, and scalability of Starling. On a data segment with 2GB memory and 10GB disk capacity, Starling can accommodate up to 33 million vectors in 128 dimensions, offering HVSS with over 0.9 average precision and top-10 recall rate, and latency under 1 millisecond. The results showcase Starling's superior performance, exhibiting 43.9$\times$ higher throughput with 98% lower query latency compared to state-of-the-art methods while maintaining the same level of accuracy. △ Less

Submitted 2 March, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: This paper has been accepted by SIGMOD 2024

arXiv:2312.11523 [pdf, other]

ToViLaG: Your Visual-Language Generative Model is Also An Evildoer

Authors: Xinpeng Wang, Xiaoyuan Yi, Han Jiang, Shanlin Zhou, Zhihua Wei, Xing Xie

Abstract: Warning: this paper includes model outputs showing offensive content. Recent large-scale Visual-Language Generative Models (VLGMs) have achieved unprecedented improvement in multimodal image/text generation. However, these models might also generate toxic content, e.g., offensive text and pornography images, raising significant ethical risks. Despite exhaustive studies on toxic degeneration of lan… ▽ More Warning: this paper includes model outputs showing offensive content. Recent large-scale Visual-Language Generative Models (VLGMs) have achieved unprecedented improvement in multimodal image/text generation. However, these models might also generate toxic content, e.g., offensive text and pornography images, raising significant ethical risks. Despite exhaustive studies on toxic degeneration of language models, this problem remains largely unexplored within the context of visual-language generation. This work delves into the propensity for toxicity generation and susceptibility to toxic data across various VLGMs. For this purpose, we built ToViLaG, a dataset comprising 32K co-toxic/mono-toxic text-image pairs and 1K innocuous but evocative text that tends to stimulate toxicity. Furthermore, we propose WInToRe, a novel toxicity metric tailored to visual-language generation, which theoretically reflects different aspects of toxicity considering both input and output. On such a basis, we benchmarked the toxicity of a diverse spectrum of VLGMs and discovered that some models do more evil than expected while some are more vulnerable to infection, underscoring the necessity of VLGMs detoxification. Therefore, we develop an innovative bottleneck-based detoxification method. Our method could reduce toxicity while maintaining comparable generation quality, providing a promising initial solution to this line of research. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted by EMNLP 2023 (Main Conference), Oral Presentation

arXiv:2312.10674 [pdf]

A Framework of Full-Process Generation Design for Park Green Spaces Based on Remote Sensing Segmentation-GAN-Diffusion

Authors: Ran Chen, Xingjian Yi, **g Zhao, Yueheng He, Bainian Chen, Xueqi Yao, Fangjun Liu, Haoran Li, Zeke Lian

Abstract: The development of generative design driven by artificial intelligence algorithms is speedy. There are two research gaps in the current research: 1) Most studies only focus on the relationship between design elements and pay little attention to the external information of the site; 2) GAN and other traditional generative algorithms generate results with low resolution and insufficient details. To… ▽ More The development of generative design driven by artificial intelligence algorithms is speedy. There are two research gaps in the current research: 1) Most studies only focus on the relationship between design elements and pay little attention to the external information of the site; 2) GAN and other traditional generative algorithms generate results with low resolution and insufficient details. To address these two problems, we integrate GAN, Stable diffusion multimodal large-scale image pre-training model to construct a full-process park generative design method: 1) First, construct a high-precision remote sensing object extraction system for automated extraction of urban environmental information; 2) Secondly, use GAN to construct a park design generation system based on the external environment, which can quickly infer and generate design schemes from urban environmental information; 3) Finally, introduce Stable Diffusion to optimize the design plan, fill in details, and expand the resolution of the plan by 64 times. This method can achieve a fully unmanned design automation workflow. The research results show that: 1) The relationship between the inside and outside of the site will affect the algorithm generation results. 2) Compared with traditional GAN algorithms, Stable diffusion significantly improve the information richness of the generated results. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.09041 [pdf, other]

doi 10.1145/3543507.3583324

Graph Neural Networks with Diverse Spectral Filtering

Authors: **gwei Guo, Kaizhu Huang, ** Yi, Rui Zhang

Abstract: Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph machine learning, with polynomial filters applied for graph convolutions, where all nodes share the identical filter weights to mine their local contexts. Despite the success, existing spectral GNNs usually fail to deal with complex networks (e.g., WWW) due to such homogeneous spectral filtering setting that ignores th… ▽ More Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph machine learning, with polynomial filters applied for graph convolutions, where all nodes share the identical filter weights to mine their local contexts. Despite the success, existing spectral GNNs usually fail to deal with complex networks (e.g., WWW) due to such homogeneous spectral filtering setting that ignores the regional heterogeneity as typically seen in real-world networks. To tackle this issue, we propose a novel diverse spectral filtering (DSF) framework, which automatically learns node-specific filter weights to exploit the varying local structure properly. Particularly, the diverse filter weights consist of two components -- A global one shared among all nodes, and a local one that varies along network edges to reflect node difference arising from distinct graph parts -- to balance between local and global information. As such, not only can the global graph characteristics be captured, but also the diverse local patterns can be mined with awareness of different node positions. Interestingly, we formulate a novel optimization problem to assist in learning diverse filters, which also enables us to enhance any spectral GNNs with our DSF framework. We showcase the proposed framework on three state-of-the-arts including GPR-GNN, BernNet, and JacobiConv. Extensive experiments over 10 benchmark datasets demonstrate that our framework can consistently boost model performance by up to 4.92% in node classification tasks, producing diverse filters with enhanced interpretability. Code is available at \url{https://github.com/**gweio/DSF}. △ Less

Submitted 23 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted by Proceedings of the ACM Web Conference 2023 (WWW '23)

Journal ref: Proceedings of the ACM Web Conference 2023

arXiv:2311.16421 [pdf, other]

CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models

Authors: Yuhang Wang, Yanxu Zhu, Chao Kong, Shuyu Wei, Xiaoyuan Yi, Xing Xie, Jitao Sang

Abstract: As the scaling of Large Language Models (LLMs) has dramatically enhanced their capabilities, there has been a growing focus on the alignment problem to ensure their responsible and ethical use. While existing alignment efforts predominantly concentrate on universal values such as the HHH principle, the aspect of culture, which is inherently pluralistic and diverse, has not received adequate attent… ▽ More As the scaling of Large Language Models (LLMs) has dramatically enhanced their capabilities, there has been a growing focus on the alignment problem to ensure their responsible and ethical use. While existing alignment efforts predominantly concentrate on universal values such as the HHH principle, the aspect of culture, which is inherently pluralistic and diverse, has not received adequate attention. This work introduces a new benchmark, CDEval, aimed at evaluating the cultural dimensions of LLMs. CDEval is constructed by incorporating both GPT-4's automated generation and human verification, covering six cultural dimensions across seven domains. Our comprehensive experiments provide intriguing insights into the culture of mainstream LLMs, highlighting both consistencies and variations across different dimensions and domains. The findings underscore the importance of integrating cultural considerations in LLM development, particularly for applications in diverse cultural settings. Through CDEval, we aim to broaden the horizon of LLM alignment research by including cultural dimensions, thus providing a more holistic framework for the future development and evaluation of LLMs. This benchmark serves as a valuable resource for cultural studies in LLMs, paving the way for more culturally aware and sensitive models. △ Less

Submitted 20 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted by the Cross-Cultural Considerations in NLP Workshop @ ACL 2024

arXiv:2311.10779 [pdf, other]

Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations

Authors: **g Yao, Wei Xu, Jianxun Lian, Xiting Wang, Xiaoyuan Yi, Xing Xie

Abstract: The significant progress of large language models (LLMs) provides a promising opportunity to build human-like systems for various practical applications. However, when applied to specific task domains, an LLM pre-trained on a general-purpose corpus may exhibit a deficit or inadequacy in two types of domain-specific knowledge. One is a comprehensive set of domain data that is typically large-scale… ▽ More The significant progress of large language models (LLMs) provides a promising opportunity to build human-like systems for various practical applications. However, when applied to specific task domains, an LLM pre-trained on a general-purpose corpus may exhibit a deficit or inadequacy in two types of domain-specific knowledge. One is a comprehensive set of domain data that is typically large-scale and continuously evolving. The other is specific working patterns of this domain reflected in the data. The absence or inadequacy of such knowledge impacts the performance of the LLM. In this paper, we propose a general paradigm that augments LLMs with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way. Then, the extracted knowledge is incorporated through prompts, without any computational cost of model fine-tuning. We instantiate the general paradigm on a widespread application, i.e. recommender systems, where critical item attributes and collaborative filtering signals are incorporated. Experimental results demonstrate that DOKE can substantially improve the performance of LLMs in specific domains. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.10766 [pdf, other]

Value FULCRA: Map** Large Language Models to the Multidimensional Spectrum of Basic Human Values

Authors: **g Yao, Xiaoyuan Yi, Xiting Wang, Yifan Gong, Xing Xie

Abstract: The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development. However, how to define values in this context remains a largely unexplored question. Existing work mainly follows the Helpful, Honest, Harmless principle and specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection,… ▽ More The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development. However, how to define values in this context remains a largely unexplored question. Existing work mainly follows the Helpful, Honest, Harmless principle and specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection, suffering from poor clarity, adaptability and transparency. Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and introduces a value space spanned by basic value dimensions. All LLMs' behaviors can be mapped into the space by identifying the underlying values, possessing the potential to address the three challenges. To foster future research, we apply the representative Schwartz's Theory of Basic Values as an initialized example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs. Our extensive analysis of FULCRA reveals the underlying relation between basic values and LLMs' behaviors, demonstrating that our approach not only covers existing mainstream risks but also anticipates possibly unidentified ones. Additionally, we present an initial implementation of the basic value evaluation and alignment, paving the way for future research in this line. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.05126 [pdf, other]

Exploring and Analyzing the Effect of Avatar's Realism on Anxiety of English as Second Language (ESL) Speakers

Authors: Tianqi Liu, Joshua Rafael Sanchez, Yuntao Wang, Xin Yi, Yuanchun Shi

Abstract: The emergence of virtual avatars provides innovative opportunities for remote conferencing, education, and more. Our study investigates how the realism of avatars, used by native English speakers, impacts the anxiety levels of English as a Second Language (ESL) speakers during interactions. ESL participants engaged in conversations with native English speakers represented through cartoonish avatar… ▽ More The emergence of virtual avatars provides innovative opportunities for remote conferencing, education, and more. Our study investigates how the realism of avatars, used by native English speakers, impacts the anxiety levels of English as a Second Language (ESL) speakers during interactions. ESL participants engaged in conversations with native English speakers represented through cartoonish avatars, realistic-like avatars, or actual video streams. We measured both the ESL speakers' self-reported anxiety and their physiological indicators of anxiety. Our findings show that interactions with native speakers using cartoonish avatars or direct video lead to reduced anxiety levels among ESL participants. However, interactions with avatars that closely resemble humans heightened these anxieties. These insights are critically important for the design and application of virtual avatars, especially in addressing cross-cultural communication barriers and enhancing user experience. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 13 pages, 7 figures, 8 tables

arXiv:2311.01957 [pdf, ps, other]

Distributed online constrained convex optimization with event-triggered communication

Authors: Kunpeng Zhang, Xinlei Yi, Yuzhe Li, Ming Cao, Tianyou Chai, Tao Yang

Abstract: This paper focuses on the distributed online convex optimization problem with time-varying inequality constraints over a network of agents, where each agent collaborates with its neighboring agents to minimize the cumulative network-wide loss over time. To reduce communication overhead between the agents, we propose a distributed event-triggered online primal-dual algorithm over a time-varying dir… ▽ More This paper focuses on the distributed online convex optimization problem with time-varying inequality constraints over a network of agents, where each agent collaborates with its neighboring agents to minimize the cumulative network-wide loss over time. To reduce communication overhead between the agents, we propose a distributed event-triggered online primal-dual algorithm over a time-varying directed graph. With several classes of appropriately chose decreasing parameter sequences and non-increasing event-triggered threshold sequences, we establish dynamic network regret and network cumulative constraint violation bounds. Finally, a numerical simulation example is provided to verify the theoretical results. △ Less

Submitted 2 May, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 12 pages, 3 figures

arXiv:2311.01920 [pdf, other]

ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language

Authors: Yuan Tian, Weiwei Cui, Dazhen Deng, Xin**g Yi, Yurun Yang, Haidong Zhang, Yingcai Wu

Abstract: The use of natural language interfaces (NLIs) for the creation of charts is becoming increasingly popular due to the intuitiveness of natural language interactions. One key challenge in this approach is to accurately capture user intents and transform them to proper chart specifications. This obstructs the wide use of NLI in chart generation, as users' natural language inputs are generally abstrac… ▽ More The use of natural language interfaces (NLIs) for the creation of charts is becoming increasingly popular due to the intuitiveness of natural language interactions. One key challenge in this approach is to accurately capture user intents and transform them to proper chart specifications. This obstructs the wide use of NLI in chart generation, as users' natural language inputs are generally abstract (i.e., ambiguous or under-specified), without a clear specification of visual encodings. Recently, pre-trained large language models (LLMs) have exhibited superior performance in understanding and generating natural language, demonstrating great potential for downstream tasks. Inspired by this major trend, we propose ChartGPT, generating charts from abstract natural language inputs. However, LLMs are struggling to address complex logic problems. To enable the model to accurately specify the complex parameters and perform operations in chart generation, we decompose the generation process into a step-by-step reasoning pipeline, so that the model only needs to reason a single and specific sub-task during each run. Moreover, LLMs are pre-trained on general datasets, which might be biased for the task of chart generation. To provide adequate visualization knowledge, we create a dataset consisting of abstract utterances and charts and improve model performance through fine-tuning. We further design an interactive interface for ChartGPT that allows users to check and modify the intermediate outputs of each step. The effectiveness of the proposed system is evaluated through quantitative evaluations and a user study. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.17551 [pdf, other]

Unpacking the Ethical Value Alignment in Big Models

Authors: Xiaoyuan Yi, **g Yao, Xiting Wang, Xing Xie

Abstract: Big models have greatly advanced AI's ability to understand, generate, and manipulate information and content, enabling numerous applications. However, as these models become increasingly integrated into everyday life, their inherent ethical values and potential biases pose unforeseen risks to society. This paper provides an overview of the risks and challenges associated with big models, surveys… ▽ More Big models have greatly advanced AI's ability to understand, generate, and manipulate information and content, enabling numerous applications. However, as these models become increasingly integrated into everyday life, their inherent ethical values and potential biases pose unforeseen risks to society. This paper provides an overview of the risks and challenges associated with big models, surveys existing AI ethics guidelines, and examines the ethical implications arising from the limitations of these models. Taking a normative ethics perspective, we propose a reassessment of recent normative guidelines, highlighting the importance of collaborative efforts in academia to establish a unified and universal AI ethics framework. Furthermore, we investigate the moral inclinations of current mainstream LLMs using the Moral Foundation theory, analyze existing alignment algorithms, and outline the unique challenges encountered in aligning ethical values within them. To address these challenges, we introduce a novel conceptual paradigm for aligning the ethical values of big models and discuss promising research directions for alignment criteria, evaluation, and method, representing an initial step towards the interdisciplinary construction of the ethically aligned AI This paper is a modified English version of our Chinese paper https://crad.ict.ac.cn/cn/article/doi/10.7544/issn1000-1239.202330553, intended to help non-Chinese native speakers better understand our work. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.11053 [pdf, other]

Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning

Authors: Shitong Duan, Xiaoyuan Yi, Peng Zhang, Tun Lu, Xing Xie, Ning Gu

Abstract: Large Language Models (LLMs) have made unprecedented breakthroughs, yet their increasing integration into everyday life might raise societal risks due to generated unethical content. Despite extensive study on specific issues like bias, the intrinsic values of LLMs remain largely unexplored from a moral philosophy perspective. This work delves into ethical values utilizing Moral Foundation Theory.… ▽ More Large Language Models (LLMs) have made unprecedented breakthroughs, yet their increasing integration into everyday life might raise societal risks due to generated unethical content. Despite extensive study on specific issues like bias, the intrinsic values of LLMs remain largely unexplored from a moral philosophy perspective. This work delves into ethical values utilizing Moral Foundation Theory. Moving beyond conventional discriminative evaluations with poor reliability, we propose DeNEVIL, a novel prompt generation algorithm tailored to dynamically exploit LLMs' value vulnerabilities and elicit the violation of ethics in a generative manner, revealing their underlying value inclinations. On such a basis, we construct MoralPrompt, a high-quality dataset comprising 2,397 prompts covering 500+ value principles, and then benchmark the intrinsic values across a spectrum of LLMs. We discovered that most models are essentially misaligned, necessitating further ethical value alignment. In response, we develop VILMO, an in-context alignment method that substantially enhances the value compliance of LLM outputs by learning to generate appropriate value instructions, outperforming existing competitors. Our methods are suitable for black-box and open-source models, offering a promising initial step in studying the ethical values of LLMs. △ Less

Submitted 4 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted by ICLR 2024

arXiv:2310.02027 [pdf, other]

DeepHGCN: Recipe for Efficient and Scalable Deep Hyperbolic Graph Convolutional Networks

Authors: Jiaxu Liu, ** Yi, Xiaowei Huang

Abstract: Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, develo** a hyperbolic therapy… ▽ More Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, develo** a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants. △ Less

Submitted 28 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: 14 pages including appendix and reference

arXiv:2309.09276 [pdf, other]

MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification

Authors: Junjie Zhu, Yiying Li, Chun** Qiu, Ke Yang, Naiyang Guan, Xiaodong Yi

Abstract: Vision Transformer (ViT) models have recently emerged as powerful and versatile models for various visual tasks. Recently, a work called PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models. However, PMF employs full fine-tuning for learning the downstream tasks, leading to significant overfitting and storage issues, especially in t… ▽ More Vision Transformer (ViT) models have recently emerged as powerful and versatile models for various visual tasks. Recently, a work called PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models. However, PMF employs full fine-tuning for learning the downstream tasks, leading to significant overfitting and storage issues, especially in the remote sensing domain. In order to tackle these issues, we turn to the recently proposed parameter-efficient tuning methods, such as VPT, which updates only the newly added prompt parameters while kee** the pre-trained backbone frozen. Inspired by VPT, we propose the Meta Visual Prompt Tuning (MVP) method. Specifically, we integrate the VPT method into the meta-learning framework and tailor it to the remote sensing domain, resulting in an efficient framework for Few-Shot Remote Sensing Scene Classification (FS-RSSC). Furthermore, we introduce a novel data augmentation strategy based on patch embedding recombination to enhance the representation and diversity of scenes for classification purposes. Experiment results on the FS-RSSC benchmark demonstrate the superior performance of the proposed MVP over existing methods in various settings, such as various-way-various-shot, various-way-one-shot, and cross-domain adaptation. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: SUBMIT TO IEEE TRANSACTIONS

arXiv:2309.04885 [pdf, other]

Symplectic Structure-Aware Hamiltonian (Graph) Embeddings

Authors: Jiaxu Liu, ** Yi, Tianle Zhang, Xiaowei Huang

Abstract: In traditional Graph Neural Networks (GNNs), the assumption of a fixed embedding manifold often limits their adaptability to diverse graph geometries. Recently, Hamiltonian system-inspired GNNs have been proposed to address the dynamic nature of such embeddings by incorporating physical laws into node feature updates. We present Symplectic Structure-Aware Hamiltonian GNN (SAH-GNN), a novel approac… ▽ More In traditional Graph Neural Networks (GNNs), the assumption of a fixed embedding manifold often limits their adaptability to diverse graph geometries. Recently, Hamiltonian system-inspired GNNs have been proposed to address the dynamic nature of such embeddings by incorporating physical laws into node feature updates. We present Symplectic Structure-Aware Hamiltonian GNN (SAH-GNN), a novel approach that generalizes Hamiltonian dynamics for more flexible node feature updates. Unlike existing Hamiltonian approaches, SAH-GNN employs Riemannian optimization on the symplectic Stiefel manifold to adaptively learn the underlying symplectic structure, circumventing the limitations of existing Hamiltonian GNNs that rely on a pre-defined form of standard symplectic structure. This innovation allows SAH-GNN to automatically adapt to various graph datasets without extensive hyperparameter tuning. Moreover, it conserves energy during training meaning the implicit Hamiltonian system is physically meaningful. Finally, we empirically validate SAH-GNN's superiority and adaptability in node classification tasks across multiple types of graph datasets. △ Less

Submitted 1 December, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: 5 pages main content with 5 pages appendix

arXiv:2309.00310 [pdf, other]

Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture

Authors: Shaohua Pan, Qi Ma, Xinyu Yi, Weifeng Hu, Xiong Wang, Xingkang Zhou, Jijunnan Li, Feng Xu

Abstract: Either RGB images or inertial signals have been used for the task of motion capture (mocap), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent difficulties of using one modality input, including occlusions, extreme lighting/texture, and out-of-view for visual mocap and global drifts for inertial mocap. To thi… ▽ More Either RGB images or inertial signals have been used for the task of motion capture (mocap), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent difficulties of using one modality input, including occlusions, extreme lighting/texture, and out-of-view for visual mocap and global drifts for inertial mocap. To this end, we propose a method that fuses monocular images and sparse IMUs for real-time human motion capture. Our method contains a dual coordinate strategy to fully explore the IMU signals with different goals in motion capture. To be specific, besides one branch transforming the IMU signals to the camera coordinate system to combine with the image information, there is another branch to learn from the IMU signals in the body root coordinate system to better estimate body poses. Furthermore, a hidden state feedback mechanism is proposed for both two branches to compensate for their own drawbacks in extreme input cases. Thus our method can easily switch between the two kinds of signals or combine them in different cases to achieve a robust mocap. %The two divided parts can help each other for better mocap results under different conditions. Quantitative and qualitative results demonstrate that by delicately designing the fusion method, our technique significantly outperforms the state-of-the-art vision, IMU, and combined methods on both global orientation and local pose estimation. Our codes are available for research at https://shaohua-pan.github.io/robustcap-page/. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Accepted by SIGGRAPH ASIA 2023. Project page: https://shaohua-pan.github.io/robustcap-page/

Showing 1–50 of 220 results for author: Yi, X