Skip to main content

Showing 1–50 of 439 results for author: Ye, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18572  [pdf, other

    cs.CV cs.LG

    GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

    Authors: Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

    Abstract: This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  2. arXiv:2406.18051  [pdf, other

    cs.CV

    ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

    Authors: Zhengqing Yuan, Rong Zhou, Hongyi Wang, Lifang He, Yanfang Ye, Lichao Sun

    Abstract: Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose significant challenges for deployment in resource-constrained environments. This paper introduces ViT-1.58b, a novel 1.58-bit quantized ViT model designed to dr… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17642  [pdf, other

    cs.CL cs.AI

    Banishing LLM Hallucinations Requires Rethinking Generalization

    Authors: Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos

    Abstract: Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16793  [pdf, other

    cs.LG cs.AI

    Adam-mini: Use Fewer Learning Rates To Gain More

    Authors: Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun

    Abstract: We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle… ▽ More

    Submitted 1 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.15796  [pdf, other

    cs.CL

    Rethinking Entity-level Unlearning for Large Language Models

    Authors: Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin

    Abstract: Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Work in progress

  6. arXiv:2406.12753  [pdf, other

    cs.CL cs.AI

    OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

    Authors: Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang , et al. (3 additional authors not shown)

    Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 44 pages

  7. arXiv:2406.12433  [pdf, other

    cs.IR

    LLM-enhanced Reranking in Recommender Systems

    Authors: **gtong Gao, Bo Chen, Xiangyu Zhao, Weiwen Liu, Xiangyang Li, Yichao Wang, Zijian Zhang, Wanyu Wang, Yuyang Ye, Shanru Lin, Huifeng Guo, Ruiming Tang

    Abstract: Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. Traditional reranking models have focused predominantly on accuracy, but modern applications demand consideration of additional criteria such as diversity and fairness. Existing reranking approaches often fail to harmonize these diverse criteria effectively at th… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.09905  [pdf, other

    cs.CV cs.GR

    Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

    Authors: Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo ** Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

    Abstract: We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  9. arXiv:2406.01326  [pdf, other

    cs.CV

    TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

    Authors: Weichao Zhao, Hao Feng, Qi Liu, **gqun Tang, Shu Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Houqiang Li, Can Huang

    Abstract: Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy me… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 20 pages, 8 figures

  10. arXiv:2405.14869  [pdf, other

    cs.CV cs.AI cs.GR

    PuzzleAvatar: Assembling 3D Avatars from Personal Albums

    Authors: Yuliang Xiu, Yufei Ye, Zhen Liu, Dimitrios Tzionas, Michael J. Black

    Abstract: Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar i… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: video: https://youtu.be/0hpXH2tVPk4

  11. arXiv:2405.14206  [pdf, other

    cs.CV

    LG-VQ: Language-Guided Codebook Learning

    Authors: Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

    Abstract: Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal per… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: None

  12. arXiv:2405.11985  [pdf, other

    cs.CV

    MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

    Authors: **gqun Tang, Qi Liu, Yongjie Ye, **ghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao Liu, Xiang Bai, Can Huang

    Abstract: Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding. Nonetheless, most existing TEC-VQA benchmarks have focused on high-resource languages like English and Chinese. Despite pioneering wo… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  13. arXiv:2405.06646  [pdf, other

    cs.GR cs.CV

    On-the-fly Learning to Transfer Motion Style with Diffusion Models: A Semantic Guidance Approach

    Authors: Lei Hu, Zihao Zhang, Yong**g Ye, Yiwen Xu, Shihong Xia

    Abstract: In recent years, the emergence of generative models has spurred development of human motion generation, among which the generation of stylized human motion has consistently been a focal point of research. The conventional approach for stylized human motion generation involves transferring the style from given style examples to new motions. Despite decades of research in human motion style transfer… ▽ More

    Submitted 20 March, 2024; originally announced May 2024.

    Comments: 23 pages

    MSC Class: 68U05 ACM Class: I.3.0

  14. arXiv:2405.06468  [pdf, other

    cs.CV cs.CL

    Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification

    Authors: Yaoqin Ye, Junjie Zhang, Hongwei Shi

    Abstract: The task of medical image recognition is notably complicated by the presence of varied and multiple pathological indications, presenting a unique challenge in multi-label classification with unseen labels. This complexity underlines the need for computer-aided diagnosis methods employing multi-label zero-shot learning. Recent advancements in pre-trained vision-language models (VLMs) have showcased… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  15. arXiv:2405.04289  [pdf, ps, other

    cs.NE

    Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

    Authors: Chenlin Zhou, Han Zhang, Liutao Yu, Yumin Ye, Zhaokun Zhou, Liwei Huang, Zhengyu Ma, Xiaopeng Fan, Huihui Zhou, Yonghong Tian

    Abstract: Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks (ANNs), in virtue of their high biological plausibility, rich spatial-temporal dynamics, and event-driven computation. The direct training algorithms based on the surrogate gradient method provide sufficient flexibility to design novel SNN architectures and explore the spatial-temporal dynam… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 29 pages

  16. arXiv:2405.01510  [pdf, other

    cs.SI cs.DB

    Reverse Influential Community Search Over Social Networks (Technical Report)

    Authors: Qi Wen, Nan Zhang, Yutong Ye, Xiang Lian, Mingsong Chen

    Abstract: As an important fundamental task of numerous real-world applications such as social network analysis and online advertising/marketing, several prior works studied influential community search, which retrieves a community with high structural cohesiveness and maximum influences on other users in social networks. However, previous works usually considered the influences of the community on arbitrary… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  17. arXiv:2404.18144  [pdf, other

    cs.LG cs.AI cs.HC

    Generative AI for Visualization: State of the Art and Future Directions

    Authors: Yilin Ye, Jianing Hao, Yihan Hou, Zhan Wang, Shishi Xiao, Yuyu Luo, Wei Zeng

    Abstract: Generative AI (GenAI) has witnessed remarkable progress in recent years and demonstrated impressive performance in various generation tasks in different domains such as computer vision and computational design. Many researchers have attempted to integrate GenAI into visualization framework, leveraging the superior generative capacity for different operations. Concurrently, recent major breakthroug… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  18. arXiv:2404.17186  [pdf, other

    cs.CV cs.AI cs.LG

    MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information

    Authors: Jiajun Liang, Baoquan Zhang, Yunming Ye, Xutao Li, Chuyao Luo, Xukai Fu

    Abstract: The accurate detection of Mesoscale Convective Systems (MCS) is crucial for meteorological monitoring due to their potential to cause significant destruction through severe weather phenomena such as hail, thunderstorms, and heavy rainfall. However, the existing methods for MCS detection mostly targets on single-frame detection, which just considers the static characteristics and ignores the tempor… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  19. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  20. arXiv:2404.16336  [pdf, other

    cs.LG cs.CV

    FedStyle: Style-Based Federated Learning Crowdsourcing Framework for Art Commissions

    Authors: Changjuan Ran, Yeting Guo, Fang Liu, Shenglan Cui, Yunfan Ye

    Abstract: The unique artistic style is crucial to artists' occupational competitiveness, yet prevailing Art Commission Platforms rarely support style-based retrieval. Meanwhile, the fast-growing generative AI techniques aggravate artists' concerns about releasing personal artworks to public platforms. To achieve artistic style-based retrieval without exposing personal artworks, we propose FedStyle, a style-… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to ICME 2024

  21. arXiv:2404.15675  [pdf, other

    cs.IR

    Hi-Gen: Generative Retrieval For Large-Scale Personalized E-commerce Search

    Authors: Yan**g Wu, Yinfu Feng, Jian Wang, Wenji Zhou, Yunan Ye, Rong Xiao

    Abstract: Leveraging generative retrieval (GR) techniques to enhance search systems is an emerging methodology that has shown promising results in recent years. In GR, a text-to-text model maps string queries directly to relevant document identifiers (docIDs), so it dramatically simplifies the whole retrieval process. However, when applying most GR models in large-scale E-commerce for personalized item sear… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.15380  [pdf, other

    cs.LG cs.AI

    ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model

    Authors: Yuanshao Zhu, James Jianqiao Yu, Xiangyu Zhao, Qidong Liu, Yongchao Ye, Wei Chen, Zijian Zhang, Xuetao Wei, Yuxuan Liang

    Abstract: Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizabi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  23. arXiv:2404.12925  [pdf, other

    cs.CV

    A Hybrid Generative and Discriminative PointNet on Unordered Point Sets

    Authors: Yang Ye, Shihao Ji

    Abstract: As point cloud provides a natural and flexible representation usable in myriad applications (e.g., robotics and self-driving cars), the ability to synthesize point clouds for analysis becomes crucial. Recently, Xie et al. propose a generative model for unordered point sets in the form of an energy-based model (EBM). Despite the model achieving an impressive performance for point cloud generation,… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  24. arXiv:2404.12383  [pdf, ps, other

    cs.CV

    G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

    Authors: Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani

    Abstract: We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field f… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: accepted to CVPR2024; project page at https://judyye.github.io/ghop-www

  25. arXiv:2404.10512  [pdf

    cs.LG

    Four-hour thunderstorm nowcasting using deep diffusion models of satellite

    Authors: Kuai Dai, Xutao Li, Junying Fang, Yunming Ye, Demin Yu, Di Xian, Danyu Qin

    Abstract: Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  26. arXiv:2404.09150  [pdf, other

    cs.RO cs.GR

    Learning Cross-hand Policies for High-DOF Reaching and Gras**

    Authors: Qi** She, Shishun Zhang, Yunfan Ye, Min Liu, Ruizhen Hu, Kai Xu

    Abstract: Reaching-and-gras** is a fundamental skill for robotic manipulation, but existing methods usually train models on a specific gripper and cannot be reused on another gripper without retraining. In this paper, we propose a novel method that can learn a unified policy model that can be easily transferred to different dexterous grippers. Our method consists of two stages: a gripper-agnostic policy m… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  27. arXiv:2404.08660  [pdf, other

    cs.IR cs.LG

    How Does Message Passing Improve Collaborative Filtering?

    Authors: Mingxuan Ju, William Shiao, Zhichun Guo, Yanfang Ye, Yozen Liu, Neil Shah, Tong Zhao

    Abstract: Collaborative filtering (CF) has exhibited prominent results for recommender systems and been broadly utilized for real-world applications. A branch of research enhances CF methods by message passing used in graph neural networks, due to its strong capabilities of extracting knowledge from graph-structured data, like user-item bipartite graphs that naturally exist in CF. They assume that message p… ▽ More

    Submitted 27 March, 2024; originally announced April 2024.

  28. arXiv:2404.07962  [pdf, other

    cs.CV cs.LG

    Live and Learn: Continual Action Clustering with Incremental Views

    Authors: Xiaoqiang Yan, Yingtao Gan, Yiqiao Mao, Yangdong Ye, Hui Yu

    Abstract: Multi-view action clustering leverages the complementary information from different camera views to enhance the clustering performance. Although existing approaches have achieved significant progress, they assume all camera views are available in advance, which is impractical when the camera view is incremental over time. Besides, learning the invariant information among multiple camera views is s… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

  29. arXiv:2404.03080  [pdf, other

    cs.CL cs.AI

    Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

    Authors: Yanpeng Ye, Jie Ren, Shaozhou Wang, Yuwei Wan, Haofen Wang, Imran Razzak, Tong Xie, Wenjie Zhang

    Abstract: Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges for efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures, 3 tables

  30. arXiv:2404.00929  [pdf, other

    cs.CL cs.AI

    A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

    Authors: Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu

    Abstract: Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, ho** to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inh… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  31. arXiv:2404.00838  [pdf, other

    cs.CV

    3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching

    Authors: Yibin Ye, Xichao Teng, Shuo Chen, Yijie Bian, Tao Tan, Zhang Li

    Abstract: Optical-SAR image matching is a fundamental task for image fusion and visual navigation. However, all large-scale open SAR dataset for methods development are collected from single platform, resulting in limited satellite types and spatial resolutions. Since images captured by different sensors vary significantly in both geometric and radiometric appearance, existing methods may fail to match corr… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 20pages 17 figures

  32. arXiv:2403.19098  [pdf, other

    cs.CV

    GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving

    Authors: Yunpeng Zhang, Deheng Qian, Ding Li, Yifeng Pan, Yong Chen, Zhenbao Liang, Zhiyao Zhang, Shurui Zhang, Hongxu Li, Maolei Fu, Yun Ye, Zhu** Liang, Yi Shan, Dalong Du

    Abstract: Modeling complicated interactions among the ego-vehicle, road agents, and map elements has been a crucial part for safety-critical autonomous driving. Previous works on end-to-end autonomous driving rely on the attention mechanism for handling heterogeneous interactions, which fails to capture the geometric priors and is also computationally intensive. In this paper, we propose the Interaction Sce… ▽ More

    Submitted 6 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: project page: https://github.com/zhangyp15/GraphAD

  33. arXiv:2403.18393  [pdf, other

    cs.LG

    Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering

    Authors: Long Shi, Lei Cao, Yunshan Ye, Yu Zhao, Badong Chen

    Abstract: In the context of multi-view clustering, graph learning is recognized as a crucial technique, which generally involves constructing an adaptive neighbor graph based on probabilistic neighbors, and then learning a consensus graph to for clustering. However, they are confronted with two limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive n… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  34. arXiv:2403.18202  [pdf, other

    cs.SE

    TGMM: Combining Parse Tree with GPU for Scalable Multilingual and Multi-Granularity Code Clone Detection

    Authors: Yuhang Ye, Yuekun Wang, Yinxing Xue, Yueming Wu, Yang Liu

    Abstract: The rapid evolution of programming languages and software systems has necessitated the implementation of multilingual and scalable clone detection tools. However, it is difficult to achieve the above requirements at the same time. Most existing tools only focus on one challenge. In this work, we propose TGMM, a tree and GPU-based tool for multilingual and multi-granularity code clone detection. By… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 14 pages, 7 figures

  35. arXiv:2403.16558  [pdf, other

    cs.CV

    Elysium: Exploring Object-level Perception in Videos via MLLM

    Authors: Han Wang, Yanjie Wang, Yongjie Ye, Yuxiang Nie, Can Huang

    Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied. This lack of exploration is primarily due to two key challenges. Firstly, extensive pretraining on large-scale video datasets is required to equip MLLMs with the capability to perceive objects acr… ▽ More

    Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  36. arXiv:2403.16477  [pdf, other

    cs.IT eess.SP

    Safeguarding Next Generation Multiple Access Using Physical Layer Security Techniques: A Tutorial

    Authors: Lu Lv, Dongyang Xu, Rose Qingyang Hu, Yinghui Ye, Long Yang, Xianfu Lei, Xianbin Wang, Dong In Kim, Arumugam Nallanathan

    Abstract: Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Neve… ▽ More

    Submitted 21 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Invited paper by Proceedings of the IEEE

  37. arXiv:2403.15834  [pdf, other

    cs.RO cs.AI

    ARO: Large Language Model Supervised Robotics Text2Skill Autonomous Learning

    Authors: Yiwen Chen, Yuyao Ye, Ziyi Chen, Chuheng Zhang, Marcelo H. Ang

    Abstract: Robotics learning highly relies on human expertise and efforts, such as demonstrations, design of reward functions in reinforcement learning, performance evaluation using human feedback, etc. However, reliance on human assistance can lead to expensive learning costs and make skill learning difficult to scale. In this work, we introduce the Large Language Model Supervised Robotics Text2Skill Autono… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 6 pages, 2 figures

  38. arXiv:2403.15681  [pdf, other

    cs.IT cs.LG

    Differentiable Information Bottleneck for Deterministic Multi-view Clustering

    Authors: Xiaoqiang Yan, Zhixiang **, Fengshou Han, Yangdong Ye

    Abstract: In recent several years, the information bottleneck (IB) principle provides an information-theoretic framework for deep multi-view clustering (MVC) by compressing multi-view observations while preserving the relevant information of multiple views. Although existing IB-based deep MVC methods have achieved huge success, they rely on variational approximation and distribution assumption to estimate t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures, cvpr 2024

  39. arXiv:2403.15433  [pdf, other

    eess.SP cs.AI cs.LG eess.IV

    HyPer-EP: Meta-Learning Hybrid Personalized Models for Cardiac Electrophysiology

    Authors: Xiajun Jiang, Sumeet Vadhavkar, Yubo Ye, Maryam Toloubidokhti, Ryan Missel, Linwei Wang

    Abstract: Personalized virtual heart models have demonstrated increasing potential for clinical use, although the estimation of their parameters given patient-specific data remain a challenge. Traditional physics-based modeling approaches are computationally costly and often neglect the inherent structural errors in these models due to model simplifications and assumptions. Modern deep learning approaches,… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  40. arXiv:2403.13372  [pdf, other

    cs.CL cs.AI

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    Authors: Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, Yongqiang Ma

    Abstract: Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present LlamaFactory, a unified framework that integrates a suite of cutting-edge efficient training methods. It provides a solution for flexibly customizing the fine-tuning of 100+ LLMs without the need for coding… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 13 pages, accepted to ACL 2024 System Demonstration Track

  41. arXiv:2403.13219  [pdf, other

    cs.LG math.OC

    Diffusion Model for Data-Driven Black-Box Optimization

    Authors: Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang

    Abstract: Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scen… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2307.07055

  42. arXiv:2403.10071  [pdf, other

    cs.CV

    Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling

    Authors: Baoquan Zhang, Huaibin Wang, Luo Chuyao, Xutao Li, Liang Guotao, Yunming Ye, Xiaochen Qi, Yao He

    Abstract: Vector-Quantized Image Modeling (VQIM) is a fundamental research problem in image synthesis, which aims to represent an image with a discrete token sequence. Existing studies effectively address this problem by learning a discrete codebook from scratch and in a code-independent manner to quantize continuous representations into discrete tokens. However, learning a codebook from scratch and in a co… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  43. arXiv:2403.09732  [pdf, other

    cs.CL cs.AI

    PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency

    Authors: Zhishuai Li, Xiang Wang, **g**g Zhao, Sun Yang, Guoqing Du, Xiaoru Hu, Bin Zhang, Yuxiao Ye, Ziyue Li, Rui Zhao, Hangyu Mao

    Abstract: Recent advancements in Text-to-SQL (Text2SQL) emphasize stimulating the large language models (LLM) on in-context learning, achieving significant results. Nevertheless, they face challenges when dealing with verbose database information and complex user intentions. This paper presents a two-stage framework to enhance the performance of current LLM-based natural language to SQL systems. We first in… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  44. arXiv:2403.08820  [pdf, other

    cs.LG cs.AI cs.SI

    Diet-ODIN: A Novel Framework for Opioid Misuse Detection with Interpretable Dietary Patterns

    Authors: Zheyuan Zhang, Zehong Wang, Shifu Hou, Evan Hall, Landon Bachman, Vincent Galassi, Jasmine White, Nitesh V. Chawla, Chuxu Zhang, Yanfang Ye

    Abstract: The opioid crisis has been one of the most critical society concerns in the United States. Although the medication assisted treatment (MAT) is recognized as the most effective treatment for opioid misuse and addiction, the various side effects can trigger opioid relapse. In addition to MAT, the dietary nutrition intervention has been demonstrated its importance in opioid misuse prevention and reco… ▽ More

    Submitted 21 February, 2024; originally announced March 2024.

  45. arXiv:2403.08194  [pdf, other

    cs.LG stat.ML

    Unsupervised Learning of Hybrid Latent Dynamics: A Learn-to-Identify Framework

    Authors: Yubo Ye, Sumeet Vadhavkar, Xiajun Jiang, Ryan Missel, Huafeng Liu, Linwei Wang

    Abstract: Modern applications increasingly require unsupervised learning of latent dynamics from high-dimensional time-series. This presents a significant challenge of identifiability: many abstract latent representations may reconstruct observations, yet do they guarantee an adequate identification of the governing dynamics? This paper investigates this challenge from two angles: the use of physics inducti… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Under Review

  46. arXiv:2403.08193  [pdf, other

    cs.LG cs.AR cs.ET

    Learning-driven Physically-aware Large-scale Circuit Gate Sizing

    Authors: Yuyang Ye, Peng Xu, Lizheng Ren, Tinghuan Chen, Hao Yan, Bei Yu, Longxing Shi

    Abstract: Gate sizing plays an important role in timing optimization after physical design. Existing machine learning-based gate sizing works cannot optimize timing on multiple timing paths simultaneously and neglect the physical constraint on layouts. They cause sub-optimal sizing solutions and low-efficiency issues when compared with commercial gate sizing tools. In this work, we propose a learning-driven… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  47. arXiv:2403.07942  [pdf, other

    cs.CR cs.CV

    Attacking Transformers with Feature Diversity Adversarial Perturbation

    Authors: Chenxing Gao, Hang Zhou, Junqing Yu, YuTeng Ye, Jiale Cai, Junle Wang, Wei Yang

    Abstract: Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  48. arXiv:2403.02951  [pdf, other

    cs.CL cs.AI

    Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

    Authors: Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Yang, Chi Harold Liu, Rui Zhao, Ziyue Li, Hangyu Mao

    Abstract: Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL pr… ▽ More

    Submitted 6 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 26pages, 6figures, 14tables

  49. arXiv:2402.18099  [pdf, other

    cs.CL cs.AI

    Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models

    Authors: Derong Xu, Ziheng Zhang, Zhihong Zhu, Zhenxi Lin, Qidong Liu, Xian Wu, Tong Xu, Wanyu Wang, Yuyang Ye, Xiangyu Zhao, Yefeng Zheng, Enhong Chen

    Abstract: Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge, while leaving unrelated knowledge intact. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored, even though resol… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  50. arXiv:2402.16667  [pdf, other

    cs.CL cs.AI

    RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation

    Authors: Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu, Xin Cong, Yankai Lin, Yingli Zhang, Xiaoyin Che, Zhiyuan Liu, Maosong Sun

    Abstract: Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging. However, their utilization in the domain of code documentation generation remains underexplored. To this end, we introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code docu… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    ACM Class: I.2.7; F.2.2