Skip to main content

Showing 1–50 of 508 results for author: He, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00711  [pdf, other

    cs.CE

    Beyond the Yield Barrier: Variational Importance Sampling Yield Analysis

    Authors: Yanfang Liu, Lei He, Wei W. Xing

    Abstract: Optimal mean shift vector (OMSV)-based importance sampling methods have long been prevalent in yield estimation and optimization as an industry standard. However, most OMSV-based methods are designed heuristically without a rigorous understanding of their limitations. To this end, we propose VIS, the first variational analysis framework for yield problems, enabling a systematic refinement for OMSV… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 2024 43rd ACM/IEEE International Conference on Computer-Aided Design (ICCAD)

    MSC Class: 68U07 ACM Class: J.6

  2. arXiv:2407.00091  [pdf, other

    cs.IR cs.HC cs.LG

    Learning to Rank for Maps at Airbnb

    Authors: Malay Haldar, Hongwei Zhang, Kedar Bellare, Sherry Chen, Soumyadip Banerjee, Xiaotang Wang, Mustafa Abdool, Huiji Gao, Pavan Tapadia, Liwei He, Sanjeev Katariya

    Abstract: As a two-sided marketplace, Airbnb brings together hosts who own listings for rent with prospective guests from around the globe. Results from a guest's search for listings are displayed primarily through two interfaces: (1) as a list of rectangular cards that contain on them the listing image, price, rating, and other details, referred to as list-results (2) as oval pins on a map showing the list… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  3. arXiv:2407.00024  [pdf, other

    cs.CV cs.AI cs.MM

    LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

    Authors: Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari

    Abstract: Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection con… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  4. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  5. arXiv:2406.18051  [pdf, other

    cs.CV

    ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

    Authors: Zhengqing Yuan, Rong Zhou, Hongyi Wang, Lifang He, Yanfang Ye, Lichao Sun

    Abstract: Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose significant challenges for deployment in resource-constrained environments. This paper introduces ViT-1.58b, a novel 1.58-bit quantized ViT model designed to dr… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  6. arXiv:2406.16536  [pdf, other

    cs.CL

    C-LLM: Learn to Check Chinese Spelling Errors Character by Character

    Authors: Kunting Li, Yong Hu, Liang He, Fandong Meng, Jie Zhou

    Abstract: Chinese Spell Checking (CSC) aims to detect and correct spelling errors in sentences. Despite Large Language Models (LLMs) exhibit robust capabilities and are widely applied in various tasks, their performance on CSC is often unsatisfactory. We find that LLMs fail to meet the Chinese character-level constraints of the CSC task, namely equal length and phonetic similarity, leading to a performance… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.14598  [pdf, other

    cs.AI

    SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

    Authors: Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

    Abstract: Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.14526  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Fantastic Copyrighted Beasts and How (Not) to Generate Them

    Authors: Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson

    Abstract: Recent studies show that image and video generation models can be prompted to reproduce copyrighted content from their training data, raising serious legal concerns around copyright infringement. Copyrighted characters, in particular, pose a difficult challenge for image generation services, with at least one lawsuit already awarding damages based on the generation of these characters. Yet, little… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.12548  [pdf, other

    cs.CL

    P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts

    Authors: Yuhao Dan, Jie Zhou, Qin Chen, Junfeng Tian, Liang He

    Abstract: Personalized large language models (LLMs) have attracted great attention in many applications, such as intelligent education and emotional support. Most work focuses on controlling the character settings based on the profile (e.g., age, skill, experience, and so on). Conversely, the psychological theory-based personality traits with implicit expression and behavior are not well modeled, limiting t… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  10. arXiv:2406.11441  [pdf, other

    cs.CV

    SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

    Authors: Zhenchao Lin, Li He, Hongqiang Yang, Xiaoqun Sun, Cuo** Zhang, Weinan Chen, Yisheng Guan, Hong Zhang

    Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.09095  [pdf, other

    cs.CL

    Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

    Authors: Yuhao Dan, Junfeng Tian, Jie Zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He

    Abstract: Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  12. arXiv:2406.07147  [pdf

    cs.HC cs.AI cs.CY

    Wearable Device-Based Physiological Signal Monitoring: An Assessment Study of Cognitive Load Across Tasks

    Authors: Ling He, Yanxin Chen, Wenqi Wang, Shuting He, Xiaoqiang Hu

    Abstract: This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution cognitive load assessment on EEG data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students(SVS). By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  13. arXiv:2406.05540  [pdf, other

    q-bio.QM cs.AI cs.CL cs.LG

    A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

    Authors: Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, Yu Guang Wang

    Abstract: The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  14. arXiv:2406.00976  [pdf, other

    cs.CL cs.SD eess.AS

    Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

    Authors: Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu

    Abstract: While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio wavef… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept in ACL2024-main

  15. arXiv:2406.00037  [pdf, other

    cs.CL cs.AI

    Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering

    Authors: Hongyu Yang, Liyang He, Min Hou, Shuanghong Shen, Rui Li, Jiahui Hou, Jianhui Ma, Junda Zhao

    Abstract: Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for p… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  16. arXiv:2405.20614  [pdf, other

    cs.CV

    EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screening

    Authors: Junming Ren, Zhoujian Xiao, Yujia Zhang, Yujie Yang, Ling He, Ezra Yoon, Stephen Temitayo Bello, Xi Chen, Dapeng Wu, Micky Tortorella, Jufang He

    Abstract: In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex opera… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  17. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Gei**, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  18. arXiv:2405.18653  [pdf, other

    cs.CL

    Recent Advances of Foundation Language Models-based Continual Learning: A Survey

    Authors: Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Liang He, Yuan Xie

    Abstract: Recently, foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV). Unlike traditional neural network models, foundation LMs obtain a great ability for transfer learning by acquiring rich commonsense knowledge through pre-training on extensive unsupervised datasets with a vast number of parameters. However, t… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  19. arXiv:2405.18187  [pdf, other

    cs.LG

    AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

    Authors: Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

    Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 3 figures, 4 tables

  20. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  21. arXiv:2405.14334  [pdf, other

    cs.CV

    Hierarchical Salient Patch Identification for Interpretable Fundus Disease Localization

    Authors: Yitao Peng, Lianghua He, Die Hu

    Abstract: With the widespread application of deep learning technology in medical image analysis, how to effectively explain model decisions and improve diagnosis accuracy has become an urgent problem that needs to be solved. Attribution methods have become a key tool to help doctors better understand the diagnostic basis of models, and they are used to explain and localize diseases in medical images. Howeve… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  22. arXiv:2405.13190  [pdf, other

    cs.LG cs.AI

    Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation

    Authors: Haoteng Tang, Guodong Liu, Siyuan Dai, Kai Ye, Kun Zhao, Wenlu Wang, Carl Yang, Lifang He, Alex Leow, Paul Thompson, Heng Huang, Liang Zhan

    Abstract: The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  23. arXiv:2405.12100  [pdf, other

    cs.CL

    DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

    Authors: Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou

    Abstract: Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs i… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  24. arXiv:2405.11459  [pdf, other

    eess.SP cs.CL q-bio.NC

    Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

    Authors: Hui Zheng, Hai-Teng Wang, Wei-Bang Jiang, Zhong-Tao Chen, Li He, Pei-Yang Lin, Peng-Hu Wei, Guo-Guang Zhao, Yun-Zhe Liu

    Abstract: Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  25. arXiv:2405.09215  [pdf, other

    cs.CV cs.AI

    Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

    Authors: Wanting Xu, Yang Liu, Lang** He, Xucheng Huang, Ling Jiang

    Abstract: We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, emp… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  26. arXiv:2405.09055  [pdf, other

    cs.CL

    A safety realignment framework via subspace-oriented model fusion for large language models

    Authors: Xin Yi, Shunfan Zheng, Linlin Wang, Xiaoling Wang, Liang He

    Abstract: The current safeguard mechanisms for large language models (LLMs) are indeed susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine-tuning on apparently benign data for downstream tasks can jeopardize safety. One potential solution is to conduct safety fine-tuning subsequent to downstream fine-tuning. However, there's a risk of catastrophic forgetting during saf… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  27. arXiv:2405.09045  [pdf, other

    cs.CV

    AMSNet: Netlist Dataset for AMS Circuits

    Authors: Zhuofu Tao, Yichen Shi, Yiru Huo, Rui Ye, Zonghang Li, Li Huang, Chen Wu, Na Bai, Zhi** Yu, Ting-Jung Lin, Lei He

    Abstract: Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention. The advent of multimodal large language models (MLLMs) has unveiled significant potential across various fields, suggesting their applicability in streamlining large-scale AMS IC design as well. A bottleneck in employing MLLMs for automatic AMS circuit generation is the absence of a comprehens… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  28. arXiv:2405.05722  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

    Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

    Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  29. arXiv:2405.05496  [pdf, other

    cs.CL

    Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis

    Authors: Xuanwen Ding, Jie Zhou, Liang Dou, Qin Chen, Yuanbin Wu, Chengcai Chen, Liang He

    Abstract: Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to lear… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  30. arXiv:2405.05131  [pdf, other

    cs.RO

    DenserRadar: A 4D millimeter-wave radar point cloud detector based on dense LiDAR point clouds

    Authors: Zeyu Han, Junkai Jiang, Xiaokang Ding, Qingwen Meng, Shaobing Xu, Lei He, Jianqiang Wang

    Abstract: The 4D millimeter-wave (mmWave) radar, with its robustness in extreme environments, extensive detection range, and capabilities for measuring velocity and elevation, has demonstrated significant potential for enhancing the perception abilities of autonomous driving systems in corner-case scenarios. Nevertheless, the inherent sparsity and noise of 4D mmWave radar point clouds restrict its further d… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  31. arXiv:2405.04041  [pdf, other

    cs.AI cs.CV

    Feature Map Convergence Evaluation for Functional Module

    Authors: Ludan Zhang, Chaoyi Chen, Lei He, Keqiang Li

    Abstract: Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneer… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  32. arXiv:2405.03098  [pdf, other

    cs.CL

    FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models

    Authors: Yanhong Bai, Jiabao Zhao, **xin Shi, Zhentao Xie, Xingjiao Wu, Liang He

    Abstract: Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based on probability metrics, fall short in revealing the nuanced and implicit biases present in various contexts. To address this challenge, we propose the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  33. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  34. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  35. arXiv:2404.13816  [pdf, other

    cs.CV

    Neural Radiance Field in Autonomous Driving: A Survey

    Authors: Lei He, Leheng Li, Wenchao Sun, Zeyu Han, Yichen Liu, Sifa Zheng, Jianqiang Wang, Keqiang Li

    Abstract: Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuou… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  36. arXiv:2404.11326  [pdf, other

    cs.CV

    Single-temporal Supervised Remote Change Detection for Domain Generalization

    Authors: Qiangang Du, **long Peng, Xu Chen, Qingdong He, Liren He, Qiang Nie, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Change detection is widely applied in remote sensing image analysis. Existing methods require training models separately for each dataset, which leads to poor domain generalization. Moreover, these methods rely heavily on large amounts of high-quality pair-labelled data for training, which is expensive and impractical. In this paper, we propose a multimodal contrastive learning (ChangeCLIP) based… ▽ More

    Submitted 23 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  37. arXiv:2404.09699  [pdf, other

    cs.GT

    Generative AI for Game Theory-based Mobile Networking

    Authors: Long He, Geng Sun, Dusit Niyato, Hongyang Du, Fang Mei, Jiawen Kang, Mérouane Debbah, and Zhu Han

    Abstract: With the continuous advancement of network technology, various emerging complex networking optimization problems opened up a wide range of applications utilizating of game theory. However, since game theory is a mathematical framework, game theory-based solutions often require the experience and knowledge of human experts. Recently, the remarkable advantages exhibited by generative artificial inte… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.09509  [pdf, other

    cs.CV

    Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder

    Authors: Chong Peng, Liqiang He, Dan Su

    Abstract: Today, there have been many achievements in learning the association between voice and face. However, most previous work models rely on cosine similarity or L2 distance to evaluate the likeness of voices and faces following contrastive learning, subsequently applied to retrieval and matching tasks. This method only considers the embeddings as high-dimensional vectors, utilizing a minimal scope of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  39. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  40. arXiv:2404.03974  [pdf, other

    cs.GT

    Game-theoretic Distributed Learning Approach for Heterogeneous-cost Task Allocation with Budget Constraints

    Authors: Weiyi Yang, Xiaolu Liu, Lei He, Yonghao Du, Yingwu Chen

    Abstract: This paper investigates heterogeneous-cost task allocation with budget constraints (HCTAB), wherein heterogeneity is manifested through the varying capabilities and costs associated with different agents for task execution. Different from the centralized optimization-based method, the HCTAB problem is solved using a fully distributed framework, and a coalition formation game is introduced to provi… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 15 pages,5 figures

  41. arXiv:2404.02508  [pdf, other

    cs.CV cs.AI cs.LG

    VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments

    Authors: Bufang Yang, Lixing He, Kaiwei Liu, Zhenyu Yan

    Abstract: Individuals with visual impairments, encompassing both partial and total difficulties in visual perception, are referred to as visually impaired (VI) people. An estimated 2.2 billion individuals worldwide are affected by visual impairments. Recent advancements in multi-modal large language models (MLLMs) have showcased their extraordinary capabilities across various domains. It is desirable to hel… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys 2024)

  42. arXiv:2404.02166  [pdf, other

    cs.IT

    An Online Joint Optimization Approach for QoE Maximization in UAV-Enabled Mobile Edge Computing

    Authors: Long He, Geng Sun, Zemin Sun, Pengfei Wang, Jiahui Li, Shuang Liang, Dusit Niyato

    Abstract: Given flexible mobility, rapid deployment, and low cost, unmanned aerial vehicle (UAV)-enabled mobile edge computing (MEC) shows great potential to compensate for the lack of terrestrial edge computing coverage. However, limited battery capacity, computing and spectrum resources also pose serious challenges for UAV-enabled MEC, which shorten the service time of UAVs and degrade the quality of expe… ▽ More

    Submitted 23 March, 2024; originally announced April 2024.

  43. arXiv:2404.01099  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety

    Authors: Luxi He, Mengzhou Xia, Peter Henderson

    Abstract: Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking. Some have found that just further fine-tuning an aligned model with benign data (i.e., data without harmful content) surprisingly leads to substantial degradation in safety. We delve into the data-centric aspects of why benign fine-tuning inadvertently contributes to jailbreaking. Firs… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  44. arXiv:2404.01065  [pdf, other

    cs.CV

    T-Mamba: Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation

    Authors: **g Hao, Lei He, Kuo Feng Hung

    Abstract: Efficient tooth segmentation in three-dimensional (3D) imaging, critical for orthodontic diagnosis, remains challenging due to noise, low contrast, and artifacts in CBCT images. Both convolutional Neural Networks (CNNs) and transformers have emerged as popular architectures for image segmentation. However, their efficacy in handling long-range dependencies is limited due to inherent locality or co… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  45. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, **g Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  46. arXiv:2403.18243  [pdf, other

    cs.AI

    Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check

    Authors: Linhao Ye, Zhikai Lei, Jianghao Yin, Qin Chen, Jie Zhou, Liang He

    Abstract: Retrieval-Augmented Generation (RAG) aims to generate more reliable and accurate responses, by augmenting large language models (LLMs) with the external vast and dynamic knowledge. Most previous work focuses on using RAG for single-round question answering, while how to adapt RAG to the complex conversational setting wherein the question is interdependent on the preceding context is not well studi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  47. arXiv:2403.17299  [pdf, other

    cs.CL q-bio.NC

    Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs

    Authors: Linyang He, Peili Chen, Ercong Nie, Yuanning Li, Jonathan R. Brennan

    Abstract: Inspired by cognitive neuroscience studies, we introduce a novel `decoding probing' method that uses minimal pairs benchmark (BLiMP) to probe internal linguistic characteristics in neural language models layer by layer. By treating the language model as the `brain' and its representations as `neural activations', we decode grammaticality labels of minimal pairs from the intermediate layers' repres… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  48. arXiv:2403.15696  [pdf, other

    cs.AI cs.CL

    MixRED: A Mix-lingual Relation Extraction Dataset

    Authors: Lingxing Kong, Yougang Chu, Zheng Ma, Jianbing Zhang, Liang He, Jiajun Chen

    Abstract: Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix con… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  49. arXiv:2403.13248  [pdf, other

    cs.CV

    Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

    Authors: Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun

    Abstract: Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority bein… ▽ More

    Submitted 22 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  50. arXiv:2403.11561  [pdf, other

    cs.CV

    Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

    Authors: Liren He, Zhengkai Jiang, **long Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of ``learning shortcuts'', wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity map** or artificial noise elimination. Consequently, the model becomes unable to reconstruct genu… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.