Skip to main content

Showing 1–50 of 342 results for author: Chen, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00981  [pdf, other

    cs.HC cs.CL

    VisEval: A Benchmark for Data Visualization in the Era of Large Language Models

    Authors: Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang

    Abstract: Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  3. arXiv:2406.11021  [pdf, other

    cs.CV

    $α$-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion

    Authors: Sanbao Su, Nuo Chen, Felix Juefei-Xu, Chen Feng, Fei Miao

    Abstract: In the realm of autonomous vehicle (AV) perception, comprehending 3D scenes is paramount for tasks such as planning and map**. Semantic scene completion (SSC) aims to infer scene geometry and semantics from limited observations. While camera-based SSC has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address thi… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  4. arXiv:2406.09383  [pdf, other

    cs.CV

    Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

    Authors: Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

    Abstract: Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024

  5. arXiv:2406.05358  [pdf, other

    cs.LG math.OC

    Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management

    Authors: Huiling Meng, Ningyuan Chen, Xuefeng Gao

    Abstract: Intensity control is a type of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we adapt the reinforcement learning framework to intensity control using choice-based network revenue management as a case study, which is a classical problem in revenue management that features a large state… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  6. arXiv:2406.03052  [pdf, other

    cs.LG

    Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections

    Authors: Zihan Luo, Hong Huang, Yongkang Zhou, Ji** Zhang, Nuo Chen

    Abstract: Despite the remarkable capabilities demonstrated by Graph Neural Networks (GNNs) in graph-related tasks, recent research has revealed the fairness vulnerabilities in GNNs when facing malicious adversarial attacks. However, all existing fairness attacks require manipulating the connectivity between existing nodes, which may be prohibited in reality. To this end, we introduce a Node Injection-based… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 21 pages

  7. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  8. arXiv:2406.02921  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Text Injection for Neural Contextual Biasing

    Authors: Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran

    Abstract: Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhance contextual ASR. CTI leverages not only the paired speech-text data, but also a much larger corpus of unpaired text to optimize the ASR model and it… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

    Journal ref: Interspeech 2024, Kos Island, Greece

  9. arXiv:2406.02426  [pdf, other

    math.OC cs.LG

    Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls

    Authors: Tianyu Wang, Ningyuan Chen, Chun Wang

    Abstract: In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2405.17463  [pdf, other

    cs.GT cs.LG

    No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling

    Authors: Ningyuan Chen, Xuefeng Gao, Yi Xiong

    Abstract: When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.15329  [pdf, other

    cs.CL

    Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

    Authors: Minzhi Li, Zhengyuan Liu, Shumin Deng, Shafiq Joty, Nancy F. Chen, Min-Yen Kan

    Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final ev… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  12. arXiv:2405.12038  [pdf, other

    cs.LG cs.IR

    Adaptive Convolutional Forecasting Network Based on Time Series Feature-Driven

    Authors: Dandan Zhang, Zhiqiang Zhang, Nanguang Chen, Yun Wang

    Abstract: Time series data in real-world scenarios contain a substantial amount of nonlinear information, which significantly interferes with the training process of models, leading to decreased prediction performance. Therefore, during the time series forecasting process, extracting the local and global time series patterns and understanding the potential nonlinear features among different time observation… ▽ More

    Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  13. arXiv:2405.10496  [pdf, other

    cs.IT eess.SP

    Electromagnetic Information Theory for Holographic MIMO Communications

    Authors: Li Wei, Tierui Gong, Chongwen Huang, Zhaoyang Zhang, Wei E. I. Sha, Zhi Ning Chen, Linglong Dai, Merouane Debbah, Chau Yuen

    Abstract: Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far it… ▽ More

    Submitted 25 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  14. arXiv:2405.08460  [pdf, other

    cs.CL cs.AI

    Evaluating LLMs at Evaluating Temporal Generalization

    Authors: Chenghao Zhu, Nuo Chen, Yufei Gao, Benyou Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Preprint

  15. arXiv:2405.03138  [pdf, other

    cs.CL

    CRAFT: Extracting and Tuning Cultural Instructions from the Wild

    Authors: Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

    Abstract: Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introd… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6 pages

  16. arXiv:2405.03110  [pdf, other

    cs.IR

    Vector Quantization for Recommender Systems: A Review and Outlook

    Authors: Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  17. arXiv:2405.02630  [pdf, other

    quant-ph cs.DC cs.SE

    cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK

    Authors: Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Wille, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin

    Abstract: This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulatio… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 14 figures

  18. arXiv:2404.18946  [pdf, other

    physics.optics cs.IR eess.IV

    Align-Free Multi-Plane Phase Retrieval

    Authors: Jiabao Wang, Yang Wu, Jun Wang, Ni Chen

    Abstract: The multi-plane phase retrieval method provides a budget-friendly and effective way to perform phase imaging, yet it often encounters alignment challenges due to shifts along the optical axis in experiments. Traditional methods, such as employing beamsplitters instead of mechanical stage movements or adjusting focus using tunable light sources, add complexity to the setup required for multi-plane… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  19. arXiv:2404.17454  [pdf, other

    cs.LG cs.AI q-bio.QM

    Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

    Authors: Kaichen Xu, Yueyang Ding, Suyang Hou, Weiqiang Zhan, Nisang Chen, Jun Wang, Xiaobo Sun

    Abstract: Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these… ▽ More

    Submitted 29 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 17 pages, 2 figures. Accepted by IJCAI 2024

  20. arXiv:2404.11932  [pdf, other

    cs.CL cs.AI

    CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

    Authors: Geyu Lin, Bin Wang, Zhengyuan Liu, Nancy F. Chen

    Abstract: Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance discrepancy mainly stems from the imbalanced distribution of training data across languages during pre-training and instruction tuning stages. To address this p… ▽ More

    Submitted 12 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages

  21. arXiv:2404.09754  [pdf, other

    cs.CL

    Resilience of Large Language Models for Noisy Instructions

    Authors: Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

    Abstract: As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks. Nonetheless, the resilience of LLMs to handle text containing inherent errors, stemming from human interactions and collaborative systems, has not been thoroughly explored. Our study investigates… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 12 pages

  22. arXiv:2404.07471  [pdf, other

    cs.SE cs.AI cs.CL

    Structure-aware Fine-tuning for Code Pre-trained Models

    Authors: Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao

    Abstract: Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Stru… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by COLING 2024

  23. arXiv:2404.06762  [pdf, other

    cs.CL cs.HC

    Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

    Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

    Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  24. arXiv:2404.06749  [pdf, other

    cs.LG

    CGNSDE: Conditional Gaussian Neural Stochastic Differential Equation for Modeling Complex Systems and Data Assimilation

    Authors: Chuanqi Chen, Nan Chen, **-Long Wu

    Abstract: A new knowledge-based and machine learning hybrid modeling approach, called conditional Gaussian neural stochastic differential equation (CGNSDE), is developed to facilitate modeling complex dynamical systems and implementing analytic formulae of the associated data assimilation (DA). In contrast to the standard neural network predictive models, the CGNSDE is designed to effectively tackle both fo… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  25. arXiv:2404.03429  [pdf, other

    cs.CL

    Scaffolding Language Learning via Multi-modal Tutoring Systems with Pedagogical Instructions

    Authors: Zhengyuan Liu, Stella Xin Yin, Carolyn Lee, Nancy F. Chen

    Abstract: Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in lang… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  26. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Li** Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  27. arXiv:2403.19149  [pdf, other

    cs.LG q-bio.NC

    Topological Cycle Graph Attention Network for Brain Functional Connectivity

    Authors: **ghan Huang, Nanguang Chen, Anqi Qiu

    Abstract: This study, we introduce a novel Topological Cycle Graph Attention Network (CycGAT), designed to delineate a functional backbone within brain functional graph--key pathways essential for signal transmissio--from non-essential, redundant connections that form cycles around this core structure. We first introduce a cycle incidence matrix that establishes an independent cycle basis within a graph, ma… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  28. arXiv:2403.18462  [pdf, other

    cs.IR

    Decoy Effect In Search Interaction: Understanding User Behavior and Measuring System Vulnerability

    Authors: Nuo Chen, Jiqun Liu, Hanpei Fang, Yuankai Luo, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: This study examines the decoy effect's underexplored influence on user search interactions and methods for measuring information retrieval (IR) systems' vulnerability to this effect. It explores how decoy results alter users' interactions on search engine result pages, focusing on metrics like click-through likelihood, browsing time, and perceived document usefulness. By analyzing user interaction… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  29. arXiv:2403.17111  [pdf, other

    cs.RO

    Vision-Based Dexterous Motion Planning by Dynamic Movement Primitives with Human Hand Demonstration

    Authors: Nuo Chen, Ya-Jun Pan

    Abstract: This paper proposes a vision-based framework for a 7-degree-of-freedom robotic manipulator, with the primary objective of facilitating its capacity to acquire information from human hand demonstrations for the execution of dexterous pick-and-place tasks. Most existing works only focus on the position demonstration without considering the orientations. In this paper, by employing a single depth cam… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  30. arXiv:2403.15239  [pdf, other

    cs.RO cs.LG

    Guided Decoding for Robot Motion Generation and Adaption

    Authors: Nutan Chen, Elie Aljalbout, Botond Cseke, Patrick van der Smagt

    Abstract: We address motion generation for high-DoF robot arms in complex settings with obstacles, via points, etc. A significant advancement in this domain is achieved by integrating Learning from Demonstration (LfD) into the motion generation process. This integration facilitates rapid adaptation to new tasks and optimizes the utilization of accumulated expertise by allowing robots to learn and generalize… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages

  31. arXiv:2403.14356  [pdf, other

    cs.LG cs.SE

    DomainLab: A modular Python package for domain generalization in deep learning

    Authors: Xudong Sun, Carla Feistner, Alexej Gossmann, George Schwarz, Rao Muhammad Umer, Lisa Beer, Patrick Rockenschaub, Rahul Babu Shrestha, Armin Gruber, Nutan Chen, Sayedali Shetab Boushehri, Florian Buettner, Carsten Marr

    Abstract: Poor generalization performance caused by distribution shifts in unseen domains often hinders the trustworthy deployment of deep neural networks. Many domain generalization techniques address this problem by adding a domain invariant regularization loss terms during training. However, there is a lack of modular software that allows users to combine the advantages of different methods with minimal… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  32. arXiv:2403.13728  [pdf, ps, other

    cs.LG cs.AI

    M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling

    Authors: Xudong Sun, Nutan Chen, Alexej Gossmann, Yu Xing, Carla Feistner, Emilio Dorigatt, Felix Drost, Daniele Scarcella, Lisa Beer, Carsten Marr

    Abstract: We address the online combinatorial choice of weight multipliers for multi-objective optimization of many loss terms parameterized by neural works via a probabilistic graphical model (PGM) for the joint model parameter and multiplier evolution process, with a hypervolume based likelihood promoting multi-objective descent. The corresponding parameter and multiplier estimation as a sequential decisi… ▽ More

    Submitted 10 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  33. arXiv:2403.11123  [pdf, other

    cs.CL

    Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking

    Authors: Taha Aksu, Nancy F. Chen

    Abstract: Current metrics for evaluating Dialogue State Tracking (DST) systems exhibit three primary limitations. They: i) erroneously presume a uniform distribution of slots throughout the dialog, ii) neglect to assign partial scores for individual turns, iii) frequently overestimate or underestimate performance by repeatedly counting the models' successful or failed predictions. To address these shortcomi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to COLING 2024

  34. arXiv:2403.06687  [pdf, other

    cs.LG cs.CV

    Advancing Graph Neural Networks with HL-HGAT: A Hodge-Laplacian and Attention Mechanism Approach for Heterogeneous Graph-Structured Data

    Authors: **ghan Huang, Qiufeng Chen, Yijun Bian, Pengli Zhu, Nanguang Chen, Moo K. Chung, Anqi Qiu

    Abstract: Graph neural networks (GNNs) have proven effective in capturing relationships among nodes in a graph. This study introduces a novel perspective by considering a graph as a simplicial complex, encompassing nodes, edges, triangles, and $k$-simplices, enabling the definition of graph-structured data on any $k$-simplices. Our contribution is the Hodge-Laplacian heterogeneous graph attention network (H… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  35. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  36. arXiv:2403.03640  [pdf, other

    cs.CL cs.AI

    Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

    Authors: Xidong Wang, Nuo Chen, Junyin Chen, Yan Hu, Yidong Wang, Xiangbo Wu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang

    Abstract: Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Preprint

  37. arXiv:2402.17168  [pdf, other

    cs.AI cs.CL

    Benchmarking Data Science Agents

    Authors: Yuge Zhang, Qiyang Jiang, Xingyu Han, Nan Chen, Yuqing Yang, Kan Ren

    Abstract: In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of re… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Source code and data are available at https://github.com/MetaCopilot/dseval

  38. arXiv:2402.16029  [pdf, other

    cs.CL

    GraphWiz: An Instruction-Following Language Model for Graph Problems

    Authors: Nuo Chen, Yuhan Li, Jianheng Tang, Jia Li

    Abstract: Large language models (LLMs) have achieved impressive success across several fields, but their proficiency in understanding and resolving complex graph problems is less explored. To bridge this gap, we introduce GraphInstruct, a novel and comprehensive instruction-tuning dataset designed to equip language models with the ability to tackle a broad spectrum of graph problems using explicit reasoning… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: 27pages, 15 tables

  39. arXiv:2402.12710  [pdf, other

    stat.ME cs.LG stat.ML

    Integrating Active Learning in Causal Inference with Interference: A Novel Approach in Online Experiments

    Authors: Hongtao Zhu, Sizhe Zhang, Yang Su, Zhenyu Zhao, Nan Chen

    Abstract: In the domain of causal inference research, the prevalent potential outcomes framework, notably the Rubin Causal Model (RCM), often overlooks individual interference and assumes independent treatment effects. This assumption, however, is frequently misaligned with the intricate realities of real-world scenarios, where interference is not merely a possibility but a common occurrence. Our research e… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: conference paper

  40. arXiv:2402.11975  [pdf, other

    cs.CL

    Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations

    Authors: Nuo Chen, Hongguang Li, Juhua Huang, Baoyuan Wang, Jia Li

    Abstract: Existing retrieval-based methods have made significant strides in maintaining long-term conversations. However, these approaches face challenges in memory database management and accurate memory retrieval, hindering their efficacy in dynamic, real-world interactions. This study introduces a novel framework, COmpressive Memory-Enhanced Dialogue sYstems (COMEDY), which eschews traditional retrieval… ▽ More

    Submitted 1 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 17pages, 5 figures

  41. arXiv:2402.00658  [pdf, other

    cs.AI cs.CL

    Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

    Authors: Fangkai Jiao, Chengwei Qin, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation. However, recent studies have raised concerns regarding the hallucination and flaws in their reasoning process. Substantial efforts are being made to improve the reliability and faithfulness of the generated rationales. Some approaches model reasoning a… ▽ More

    Submitted 15 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures

  42. arXiv:2401.17919  [pdf, other

    cs.CL cs.LG

    LOCOST: State-Space Models for Long Document Abstractive Summarization

    Authors: Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F. Chen, Vincent Guigue, Alberto Lumbreras, Laure Soulier, Patrick Gallinari

    Abstract: State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-a… ▽ More

    Submitted 25 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: 9 pages, 5 figures, 7 tables, EACL 2024 conference

  43. arXiv:2401.11759  [pdf, other

    cs.DC

    Integrated Sensing, Communication, and Computing: An Information-oriented Resource Transaction Mechanism

    Authors: Ning Chen, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Jie Yang, Yifeng Zhao, Lianfen Huang

    Abstract: Information acquisition from target perception represents the key enabling technology of the Internet of Automatic Vehicles (IoAV), which is essential for the decision-making and control operation of connected automatic vehicles (CAVs). Exploring target information involves multiple operations on data, e.g., wireless sensing (for data acquisition), communication (for data transmission), and comput… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures, 2 tables

  44. arXiv:2401.11447  [pdf, other

    cs.LG q-bio.QM

    Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

    Authors: Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Li** Si, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom s… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Frontiers in Pharmacology, research topic: Methods and Metrics to Measure Medication Adherence

  45. arXiv:2401.05384  [pdf, other

    math.HO cs.AI

    From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting

    Authors: Nuo Chen, Hongguang Li, Baoyuan Wang, Jia Li

    Abstract: This paper investigates the performance of Large Language Models (LLMs) and Tool-augmented LLMs in tackling complex mathematical reasoning tasks. We introduce IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting, a framework that combines the strengths of both LLMs and Tool-augmented LLMs. IMP-TIP follows the ``From Good to Great" concept, collecting multiple potential solutio… ▽ More

    Submitted 18 December, 2023; originally announced January 2024.

    Comments: 16 pages

  46. arXiv:2401.02668  [pdf, other

    cs.DC cs.LG

    Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence

    Authors: Ning Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang

    Abstract: The high-performance generative artificial intelligence (GAI) represents the latest evolution of computational intelligence, while the blessing of future 6G networks also makes edge intelligence (EI) full of development potential. The inevitable encounter between GAI and EI can unleash new opportunities, where GAI's pre-training based on massive computing resources and large-scale unlabeled corpor… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 11 pages, 8 figures, and 5 tables

  47. arXiv:2401.02662  [pdf, other

    cs.NI eess.SP

    GainNet: Coordinates the Odd Couple of Generative AI and 6G Networks

    Authors: Ning Chen, Jie Yang, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani

    Abstract: The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-of-everything to the Internet-of-intelligence with hybrid heterogeneous network architectures. In the future, the interplay between GAI and the 6G will lead to new opportunities, where GAI can learn th… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 10 pages, 5 figures, 1 table

  48. arXiv:2312.17674  [pdf, other

    cs.DC

    QoE-oriented Dependent Task Scheduling under Multi-dimensional QoS Constraints over Distributed Networks

    Authors: Xuwei Fan, Zhipeng Cheng, Ning Chen, Lianfen Huang, Xianbin Wang

    Abstract: Task scheduling as an effective strategy can improve application performance on computing resource-limited devices over distributed networks. However, existing evaluation mechanisms fail to depict the complexity of diverse applications, which involve dependencies among tasks, computing resource requirements, and multi-dimensional quality of service (QoS) constraints. Furthermore, traditional QoS-o… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  49. arXiv:2312.12153  [pdf, other

    cs.SD eess.AS

    Noise robust distillation of self-supervised speech models via correlation metrics

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

    Abstract: Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Te… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 6 pages

  50. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.