Skip to main content

Showing 1–50 of 873 results for author: Feng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01029  [pdf, other

    cs.CV

    EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

    Authors: Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan

    Abstract: 3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accpeted by MICCAI2024

  2. arXiv:2407.00435  [pdf, other

    cs.GR

    RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering

    Authors: Weikai Lin, Yu Feng, Yuhao Zhu

    Abstract: Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging. This paper proposes RTGS, a PBNR system that for the firs… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: I.3; I.2

  3. arXiv:2406.18547  [pdf

    eess.IV cs.CV

    Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data

    Authors: Yinqiu Feng, Bo Zhang, Lingxi Xiao, Yutian Yang, Tana Gegen, Zexi Chen

    Abstract: In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator networ… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  4. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.17233  [pdf, other

    cs.SE cs.CL

    Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement

    Authors: Yunlong Feng, Yang Xu, Dechuan Teng, Honglin Mu, Xiao Xu, Libo Qin, Wanxiang Che, Qingfu Zhu

    Abstract: Decompilation transforms compiled code back into a high-level programming language for analysis when source code is unavailable. Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training. Based on the characteristics of the decompilation task, we propose two methods: (1) Without fine-tuning, the Self-Con… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under Review

  6. arXiv:2406.16982  [pdf

    cs.LG cs.AI

    Research on Disease Prediction Model Construction Based on Computer AI deep Learning Technology

    Authors: Yang Lin, Muqing Li, Ziyi Zhu, Yinqiu Feng, Lingxi Xiao, Zexi Chen

    Abstract: The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning alg… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  7. arXiv:2406.16981  [pdf

    eess.IV cs.AI cs.LG eess.SP

    Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

    Authors: Lingxi Xiao, **xin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

    Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  8. arXiv:2406.15778  [pdf, other

    cs.CV cs.AI

    ObjectNLQ @ Ego4D Episodic Memory Challenge 2024

    Authors: Yisen Feng, Haoyu Zhang, Yuquan Xie, Zai**g Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The solution for the Natural Language Query track and Goal Step track at CVPR EgoVis Workshop 2024

  9. arXiv:2406.15771  [pdf, other

    cs.CV cs.AI

    HCQA @ Ego4D EgoSchema Challenge 2024

    Authors: Haoyu Zhang, Yuquan Xie, Yisen Feng, Zai**g Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our champion solution for Ego4D EgoSchema Challenge in CVPR 2024. To deeply integrate the powerful egocentric captioning model and question reasoning model, we propose a novel Hierarchical Comprehension scheme for egocentric video Question Answering, named HCQA. It consists of three stages: Fine-grained Caption Generation, Context-driven Summarization, and Inference-guid… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The champion solution for Ego4D EgoSchema Challenge in CVPR EgoVis Workshop 2024

  10. arXiv:2406.15695  [pdf, other

    cs.CL

    SS-Bench: A Benchmark for Social Story Generation and Evaluation

    Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Li** **g, Jian Yu

    Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timelin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  11. arXiv:2406.14503  [pdf, other

    cs.CL

    Overview of the CAIL 2023 Argument Mining Track

    Authors: **gcong Liang, Junlong Wang, Xinyu Zhai, Yungui Zhuang, Yiyang Zheng, Xin Xu, Xiandong Ran, Xiaozheng Dong, Honghui Rong, Yanlun Liu, Hao Chen, Yuhan Wei, Donghai Li, Jiajie Peng, Xuan**g Huang, Chongde Shi, Yansong Feng, Yun Song, Zhongyu Wei

    Abstract: We give a detailed overview of the CAIL 2023 Argument Mining Track, one of the Chinese AI and Law Challenge (CAIL) 2023 tracks. The main goal of the track is to identify and extract interacting argument pairs in trial dialogs. It mainly uses summarized judgment documents but can also refer to trial recordings. The track consists of two stages, and we introduce the tasks designed for each stage; we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  12. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global ship**, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  13. arXiv:2406.12835  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Influence Maximization via Graph Neural Bandits

    Authors: Yuting Feng, Vincent Y. F. Tan, Bogdan Cautis

    Abstract: We consider a ubiquitous scenario in the study of Influence Maximization (IM), in which there is limited knowledge about the topology of the diffusion network. We set the IM problem in a multi-round diffusion campaign, aiming to maximize the number of distinct users that are influenced. Leveraging the capability of bandit algorithms to effectively balance the objectives of exploration and exploita… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at the 2024 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)

  14. arXiv:2406.11238  [pdf, other

    cs.CL

    What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling

    Authors: Yutong Hu, Quzhe Huang, Kangcheng Luo, Yansong Feng

    Abstract: As the context length that large language models can handle continues to increase, these models demonstrate an enhanced ability to utilize distant information for tasks such as language modeling. This capability contrasts with human reading and writing habits, where it is uncommon to remember and use particularly distant information, except in cases of foreshadowing. In this paper, we aim to explo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  15. arXiv:2406.10910  [pdf, ps, other

    cs.IT eess.SP

    Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications

    Authors: Yannan Chen, Yi Feng, Xiaoyang Li, Licheng Zhao, Kaiming Shen

    Abstract: This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  16. arXiv:2406.10605  [pdf, other

    cs.LG cs.GT

    Last-iterate Convergence Separation between Extra-gradient and Optimism in Constrained Periodic Games

    Authors: Yi Feng, ** Li, Ioannis Panageas, Xiao Wang

    Abstract: Last-iterate behaviors of learning algorithms in repeated two-player zero-sum games have been extensively studied due to their wide applications in machine learning and related tasks. Typical algorithms that exhibit the last-iterate convergence property include optimistic and extra-gradient methods. However, most existing results establish these properties under the assumption that the game is tim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted for UAI 2024

  17. arXiv:2406.10603  [pdf, other

    cs.GT

    Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg

    Authors: Yi Feng, Georgios Piliouras, Xiao Wang

    Abstract: We investigate the accuracy of prediction in deterministic learning dynamics of zero-sum games with random initializations, specifically focusing on observer uncertainty and its relationship to the evolution of covariances. Zero-sum games are a prominent field of interest in machine learning due to their various applications. Concurrently, the accuracy of prediction in dynamical systems from mecha… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted for ICML 2024

  18. arXiv:2406.10447  [pdf, other

    cs.CV

    The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences

    Authors: Bria Long, Violet Xiang, Stefan Stojanov, Robert Z. Sparks, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank

    Abstract: Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, 4 tables and SI. Submitted to NeurIPS Datasets and Benchmarks

  19. arXiv:2406.09834  [pdf, other

    cs.SE

    How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study

    Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng

    Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  20. arXiv:2406.08897  [pdf, other

    cs.LG

    Motif-driven Subgraph Structure Learning for Graph Classification

    Authors: Zhiyao Zhou, Sheng Zhou, Bochao Mao, Jiawei Chen, Qingyun Sun, Yan Feng, Chun Chen, Can Wang

    Abstract: To mitigate the suboptimal nature of graph structure, Graph Structure Learning (GSL) has emerged as a promising approach to improve graph structure and boost performance in downstream tasks. Despite the proposal of numerous GSL methods, the progresses in this field mostly concentrated on node-level tasks, while graph-level tasks (e.g., graph classification) remain largely unexplored. Notably, appl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures

  21. arXiv:2406.07913  [pdf, other

    cs.CL cs.IR

    DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning

    Authors: Yuxi Feng, Raymond Li, Zhenan Fan, Giuseppe Carenini, Mohammadreza Pourreza, Weiwei Zhang, Yong Zhang

    Abstract: While in-context Learning (ICL) has proven to be an effective technique to improve the performance of Large Language Models (LLMs) in a variety of complex tasks, notably in translating natural language questions into Structured Query Language (NL2SQL), the question of how to select the most beneficial demonstration examples remains an open research problem. While prior works often adapted off-the-… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  22. arXiv:2406.07547  [pdf, other

    cs.CV

    Zero-shot Image Editing with Reference Imitation

    Authors: Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

    Abstract: Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like. In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. Concretely, to edit an image region of interest, users are free to dire… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: https://xavierchen34.github.io/MimicBrush-Page

  23. arXiv:2406.07515  [pdf, other

    cs.LG cs.AI stat.ML

    Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

    Authors: Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe

    Abstract: Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  24. arXiv:2406.07330  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    CTC-based Non-autoregressive Textless Speech-to-Speech Translation

    Authors: Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng

    Abstract: Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

    ACM Class: I.2.7

  25. arXiv:2406.07289  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

    Authors: Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 main conference. Project Page: https://ictnlp.github.io/ComSpeech-Site/

    ACM Class: I.2.7

  26. arXiv:2406.06937  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

    Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

    Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

  27. arXiv:2406.06910  [pdf, other

    cs.CL

    Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models

    Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies,… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 8 figures, 7 tables. v2 of arXiv:2402.13036

  28. arXiv:2406.06852  [pdf, other

    cs.CR cs.AI cs.CL

    A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

    Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan

    Abstract: The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire tra… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.06808  [pdf, ps, other

    cs.DS cs.LG

    Fast White-Box Adversarial Streaming Without a Random Oracle

    Authors: Ying Feng, Aayush Jain, David P. Woodruff

    Abstract: Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse reco… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  30. arXiv:2406.06633  [pdf, other

    cs.LG

    PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

    Authors: Xiaoqi Qiu, Yongjie Wang, Xu Guo, Zhiwei Zeng, Yue Yu, Yuhong Feng, Chunyan Miao

    Abstract: Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes. Training with CAD enhances model robustness against spurious features that happen to correlate with labels by spreading the casual relationships across different classes. Yet, recent research reveals that training wit… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 main conference

    MSC Class: 68T50 ACM Class: I.2; I.2.7

  31. arXiv:2406.04669  [pdf, other

    cs.CL

    DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization

    Authors: Chengang Hu, Xiao Liu, Yansong Feng

    Abstract: Most of the existing compositional generalization datasets are synthetically-generated, resulting in a lack of natural language variation. While there have been recent attempts to introduce non-synthetic datasets for compositional generalization, they suffer from either limited data scale or a lack of diversity in the forms of combinations. To better investigate compositional generalization with m… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: EMNLP 2023 long paper

  32. arXiv:2406.03878  [pdf, other

    cs.CL

    Decoder-only Streaming Transformer for Simultaneous Translation

    Authors: Shoutao Guo, Shaolei Zhang, Yang Feng

    Abstract: Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we e… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024. 14 pages, 10 Tables, 5 Figures

  33. arXiv:2406.03049  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

    Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

  34. arXiv:2406.02056  [pdf, other

    cs.LG cs.NE

    CAP: A Context-Aware Neural Predictor for NAS

    Authors: Han Ji, Yuqi Feng, Yanan Sun

    Abstract: Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI24

  35. arXiv:2406.00584  [pdf, other

    cs.DB cs.AI

    A Blueprint Architecture of Compound AI Systems for Enterprise

    Authors: Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng, Hannah Kim, Chen Shen, ** Wang, Estevam Hruschka

    Abstract: Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we intr… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Compound AI Systems Workshop at the Data+AI Summit 2024

  36. CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems

    Authors: Yanlin Feng, Sajjadur Rahman, Aaron Feng, Vincent Chen, Eser Kandogan

    Abstract: Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks via interactions with tools and data retrievers have garnered significant interest within database and AI communities. While these systems have the potential to supplement typical analysis workflows of data analysts in enterprise data platforms, unfortunately, CASs are subject to the same data discovery c… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI '24), June 14, 2024, Santiago, AA, Chile

  37. arXiv:2406.00016  [pdf

    cs.CL

    Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

    Authors: Lingxi Xiao, Muqing Li, Yinqiu Feng, Meiqi Wang, Ziyi Zhu, Zexi Chen

    Abstract: The research explores the utilization of a deep learning model employing an attention mechanism in medical text mining. It targets the challenge of analyzing unstructured text information within medical data. This research seeks to enhance the model's capability to identify essential medical information by incorporating deep learning and attention mechanisms. This paper reviews the basic principle… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.11704 by other authors

  38. arXiv:2405.20334  [pdf, other

    cs.CV cs.GR

    VividDream: Generating 3D Scene with Ambient Dynamics

    Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

    Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://vivid-dream-4d.github.io

  39. arXiv:2405.19290  [pdf, other

    cs.CL

    Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation

    Authors: Langlin Huang, Yang Feng

    Abstract: Subword tokenization is a common method for vocabulary building in Neural Machine Translation (NMT) models. However, increasingly complex tasks have revealed its disadvantages. First, a vocabulary cannot be modified once it is learned, making it hard to adapt to new words. Second, in multilingual translation, the imbalance in data volumes across different languages spreads to the vocabulary, exace… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL2024 Findings

  40. arXiv:2405.17660  [pdf, other

    cs.CV

    LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking

    Authors: Shaohua Dong, Yunhe Feng, Qing Yang, Yuewei Lin, Heng Fan

    Abstract: High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low re… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  41. arXiv:2405.16960  [pdf, other

    cs.CV cs.RO

    DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation

    Authors: Mengtan Zhang, Yi Feng, Qijun Chen, Rui Fan

    Abstract: There has been a recent surge of interest in learning to perceive depth from monocular videos in an unsupervised fashion. A key challenge in this field is achieving robust and accurate depth estimation in challenging scenarios, particularly in regions with weak textures or where dynamic objects are present. This study makes three major contributions by delving deeply into dense correspondence prio… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures

  42. arXiv:2405.16571  [pdf, other

    cs.CL

    A Preliminary Empirical Study on Prompt-based Unsupervised Keyphrase Extraction

    Authors: Mingyang Song, Yi Feng, Li** **g

    Abstract: Pre-trained large language models can perform natural language processing downstream tasks by conditioning on human-designed prompts. However, a prompt-based approach often requires "prompt engineering" to design different prompts, primarily hand-crafted through laborious trial and error, requiring human intervention and expertise. It is a challenging problem when constructing a prompt-based keyph… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: work in progress

  43. arXiv:2405.16533  [pdf, other

    cs.CL

    Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

    Authors: Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Verberne, Zhaochun Ren

    Abstract: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, empowering them to solve practical tasks. Existing work typically empowers LLMs as tool users with a manually designed workflow, where the LLM plans a series of tools in a step-by-step manner, and sequentially executes each tool to obtain intermediate results until deriving the… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Work in progress

  44. arXiv:2405.15165  [pdf, other

    cs.CL cs.AI cs.SE

    A Solution-based LLM API-using Methodology for Academic Information Seeking

    Authors: Yuanchun Wang, Jifan Yu, Zijun Yao, **g Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, **kai Zhang, **gyao Zhang, Bowen Huang, Yuanyao Li, Huihui Yuan, Lei Hou, Juanzi Li, Jie Tang

    Abstract: Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures

  45. arXiv:2405.15056  [pdf, other

    cs.LG cs.CV cs.GR

    ElastoGen: 4D Generative Elastodynamics

    Authors: Yutao Feng, Yintong Shang, Xiang Feng, Lei Lan, Shandian Zhe, Tianjia Shao, Hongzhi Wu, Kun Zhou, Hao Su, Chenfanfu Jiang, Yin Yang

    Abstract: We present ElastoGen, a knowledge-driven model that generates physically accurate and coherent 4D elastodynamics. Instead of relying on petabyte-scale data-driven learning, ElastoGen leverages the principles of physics-in-the-loop and learns from established physical knowledge, such as partial differential equations and their numerical solutions. The core idea of ElastoGen is converting the global… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  46. arXiv:2405.11430  [pdf, other

    cs.CL

    MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

    Authors: Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

    Abstract: Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and fo… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

  47. arXiv:2405.10989  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Learnable Privacy Neurons Localization in Language Models

    Authors: Ruizhe Chen, Tianxiang Hu, Yang Feng, Zuozhu Liu

    Abstract: Concerns regarding Large Language Models (LLMs) to memorize and disclose private information, particularly Personally Identifiable Information (PII), become prominent within the community. Many efforts have been made to mitigate the privacy risks. However, the mechanism through which LLMs memorize PII remains poorly understood. To bridge this gap, we introduce a pioneering method for pinpointing P… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ACL 2024 main conference

  48. arXiv:2405.10626  [pdf, other

    cs.CL

    Dynamic data sampler for cross-language transfer learning in large language models

    Authors: Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou, Xianxu Hou

    Abstract: Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by ICASSP 2024

  49. arXiv:2405.08245  [pdf

    cs.CV cs.AI

    Progressive enhancement and restoration for mural images under low-light and defected conditions based on multi-receptive field strategy

    Authors: Xiameng Wei, Binbin Fan, Ying Wang, Yanxiang Feng, Laiyi Fu

    Abstract: Ancient murals are valuable cultural heritage with great archaeological value. They provide insights into ancient religions, ceremonies, folklore, among other things through their content. However, due to long-term oxidation and inadequate protection, ancient murals have suffered continuous damage, including peeling and mold etc. Additionally, since ancient murals were typically painted indoors, t… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  50. arXiv:2405.07908  [pdf, other

    cs.RO

    Collaborative Planar Pushing of Polytopic Objects with Multiple Robots in Complex Scenes

    Authors: Zili Tang, Yuming Feng, Meng Guo

    Abstract: Pushing is a simple yet effective skill for robots to interact with and further change the environment. Related work has been mostly focused on utilizing it as a non-prehensile manipulation primitive for a robotic manipulator. However, it can also be beneficial for low-cost mobile robots that are not equipped with a manipulator. This work tackles the general problem of controlling a team of mobile… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Robotics: Science and Systems (RSS) 2024.Videos are available on https://zilitang.github.io/Collaborative-Pushing