Skip to main content

Showing 1–50 of 159 results for author: Duan, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01391  [pdf, other

    astro-ph.IM cs.DL

    Knowledge Graph in Astronomical Research with Large Language Models: Quantifying Driving Forces in Interdisciplinary Scientific Discovery

    Authors: Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai

    Abstract: Identifying and predicting the factors that contribute to the success of interdisciplinary research is crucial for advancing scientific discovery. However, there is a lack of methods to quantify the integration of new ideas and technological advancements in astronomical research and how these new technologies drive further scientific breakthroughs. Large language models, with their ability to extr… ▽ More

    Submitted 15 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: An interactive version of the knowledge graph is made publicly available at https://astrokg.github.io/. Accepted to IJCAI 2024 AI4Research Workshop. Comments are welcome

  2. arXiv:2404.07965  [pdf, other

    cs.CL cs.AI

    Rho-1: Not All Tokens Are What You Need

    Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

    Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: First two authors equal contribution

  3. arXiv:2404.03118  [pdf, other

    cs.CV

    LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

    Authors: Gabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar, Anahita Bhiwandiwalla, Shao-Yen Tseng, Matthew Lyle Olson, Yaniv Gurwicz, Chenfei Wu, Nan Duan, Vasudev Lal

    Abstract: In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest. These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task. Numerous advancements have been made in the field of explainability tools and mechanisms, y… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  4. arXiv:2404.01067  [pdf, other

    cs.CL

    Exploring the Mystery of Influential Data for Mathematical Reasoning

    Authors: Xinzhe Ni, Yeyun Gong, Zhibin Gou, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen

    Abstract: Selecting influential data for fine-tuning on downstream tasks is a key factor for both performance and computation efficiency. Recent works have shown that training with only limited data can show a superior performance on general tasks. However, the feasibility on mathematical reasoning tasks has not been validated. To go further, there exist two open questions for mathematical reasoning: how to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  5. arXiv:2403.03788  [pdf, other

    cs.CL

    PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

    Authors: Zekai Zhang, Yiduo Guo, Yaobo Liang, Dongyan Zhao, Nan Duan

    Abstract: The growing dependence on Large Language Models (LLMs) for finishing user instructions necessitates a comprehensive understanding of their robustness to complex task completion in real-world situations. To address this critical need, we propose the PowerPoint Task Completion Robustness benchmark (PPTC-R) to measure LLMs' robustness to the user PPT task instruction and software version. Specificall… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: LLM evaluation, Multi-turn, Multi-language, Multi-modal benchmark

  6. arXiv:2403.02333  [pdf, other

    cs.CL cs.AI

    Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

    Authors: Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, Weizhu Chen

    Abstract: Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from… ▽ More

    Submitted 7 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: In progress

  7. arXiv:2402.10534  [pdf, other

    cs.CV

    Using Left and Right Brains Together: Towards Vision and Language Planning

    Authors: Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, **glong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang

    Abstract: Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking pro… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 19 pages, 13 figures

  8. arXiv:2401.17093  [pdf, other

    cs.CV cs.CL

    StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

    Authors: Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

    Abstract: To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natura… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  9. arXiv:2401.09454  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Voila-A: Aligning Vision-Language Models with User's Gaze Attention

    Authors: Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

    Abstract: In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper… ▽ More

    Submitted 22 December, 2023; originally announced January 2024.

  10. arXiv:2401.07663  [pdf, other

    cs.SE

    Selene: Pioneering Automated Proof in Software Verification

    Authors: Lichen Zhang, Shuai Lu, Nan Duan

    Abstract: Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof be… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  11. arXiv:2312.02143  [pdf, other

    cs.CL cs.AI

    Competition-Level Problems are Effective LLM Evaluators

    Authors: Yiming Huang, Zhenghao Lin, Xiao Liu, Yeyun Gong, Shuai Lu, Fangyu Lei, Yaobo Liang, Yelong Shen, Chen Lin, Nan Duan, Weizhu Chen

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently. This paper aims to evaluate the reasoning capacities of LLMs, specifically in solving recent competition-level programming problems in Codeforces, which are expert-crafted and unique, requiring deep understanding… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  12. arXiv:2311.01767  [pdf, other

    cs.CL

    PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion

    Authors: Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, Nan Duan

    Abstract: Recent evaluations of Large Language Models (LLMs) have centered around testing their zero-shot/few-shot capabilities for basic natural language tasks and their ability to translate instructions into tool APIs. However, the evaluation of LLMs utilizing complex tools to finish multi-turn, multi-modal instructions in a complex multi-modal environment has not been investigated. To address this gap, w… ▽ More

    Submitted 7 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: LLM evaluation, PPT task completion

  13. arXiv:2310.08185  [pdf, other

    cs.CL cs.AI

    EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

    Authors: Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

    Abstract: Plan-and-Write is a common hierarchical approach in long-form narrative text generation, which first creates a plan to guide the narrative writing. Following this approach, several studies rely on simply prompting large language models for planning, which often yields suboptimal results. In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narr… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  14. arXiv:2309.17452  [pdf, other

    cs.CL cs.AI

    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

    Authors: Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen

    Abstract: Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers)… ▽ More

    Submitted 21 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ICLR 2024; First two authors equal contribution

  15. arXiv:2309.17272  [pdf, other

    cs.CL cs.AI cs.SE

    Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency

    Authors: Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, Nan Duan

    Abstract: Large language models (LLMs) have exhibited remarkable ability in code generation. However, generating the correct solution in a single attempt still remains a challenge. Prior works utilize verification properties in software engineering to verify and re-rank solutions in a majority voting manner. But the assumption behind them that generated verification properties have better qualities than sol… ▽ More

    Submitted 20 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Preprint version

  16. arXiv:2309.09506  [pdf, other

    cs.CV cs.CL

    LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

    Authors: Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan

    Abstract: Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception. Existing methods primarily treat layout generation as a numerical optimization task, focusing on quantitative aspects while overlooking the semantic information of layout, such as the relationship between each layout element. In this paper, we propose LayoutNUWA, the first m… ▽ More

    Submitted 19 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

  17. arXiv:2308.13785  [pdf, other

    cs.CV

    ORES: Open-vocabulary Responsible Visual Synthesis

    Authors: Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

    Abstract: Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avo… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  18. arXiv:2308.10032  [pdf, other

    cs.CL

    GameEval: Evaluating LLMs on Conversational Games

    Authors: Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

    Abstract: The rapid advancements in large language models (LLMs) have presented challenges in evaluating those models. Existing evaluation methods are either reference-based or preference based, which inevitably need human intervention or introduce test bias caused by evaluator models. In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcomin… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  19. arXiv:2308.08089  [pdf, other

    cs.CV

    DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

    Authors: Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan

    Abstract: Controllable video generation has gained significant attention in recent years. However, two main limitations persist: Firstly, most existing works focus on either text, image, or trajectory-based control, leading to an inability to achieve fine-grained control in videos. Secondly, trajectory control research is still in its early stages, with most experiments being conducted on simple datasets li… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  20. arXiv:2306.15604  [pdf, other

    cs.CL cs.SE

    Constructing Multilingual Code Search Dataset Using Neural Machine Translation

    Authors: Ryo Sekizawa, Nan Duan, Shuai Lu, Hitomi Yanaka

    Abstract: Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only in English. In this research, we create a multilingual code search dataset in four natural and four programming languages using a neural machine translation mo… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings of the ACL2023 Student Research Workshop (SRW)

  21. arXiv:2306.15255  [pdf, other

    cs.CV cs.CL

    GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

    Authors: Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

    Abstract: In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 4 tables, the champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023

  22. arXiv:2306.14893  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    LongCoder: A Long-Range Pre-trained Language Model for Code Completion

    Authors: Daya Guo, Canwen Xu, Nan Duan, Jian Yin, Julian McAuley

    Abstract: In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens - bridge tokens and memory tokens - to improve performance and efficiency. Bridge tokens are inserted… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  23. arXiv:2306.09212  [pdf, other

    cs.CL

    CMMLU: Measuring massive multitask language understanding in Chinese

    Authors: Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

    Abstract: As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging. This paper aims to bridge this gap by introducing CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities. We conduct a thorough evaluation of 18 advanced multilingu… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  24. arXiv:2306.00103  [pdf, other

    cs.CV cs.CL cs.LG

    ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

    Authors: Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

    Abstract: Two-Tower Vision-Language (VL) models have shown promising improvements on various downstream VL tasks. Although the most advanced work improves performance by building bridges between encoders, it suffers from ineffective layer-by-layer utilization of uni-modal representations and cannot flexibly exploit different levels of uni-modal semantic knowledge. In this work, we propose ManagerTower, a no… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 Main Conference, Oral

  25. arXiv:2305.15294  [pdf, other

    cs.CL

    Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

    Authors: Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen

    Abstract: Large language models are powerful text processors and reasoners, but are still subject to limitations including outdated knowledge and hallucinations, which necessitates connecting them to the world. Retrieval-augmented large language models have raised extensive attention for grounding model generation on external knowledge. However, retrievers struggle to capture relevance, especially for queri… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Preprint

  26. arXiv:2305.14766  [pdf, other

    cs.CL

    Allies: Prompting Large Language Model with Beam Search

    Authors: Hao Sun, Xiao Liu, Yeyun Gong, Yan Zhang, Daxin Jiang, Linjun Yang, Nan Duan

    Abstract: With the advance of large language models (LLMs), the research field of LLM applications becomes more and more popular and the idea of constructing pipelines to accomplish complex tasks by stacking LLM API calls come true. However, this kind of methods face two limitations: narrow information coverage and low fault tolerance. In this work, we propose a novel method called ALLIES. Given an input qu… ▽ More

    Submitted 19 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP2023

  27. arXiv:2305.14283  [pdf, other

    cs.CL

    Query Rewriting for Retrieval-Augmented Large Language Models

    Authors: Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

    Abstract: Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline, making remarkable progress in knowledge-intensive tasks. This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. Unlike prior studies focusing on adapting either the retriever o… ▽ More

    Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP2023

  28. arXiv:2305.13071  [pdf, other

    cs.CL

    Machine-Created Universal Language for Cross-lingual Transfer

    Authors: Yaobo Liang, Quanzhi Zhu, Junhe Zhao, Nan Duan

    Abstract: There are two primary approaches to addressing cross-lingual transfer: multilingual pre-training, which implicitly aligns the hidden representations of various languages, and translate-test, which explicitly translates different languages into an intermediate language, such as English. Translate-test offers better interpretability compared to multilingual pre-training. However, it has lower perfor… ▽ More

    Submitted 16 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Journal ref: AAAI 2024

  29. arXiv:2305.11738  [pdf, other

    cs.CL cs.AI

    CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

    Authors: Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen

    Abstract: Recent developments in large language models (LLMs) have been impressive. However, these models sometimes show inconsistencies and problematic behavior, such as hallucinating facts, generating flawed code, or creating offensive and toxic content. Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checkin… ▽ More

    Submitted 21 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: ICLR 2024

  30. arXiv:2305.09515  [pdf, other

    cs.CL

    AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

    Authors: Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen

    Abstract: Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained… ▽ More

    Submitted 13 December, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accept By NIPS 2023

  31. arXiv:2305.06647  [pdf, other

    cs.CL

    PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

    Authors: Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

    Abstract: Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance. This work proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on n-grams, which can be applied to zero-shot summarization with pre-training. PROM adds an indicator layer to… ▽ More

    Submitted 28 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted by COLING2024

  32. arXiv:2305.05383  [pdf, other

    cs.PL cs.AI cs.CL cs.SE

    Code Execution with Pre-trained Language Models

    Authors: Chenxiao Liu, Shuai Lu, Weizhu Chen, Daxin Jiang, Alexey Svyatkovskiy, Shengyu Fu, Neel Sundaresan, Nan Duan

    Abstract: Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, we investigate how well pre-trained models can understand and perform code execution. We develop a mutation-based data augmentati… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to the Findings of ACL 2023

  33. arXiv:2304.11657  [pdf, other

    cs.CL cs.AI

    Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrap** in Large Language Models

    Authors: Jiashuo Sun, Yi Luo, Yeyun Gong, Chen Lin, Yelong Shen, Jian Guo, Nan Duan

    Abstract: Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations. However, the reasoning chains of demonstrations generated by LLMs are prone to errors, which can subsequently lead to incorrect reasoning during inference. Furthermore, inappropriate exemplars (overly simplistic or comple… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted by NAACL 2024 Findings

  34. arXiv:2304.10464  [pdf, other

    cs.CL

    Learning to Plan with Natural Language

    Authors: Yiduo Guo, Yaobo Liang, Chenfei Wu, Wenshan Wu, Dongyan Zhao, Nan Duan

    Abstract: Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks. For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step. LLMs can directly generate task plans, but these plans may still contain factual errors or are incomplete. A high-quality task plan contains correct step-by-step solu… ▽ More

    Submitted 12 December, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Large Language Model, Learning from feedback, Planning and Reasoning

  35. arXiv:2304.08103  [pdf, other

    cs.CL cs.HC

    Low-code LLM: Graphical User Interface over Large Language Models

    Authors: Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

    Abstract: Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphica… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: Accepted as a Demo Track paper at NAACL 2024

  36. arXiv:2304.06364  [pdf, other

    cs.CL cs.AI

    AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

    Authors: Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan

    Abstract: Evaluating the general abilities of foundation models to tackle human-level tasks is a vital aspect of their development and application in the pursuit of Artificial General Intelligence (AGI). Traditional benchmarks, which rely on artificial datasets, may not accurately represent human-level capabilities. In this paper, we introduce AGIEval, a novel benchmark specifically designed to assess found… ▽ More

    Submitted 18 September, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 19 pages

  37. arXiv:2304.01196  [pdf, other

    cs.CL cs.AI

    Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data

    Authors: Canwen Xu, Daya Guo, Nan Duan, Julian McAuley

    Abstract: Chat models, such as ChatGPT, have shown impressive capabilities and have been rapidly adopted across numerous domains. However, these models are only accessible through a restricted API, creating barriers for new research and progress in the field. We propose a pipeline that can automatically generate a high-quality multi-turn chat corpus by leveraging ChatGPT to engage in a conversation with its… ▽ More

    Submitted 2 December, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Baize v2; EMNLP 2023

  38. arXiv:2303.16854  [pdf, other

    cs.CL

    AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

    Authors: Xingwei He, Zhenghao Lin, Yeyun Gong, A-Long **, Hang Zhang, Chen Lin, Jian Jiao, Siu Ming Yiu, Nan Duan, Weizhu Chen

    Abstract: Many natural language processing (NLP) tasks rely on labeled data to train machine learning models with high performance. However, data annotation is time-consuming and expensive, especially when the task involves a large amount of data or requires specialized domains. Recently, GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. In this pape… ▽ More

    Submitted 5 April, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to NAACL 2024

  39. arXiv:2303.16434  [pdf, other

    cs.AI cs.CL

    TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

    Authors: Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

    Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  40. arXiv:2303.12346  [pdf, other

    cs.CV cs.AI

    NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

    Authors: Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

    Abstract: In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation. Most current work generates long videos segment by segment sequentially, which normally leads to the gap between training on short videos and inferring long videos, and the sequential generation is inefficient. Instead, our approach adopts a ``coarse-to-fine'' process, in which the… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  41. arXiv:2303.04671  [pdf, other

    cs.CV

    Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

    Authors: Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan

    Abstract: ChatGPT is attracting a cross-field interest as it provides a language interface with remarkable conversational competency and reasoning capabilities across many domains. However, since ChatGPT is trained with languages, it is currently not capable of processing or generating images from the visual world. At the same time, Visual Foundation Models, such as Visual Transformers or Stable Diffusion,… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  42. arXiv:2302.10781  [pdf, other

    cs.CV

    Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

    Authors: Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

    Abstract: 3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: 10 pages, 7 figures

  43. arXiv:2302.01626  [pdf, other

    cs.CL cs.IR

    Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval

    Authors: Shunyu Zhang, Yaobo Liang, Ming Gong, Daxin Jiang, Nan Duan

    Abstract: Recently multi-lingual pre-trained language models (PLM) such as mBERT and XLM-R have achieved impressive strides in cross-lingual dense retrieval. Despite its successes, they are general-purpose PLM while the multilingual PLM tailored for cross-lingual retrieval is still unexplored. Motivated by an observation that the sentences in parallel documents are approximately in the same order, which is… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: Published at ICLR 2023

  44. arXiv:2302.00618  [pdf, other

    cs.CL

    Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

    Authors: Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen

    Abstract: Large language models can perform various reasoning tasks by using chain-of-thought prompting, which guides them to find answers through step-by-step demonstrations. However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly. We introduce Synthetic prompting, a method that leverages a few handcrafted examples to prompt the mod… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: Preprint

  45. arXiv:2212.11685  [pdf, other

    cs.CL cs.LG

    Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

    Authors: Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, Weizhu Chen

    Abstract: In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE. GENIE is a large-scale pretrained diffusion language model that consists of an encoder and a diffusion-based decoder, which can generate text by gradually transforming a random noise sequence into a coherent text sequence. To pre-train GENIE on a large-scale language corpus… ▽ More

    Submitted 17 February, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Previous version title -> GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

  46. arXiv:2212.09114  [pdf, other

    cs.CL

    CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

    Authors: Xingwei He, Yeyun Gong, A-Long **, Hang Zhang, Anlei Dong, Jian Jiao, Siu Ming Yiu, Nan Duan

    Abstract: The dual-encoder has become the de facto architecture for dense retrieval. Typically, it computes the latent representations of the query and document independently, thus failing to fully capture the interactions between the query and document. To alleviate this, recent research has focused on obtaining query-informed document representations. During training, it expands the document with a real q… ▽ More

    Submitted 29 October, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accetpted to EMNLP 2023

  47. arXiv:2212.07841  [pdf, other

    cs.CL cs.IR

    MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers

    Authors: Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, Ji-Rong Wen

    Abstract: Pre-trained Transformers (\eg BERT) have been commonly used in existing dense retrieval methods for parameter initialization, and recent studies are exploring more effective pre-training tasks for further improving the quality of dense vectors. Although various novel and effective tasks have been proposed, their different input formats and learning objectives make them hard to be integrated for jo… ▽ More

    Submitted 19 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted by ECML-PKDD 2023, 16 pages

  48. arXiv:2212.05225  [pdf, other

    cs.IR cs.CL

    LEAD: Liberal Feature-based Distillation for Dense Retrieval

    Authors: Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, **gwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan

    Abstract: Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional methods include response-based methods and feature-based methods. Response-based methods are widely used but suffer from lower upper limits of performance due to their ignorance of intermediate signals, while feature-based methods have constraints on vocabularies,… ▽ More

    Submitted 11 December, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: Accepted by WSDM 2024

  49. arXiv:2211.15518  [pdf, other

    cs.CV

    ReCo: Region-Controlled Text-to-Image Generation

    Authors: Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

    Abstract: Recently, large-scale text-to-image (T2I) models have shown impressive performance in generating high-fidelity images, but with limited controllability, e.g., precisely specifying the content in a specific region with a free-form text description. In this paper, we propose an effective technique for such regional control in T2I generation. We augment T2I models' inputs with an extra set of positio… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  50. arXiv:2211.15395  [pdf, other

    cs.CL cs.LG

    CodeExp: Explanatory Code Document Generation

    Authors: Haotian Cui, Chenglong Wang, Junjie Huang, Jeevana Priya Inala, Todd Mytkowicz, Bo Wang, Jianfeng Gao, Nan Duan

    Abstract: Develo** models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted in Findings of EMNLP 2022

    ACM Class: I.2.2; I.2.7