Skip to main content

Showing 1–50 of 149 results for author: Dou, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.19760  [pdf, other

    cs.IR cs.CL

    Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation

    Authors: Chenlong Deng, Kelong Mao, Zhicheng Dou

    Abstract: Legal case retrieval for sourcing similar cases is critical in upholding judicial fairness. Different from general web search, legal case retrieval involves processing lengthy, complex, and highly specialized legal documents. Existing methods in this domain often overlook the incorporation of legal expert knowledge, which is crucial for accurately understanding and modeling legal cases, leading to… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.18676  [pdf, other

    cs.CL cs.AI cs.LG

    Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

    Authors: Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen

    Abstract: Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in develo** a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align div… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Work in progress

  4. arXiv:2406.17988  [pdf, other

    cs.CV

    DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

    Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wen** Wang, Lingjie Liu

    Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 3 tables

  5. arXiv:2406.16332  [pdf, other

    cs.IR cs.CL

    DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task

    Authors: Wenhan Liu, Yutao Zhu, Zhicheng Dou

    Abstract: Recently, there has been increasing interest in applying large language models (LLMs) as zero-shot passage rankers. However, few studies have explored how to select appropriate in-context demonstrations for the passage ranking task, which is the focus of this paper. Previous studies mainly apply a demonstration retriever to retrieve demonstrations and use top-$k$ demonstrations for in-context lear… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16213  [pdf, other

    cs.LG

    Provable Statistical Rates for Consistency Diffusion Models

    Authors: Zehao Dou, Minshuo Chen, Mengdi Wang, Zhuoran Yang

    Abstract: Diffusion models have revolutionized various application domains, including computer vision and audio generation. Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. In response, consistency models have been developed to merge multiple steps in the sampling process, thereby significantly boosting the s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 28 pages, 2 figures

  7. arXiv:2406.12566  [pdf, other

    cs.CL

    RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

    Authors: Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant asp… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  9. arXiv:2406.10367  [pdf, other

    cs.LG

    Disentangled Hyperbolic Representation Learning for Heterogeneous Graphs

    Authors: Qijie Bai, Changli Nie, Haiwei Zhang, Zhicheng Dou, Xiaojie Yuan

    Abstract: Heterogeneous graphs have attracted a lot of research interests recently due to the success for representing complex real-world systems. However, existing methods have two pain points in embedding them into low-dimensional spaces: the mixing of structural and semantic information, and the distributional mismatch between data and embedding spaces. These two challenges require representation methods… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  10. arXiv:2406.05654  [pdf, other

    cs.CL cs.IR

    DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

    Authors: Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in kee** up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  11. arXiv:2406.01495  [pdf, other

    cs.CL

    Reflection-Reinforced Self-Training for Language Agents

    Authors: Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng

    Abstract: Self-training can potentially improve the performance of language agents without relying on demonstrations from humans or stronger models. The general process involves generating samples from a model, evaluating their quality, and updating the model by training on high-quality samples. However, self-training can face limitations because achieving good performance requires a good amount of high-qua… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  12. arXiv:2405.19670  [pdf, other

    cs.CL

    One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

    Authors: Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, Ji-Rong Wen

    Abstract: Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved information or directly fine-tune LLMs to adapt to RAG scenarios. Although fine-tuning can yield better performance, it often compromises the LLMs' general genera… ▽ More

    Submitted 8 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: working in progress, repo: https://github.com/DaoD/SPRING/

  13. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  14. arXiv:2405.16888  [pdf, other

    cs.GR cs.CV

    Part123: Part-aware 3D Reconstruction from a Single-view Image

    Authors: Anran Liu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Zhiyang Dou, Hao-Xiang Guo, ** Luo, Wen** Wang

    Abstract: Recently, the emergence of diffusion models has opened up new opportunities for single-view reconstruction. However, all the existing methods represent the target object as a closed mesh devoid of any structural information, thus neglecting the part-based structure, which is crucial for many downstream applications, of the reconstructed shape. Moreover, the generated meshes usually suffer from lar… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGGRAPH 2024 (conference track),webpage: https://liuar0512.github.io/part123_official_page/

  15. arXiv:2405.16802  [pdf, other

    cs.CL cs.LG

    AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation

    Authors: Jianqiao Lu, Zhiyang Dou, Hongru Wang, Zeyu Cao, Jianbo Dai, Yingjia Wan, Yinya Huang, Zhijiang Guo

    Abstract: In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic proce… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: 20 pages, 1 figure, 13 tables

  16. arXiv:2405.16635  [pdf, other

    cs.CL

    Compressing Lengthy Context With UltraGist

    Authors: Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

    Abstract: Compressing lengthy context is a critical but technically challenging problem. In this paper, we propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context due to the innovative design of the compression and learning algorithm. UltraGist brings forth the following important benefits. Firstly, it notably contributes to the flexibility of compre… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  17. arXiv:2405.15318  [pdf, other

    cs.CL cs.AI

    Are Long-LLMs A Necessity For Long-Context Tasks?

    Authors: Hong** Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Yujia Zhou, Xu Chen, Zhicheng Dou

    Abstract: The learning and deployment of long-LLMs remains a challenging problem despite recent progresses. In this work, we argue that the long-LLMs are not a necessity to solve long-context tasks, as common long-context tasks are short-context solvable, i.e. they can be solved by purely working with oracle short-contexts within the long-context tasks' inputs. On top of this argument, we propose a framewor… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 18 pages

  18. arXiv:2405.13576  [pdf, other

    cs.CL cs.IR

    FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research

    Authors: Jiajie **, Yutao Zhu, Xinyu Yang, Chenghao Zhang, Zhicheng Dou

    Abstract: With the advent of Large Language Models (LLMs), the potential of Retrieval Augmented Generation (RAG) techniques have garnered considerable research attention. Numerous novel algorithms and models have been introduced to enhance various aspects of RAG systems. However, the absence of a standardized framework for implementation, coupled with the inherently intricate RAG process, makes it challengi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 8 pages

  19. arXiv:2405.05001  [pdf, other

    cs.CV

    HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution

    Authors: Shu-Chuan Chu, Zhi-Chao Dou, Jeng-Shyang Pan, Shaowei Weng, Junbao Li

    Abstract: Transformer-based methods have demonstrated excellent performance on super-resolution visual tasks, surpassing conventional convolutional neural networks. However, existing work typically restricts self-attention computation to non-overlap** windows to save computational costs. This means that Transformer-based networks can only use input information from a limited spatial range. Therefore, a no… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 12 pages, 10 figures, conference

  20. arXiv:2404.19553  [pdf, other

    cs.CL

    Extending Llama-3's Context Ten-Fold Overnight

    Authors: Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hong** Qian, Qiwei Ye, Zhicheng Dou

    Abstract: We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  21. arXiv:2404.17779  [pdf, other

    cs.CL

    Medical Vision-Language Pre-Training for Brain Abnormalities

    Authors: Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang

    Abstract: Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take b… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  22. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  23. arXiv:2404.14851  [pdf, other

    cs.IR cs.AI cs.CL

    From Matching to Generation: A Survey on Generative Information Retrieval

    Authors: Xiaoxi Li, Jiajie **, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, Zhicheng Dou

    Abstract: Information Retrieval (IR) systems are crucial tools for users to access information, widely applied in scenarios like search engines, question answering, and recommendation systems. Traditional IR methods, based on similarity matching to return ranked lists of documents, have been reliable means of information acquisition, dominating the IR field for years. With the advancement of pre-trained lan… ▽ More

    Submitted 15 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  24. arXiv:2404.13874  [pdf, other

    cs.CL cs.CV

    VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

    Authors: Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

    Abstract: Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucina… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Findings

  25. arXiv:2404.13556  [pdf, other

    cs.IR cs.CL

    ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

    Authors: Kelong Mao, Chenlong Deng, Haonan Chen, Fengran Mo, Zheng Liu, Tetsuya Sakai, Zhicheng Dou

    Abstract: Conversational search requires accurate interpretation of user intent from complex multi-turn contexts. This paper presents ChatRetriever, which inherits the strong generalization capability of large language models to robustly represent complex conversational sessions for dense retrieval. To achieve this, we propose a simple and effective dual-learning approach that adapts LLM for retrieval via c… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  26. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  27. arXiv:2403.13307  [pdf, other

    cs.CV

    LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment

    Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yu**g Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma

    Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descript… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  28. An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models

    Authors: Qi Liu, Gang Guo, Jiaxin Mao, Zhicheng Dou, Ji-Rong Wen, Hao Jiang, Xinyu Zhang, Zhao Cao

    Abstract: With the development of pre-trained language models, the dense retrieval models have become promising alternatives to the traditional retrieval models that rely on exact match and sparse bag-of-words representations. Different from most dense retrieval models using a bi-encoder to encode each query or document into a dense vector, the recently proposed late-interaction multi-vector models (i.e., C… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM Transactions on Information Systems

  29. arXiv:2402.14690  [pdf, other

    cs.CL

    UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

    Authors: Zhaoheng Huang, Zhicheng Dou, Yutao Zhu, Ji-rong Wen

    Abstract: Large language models (LLMs) may generate text that lacks consistency with human knowledge, leading to factual inaccuracies or \textit{hallucination}. Existing research for evaluating the factuality of LLMs involves extracting fact claims using an LLM and verifying them against a predefined fact source. However, these evaluation metrics are task-specific, and not scalable, and the substitutability… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: under review

  30. arXiv:2402.12774  [pdf, other

    cs.IR

    Interpreting Conversational Dense Retrieval by Rewriting-Enhanced Inversion of Session Embedding

    Authors: Yiruo Cheng, Kelong Mao, Zhicheng Dou

    Abstract: Conversational dense retrieval has shown to be effective in conversational search. However, a major limitation of conversational dense retrieval is their lack of interpretability, hindering intuitive understanding of model behaviors for targeted improvements. This paper presents CONVINV, a simple yet effective approach to shed light on interpretable conversational dense retrieval models. CONVINV t… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024. Repo: https://github.com/Ariya12138/ConvInv

  31. arXiv:2402.12174  [pdf, other

    cs.CL

    BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence

    Authors: Jiajie **, Yutao Zhu, Yujia Zhou, Zhicheng Dou

    Abstract: Retrieval-augmented large language models (LLMs) have demonstrated efficacy in knowledge-intensive tasks such as open-domain QA, addressing inherent challenges in knowledge update and factual inadequacy. However, inconsistencies between retrieval knowledge and the necessary knowledge for LLMs, leading to a decline in LLM's answer quality. This paper introduces BIDER, an approach that refines retri… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings

  32. arXiv:2402.12052  [pdf, other

    cs.CL

    Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs

    Authors: Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen

    Abstract: The integration of large language models (LLMs) and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 main conference. Repo: https://github.com/plageon/SlimPLM

  33. arXiv:2402.11626  [pdf, other

    cs.CL cs.IR

    Metacognitive Retrieval-Augmented Large Language Models

    Authors: Yujia Zhou, Zheng Liu, Jiajie **, Jian-Yun Nie, Zhicheng Dou

    Abstract: Retrieval-augmented generation have become central in natural language processing due to their efficacy in generating factual content. While traditional methods employ single-time retrieval, more recent approaches have shifted towards multi-time retrieval for multi-hop reasoning tasks. However, these strategies are bound by predefined reasoning steps, potentially leading to inaccuracies in respons… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW 2024

  34. arXiv:2402.10548  [pdf, other

    cs.IR

    Cognitive Personalized Search Integrating Large Language Models with an Efficient Memory Mechanism

    Authors: Yujia Zhou, Qiannan Zhu, Jiajie **, Zhicheng Dou

    Abstract: Traditional search engines usually provide identical search results for all users, overlooking individual preferences. To counter this limitation, personalized search has been developed to re-rank results based on user preferences derived from query logs. Deep learning-based personalized search methods have shown promise, but they rely heavily on abundant training data, making them susceptible to… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW 2024

  35. arXiv:2402.09760  [pdf, other

    cs.CL cs.AI cs.IR

    Grounding Language Model with Chunking-Free In-Context Retrieval

    Authors: Hong** Qian, Zheng Liu, Kelong Mao, Yujia Zhou, Zhicheng Dou

    Abstract: This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems often struggle with grounding responses using precise evidence text due to the challenges of processing lengthy documents and filtering out irrelevant content. Commonly employed solutions, such as document chunking and adapt… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  36. arXiv:2402.07092  [pdf, other

    cs.CL cs.IR

    Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation

    Authors: Haonan Chen, Zhicheng Dou, Kelong Mao, Jiongnan Liu, Ziliang Zhao

    Abstract: Conversational search utilizes muli-turn natural language contexts to retrieve relevant passages. Existing conversational dense retrieval models mostly view a conversation as a fixed sequence of questions and responses, overlooking the severe data sparsity problem -- that is, users can perform a conversation in various ways, and these alternate conversations are unrecorded. Consequently, they ofte… ▽ More

    Submitted 3 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  37. arXiv:2402.07076  [pdf, other

    cs.IR cs.AI

    Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

    Authors: Haonan Chen, Zhicheng Dou, Xuetong Hao, Yunhao Tao, Shiren Song, Zhenli Sheng

    Abstract: Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to… ▽ More

    Submitted 6 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: KDD 2024, ADS Track

  38. arXiv:2402.01176  [pdf, other

    cs.CL cs.IR

    CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks

    Authors: Xiaoxi Li, Zhicheng Dou, Yujia Zhou, Fangchao Liu

    Abstract: Large language models (LLMs) have gained significant attention in various fields but prone to hallucination, especially in knowledge-intensive (KI) tasks. To address this, retrieval-augmented generation (RAG) has emerged as a popular solution to enhance factual accuracy. However, traditional retrieval modules often rely on large document index and disconnect with generative tasks. With the advent… ▽ More

    Submitted 21 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  39. arXiv:2401.12946  [pdf, other

    cs.CV cs.CG cs.GR

    Coverage Axis++: Efficient Inner Point Selection for 3D Shape Skeletonization

    Authors: Zimeng Wang, Zhiyang Dou, Rui Xu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Shiqing Xin, Taku Komura, Xiaoming Yuan, Wen** Wang

    Abstract: We introduce Coverage Axis++, a novel and efficient approach to 3D shape skeletonization. The current state-of-the-art approaches for this task often rely on the watertightness of the input or suffer from substantial computational costs, thereby limiting their practicality. To address this challenge, Coverage Axis++ proposes a heuristic algorithm to select skeletal points, offering a high-accuracy… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: SGP2024. Project Page: https://frank-zy-dou.github.io/projects/CoverageAxis++/index.html

  40. arXiv:2401.12445  [pdf, other

    cs.IR

    Session-level Normalization and Click-through Data Enhancement for Session-based Evaluation

    Authors: Haonan Chen, Zhicheng Dou, Jiaxin Mao

    Abstract: Since a user usually has to issue a sequence of queries and examine multiple documents to resolve a complex information need in a search session, researchers have paid much attention to evaluating search systems at the session level rather than the single-query level. Most existing session-level metrics evaluate each query separately and then aggregate the query-level scores using a session-level… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  41. arXiv:2401.08046  [pdf, other

    cs.CL cs.AI

    Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis

    Authors: Zhicheng Dou, Yuchen Guo, Ching-Chun Chang, Huy H. Nguyen, Isao Echizen

    Abstract: The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  42. arXiv:2401.06532  [pdf, other

    cs.CL cs.IR

    INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning

    Authors: Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie, Zheng Liu, Ji-Rong Wen, Zhicheng Dou

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks. Despite this, their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language. While prompt-based methods can provide task descriptions to LLMs, they often fall short in facilitating a compr… ▽ More

    Submitted 28 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL 2024 main conference. Repo: https://github.com/DaoD/INTERS

  43. arXiv:2401.03462  [pdf, other

    cs.CL cs.AI

    Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

    Authors: Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

    Abstract: The utilization of long contexts poses a big challenge for LLMs due to their limited context window size. Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities. In this work, we propose a new method called Activation Beacon, which condenses LLM's… ▽ More

    Submitted 2 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  44. arXiv:2312.11036  [pdf, other

    cs.IR cs.AI cs.CL

    UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models

    Authors: Xiaoxi Li, Yujia Zhou, Zhicheng Dou

    Abstract: Generative information retrieval, encompassing two major tasks of Generative Document Retrieval (GDR) and Grounded Answer Generation (GAR), has gained significant attention in the area of information retrieval and natural language processing. Existing methods for GDR and GAR rely on separate retrieval and reader modules, which hinder simultaneous optimization. To overcome this, we present \textbf{… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  45. arXiv:2312.05295  [pdf, other

    cs.CV

    Disentangled Clothed Avatar Generation from Text Descriptions

    Authors: Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Xin Li, Wen** Wang, Rong Xie, Li Song

    Abstract: In this paper, we introduced a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements-clothes, hair, and body-into a single 3D representation. Suc… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Project page: https://shanemankiw.github.io/SO-SMPL/

  46. arXiv:2312.03628  [pdf, other

    cs.CV

    Boosting Segment Anything Model Towards Open-Vocabulary Learning

    Authors: Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Zhenjun Han, Qi Tian

    Abstract: The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting. Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics. In this paper, we present Sambor to seamlessly integrate SAM with the open-vocabulary object… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  47. arXiv:2312.02256  [pdf, other

    cs.CV cs.AI cs.GR

    EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

    Authors: Wenyang Zhou, Zhiyang Dou, Zeyu Cao, Zhouyingcheng Liao, **gbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wen** Wang, Lingjie Liu

    Abstract: We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a la… ▽ More

    Submitted 14 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project Page: https://frank-zy-dou.github.io/projects/EMDM/index.html

  48. arXiv:2311.17135  [pdf, other

    cs.CV cs.GR

    TLControl: Trajectory and Language Control for Human Motion Synthesis

    Authors: Weilin Wan, Zhiyang Dou, Taku Komura, Wen** Wang, Dinesh Jayaraman, Lingjie Liu

    Abstract: Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion s… ▽ More

    Submitted 12 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  49. arXiv:2311.17050  [pdf, other

    cs.CV cs.GR

    Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

    Authors: Zhengming Yu, Zhiyang Dou, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wen** Wang

    Abstract: We present Surf-D, a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies using Diffusion models. Previous methods explored shape generation with different representations and they suffer from limited topologies and poor geometry details. To generate high-quality surfaces of arbitrary topologies, we use the Unsigned Distance Field (UDF) as our surface representa… ▽ More

    Submitted 22 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Page: https://yzmblog.github.io/projects/SurfD/

  50. arXiv:2311.01620  [pdf, other

    cs.CV cs.CL

    ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos

    Authors: Te-Lin Wu, Zi-Yi Dou, Qingyuan Hu, Yu Hou, Nischal Reddy Chandra, Marjorie Freedman, Ralph M. Weischedel, Nanyun Peng

    Abstract: Multimodal counterfactual reasoning is a vital yet challenging ability for AI systems. It involves predicting the outcomes of hypothetical circumstances based on vision and language inputs, which enables AI models to learn from failures and explore hypothetical scenarios. Despite its importance, there are only a few datasets targeting the counterfactual reasoning abilities of multimodal models. Am… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)