Skip to main content

Showing 1–50 of 78 results for author: Fei, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19389  [pdf, other

    cs.CV

    OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

    Authors: Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shun** Ji, Chen Change Loy, Shuicheng Yan

    Abstract: Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities and cannot be controlled via text instructions. In contrast, large vision-language multimodal models exhibit powerful vision-based conversation and reasoning capabilities but lack pixel-level understanding and have difficulty accepting visual p… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

    Authors: Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

    Abstract: While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning , under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-tempo… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE TPAMI 2024

    Journal ref: [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  3. arXiv:2406.15177  [pdf, other

    cs.MM

    EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot

    Authors: Hao Fei, Han Zhang, Bin Wang, Lizi Liao, Qian Liu, Erik Cambria

    Abstract: This paper introduces EmpathyEar, a pioneering open-source, avatar-based multimodal empathetic chatbot, to fill the gap in traditional text-only empathetic response generation (ERG) systems. Leveraging the advancements of a large language model, combined with multimodal encoders and generators, EmpathyEar supports user inputs in any combination of text, sound, and vision, and produces multimodal e… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Demonstration Paper

  4. arXiv:2406.05127  [pdf, other

    cs.CV

    Towards Semantic Equivalence of Tokenization in Multimodal LLM

    Authors: Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficiently transforming input visual signals into feature representations that are most beneficial for LLMs. However, existing vision tokenizers, essential for semantic alignment between vision and language, r… ▽ More

    Submitted 27 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Technical Report. The project page: https://chocowu.github.io/SeTok-web/

  5. arXiv:2406.03701  [pdf, other

    cs.MM

    Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction

    Authors: Meishan Zhang, Hao Fei, Bin Wang, Shengqiong Wu, Yixin Cao, Fei Li, Min Zhang

    Abstract: In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this work for the first time introduces the concept of grounded Multimodal Universal Information Extraction (MUIE), providing a unified task framework to… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  6. arXiv:2405.18357  [pdf, other

    cs.CL

    Faithful Logical Reasoning via Symbolic Chain-of-Thought

    Authors: Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, Wynne Hsu

    Abstract: While the recent Chain-of-Thought (CoT) technique enhances the reasoning ability of large language models (LLMs) with the theory of mind, it might still struggle in handling logical reasoning that relies much on symbolic expressions and rigid deducing rules. To strengthen the logical reasoning capability of LLMs, we propose a novel Symbolic Chain-of-Thought, namely SymbCoT, a fully LLM-based frame… ▽ More

    Submitted 11 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 (main proceeding)

  7. arXiv:2405.16759  [pdf, other

    cs.CV cs.LG

    Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

    Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

    Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  8. arXiv:2405.15452  [pdf, other

    cs.CL cs.AI cs.LG

    Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top

    Authors: Keyuan Cheng, Muhammad Asif Ali, Shu Yang, Gang Lin, Yuxuan Zhai, Haoyang Fei, Ke Xu, Lu Yu, Lijie Hu, Di Wang

    Abstract: Multi-hop Question Answering (MQA) under knowledge editing (KE) is a key challenge in Large Language Models (LLMs). While best-performing solutions in this domain use a plan and solve paradigm to split a question into sub-questions followed by response generation, we claim that this approach is sub-optimal as it fails for hard to decompose questions, and it does not explicitly cater to correlated… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 18 pages

  9. arXiv:2405.12564  [pdf, other

    q-bio.QM cs.CL cs.MM

    ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

    Authors: Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

    Abstract: Language Models (LMs) excel in understanding textual descriptions of proteins, as evident in biomedical question-answering tasks. However, their capability falters with raw protein data, such as amino acid sequences, due to a deficit in pretraining on such data. Conversely, Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to pro… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: ACL 2024, 9 pages

  10. arXiv:2404.00492  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-hop Question Answering under Temporal Knowledge Editing

    Authors: Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang

    Abstract: Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. However, existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. To address this limitation, we propose a novel framework, namely TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 23 pages

  11. arXiv:2403.15776  [pdf, other

    cs.CL cs.AI

    Modeling Unified Semantic Discourse Structure for High-quality Headline Generation

    Authors: Minghui Xu, Hao Fei, Fei Li, Shengqiong Wu, Rui Sun, Chong Teng, Donghong Ji

    Abstract: Headline generation aims to summarize a long document with a short, catchy title that reflects the main idea. This requires accurately capturing the core document semantics, which is challenging due to the lengthy and background information-rich na ture of the texts. In this work, We propose using a unified semantic discourse structure (S3) to represent document semantics, achieved by combining do… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  12. arXiv:2403.03605  [pdf, other

    cs.CE

    Multi-time-step coupling of peridynamics and classical continuum mechanics for dynamic brittle fracture

    Authors: Zhong Jiandong, Han Fei, Du Zongliang, Guo Xu

    Abstract: Peridynamics (PD), as a nonlocal theory, is well-suited for solving problems with discontinuities, such as cracks. However, the nonlocal effect of peridynamics makes it computationally expensive for dynamic fracture problems in large-scale engineering applications. As an alternative, this study proposes a multi-time-step (MTS) coupling model of PD and classical continuum mechanics (CCM) based on t… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 36 pages, 17 figures, 81 conferences

  13. arXiv:2403.03486  [pdf, other

    cs.CR

    PhenoAuth: A Novel PUF-Phenotype-based Authentication Protocol for IoT Devices

    Authors: Hongming Fei, Owen Millwood, Gope Prosanta, Jack Miskelly, Biplab Sikdar

    Abstract: Physical Unclonable Functions (PUFs) have been shown to be a highly promising solution for enabling high security systems tailored for low-power devices. Commonly, PUFs are utilised to generate cryptographic keys on-the-fly, replacing the need to store keys in vulnerable, non-volatile memories. Due to the physical nature of PUFs, environmental variations cause noise, manifesting themselves as erro… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures

    MSC Class: 68M25 ACM Class: I.2.8

  14. arXiv:2403.00464  [pdf, other

    cs.CR cs.AR

    Attacking Delay-based PUFs with Minimal Adversary Model

    Authors: Hongming Fei, Owen Millwood, Prosanta Gope, Jack Miskelly, Biplab Sikdar

    Abstract: Physically Unclonable Functions (PUFs) provide a streamlined solution for lightweight device authentication. Delay-based Arbiter PUFs, with their ease of implementation and vast challenge space, have received significant attention; however, they are not immune to modelling attacks that exploit correlations between their inputs and outputs. Research is therefore polarized between develo** modelli… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 13 pages, 6 figures, journal

    MSC Class: 68M25 ACM Class: I.2.8

  15. arXiv:2402.11435  [pdf, other

    cs.CV

    Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

    Authors: Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang

    Abstract: Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and handling text-based tasks. Many efforts are being made to transfer these attributes to video modality, which are termed Video-LLMs. However, existing Video-LLMs can only capture the coarse-grained semantics and are unable to effectively handle tasks related to comprehension or localization of specific video segmen… ▽ More

    Submitted 2 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  16. arXiv:2402.01182  [pdf, other

    cs.CL

    In-Context Learning for Few-Shot Nested Named Entity Recognition

    Authors: Meishan Zhang, Bin Wang, Hao Fei, Min Zhang

    Abstract: In nested Named entity recognition (NER), entities are nested with each other, and thus requiring more data annotations to address. This leads to the development of few-shot nested NER, where the prevalence of pretrained language models with in-context learning (ICL) offers promising solutions. In this work, we introduce an effective and innovative ICL framework for the setting of few-shot nested… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 5 figures

    Journal ref: ICASSP 2024

  17. arXiv:2401.15603  [pdf, other

    cs.LG cs.SI

    Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction

    Authors: Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang, Mengran Yin, Tat-Seng Chua

    Abstract: In recent years, spectral graph neural networks, characterized by polynomial filters, have garnered increasing attention and have achieved remarkable performance in tasks such as node classification. These models typically assume that eigenvalues for the normalized Laplacian matrix are distinct from each other, thus expecting a polynomial filter to have a high fitting ability. However, this paper… ▽ More

    Submitted 18 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI-24

  18. arXiv:2401.10404  [pdf, other

    cs.CV

    Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

    Authors: Xin Yuan, **oo Baek, Keyang Xu, Omer Tov, Hongliang Fei

    Abstract: We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design an efficient architecture by inflating the weightings of the text-to-image SR model into our video generation framework. Additionally, we i… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: WACV'24 workshop

  19. arXiv:2312.15291  [pdf, other

    cs.CL

    Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought

    Authors: Li Zheng, Hao Fei, Fei Li, Bobo Li, Lizi Liao, Donghong Ji, Chong Teng

    Abstract: With the proliferation of dialogic data across the Internet, the Dialogue Commonsense Multi-choice Question Answering (DC-MCQ) task has emerged as a response to the challenge of comprehending user queries and intentions. Although prevailing methodologies exhibit effectiveness in addressing single-choice questions, they encounter difficulties in handling multi-choice queries due to the heightened i… ▽ More

    Submitted 26 December, 2023; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by the 38th Annual AAAI Conference on Artificial Intelligence (AAAI'24, FEBRUARY 20-27, 2024, VANCOUVER, CANADA)

  20. arXiv:2312.11974  [pdf, other

    cs.SD cs.HC eess.AS

    Ms-senet: Enhancing Speech Emotion Recognition Through Multi-scale Feature Fusion With Squeeze-and-excitation Blocks

    Authors: Mengbo Li, Yuanzhong Zheng, Dichucheng Li, Yulun Wu, Yaoxuan Wang, Haojun Fei

    Abstract: Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. Spatiotemporal features play a crucial role in SER, yet current research lacks comprehensive spatiotemporal feature learning. This paper focuses on addressing this gap by proposing a novel approach. In this paper, we employ Convolutional Neural Network (CNN) with varying kernel sizes for spatial… ▽ More

    Submitted 24 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

  21. arXiv:2311.18651  [pdf, other

    cs.CV

    LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

    Authors: Si** Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

    Abstract: Recent advances in Large Multimodal Models (LMM) have made it possible for various applications in human-machine interactions. However, develo** LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene. Existing works seek… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Project Page: https://ll3da.github.io/

  22. arXiv:2311.12890  [pdf, other

    cs.CV

    De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

    Authors: Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks. Unlike end-to-end models that need task-specific data, it advances in performing visual processing and reasoning in an unsupervised manner. Current visual programming methods generate programs in a single pass for each task where the ability to evaluat… ▽ More

    Submitted 25 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  23. arXiv:2310.12798  [pdf, other

    cs.CL cs.MM

    MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

    Authors: Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

    Abstract: Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception - a critical ability of human professionals in comprehending molecules' topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: EMNLP main conference. 9 pages

  24. arXiv:2309.17205  [pdf, other

    cs.CV

    Towards Complex-query Referring Image Segmentation: A Novel Benchmark

    Authors: Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

    Abstract: Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms. However, there has been a lack of research investigating how existing algorithms should be benchmarked with complex language queries, which include more informative descriptions of surrounding objects and backgrounds (\eg \textit{"the black car."} vs. \textit{"t… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  25. arXiv:2309.11368  [pdf, other

    cs.RO cs.AI

    Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition

    Authors: Haolin Fei, Stefano Tedeschi, Yanpei Huang, Andrew Kennedy, Ziwei Wang

    Abstract: Human-robot collaboration has benefited users with higher efficiency towards interactive tasks. Nevertheless, most collaborative schemes rely on complicated human-machine interfaces, which might lack the requisite intuitiveness compared with natural limb control. We also expect to understand human intent with low training data requirements. In response to these challenges, this paper introduces an… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2309.05519  [pdf, other

    cs.AI cs.CL cs.LG

    NExT-GPT: Any-to-Any Multimodal LLM

    Authors: Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua

    Abstract: While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through various modalities, develo** any-to-any MM-LLMs capable of accepting and delivering conte… ▽ More

    Submitted 25 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: ICML 2024 (Oral)

  27. arXiv:2308.13812  [pdf, other

    cs.AI cs.CV

    Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

    Authors: Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua

    Abstract: Text-to-video (T2V) synthesis has gained increasing attention in the community, in which the recently emerged diffusion models (DMs) have promisingly shown stronger performance than the past approaches. While existing state-of-the-art DMs are competent to achieve high-resolution video generation, they may largely suffer from key limitations (e.g., action occurrence disorders, crude video motions)… ▽ More

    Submitted 19 March, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: CVPR 2024

  28. arXiv:2308.10025  [pdf, other

    cs.CL

    I3: Intent-Introspective Retrieval Conditioned on Instructions

    Authors: Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, Tat-Seng Chua, Siliang Tang

    Abstract: Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents. To address this challenge, in this work we leverage instructions to flexibly describe retrieval intents and introduce I3, a unified retrieval system that performs Intent-Introspective… ▽ More

    Submitted 25 April, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted by SIGIR 2024

  29. arXiv:2308.05095  [pdf, other

    cs.CV cs.AI

    LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

    Authors: Leigang Qu, Shengqiong Wu, Hao Fei, Liqiang Nie, Tat-Seng Chua

    Abstract: In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial relation understanding and numeration failure) in complex natural scenes, which impedes the high-faithfulness text-to-image generation. Although recent efforts… ▽ More

    Submitted 12 August, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  30. arXiv:2308.05081  [pdf, other

    cs.CV cs.CL

    Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling

    Authors: Yu Zhao, Hao Fei, Yixin Cao, Bobo Li, Meishan Zhang, Jianguo Wei, Min Zhang, Tat-Seng Chua

    Abstract: Video Semantic Role Labeling (VidSRL) aims to detect the salient events from given videos, by recognizing the predict-argument event structures and the interrelationships between events. While recent endeavors have put forth methods for VidSRL, they can be mostly subject to two key drawbacks, including the lack of fine-grained spatial scene perception and the insufficiently modeling of video tempo… ▽ More

    Submitted 12 August, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  31. arXiv:2308.04502  [pdf, other

    cs.CL

    Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

    Authors: Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li

    Abstract: It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard m… ▽ More

    Submitted 12 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  32. arXiv:2308.04498  [pdf, other

    cs.CL

    DialogRE^C+: An Extension of DialogRE to Investigate How Much Coreference Helps Relation Extraction in Dialogs

    Authors: Yiyun Xiong, Mengwei Dai, Fei Li, Hao Fei, Bobo Li, Shengqiong Wu, Donghong Ji, Chong Teng

    Abstract: Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument r… ▽ More

    Submitted 12 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by NLPCC 2023

  33. arXiv:2308.04424  [pdf, other

    cs.CL

    A Bi-directional Multi-hop Inference Model for Joint Dialog Sentiment Classification and Act Recognition

    Authors: Li Zheng, Fei Li, Yuyang Chai, Chong Teng, Donghong Ji

    Abstract: The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and a… ▽ More

    Submitted 12 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by NLPCC 2023

  34. arXiv:2308.01846  [pdf, other

    cs.CL

    XNLP: An Interactive Demonstration System for Universal Structured NLP

    Authors: Hao Fei, Meishan Zhang, Min Zhang, Tat-Seng Chua

    Abstract: Structured Natural Language Processing (XNLP) is an important subset of NLP that entails understanding the underlying semantic or syntactic structure of texts, which serves as a foundational component for many downstream applications. Despite certain recent efforts to explore universal solutions for specific categories of XNLP tasks, a comprehensive and effective approach for unifying all XNLP tas… ▽ More

    Submitted 21 June, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: ACL 2024 Demonstration Paper

  35. arXiv:2307.05174  [pdf, other

    cs.CL

    Mao-Zedong At SemEval-2023 Task 4: Label Represention Multi-Head Attention Model With Contrastive Learning-Enhanced Nearest Neighbor Mechanism For Multi-Label Text Classification

    Authors: Che Zhang, **'an Liu, Zhenyang Xiao, Haojun Fei

    Abstract: The study of human values is essential in both practical and theoretical domains. With the development of computational linguistics, the creation of large-scale datasets has made it possible to automatically recognize human values accurately. SemEval 2023 Task 4\cite{kiesel:2023} provides a set of arguments and 20 types of human values that are implicitly expressed in each argument. In this paper,… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  36. arXiv:2306.03975  [pdf, other

    cs.CL

    Revisiting Conversation Discourse for Dialogue Disentanglement

    Authors: Bobo Li, Hao Fei, Fei Li, Shengqiong Wu, Lizi Liao, Yinwei Wei, Tat-Seng Chua, Donghong Ji

    Abstract: Dialogue disentanglement aims to detach the chronologically ordered utterances into several independent sessions. Conversation utterances are essentially organized and described by the underlying discourse, and thus dialogue disentanglement requires the full understanding and harnessing of the intrinsic discourse attribute. In this paper, we propose enhancing dialogue disentanglement by taking ful… ▽ More

    Submitted 10 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: under review

  37. arXiv:2306.03974  [pdf, other

    cs.CL

    TKDP: Threefold Knowledge-enriched Deep Prompt Tuning for Few-shot Named Entity Recognition

    Authors: Jiang Liu, Hao Fei, Fei Li, **gye Li, Bobo Li, Liang Zhao, Chong Teng, Donghong Ji

    Abstract: Few-shot named entity recognition (NER) exploits limited annotated instances to identify named mentions. Effectively transferring the internal or external resources thus becomes the key to few-shot NER. While the existing prompt tuning methods have shown remarkable few-shot performances, they still fail to make full use of knowledge. In this work, we investigate the integration of rich knowledge t… ▽ More

    Submitted 10 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: under review

  38. arXiv:2306.03969  [pdf, other

    cs.CL

    ECQED: Emotion-Cause Quadruple Extraction in Dialogs

    Authors: Li Zheng, Donghong Ji, Fei Li, Hao Fei, Shengqiong Wu, **gye Li, Bobo Li, Chong Teng

    Abstract: The existing emotion-cause pair extraction (ECPE) task, unfortunately, ignores extracting the emotion type and cause type, while these fine-grained meta-information can be practically useful in real-world applications, i.e., chat robots and empathic dialog generation. Also the current ECPE is limited to the scenario of single text piece, while neglecting the studies at dialog level that should hav… ▽ More

    Submitted 10 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: under review

  39. arXiv:2305.12260  [pdf, other

    cs.CV cs.CL

    Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment

    Authors: Shengqiong Wu, Hao Fei, Wei Ji, Tat-Seng Chua

    Abstract: Unpaired cross-lingual image captioning has long suffered from irrelevancy and disfluency issues, due to the inconsistencies of the semantic scene and syntax attributes during transfer. In this work, we propose to address the above problems by incorporating the scene graph (SG) structures and the syntactic constituency (SC) trees. Our captioner contains the semantic structure-guided image-to-pivot… ▽ More

    Submitted 25 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  40. arXiv:2305.12258  [pdf, other

    cs.CL

    Constructing Code-mixed Universal Dependency Forest for Unbiased Cross-lingual Relation Extraction

    Authors: Hao Fei, Meishan Zhang, Min Zhang, Tat-Seng Chua

    Abstract: Latest efforts on cross-lingual relation extraction (XRE) aggressively leverage the language-consistent structural features from the universal dependency (UD) resource, while they may largely suffer from biased transfer (e.g., either target-biased or source-biased) due to the inevitable linguistic disparity between languages. In this work, we investigate an unbiased UD-based XRE transfer by constr… ▽ More

    Submitted 4 June, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  41. arXiv:2305.12256  [pdf, other

    cs.CL

    Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

    Authors: Hao Fei, Qian Liu, Meishan Zhang, Min Zhang, Tat-Seng Chua

    Abstract: In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text image pairs, and tested with only source-text inputs. First, we represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic unde… ▽ More

    Submitted 25 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  42. arXiv:2305.11768  [pdf, other

    cs.CV cs.CL

    Generating Visual Spatial Description via Holistic 3D Scene Understanding

    Authors: Yu Zhao, Hao Fei, Wei Ji, Jianguo Wei, Meishan Zhang, Min Zhang, Tat-Seng Chua

    Abstract: Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. Existing VSD work merely models the 2D geometrical vision features, thus inevitably falling prey to the problem of skewed spatial understanding of target objects. In this work, we investigate the incorporation of 3D scene features for VSD. With an external 3D scene extrac… ▽ More

    Submitted 25 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  43. arXiv:2305.11719  [pdf, other

    cs.CV cs.CL

    Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

    Authors: Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua

    Abstract: Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting. First, we represent the fine-grained semantic structures of the input… ▽ More

    Submitted 25 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  44. arXiv:2305.11255  [pdf, other

    cs.CL

    Reasoning Implicit Sentiment with Chain-of-Thought Prompting

    Authors: Hao Fei, Bobo Li, Qian Liu, Lidong Bing, Fei Li, Tat-Seng Chua

    Abstract: While sentiment analysis systems try to determine the sentiment polarities of given targets based on the key opinion expressions in input texts, in implicit sentiment analysis (ISA) the opinion cues come in an implicit and obscure manner. Thus detecting implicit sentiment requires the common-sense and multi-hop reasoning ability to infer the latent intent of opinion. Inspired by the recent chain-o… ▽ More

    Submitted 8 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: ACL2023 Short Paper

  45. arXiv:2305.03741  [pdf, other

    cs.LG cs.AI

    AmGCL: Feature Imputation of Attribute Missing Graph via Self-supervised Contrastive Learning

    Authors: Xiaochuan Zhang, Mengran Li, Ye Wang, Haojun Fei

    Abstract: Attribute graphs are ubiquitous in multimedia applications, and graph representation learning (GRL) has been successful in analyzing attribute graph data. However, incomplete graph data and missing node attributes can have a negative impact on media knowledge discovery. Existing methods for handling attribute missing graph have limited assumptions or fail to capture complex attribute-graph depende… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  46. arXiv:2305.01278  [pdf, other

    cs.CV cs.AI cs.CL

    VPGTrans: Transfer Visual Prompt Generator across LLMs

    Authors: Ao Zhang, Hao Fei, Yuan Yao, Wei Ji, Li Li, Zhiyuan Liu, Tat-Seng Chua

    Abstract: While develo** a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm. However, further tuning the VPG part of the MLLM still suffers from indispensable computational costs, i.e., requiring thousands of GPU h… ▽ More

    Submitted 23 October, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: Project Website: https://vpgtrans.github.io Code: https://github.com/VPGTrans/VPGTrans NeurIPS 2023

  47. On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training

    Authors: Hao Fei, Tat-Seng Chua, Chenliang Li, Donghong Ji, Meishan Zhang, Yafeng Ren

    Abstract: Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind the social media texts or reviews, which has been a fundamental application to the real-world society. Since the early 2010s, ABSA has achieved extraordinarily high accuracy with various deep neural models. However, existing ABSA models with… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted in ACM Transactions on Information Systems

    Journal ref: [J]. ACM Transactions on Information Systems, 2022, 41(2): 1-32

  48. arXiv:2304.06248  [pdf, other

    cs.CL

    LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

    Authors: Hao Fei, Shengqiong Wu, **gye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, Tat-Seng Chua

    Abstract: Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UI… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: NeurIPS2022 conference paper

  49. arXiv:2211.05705  [pdf, other

    cs.CL

    DiaASQ : A Benchmark of Conversational Aspect-based Sentiment Quadruple Analysis

    Authors: Bobo Li, Hao Fei, Fei Li, Yuhan Wu, **song Zhang, Shengqiong Wu, **gye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, Donghong Ji

    Abstract: The rapid development of aspect-based sentiment analysis (ABSA) within recent decades shows great potential for real-world society. The current ABSA works, however, are mostly limited to the scenario of a single text piece, leaving the study in dialogue contexts unexplored. To bridge the gap between fine-grained sentiment analysis and conversational opinion mining, in this work, we introduce a nov… ▽ More

    Submitted 22 May, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: Accepted to Findings of ACL 2023

  50. arXiv:2210.16541  [pdf, other

    cs.CL

    Entity-centered Cross-document Relation Extraction

    Authors: Fengqi Wang, Fei Li, Hao Fei, **gye Li, Shengqiong Wu, Fangfang Su, Wenxuan Shi, Donghong Ji, Bo Cai

    Abstract: Relation Extraction (RE) is a fundamental task of information extraction, which has attracted a large amount of research attention. Previous studies focus on extracting the relations within a sentence or document, while currently researchers begin to explore cross-document RE. However, current cross-document RE methods directly utilize text snippets surrounding target entities in multiple given do… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: This paper was accepted by EMNLP 2022 conference