Skip to main content

Showing 1–50 of 147 results for author: Shang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01065  [pdf, other

    cs.LG

    Improve ROI with Causal Learning and Conformal Prediction

    Authors: Meng Ai, Zhuo Chen, Jibin Wang, **g Shang, Tao Tao, Zhen Li

    Abstract: In the commercial sphere, such as operations and maintenance, advertising, and marketing recommendations, intelligent decision-making utilizing data mining and neural network technologies is crucial, especially in resource allocation to optimize ROI. This study delves into the Cost-aware Binary Treatment Assignment Problem (C-BTAP) across different industries, with a focus on the state-of-the-art… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ICDE 2024; Link: https://icde2024.github.io/papers.html

  2. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, **ghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.18312  [pdf, other

    cs.CL cs.AI

    AI-native Memory: A Pathway from LLMs Towards AGI

    Authors: **gbo Shang, Zai Zheng, Xiang Ying, Felix Tao, Mindverse Team

    Abstract: Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective conte… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.13236  [pdf, other

    cs.CL cs.AI

    Data Contamination Can Cross Language Barriers

    Authors: Feng Yao, Yufan Zhuang, Zihao Sun, Sunan Xu, Animesh Kumar, **gbo Shang

    Abstract: The opacity in develo** large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingua… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  5. arXiv:2406.11115  [pdf, other

    cs.CL

    Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

    Authors: Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, **gbo Shang

    Abstract: For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot gene… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  6. arXiv:2406.06567  [pdf, other

    cs.LG cs.AI cs.CL

    DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

    Authors: Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun

    Abstract: Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate subst… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures, 3 tables

  7. arXiv:2406.04460  [pdf, other

    cs.CL

    Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs

    Authors: Shang Zhou, Feng Yao, Chengyu Dong, Zihan Wang, **gbo Shang

    Abstract: Controlling the attribute intensity of text generation is crucial across scenarios (e.g., writing conciseness, chatting emotion, and explanation clarity). The remarkable capabilities of large language models (LLMs) have revolutionized text generation, prompting us to explore such \emph{smooth control} of LLM generation. Specifically, we propose metrics to assess the range, calibration, and consist… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  8. arXiv:2406.00226  [pdf, other

    cs.CL

    Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction

    Authors: William Hogan, **gbo Shang

    Abstract: Recent research efforts have explored the potential of leveraging natural language inference (NLI) techniques to enhance relation extraction (RE). In this vein, we introduce MetaEntail-RE, a novel adaptation method that harnesses NLI principles to enhance RE performance. Our approach follows past works by verbalizing relation classes into class-indicative hypotheses, aligning a traditionally multi… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 17 pages, 1 figure

    ACM Class: I.2.7

  9. arXiv:2405.13397  [pdf, other

    cs.CV

    Multi Player Tracking in Ice Hockey with Homographic Projections

    Authors: Harish Prakash, Jia Cheng Shang, Ken M. Nsiempba, Yuhao Chen, David A. Clausi, John S. Zelek

    Abstract: Multi Object Tracking (MOT) in ice hockey pursues the combined task of localizing and associating players across a given sequence to maintain their identities. Tracking players from monocular broadcast feeds is an important computer vision problem offering various downstream analytics and enhanced viewership experience. However, existing trackers encounter significant difficulties in dealing with… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted at the Conference on Robots and Vision (CRV), 2024

  10. arXiv:2405.07726  [pdf, other

    cs.CL

    Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing

    Authors: Letian Peng, **gbo Shang

    Abstract: Persona-driven role-playing (PRP) aims to build AI characters that can respond to user queries by faithfully sticking with all persona statements. Unfortunately, existing faithfulness criteria for PRP are limited to coarse-grained LLM-based scoring without a clear definition or formulation. This paper presents a pioneering exploration to quantify PRP faithfulness as a fine-grained and explainable… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  11. arXiv:2405.04086  [pdf, other

    cs.CL

    Optimizing Language Model's Reasoning Abilities with Weak Supervision

    Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, **gbo Shang

    Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  12. arXiv:2404.15889  [pdf, other

    cs.CV cs.GR

    Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control

    Authors: Linzi Qu, Jiaxiang Shang, Hui Ye, Xiaoguang Han, Hongbo Fu

    Abstract: Geometry- and appearance-controlled full-body human image generation is an interesting but challenging task. Existing solutions are either unconditional or dependent on coarse conditions (e.g., pose, text), thus lacking explicit geometry and appearance control of body and garment. Sketching offers such editing ability and has been adopted in various sketch-based face generation and editing solutio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.14372  [pdf, other

    cs.CL cs.AI

    Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph

    Authors: Xiaochen Kev Gao, Feng Yao, Kewen Zhao, Beilei He, Animesh Kumar, Vish Krishnan, **gbo Shang

    Abstract: Model scaling is becoming the default choice for many language tasks due to the success of large language models (LLMs). However, it can fall short in specific scenarios where simple customized methods excel. In this paper, we delve into the patent approval pre-diction task and unveil that simple domain-specific graph methods outperform enlarging the model, using the intrinsic dependencies within… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 17 Pages, Under Review

  14. arXiv:2404.10877  [pdf, other

    cs.CL

    Incubating Text Classifiers Following User Instruction with Nothing but LLM

    Authors: Letian Peng, **gbo Shang

    Abstract: In this paper, we aim to generate text classification data given arbitrary class definitions (i.e., user instruction), so one can train a small text classifier without any human annotation or raw corpus. Compared with pioneer attempts, our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes (e.g., "TED Talk given by Educator" and "Other"). Spec… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  15. arXiv:2404.07382  [pdf, other

    cs.AI cs.LO

    Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

    Authors: Chenyang An, Zhibo Chen, Qihao Ye, Emily First, Letian Peng, Jiayun Zhang, Zihan Wang, Sorin Lerner, **gbo Shang

    Abstract: Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its traini… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Submitted to ACL on Feb.15th 2024

  16. arXiv:2404.02931  [pdf, other

    cs.CL cs.AI

    READ: Improving Relation Extraction from an ADversarial Perspective

    Authors: Dawei Li, William Hogan, **gbo Shang

    Abstract: Recent works in relation extraction (RE) have achieved promising benchmark accuracy; however, our adversarial attack experiments show that these works excessively rely on entities, making their generalization capability questionable. To address this issue, we propose an adversarial training method specifically designed for RE. Our approach introduces both sequence- and token-level perturbations to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by findings of NAACL 2024

  17. arXiv:2404.00457  [pdf, other

    cs.CL

    MetaIE: Distilling a Meta Model from LLM for All Kinds of Information Extraction Tasks

    Authors: Letian Peng, Zilong Wang, Feng Yao, Zihan Wang, **gbo Shang

    Abstract: Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. I… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  18. arXiv:2404.00439  [pdf, other

    cs.CL

    DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering

    Authors: Alex Nguyen, Zilong Wang, **gbo Shang, Dheeraj Mekala

    Abstract: The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles. These include the complexity of working with PDF formats that necessitate parsing text and layout information for curating training data and the lack of privacy-preserving annotation… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  19. arXiv:2403.20046  [pdf, other

    cs.CL

    Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

    Authors: Yongqi Tong, Dawei Li, Sizhe Wang, Yujia Wang, Fei Teng, **gbo Shang

    Abstract: Recent works have shown the benefits to LLMs from fine-tuning golden-standard Chain-of-Thought (CoT) rationales or using them as correct examples in few-shot prompting. While humans can indeed imitate correct examples, learning from our mistakes is another vital aspect of human cognition. Hence, a question naturally arises: \textit{can LLMs learn and benefit from their mistakes, especially for the… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) - Main Conference

  20. arXiv:2402.16906  [pdf, other

    cs.SE cs.AI cs.CL

    Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

    Authors: Li Zhong, Zilong Wang, **gbo Shang

    Abstract: Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs. However, these works consider the generated programs as an indivisible entity, which falls short for LLMs in debugging the programs, especially when the programs con… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Preprint

  21. arXiv:2402.14158  [pdf, other

    cs.CL

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    Authors: Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, **gbo Shang, Jane Dwivedi-Yu

    Abstract: Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models still struggle with learning how to robustly use new tools from only a few demonstrations. In this work we introduce a self-verification method which distinguish… ▽ More

    Submitted 13 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  22. arXiv:2402.10430  [pdf, other

    cs.CL

    Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models

    Authors: Dheeraj Mekala, Alex Nguyen, **gbo Shang

    Abstract: Instruction-tuning language models has become a crucial step in aligning them for general use. Typically, this process involves extensive training on large datasets, incurring high training costs. In this paper, we introduce a novel training data selection based on the learning percentage of the samples. We assert that current language models possess the capability to autonomously select high-qual… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  23. arXiv:2402.09642  [pdf, other

    cs.CL

    Answer is All You Need: Instruction-following Text Embedding via Answering the Question

    Authors: Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, **gbo Shang

    Abstract: This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the repres… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  24. arXiv:2402.04624  [pdf, other

    cs.CL

    MEMORYLLM: Towards Self-Updatable Large Language Models

    Authors: Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, **gfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, **gbo Shang, Julian McAuley

    Abstract: Existing Large Language Models (LLMs) usually remain static after deployment, which might make it hard to inject new knowledge into the model. We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently. To this end, we introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memo… ▽ More

    Submitted 26 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 13 pages, 9 figures

  25. arXiv:2402.03774  [pdf, other

    cs.LG cs.AI cs.CL

    Learning a Decision Tree Algorithm with Transformers

    Authors: Yufan Zhuang, Liyuan Liu, Chandan Singh, **gbo Shang, Jianfeng Gao

    Abstract: Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To ad… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  26. arXiv:2402.02658  [pdf, other

    cs.AI cs.CL cs.LG

    Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision

    Authors: Zihan Wang, Yunxuan Li, Yuexin Wu, Liangchen Luo, Le Hou, Hongkun Yu, **gbo Shang

    Abstract: Process supervision, using a trained verifier to evaluate the intermediate steps generated by reasoner, has demonstrated significant improvements in multi-step problem solving. In this paper, to avoid expensive human annotation effort on the verifier training data, we introduce Model-induced Process Supervision (MiPS), a novel method for automating data curation. MiPS annotates an intermediate ste… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  27. arXiv:2402.01801  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models for Time Series: A Survey

    Authors: Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, **gbo Shang

    Abstract: Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the vari… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: GitHub repository: https://github.com/xiyuanzh/awesome-llm-time-series

  28. arXiv:2401.04398  [pdf, other

    cs.CL

    Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

    Authors: Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, **gbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches inco… ▽ More

    Submitted 18 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  29. arXiv:2401.01003  [pdf, other

    cs.CV eess.IV

    Rink-Agnostic Hockey Rink Registration

    Authors: Jia Cheng Shang, Yuhao Chen, Mohammad Javad Shafiee, David A. Clausi

    Abstract: Hockey rink registration is a useful tool for aiding and automating sports analysis. When combined with player tracking, it can provide location information of players on the rink by estimating a homography matrix that can warp broadcast video frames onto an overhead template of the rink, or vice versa. However, most existing techniques require accurate ground truth information, which can take man… ▽ More

    Submitted 8 September, 2023; originally announced January 2024.

  30. arXiv:2312.03291  [pdf, other

    cs.LG cs.AI

    OMNIINPUT: A Model-centric Evaluation Framework through Output Distribution

    Authors: Weitang Liu, Ying Wai Li, Tianle Wang, Yi-Zhuang You, **gbo Shang

    Abstract: We propose a novel model-centric evaluation framework, OmniInput, to evaluate the quality of an AI/ML model's predictions on all possible inputs (including human-unrecognizable ones), which is crucial for AI safety and reliability. Unlike traditional data-centric evaluation based on pre-defined test sets, the test set in OmniInput is self-constructed by the model itself and the model quality is ev… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  31. arXiv:2312.00293  [pdf

    cs.CL

    PsyAttention: Psychological Attention Model for Personality Detection

    Authors: Baohua Zhang, Yongyi Huang, Wenyao Cui, Hua** Zhang, Jianyun Shang

    Abstract: Work on personality detection has tended to incorporate psychological features from different personality models, such as BigFive and MBTI. There are more than 900 psychological features, each of which is helpful for personality detection. However, when used in combination, the application of different calculation standards among these features may result in interference between features calculate… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  32. arXiv:2311.06968  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Physics-Informed Data Denoising for Real-Life Sensing Systems

    Authors: Xiyuan Zhang, Xiaohan Fu, Diyan Teng, Chengyu Dong, Keerthivasan Vijayakumar, Jiayun Zhang, Ranak Roy Chowdhury, Junsheng Han, Dezhi Hong, Rashmi Kulkarni, **gbo Shang, Rajesh Gupta

    Abstract: Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically r… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: SenSys 2023

  33. arXiv:2311.03319  [pdf, other

    cs.CL cs.AI

    DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

    Authors: Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin wang, Xueqi Wang, William Hogan, **gbo Shang

    Abstract: In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks. However, ICL requires high-quality annotated demonstrations which might not be available in real-world scenarios. To overcome this limitation, we propose \textbf{D}ata \textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning (\textbf{DAIL}). DAIL leverages the intui… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Course project for DSC 253 (Advanced Data-Driven Text Mining) at UCSD

  34. arXiv:2311.02861  [pdf, other

    cs.CL

    Less than One-shot: Named Entity Recognition via Extremely Weak Supervision

    Authors: Letian Peng, Zihan Wang, **gbo Shang

    Abstract: We study the named entity recognition (NER) problem under the extremely weak supervision (XWS) setting, where only one example entity per type is given in a context-free way. While one can see that XWS is lighter than one-shot in terms of the amount of supervision, we propose a novel method X-NER that can outperform the state-of-the-art one-shot NER methods. We first mine entity spans that are sim… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: Accepted to Findings of EMNLP 2023

  35. arXiv:2311.01751  [pdf, other

    cs.CL

    EmojiLM: Modeling the New Emoji Language

    Authors: Letian Peng, Zilong Wang, Hang Liu, Zihan Wang, **gbo Shang

    Abstract: With the rapid development of the internet, online social media welcomes people with different backgrounds through its diverse content. The increasing usage of emoji becomes a noticeable trend thanks to emoji's rich information beyond cultural or linguistic borders. However, the current study on emojis is limited to single emoji prediction and there are limited data resources available for further… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  36. arXiv:2310.17389  [pdf, other

    cs.CL cs.AI

    ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

    Authors: Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, **gbo Shang

    Abstract: Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored.… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Journal ref: EMNLP findings 2023

  37. arXiv:2310.12342  [pdf, other

    cs.CL cs.AI

    Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking

    Authors: Yongqi Tong, Yifan Wang, Dawei Li, Sizhe Wang, Zi Lin, Simeng Han, **gbo Shang

    Abstract: Chain-of-Thought(CoT) prompting and its variants explore equip** large language models (LLMs) with high-level reasoning abilities by emulating human-like linear cognition and logic. However, the human mind is complicated and mixed with both linear and nonlinear thinking. In this work, we propose \textbf{I}nferential \textbf{E}xclusion \textbf{P}rompting (IEP), a novel prompting that combines the… ▽ More

    Submitted 14 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  38. arXiv:2310.07347  [pdf, other

    cs.CL cs.AI cs.LG

    Fast-ELECTRA for Efficient Pre-training

    Authors: Chengyu Dong, Liyuan Liu, Hao Cheng, **gbo Shang, Jianfeng Gao, Xiaodong Liu

    Abstract: ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model. Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model. Notably, this model, which is jointly trained with the main model, only serves to assist the training of the main model and is discarded post-t… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  39. arXiv:2310.04815  [pdf, other

    cs.LG

    Critique Ability of Large Language Models

    Authors: Liangchen Luo, Zi Lin, Yinxiao Liu, Lei Shu, Yun Zhu, **gbo Shang, Lei Meng

    Abstract: Critical thinking is essential for rational decision-making and problem-solving. This skill hinges on the ability to provide precise and reasoned critiques and is a hallmark of human intelligence. In the era of large language models (LLMs), this study explores the ability of LLMs to deliver accurate critiques across various tasks. We are interested in this topic as a capable critic model could not… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  40. arXiv:2310.03182  [pdf, other

    cs.CV cs.CL cs.LG

    Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

    Authors: An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, Chengyu Dong, Amilcare Gentili, Chun-Nan Hsu, **gbo Shang, Julian McAuley

    Abstract: Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 18 pages, 12 figures

  41. arXiv:2308.03685  [pdf, other

    cs.CV

    Learning Concise and Descriptive Attributes for Visual Recognition

    Authors: An Yan, Yu Wang, Yiwu Zhong, Chengyu Dong, Zexue He, Yujie Lu, William Wang, **gbo Shang, Julian McAuley

    Abstract: Recent advances in foundation models present new opportunities for interpretable visual recognition -- one can first query Large Language Models (LLMs) to obtain a set of attributes that describe each class, then apply vision-language models to classify images via these attributes. Pioneering work shows that querying thousands of attributes can achieve performance competitive with image features.… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  42. arXiv:2307.12847  [pdf, other

    cs.CY cs.CR

    Securing Bystander Privacy in Mixed Reality While Protecting the User Experience

    Authors: Matthew Corbett, Brendan David-John, Jiacheng Shang, Y. Charlie Hu, Bo Ji

    Abstract: The modern Mixed Reality devices that make the Metaverse viable require vast information about the physical world and can also violate the privacy of unsuspecting or unwilling bystanders in their vicinity. In this article, we provide an introduction to the problem, existing solutions, and avenues for future research.

    Submitted 12 November, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 13 pages, 1 Figure

  43. arXiv:2307.07099  [pdf, other

    cs.CL

    Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation

    Authors: Letian Peng, Yuwei Zhang, **gbo Shang

    Abstract: Prompting large language models (LLMs) for data augmentation has recently become a common practice in few-shot NLP tasks. In this paper, we propose Chain-of-Thought Attribute Manipulation (CoTAM), a novel approach that generates new data from existing examples by only tweaking in the user-provided, task-specific attribute, e.g., sentiment polarity or topic in movie reviews. Instead of conventional… ▽ More

    Submitted 21 May, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

  44. arXiv:2307.01933  [pdf, other

    cs.AI cs.CG cs.CL cs.SC

    Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs

    Authors: Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, **gbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, Wei Wang

    Abstract: Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric repres… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Journal ref: ACL 2023

  45. arXiv:2307.01849  [pdf, other

    cs.RO cs.CV cs.LG

    Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

    Authors: Xiang Li, Varun Belagali, **ghuan Shang, Michael S. Ryoo

    Abstract: Sequence modeling approaches have shown promising results in robot imitation learning. Recently, diffusion models have been adopted for behavioral cloning in a sequence modeling fashion, benefiting from their exceptional capabilities in modeling complex data distributions. The standard diffusion-based policy iteratively generates action sequences from random noise conditioned on the input states.… ▽ More

    Submitted 11 January, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: 15 pages, 13 figures. Code, pretrained checkpoints, and datasets are available at https://github.com/LostXine/crossway_diffusion Video demo is at https://youtu.be/9deKHueZBuk

  46. arXiv:2306.01016  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

    Authors: Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, **gbo Shang, Carl Yang, Xian Li

    Abstract: Information extraction, e.g., attribute value extraction, has been extensively studied and formulated based only on text. However, many attributes can benefit from image-based extraction, like color, shape, pattern, among others. The visual modality has long been underutilized, mainly due to multimodal annotation difficulty. In this paper, we aim to patch the visual modality to the textual-establi… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ACL 2023 Findings

  47. arXiv:2306.00975  [pdf, other

    cs.LG cs.CV cs.RO

    Active Vision Reinforcement Learning under Limited Visual Observability

    Authors: **ghuan Shang, Michael S. Ryoo

    Abstract: In this work, we investigate Active Vision Reinforcement Learning (ActiveVision-RL), where an embodied agent simultaneously learns action policy for the task while also controlling its visual observations in partially observable environments. We denote the former as motor policy and the latter as sensory policy. For example, humans solve real world tasks by hand manipulation (motor policy) togethe… ▽ More

    Submitted 5 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023. Project page at https://elicassion.github.io/sugarl/sugarl.html Code at https://github.com/elicassion/sugarl Environment library at https://github.com/elicassion/active-gym

  48. arXiv:2305.18350  [pdf, other

    cs.LG cs.CL cs.IR

    Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach

    Authors: Liyan Xu, Chenwei Zhang, Xian Li, **gbo Shang, **ho D. Choi

    Abstract: We present a new task setting for attribute mining on e-commerce products, serving as a practical solution to extract open-world attributes without extensive human intervention. Our supervision comes from a high-quality seed attribute set bootstrapped from existing resources, and we aim to expand the attribute vocabulary of existing seed types, and also to discover any new attribute types automati… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  49. arXiv:2305.14871  [pdf, other

    cs.CL

    ClusterLLM: Large Language Models as a Guide for Text Clustering

    Authors: Yuwei Zhang, Zihan Wang, **gbo Shang

    Abstract: We introduce ClusterLLM, a novel text clustering framework that leverages feedback from an instruction-tuned large language model, such as ChatGPT. Compared with traditional unsupervised methods that builds upon "small" embedders, ClusterLLM exhibits two intriguing advantages: (1) it enjoys the emergent capability of LLM even if its embeddings are inaccessible; and (2) it understands the user's pr… ▽ More

    Submitted 3 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP2023(main)

  50. arXiv:2305.14828  [pdf, other

    cs.CL cs.CV

    Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image Manipulation

    Authors: Prashant Krishnan, Zilong Wang, Yangkun Wang, **gbo Shang

    Abstract: Recent advances of incorporating layout information, typically bounding box coordinates, into pre-trained language models have achieved significant performance in entity recognition from document images. Using coordinates can easily model the absolute position of each token, but they might be sensitive to manipulations in document images (e.g., shifting, rotation or scaling), especially when the t… ▽ More

    Submitted 23 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.