Skip to main content

Showing 1–50 of 188 results for author: Ta, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00218  [pdf, other

    eess.SY cs.RO

    Resilient Estimator-based Control Barrier Functions for Dynamical Systems with Disturbances and Noise

    Authors: Chuyuan Tao, Wenbin Wan, Junjie Gao, Bihao Mo, Hunmin Kim, Naira Hovakimyan

    Abstract: Control Barrier Function (CBF) is an emerging method that guarantees safety in path planning problems by generating a control command to ensure the forward invariance of a safety set. Most of the developments up to date assume availability of correct state measurements and absence of disturbances on the system. However, if the system incurs disturbances and is subject to noise, the CBF cannot guar… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2406.18049  [pdf

    cs.CL cs.AI

    Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources

    Authors: Yiming Li, Deepthi Viswaroopan, William He, Jianfu Li, Xu Zuo, Hua Xu, Cui Tao

    Abstract: Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual infor… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.13035  [pdf, other

    cs.CL

    D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

    Authors: Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, **g Xiong, Mi Zhang

    Abstract: Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discrimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.11278  [pdf, other

    cs.CL

    Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs

    Authors: Duygu Nur Yaldiz, Yavuz Faruk Bakman, Baturalp Buyukates, Chenyang Tao, Anil Ramakrishna, Dimitrios Dimitriadis, Salman Avestimehr

    Abstract: In this work, we introduce the Learnable Response Scoring Function (LARS) for Uncertainty Estimation (UE) in generative Large Language Models (LLMs). Current scoring functions for probability-based UE, such as length-normalized scoring and semantic contribution-based weighting, are designed to solve specific aspects of the problem but exhibit limitations, including the inability to handle biased p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.04342  [pdf, other

    cs.CV

    Learning 1D Causal Visual Representation with De-focus Attention Networks

    Authors: Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai

    Abstract: Modality differences have led to the development of heterogeneous architectures for vision and language models. While images typically require 2D non-causal modeling, texts utilize 1D causal modeling. This distinction poses significant challenges in constructing unified multi-modal models. This paper explores the feasibility of representing images using 1D causal modeling. We identify an "over-foc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  6. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, **gyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  7. arXiv:2404.17513  [pdf, other

    cs.CL cs.AI

    A Comprehensive Evaluation on Event Reasoning of Large Language Models

    Authors: Zhengwei Tao, Zhi **, Yifan Zhang, Xiancai Chen, Xiaoying Bai, Yue Fang, Haiyan Zhao, Jia Li, Chongyang Tao

    Abstract: Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abil… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  8. arXiv:2404.10429  [pdf, other

    cs.AI

    MEEL: Multi-Modal Event Evolution Learning

    Authors: Zhengwei Tao, Zhi **, Junqiang Huang, Xiancai Chen, Xiaoying Bai, Haiyan Zhao, Yifan Zhang, Chongyang Tao

    Abstract: Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  9. arXiv:2404.07748  [pdf, other

    cs.CV cs.LG

    3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

    Authors: Xuanming Cao, Chengyu Tao, Juan Du

    Abstract: The surface quality inspection of manufacturing parts based on 3D point cloud data has attracted increasing attention in recent years. The reason is that the 3D point cloud can capture the entire surface of manufacturing parts, unlike the previous practices that focus on some key product characteristics. However, achieving accurate 3D anomaly detection is challenging, due to the complex surfaces o… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  10. arXiv:2404.05415  [pdf

    cs.CL cs.AI

    Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations

    Authors: Yiming Li, Xueqing Peng, Jianfu Li, Xu Zuo, Suyuan Peng, Donghong Pei, Cui Tao, Hua Xu, Na Hong

    Abstract: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPT) present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to compare the performance of GPT with t… ▽ More

    Submitted 14 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  11. arXiv:2404.02657  [pdf, other

    cs.CL cs.AI

    Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

    Authors: Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

    Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking prope… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: working on progress

  12. Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments

    Authors: Hieu Nguyen, Cong-Hoang Ta, Phuong-Thuy Le-Nguyen, Minh-Triet Tran, Trung-Nghia Le

    Abstract: This paper presents a simple yet efficient ensemble learning framework for Vietnamese scene text spotting. Leveraging the power of ensemble learning, which combines multiple models to yield more accurate predictions, our approach aims to significantly enhance the performance of scene text spotting in challenging urban settings. Through experimental evaluations on the VinText dataset, our proposed… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: RIVF 2023

    Journal ref: In 2023 RIVF International Conference on Computing and Communication Technologies (RIVF) (pp. 177-182). IEEE

  13. arXiv:2404.00133  [pdf, other

    cs.RO

    An Optimization-Based Planner with B-spline Parameterized Continuous-Time Reference Signals

    Authors: Chuyuan Tao, Sheng Cheng, Yang Zhao, Fanxin Wang, Naira Hovakimyan

    Abstract: For the cascaded planning and control modules implemented for robot navigation, the frequency gap between the planner and controller has received limited attention. In this study, we introduce a novel B-spline parameterized optimization-based planner (BSPOP) designed to address the frequency gap challenge with limited onboard computational power in robots. The proposed planner generates continuous… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  14. arXiv:2403.19432  [pdf, other

    cs.CL cs.AI

    Uncovering Misattributed Suicide Causes through Annotation Inconsistency Detection in Death Investigation Notes

    Authors: Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng

    Abstract: Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causes of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-cause attributions. We present an empirical Natural Language Processing (NLP) approa… ▽ More

    Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 19 pages, 6 figures

  15. arXiv:2403.18593  [pdf, other

    cs.CV cs.AI

    Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding

    Authors: Run Shao, Zhaoyang Zhang, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li

    Abstract: The tokenizer, as one of the fundamental components of large models, has long been overlooked or even misunderstood in visual tasks. One key factor of the great comprehension power of the large language model is that natural language tokenizers utilize meaningful words or subwords as the basic elements of language. In contrast, mainstream visual tokenizers, represented by patch-based methods such… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 20 pages, 8 figures, 6 tables

  16. arXiv:2403.10758  [pdf

    cs.CL

    Rules still work for Open Information Extraction

    Authors: Jialin Hua, Liangqing Luo, Weiying **, Yan Liao, Chunhai Tao, Xuewen Lub

    Abstract: Open information extraction (OIE) aims to extract surface relations and their corresponding arguments from natural language text, irrespective of domain. This paper presents an innovative OIE model, APRCOIE, tailored for Chinese text. Diverging from previous models, our model generates extraction patterns autonomously. The model defines a new pattern form for Chinese OIE and proposes an automated… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  17. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  18. arXiv:2403.04945  [pdf, other

    cs.CL cs.LG eess.SP

    MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

    Authors: Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Zhenwu Peng, Jie Fu, Rossella Arcucci, Huaxiu Yao, Mi Zhang

    Abstract: Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Under review

  19. arXiv:2403.01169  [pdf, other

    cs.CV

    Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

    Authors: Chenchen Tao, Chong Wang, Yuexian Zou, Xiaohao Peng, Jiafei Wu, Jiangbo Qian

    Abstract: Most models for weakly supervised video anomaly detection (WS-VAD) rely on multiple instance learning, aiming to distinguish normal and abnormal snippets without specifying the type of anomaly. The ambiguous nature of anomaly definitions across contexts introduces bias in detecting abnormal and normal snippets within the abnormal bag. Taking the first step to show the model why it is anomalous, a… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  20. arXiv:2402.18458  [pdf, other

    cs.CL

    Meta-Task Prompting Elicits Embedding from Large Language Models

    Authors: Yibin Lei, Di Wu, Tianyi Zhou, Tao Shen, Yu Cao, Chongyang Tao, Andrew Yates

    Abstract: In this work, we introduce a new unsupervised embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning or task-specific engineering. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  21. arXiv:2402.16242  [pdf, other

    cs.CV cs.AI

    HSONet:A Siamese foreground association-driven hard case sample optimization network for high-resolution remote sensing image change detection

    Authors: Chao Tao, Dongsheng Kuang, Zhenyang Huang, Chengli Peng, Haifeng Li

    Abstract: In the later training stages, further improvement of the models ability to determine changes relies on how well the change detection (CD) model learns hard cases; however, there are two additional challenges to learning hard case samples: (1) change labels are limited and tend to pointer only to foreground targets, yet hard case samples are prevalent in the background, which leads to optimizing th… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 17 figures, 8 tables, 18 pages

  22. arXiv:2402.16117  [pdf, other

    cs.RO cs.AI cs.CV

    RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

    Authors: Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, ** Luo

    Abstract: Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  23. arXiv:2402.13116  [pdf, other

    cs.CL

    A Survey on Knowledge Distillation of Large Language Models

    Authors: Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, **yang Li, Can Xu, Dacheng Tao, Tianyi Zhou

    Abstract: In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employi… ▽ More

    Submitted 8 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 44 pages

  24. arXiv:2402.11756  [pdf, other

    cs.CL cs.LG

    MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

    Authors: Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr

    Abstract: Generative Large Language Models (LLMs) are widely utilized for their excellence in various tasks. However, their tendency to produce inaccurate or misleading outputs poses a potential risk, particularly in high-stakes environments. Therefore, estimating the correctness of generative LLM outputs is an important task for enhanced reliability. Uncertainty Estimation (UE) in generative LLMs is an evo… ▽ More

    Submitted 8 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  25. arXiv:2401.17515  [pdf, other

    cs.CV

    Towards Image Semantics and Syntax Sequence Learning

    Authors: Chun Tao, Timur Ibrayev, Kaushik Roy

    Abstract: Convolutional neural networks and vision transformers have achieved outstanding performance in machine perception, particularly for image classification. Although these image classifiers excel at predicting image-level class labels, they may not discriminate missing or shifted parts within an object. As a result, they may fail to detect corrupted images that involve missing or disarrayed semantic… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 21 pages, 22 figures, 5 tables

  26. arXiv:2401.07103  [pdf, other

    cs.CL

    Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

    Authors: Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, Shuai Ma

    Abstract: In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent tax… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

    Comments: 21 pages, 5 figures

  27. Improving Graph Convolutional Networks with Transformer Layer in social-based items recommendation

    Authors: Thi Linh Hoang, Tuan Dung Pham, Viet Cuong Ta

    Abstract: In this work, we have proposed an approach for improving the GCN for predicting ratings in social networks. Our model is expanded from the standard model with several layers of transformer architecture. The main focus of the paper is on the encoder architecture for node embedding in the network. Using the embedding layer from the graph-based convolution layer, the attention mechanism could rearran… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  28. arXiv:2312.16221  [pdf, other

    cs.CV

    STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation

    Authors: Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Dripta S. Raychaudhuri, Hannah Dela Cruz, M. Salman Asif, Amit K. Roy-Chowdhury

    Abstract: The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal co… ▽ More

    Submitted 13 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  29. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  30. arXiv:2312.01598  [pdf, other

    cs.CV

    Good Questions Help Zero-Shot Image Reasoning

    Authors: Kaiwen Yang, Tao Shen, Xinmei Tian, Xiubo Geng, Chongyang Tao, Dacheng Tao, Tianyi Zhou

    Abstract: Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To add… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  31. arXiv:2311.08734  [pdf, other

    cs.CL

    Thread of Thought Unraveling Chaotic Contexts

    Authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Chongyang Tao, Guodong Long, Jian-Guang Lou, Jianbing Shen

    Abstract: Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In r… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 5 tables

  32. arXiv:2311.05077  [pdf, other

    cs.CV

    POISE: Pose Guided Human Silhouette Extraction under Occlusions

    Authors: Arindam Dutta, Rohit Lal, Dripta S. Raychaudhuri, Calvin Khang Ta, Amit K. Roy-Chowdhury

    Abstract: Human silhouette extraction is a fundamental task in computer vision with applications in various downstream tasks. However, occlusions pose a significant challenge, leading to incomplete and distorted silhouettes. To address this challenge, we introduce POISE: Pose Guided Human Silhouette Extraction under Occlusions, a novel self-supervised fusion framework that enhances accuracy and robustness i… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Journal ref: Winter Conference on Applications of Computer Vision, 2024

  33. arXiv:2310.10219  [pdf, other

    cs.CV cs.AI

    Using Global Land Cover Product as Prompt for Cropland Map** via Visual Foundation Model

    Authors: Chao Tao, Aoran Hu, Rong Xiao, Haifeng Li, Yuze Wang

    Abstract: Data-driven deep learning methods have shown great potential in cropland map**. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to oth… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  34. arXiv:2310.09748  [pdf, other

    cs.SE cs.CL

    Large Language Model-Aware In-Context Learning for Code Generation

    Authors: Jia Li, Ge Li, Chongyang Tao, Jia Li, Huangzhao Zhang, Fang Liu, Zhi **

    Abstract: Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. LLMs take a prompt consisting of requirement-code examples and a new requirement as input, and output new programs. Existing studies have found that ICL is highly dominated by the examples and thus arises research on example selection. However, existing approaches randomly select examples or on… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  35. AE-GPT: Using Large Language Models to Extract Adverse Events from Surveillance Reports-A Use Case with Influenza Vaccine Adverse Events

    Authors: Yiming Li, Jianfu Li, Jian** He, Cui Tao

    Abstract: Though Vaccines are instrumental in global health, mitigating infectious diseases and pandemic outbreaks, they can occasionally lead to adverse events (AEs). Recently, Large Language Models (LLMs) have shown promise in effectively identifying and cataloging AEs within clinical reports. Utilizing data from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016, this study particularly… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  36. Distill Knowledge in Multi-task Reinforcement Learning with Optimal-Transport Regularization

    Authors: Bang Giang Le, Viet Cuong Ta

    Abstract: In multi-task reinforcement learning, it is possible to improve the data efficiency of training agents by transferring knowledge from other different but related tasks. Because the experiences from different tasks are usually biased toward the specific task goals. Traditional methods rely on Kullback-Leibler regularization to stabilize the transfer of knowledge from one task to the others. In this… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: 6 pages,

    Journal ref: 2022 14th International Conference on Knowledge and Systems Engineering (KSE), Nha Trang, Vietnam, 2022, pp. 1-6,

  37. arXiv:2309.11157  [pdf, other

    cs.CV

    Learning Deformable 3D Graph Similarity to Track Plant Cells in Unregistered Time Lapse Images

    Authors: Md Shazid Islam, Arindam Dutta, Calvin-Khang Ta, Kevin Rodriguez, Christian Michael, Mark Alber, G. Venugopala Reddy, Amit K. Roy-Chowdhury

    Abstract: Tracking of plant cells in images obtained by microscope is a challenging problem due to biological phenomena such as large number of cells, non-uniform growth of different layers of the tightly packed plant cells and cell division. Moreover, images in deeper layers of the tissue being noisy and unavoidable systemic errors inherent in the imaging process further complicates the problem. In this pa… ▽ More

    Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  38. Self-explainable Graph Neural Network for Alzheimer's Disease And Related Dementias Risk Prediction

    Authors: Xinyue Hu, Zenan Sun, Yi Nian, Yichen Wang, Yifang Dang, Fang Li, **gna Feng, Evan Yu, Cui Tao

    Abstract: Background: Alzheimer's disease and related dementias (ADRD) ranks as the sixth leading cause of death in the US, underlining the importance of accurate ADRD risk prediction. While recent advancement in ADRD risk prediction have primarily relied on imaging analysis, yet not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additio… ▽ More

    Submitted 10 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

  39. arXiv:2309.06275  [pdf, other

    cs.CL

    Re-Reading Improves Reasoning in Large Language Models

    Authors: Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou

    Abstract: To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing… ▽ More

    Submitted 29 February, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 25 pages

  40. arXiv:2308.13954  [pdf, other

    cs.CV

    Prior-guided Source-free Domain Adaptation for Human Pose Estimation

    Authors: Dripta S. Raychaudhuri, Calvin-Khang Ta, Arindam Dutta, Rohit Lal, Amit K. Roy-Chowdhury

    Abstract: Domain adaptation methods for 2D human pose estimation typically require continuous access to the source data during adaptation, which can be challenging due to privacy, memory, or computational constraints. To address this limitation, we focus on the task of source-free domain adaptation for pose estimation, where a source model must adapt to a new target domain using only unlabeled target data.… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  41. arXiv:2308.13754  [pdf, other

    cs.SE cs.CL cs.IR

    ZC3: Zero-Shot Cross-Language Code Clone Detection

    Authors: Jia Li, Chongyang Tao, Zhi **, Fang Liu, Jia Li, Ge Li

    Abstract: Developers introduce code clones to improve programming productivity. Many existing studies have achieved impressive performance in monolingual code clone detection. However, during software development, more and more developers write semantically equivalent programs with different languages to support different platforms and help developers translate projects from one language to another. Conside… ▽ More

    Submitted 7 September, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted by the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

  42. arXiv:2308.11651  [pdf, other

    eess.SP cs.AI cs.LG

    Distributionally Robust Cross Subject EEG Decoding

    Authors: Tiehang Duan, Zhenyi Wang, Gianfranco Doretto, Fang Li, Cui Tao, Donald Adjeroh

    Abstract: Recently, deep learning has shown to be effective for Electroencephalography (EEG) decoding tasks. Yet, its performance can be negatively influenced by two key factors: 1) the high variance and different types of corruption that are inherent in the signal, 2) the EEG datasets are usually relatively small given the acquisition cost, annotation cost and amount of effort needed. Data augmentation app… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ECAI 2023

  43. arXiv:2308.09583  [pdf, other

    cs.CL cs.AI cs.LG

    WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

    Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang

    Abstract: Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: LLM, Mathematical Reasoning

  44. arXiv:2307.15411  [pdf, other

    cs.CL

    Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

    Authors: Xindi Wang, Yufei Wang, Can Xu, Xiubo Geng, Bowen Zhang, Chongyang Tao, Frank Rudzicz, Robert E. Mercer, Daxin Jiang

    Abstract: Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, there has been little understanding of how ICL learns the knowledge from the given prompts. In this paper, to make progress toward understanding the learning behavio… ▽ More

    Submitted 1 August, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: accepted to ECAI 2023 (camera-ready)

  45. arXiv:2306.15868  [pdf, other

    cs.LG cs.CV eess.IV

    GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation

    Authors: Zhaoyang Zhang, Zhen Ren, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li

    Abstract: Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations whe… ▽ More

    Submitted 27 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: 14 pages, 10 figures, 4 tables

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing 2023

  46. arXiv:2306.14262  [pdf, other

    cs.CV

    A Spectral Perspective towards Understanding and Improving Adversarial Robustness

    Authors: Binxiao Huang, Rui Lin, Chaofan Tao, Ngai Wong

    Abstract: Deep neural networks (DNNs) are incredibly vulnerable to crafted, imperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the AT mechanism for robustness improvement is not fully understood. This work investigates AT from a spectral perspective, adding new insights to the design of effective defenses. In particular, we show that AT i… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

  47. arXiv:2306.11825  [pdf, other

    cs.CL

    DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

    Authors: Sidi Lu, Wenbo Zhao, Chenyang Tao, Arpit Gupta, Shanchan Wu, Tagyoung Chung, Nanyun Peng

    Abstract: NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite the success, several challenges arise when apply NADO to a wide range of scenarios. Vanilla NADO suffers… ▽ More

    Submitted 6 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2024 (Poster). Work was done during an Amazon Internship Program

  48. Advancing Biomedicine with Graph Representation Learning: Recent Progress, Challenges, and Future Directions

    Authors: Fang Li, Yi Nian, Zenan Sun, Cui Tao

    Abstract: Graph representation learning (GRL) has emerged as a pivotal field that has contributed significantly to breakthroughs in various fields, including biomedicine. The objective of this survey is to review the latest advancements in GRL methods and their applications in the biomedical field. We also highlight key challenges currently faced by GRL and outline potential directions for future research.

    Submitted 20 June, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: Accepted by 2023 IMIA Yearbook of Medical Informatics

    Journal ref: Yearb Med Inform . 2023 Aug;32(1):215-224. Epub 2023 Dec 26

  49. arXiv:2306.08568  [pdf, other

    cs.CL cs.AI

    WizardCoder: Empowering Code Large Language Models with Evol-Instruct

    Authors: Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, **g Ma, Qingwei Lin, Daxin Jiang

    Abstract: Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Large Language model, Code Generation, Code LLMs

  50. arXiv:2306.05423  [pdf, other

    cs.CV

    ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

    Authors: Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu

    Abstract: Image recognition and generation have long been developed independently of each other. With the recent trend towards general-purpose representation learning, the development of general representations for both recognition and generation tasks is also promoted. However, preliminary attempts mainly focus on generation performance, but are still inferior on recognition tasks. These methods are modele… ▽ More

    Submitted 2 April, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted by ICLR2024