Skip to main content

Showing 1–50 of 89 results for author: Tao, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00118  [pdf, other

    cs.LG cs.AI

    From Efficient Multimodal Models to World Models: A Survey

    Authors: Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang

    Abstract: Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential in achieving artificial general intelligence and as a pathway to world models. We provide an overview of… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  2. arXiv:2406.19973  [pdf, other

    cs.CV cs.LG

    STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical

    Authors: Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, Zhiqiang Tao

    Abstract: Large Vision-Language Models (LVLMs) have shown significant potential in assisting medical diagnosis by leveraging extensive biomedical datasets. However, the advancement of medical image understanding and reasoning critically depends on building high-quality visual instruction data, which is costly and labor-intensive to obtain, particularly in the medical domain. To mitigate this data-starving i… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures

  3. arXiv:2406.17974  [pdf, other

    cs.CL cs.CV

    Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

    Authors: Xuyang Wu, Yuan Wang, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

    Abstract: Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate \emph{visual fairness} in several mainstream… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16473  [pdf, other

    cs.CV cs.AI

    Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

    Abstract: The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional map**s of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16459  [pdf, other

    cs.CV

    Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

    Authors: Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang

    Abstract: The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.09056  [pdf, other

    cs.CL cs.AI

    CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts

    Authors: Zhen Tao, Zhiyu Li, Dinghao Xi, Wei Xu

    Abstract: The proliferation of large language models (LLMs) has significantly enhanced text generation capabilities across various industries. However, these models' ability to generate human-like text poses substantial challenges in discerning between human and AI authorship. Despite the effectiveness of existing AI-generated text detectors, their development is hindered by the lack of comprehensive, publi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 32 pages

  7. arXiv:2406.06792  [pdf, other

    cs.LG cs.AI

    Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness

    Authors: Dingrong Wang, Hitesh Sapkota, Zhiqiang Tao, Qi Yu

    Abstract: Prior neural architecture search (NAS) for adversarial robustness works have discovered that a lightweight and adversarially robust neural network architecture could exist in a non-robust large teacher network, generally disclosed by heuristic rules through statistical analysis and neural architecture search, generally disclosed by heuristic rules from neural architecture search. However, heuristi… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 17 pages

  8. arXiv:2406.01559  [pdf, other

    cs.CV

    Prototypical Transformer as Unified Motion Learners

    Authors: Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

    Abstract: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototy** discovers prototypes based on signature moti… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 21 pages, 10 figures

  9. arXiv:2405.18769  [pdf, other

    cs.CV

    OUS: Scene-Guided Dynamic Facial Expression Recognition

    Authors: Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, **g Liu, Jiawen Yu, Xuan Tong, Yating Li, Wenqiang Zhang

    Abstract: Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered ou… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures, 6 tables

    ACM Class: I.4; I.5.1

  10. arXiv:2405.11265  [pdf, other

    cs.CL cs.AI

    EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models

    Authors: Yu Huang, Liang Guo, Wanqian Guo, Zhe Tao, Yang Lv, Zhihao Sun, Dongfang Zhao

    Abstract: In the field of environmental science, it is crucial to have robust evaluation metrics for large language models to ensure their efficacy and accuracy. We propose EnviroExam, a comprehensive evaluation method designed to assess the knowledge of large language models in the field of environmental science. EnviroExam is based on the curricula of top international universities, covering undergraduate… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  11. arXiv:2405.09045  [pdf, other

    cs.CV

    AMSNet: Netlist Dataset for AMS Circuits

    Authors: Zhuofu Tao, Yichen Shi, Yiru Huo, Rui Ye, Zonghang Li, Li Huang, Chen Wu, Na Bai, Zhi** Yu, Ting-Jung Lin, Lei He

    Abstract: Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention. The advent of multimodal large language models (MLLMs) has unveiled significant potential across various fields, suggesting their applicability in streamlining large-scale AMS IC design as well. A bottleneck in employing MLLMs for automatic AMS circuit generation is the absence of a comprehens… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  12. arXiv:2405.00711  [pdf, other

    cs.CL cs.AI cs.CY

    Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities

    Authors: Xiaomin Yu, Yezhaohui Wang, Yanfang Chen, Zhen Tao, Dinghao Xi, Shichao Song, Simin Niu, Zhiyu Li

    Abstract: In recent years, generative artificial intelligence models, represented by Large Language Models (LLMs) and Diffusion Models (DMs), have revolutionized content production methods. These artificial intelligence-generated content (AIGC) have become deeply embedded in various aspects of daily life and work. However, these technologies have also led to the emergence of Fake Artificial Intelligence Gen… ▽ More

    Submitted 3 May, 2024; v1 submitted 25 April, 2024; originally announced May 2024.

  13. arXiv:2404.17513  [pdf, other

    cs.CL cs.AI

    A Comprehensive Evaluation on Event Reasoning of Large Language Models

    Authors: Zhengwei Tao, Zhi **, Yifan Zhang, Xiancai Chen, Xiaoying Bai, Yue Fang, Haiyan Zhao, Jia Li, Chongyang Tao

    Abstract: Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abil… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  14. arXiv:2404.14387  [pdf, other

    cs.CL cs.AI

    A Survey on Self-Evolution of Large Language Models

    Authors: Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi **, Fei Huang, Dacheng Tao, **gren Zhou

    Abstract: Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. However, current LLMs that learn from human or external model supervision are costly and may face performance ceilings as task complexity and diversity increase. To address this issue, self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences ge… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.11978  [pdf, other

    cs.CL

    EVIT: Event-Oriented Instruction Tuning for Event Reasoning

    Authors: Zhengwei Tao, Xiancai Chen, Zhi **, Xiaoying Bai, Haiyan Zhao, Yiwei Lou

    Abstract: Events refer to specific occurrences, incidents, or happenings that take place under a particular background. Event reasoning aims to infer events according to certain relations and predict future events. The cutting-edge techniques for event reasoning play a crucial role in various natural language processing applications. Large language models (LLMs) have made significant advancements in event r… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  16. arXiv:2404.10429  [pdf, other

    cs.AI

    MEEL: Multi-Modal Event Evolution Learning

    Authors: Zhengwei Tao, Zhi **, Junqiang Huang, Xiancai Chen, Xiaoying Bai, Haiyan Zhao, Yifan Zhang, Chongyang Tao

    Abstract: Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  17. arXiv:2404.08564  [pdf, ps, other

    cs.LG

    Federated Distillation: A Survey

    Authors: Lin Li, Jian** Gou, Baosheng Yu, Lan Du, Zhang Yiand Dacheng Tao

    Abstract: Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients. Despite its promise, FL encounters challenges such as high communication costs for large-scale models and the necessity for uniform model architectures across all clients and the server. These challenges severely restrict the practical applications of FL. To address these l… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  18. arXiv:2404.07677  [pdf, other

    cs.CL cs.AI

    ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs

    Authors: Lei Sun, Zhengwei Tao, Youdi Li, Hiroshi Arakawa

    Abstract: The integration of Large Language Models (LLMs) and knowledge graphs (KGs) has achieved remarkable success in various natural language processing tasks. However, existing methodologies that integrate LLMs and KGs often navigate the task-solving process solely based on the LLM's analysis of the question, overlooking the rich cognitive potential inherent in the vast knowledge encapsulated in KGs. To… ▽ More

    Submitted 4 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: LLM+KG

  19. arXiv:2404.03192  [pdf, other

    cs.IR cs.CL

    Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

    Authors: Yuan Wang, Xuyang Wu, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

    Abstract: The integration of Large Language Models (LLMs) in information retrieval has raised a critical reevaluation of fairness in the text-ranking models. LLMs, such as GPT models and Llama2, have shown effectiveness in natural language understanding tasks, and prior works (e.g., RankGPT) have also demonstrated that the LLMs exhibit better performance than the traditional ranking models in the ranking ta… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024 Main Conference

  20. arXiv:2403.17998  [pdf, other

    cs.CV

    Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

    Authors: Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao

    Abstract: The increasing prevalence of video clips has sparked growing interest in text-video retrieval. Recent advances focus on establishing a joint embedding space for text and video, relying on consistent embedding representations to compute similarity. However, the text content in existing datasets is generally short and concise, making it hard to fully describe the redundant semantics of a video. Corr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024, code and model are available at https://github.com/Jiamian-Wang/T-MASS-text-video-retrieval

  21. arXiv:2403.16067  [pdf, other

    cs.CV cs.AI

    Robust Diffusion Models for Adversarial Purification

    Authors: Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

    Abstract: Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different f… ▽ More

    Submitted 24 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  22. arXiv:2403.15769  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    FusionINN: Decomposable Image Fusion for Brain Tumor Monitoring

    Authors: Nishant Kumar, Ziyan Tao, Jaikirat Singh, Yang Li, Peiwen Sun, Binghui Zhao, Stefan Gumhold

    Abstract: Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image. However, for clinical experts, solely relying on fused images may be insufficient for making diagnostic decisions, as the fusion mechanism blends features from source images, thereby making it difficult to interpret the underlying tumor pathology. We introduce FusionINN, a novel… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted at IJCAI Workshop 2024. Source code available at https://github.com/nish03/FusionINN

  23. arXiv:2403.11299  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

    Authors: Guohao Sun, Can Qin, Jiamian Wang, Zeyuan Chen, Ran Xu, Zhiqiang Tao

    Abstract: Recent advancements in the vision-language model have shown notable generalization in vision-language tasks after visual instruction tuning. However, bridging the gap between the pre-trained vision encoder and the large language models becomes the whole network's bottleneck. To improve cross-modality alignment, existing works usually consider more visual instruction data covering a broader range o… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  24. arXiv:2403.05808  [pdf, other

    cs.CV eess.IV

    Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

    Authors: Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haorang Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang

    Abstract: Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  25. arXiv:2403.04294  [pdf, other

    cs.CV

    A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP

    Authors: Zeng Tao, Yan Wang, Junxiong Lin, Haoran Wang, Xinji Mai, Jiawen Yu, Xuan Tong, Ziheng Zhou, Shaoqi Yan, Qing Zhao, Liyuan Han, Wenqiang Zhang

    Abstract: The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed in other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perf… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  26. arXiv:2402.15220  [pdf, other

    cs.LG cs.CL

    ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

    Authors: Lu Ye, Ze Tao, Yong Huang, Yang Li

    Abstract: Self-attention is an essential component of large language models(LLMs) but a significant source of inference latency for long sequences. In multi-tenant LLMs serving scenarios, the compute and memory operation cost of self-attention can be optimized by using the probability that multiple LLM requests have shared system prompts in prefixes. In this paper, we introduce ChunkAttention, a prefix-awar… ▽ More

    Submitted 22 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  27. arXiv:2402.14544  [pdf, other

    cs.CR cs.SE

    {A New Hope}: Contextual Privacy Policies for Mobile Applications and An Approach Toward Automated Generation

    Authors: Shidong Pan, Zhen Tao, Thong Hoang, Dawen Zhang, Tianshi Li, Zhenchang Xing, Sherry Xu, Mark Staples, Thierry Rakotoarivelo, David Lo

    Abstract: Privacy policies have emerged as the predominant approach to conveying privacy notices to mobile application users. In an effort to enhance both readability and user engagement, the concept of contextual privacy policies (CPPs) has been proposed by researchers. The aim of CPPs is to fragment privacy policies into concise snippets, displaying them only within the corresponding contexts within the a… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: USENIX Security 2024. arXiv admin note: text overlap with arXiv:2307.01691

  28. arXiv:2402.09782  [pdf, other

    cs.LG cs.AI

    MC-DBN: A Deep Belief Network-Based Model for Modality Completion

    Authors: Zihong Luo, Zheng Tao, Yuxuan Huang, Kexin He, Chengzhi Liu

    Abstract: Recent advancements in multi-modal artificial intelligence (AI) have revolutionized the fields of stock market forecasting and heart rate monitoring. Utilizing diverse data sources can substantially improve prediction accuracy. Nonetheless, additional data may not always align with the original dataset. Interpolation methods are commonly utilized for handling missing values in modal data, though t… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Journal ref: International Conference on Computer Supported Cooperative Work in Design 2024

  29. arXiv:2402.05423  [pdf, other

    cs.CV

    MTSA-SNN: A Multi-modal Time Series Analysis Model Based on Spiking Neural Network

    Authors: Chengzhi Liu, Zheng Tao, Zihong Luo, Chenghao Liu

    Abstract: Time series analysis and modelling constitute a crucial research area. Traditional artificial neural networks struggle with complex, non-stationary time series data due to high computational complexity, limited ability to capture temporal information, and difficulty in handling event-driven data. To address these challenges, we propose a Multi-modal Time Series Analysis Model Based on Spiking Neur… ▽ More

    Submitted 4 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 6 pages, 6 figures, published to International Conference on Computer Supported Cooperative Work in Design

  30. arXiv:2401.12578  [pdf, other

    cs.CR

    ToDA: Target-oriented Diffusion Attacker against Recommendation System

    Authors: Xiaohao Liu, Zhulin Tao, Ting Jiang, He Chang, Yunshan Ma, Xianglin Huang, Xiang Wang

    Abstract: Recommendation systems (RS) have become indispensable tools for web services to address information overload, thus enhancing user experiences and bolstering platforms' revenues. However, with their increasing ubiquity, security concerns have also emerged. As the public accessibility of RS, they are susceptible to specific malicious attacks where adversaries can manipulate user profiles, leading to… ▽ More

    Submitted 16 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  31. arXiv:2401.07711  [pdf, other

    cs.LG stat.ML

    Efficient Nonparametric Tensor Decomposition for Binary and Count Data

    Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao

    Abstract: In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-l… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: AAAI-24

  32. arXiv:2401.05707  [pdf, other

    cs.CL

    CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer

    Authors: Zhen Tao, Dinghao Xi, Zhiyu Li, Liumin Tang, Wei Xu

    Abstract: Text style transfer is increasingly prominent in online entertainment and social media. However, existing research mainly concentrates on style transfer within individual English sentences, while ignoring the complexity of long Chinese texts, which limits the wider applicability of style transfer in digital media realm. To bridge this gap, we propose a Chinese Article-style Transfer framework (CAT… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 9 pages

  33. arXiv:2401.05695  [pdf, other

    cs.CL

    Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback

    Authors: Chengfeng Dou, Zhi **, Wenpin Jiao, Haiyan Zhao, Yongqiang Zhao, Zhenwei Tao

    Abstract: The use of large language models in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  34. arXiv:2401.04750  [pdf

    cs.CV

    DedustNet: A Frequency-dominated Swin Transformer-based Wavelet Network for Agricultural Dust Removal

    Authors: Shengli Zhang, Zhiyong Tao, Sen Lin

    Abstract: While dust significantly affects the environmental perception of automated agricultural machines, the existing deep learning-based methods for dust removal require further research and improvement in this area to improve the performance and reliability of automated agricultural machines in agriculture. We propose an end-to-end trainable learning network (DedustNet) to solve the real-world agricult… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2401.04550

  35. arXiv:2401.04550  [pdf

    cs.CV

    WaveletFormerNet: A Transformer-based Wavelet Network for Real-world Non-homogeneous and Dense Fog Removal

    Authors: Shengli Zhang, Zhiyong Tao, Sen Lin

    Abstract: Although deep convolutional neural networks have achieved remarkable success in removing synthetic fog, it is essential to be able to process images taken in complex foggy conditions, such as dense or non-homogeneous fog, in the real world. However, the haze distribution in the real world is complex, and downsampling can lead to color distortion or loss of detail in the output results as the resol… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  36. arXiv:2311.17451  [pdf, other

    cs.NI cs.LG

    Wireless Network Digital Twin for 6G: Generative AI as A Key Enabler

    Authors: Zhenyu Tao, Wei Xu, Yongming Huang, Xiaoyun Wang, Xiaohu You

    Abstract: Digital twin, which enables emulation, evaluation, and optimization of physical entities through synchronized digital replicas, has gained increasing attention as a promising technology for intricate wireless networks. For 6G, numerous innovative wireless technologies and network architectures have posed new challenges in establishing wireless network digital twins. To tackle these challenges, art… ▽ More

    Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: This article has been accepted by IEEE Wireless Communication (Mobile AI-Generated Content in 6G Era special issue)

  37. arXiv:2311.08732  [pdf, other

    cs.CL

    Enhancing Emergency Decision-making with Knowledge Graphs and Large Language Models

    Authors: Minze Chen, Zhenxiang Tao, Weitong Tang, Tingxin Qin, Rui Yang, Chunli Zhu

    Abstract: Emergency management urgently requires comprehensive knowledge while having a high possibility to go beyond individuals' cognitive scope. Therefore, artificial intelligence(AI) supported decision-making under that circumstance is of vital importance. Recent emerging large language models (LLM) provide a new direction for enhancing targeted machine intelligence. However, the utilization of LLM dire… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 26 pages, 6 figures

  38. arXiv:2310.20357  [pdf, other

    cs.AI cs.MM

    Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model

    Authors: Yongqiang Zhao, Zhenyu Li, Zhi **, Feng Zhang, Haiyan Zhao, Chengfeng Dou, Zhengwei Tao, Xinhai Xu, Donghong Liu

    Abstract: The Multi-Modal Large Language Model (MLLM) refers to an extension of the Large Language Model (LLM) equipped with the capability to receive and infer multi-modal data. Spatial awareness stands as one of the crucial abilities of MLLM, encompassing diverse skills related to understanding spatial relationships among objects and between objects and the scene area. Industries such as autonomous drivin… ▽ More

    Submitted 31 October, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

  39. arXiv:2310.18770  [pdf, other

    cs.IR cs.MM

    Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction

    Authors: Yunshan Ma, Xiaohao Liu, Yinwei Wei, Zhulin Tao, Xiang Wang, Tat-Seng Chua

    Abstract: Automatic bundle construction is a crucial prerequisite step in various bundle-aware online services. Previous approaches are mostly designed to model the bundling strategy of existing bundles. However, it is hard to acquire large-scale well-curated bundle dataset, especially for those platforms that have not offered bundle services before. Even for platforms with mature bundle services, there are… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    ACM Class: H.3.0

    Journal ref: WSDM 2024

  40. arXiv:2310.09299  [pdf, other

    cs.LG eess.SP eess.SY

    Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

    Authors: Zhenyu Tao, Wei Xu, Xiaohu You

    Abstract: The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiv… ▽ More

    Submitted 21 November, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: 13 pages, 8 figures

  41. arXiv:2310.05347  [pdf

    cs.CV

    Infrared Small Target Detection Using Double-Weighted Multi-Granularity Patch Tensor Model With Tensor-Train Decomposition

    Authors: Guiyu Zhang, Qunbo Lv, Zui Tao, Baoyu Zhu, Zheng Tan, Yuan Ma

    Abstract: Infrared small target detection plays an important role in the remote sensing fields. Therefore, many detection algorithms have been proposed, in which the infrared patch-tensor (IPT) model has become a mainstream tool due to its excellent performance. However, most IPT-based methods face great challenges, such as inaccurate measure of the tensor low-rankness and poor robustness to complex scenes,… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  42. arXiv:2309.11378  [pdf, other

    cs.LG cs.AI

    Preconditioned Federated Learning

    Authors: Zeyi Tao, **di Wu, Qun Li

    Abstract: Federated Learning (FL) is a distributed machine learning approach that enables model training in communication efficient and privacy-preserving manner. The standard optimization method in FL is Federated Averaging (FedAvg), which performs multiple local SGD steps between communication rounds. FedAvg has been considered to lack algorithm adaptivity compared to modern first-order adaptive optimizat… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: preprint

  43. arXiv:2307.01691  [pdf, other

    cs.CR cs.SE

    SeePrivacy: Automated Contextual Privacy Policy Generation for Mobile Applications

    Authors: Shidong Pan, Zhen Tao, Thong Hoang, Dawen Zhang, Zhenchang Xing, Xiwei Xu, Mark Staples, David Lo

    Abstract: Privacy policies have become the most critical approach to safeguarding individuals' privacy and digital security. To enhance their presentation and readability, researchers propose the concept of contextual privacy policies (CPPs), aiming to fragment policies into shorter snippets and display them only in corresponding contexts. In this paper, we propose a novel multi-modal framework, namely SeeP… ▽ More

    Submitted 9 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

  44. arXiv:2306.01176  [pdf, other

    cs.CV

    Cooperative Hardware-Prompt Learning for Snapshot Compressive Imaging

    Authors: Jiamian Wang, Zongliang Wu, Yulun Zhang, Xin Yuan, Tao Lin, Zhiqiang Tao

    Abstract: Snapshot compressive imaging emerges as a promising technology for acquiring real-world hyperspectral signals. It uses an optical encoder and compressively produces the 2D measurement, followed by which the 3D hyperspectral data can be retrieved via training a deep reconstruction network. Existing reconstruction models are trained with a single hardware instance, whose performance is vulnerable to… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 11 figures, 4 tables

  45. arXiv:2306.00638  [pdf, other

    stat.ML cs.DC cs.LG

    Byzantine-Robust Clustered Federated Learning

    Authors: Zhixu Tao, Kun Yang, Sanjeev R. Kulkarni

    Abstract: This paper focuses on the problem of adversarial attacks from Byzantine machines in a Federated Learning setting where non-Byzantine machines can be partitioned into disjoint clusters. In this setting, non-Byzantine machines in the same cluster have the same underlying data distribution, and different clusters of non-Byzantine machines have different learning tasks. Byzantine machines can adversar… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  46. arXiv:2305.15268  [pdf, other

    cs.CL cs.AI

    EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models

    Authors: Zhengwei Tao, Zhi **, Xiaoying Bai, Haiyan Zhao, Yanlin Feng, Jia Li, Wenpeng Hu

    Abstract: Events serve as fundamental units of occurrence within various contexts. The processing of event semantics in textual information forms the basis of numerous natural language processing (NLP) applications. Recent studies have begun leveraging large language models (LLMs) to address event semantic processing. However, the extent that LLMs can effectively tackle these challenges remains uncertain. F… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  47. arXiv:2305.11508  [pdf, other

    cs.CL cs.AI

    PlugMed: Improving Specificity in Patient-Centered Medical Dialogue Generation using In-Context Learning

    Authors: Chengfeng Dou, Zhi **, Wen** Jiao, Haiyan Zhao, Zhenwei Tao, Yongqiang Zhao

    Abstract: The patient-centered medical dialogue systems strive to offer diagnostic interpretation services to users who are less knowledgeable about medical knowledge, through emphasizing the importance of providing responses specific to the patients. It is difficult for the large language models (LLMs) to guarantee the specificity of responses in spite of its promising performance even in some tasks in med… ▽ More

    Submitted 18 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023 Findings

    ACM Class: I.2.7

  48. arXiv:2305.05949  [pdf, other

    cs.SE

    Scalable and Precise Application-Centered Call Graph Construction for Python

    Authors: Kaifeng Huang, Yixuan Yan, Bihuan Chen, Zixin Tao, Yulei Sui, Xin Peng

    Abstract: Call graph construction is the foundation of inter-procedural static analysis. PYCG is the state-of-the-art approach for constructing call graphs for Python programs. Unfortunately, PyCG does not scale to large programs when adapted to whole-program analysis where application and dependent libraries are both analyzed. Moreover, PyCG is flow-insensitive and does not fully support Python's features,… ▽ More

    Submitted 25 March, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 13 pages

  49. arXiv:2304.12085  [pdf, other

    physics.soc-ph cs.DB

    Dangoron: Network Construction on Large-scale Time Series Data across Sliding Windows

    Authors: Yunlong Xu, Peizhen Yang, Zhengbin Tao

    Abstract: Complex networks represent system dynamics through the interactions of a set of anomalous time series. Consider the problem of computing correlations for highly correlated pairs of time series across sliding windows. Efficiently computing and updating the correlation matrix for user-defined sliding periods and thresholds enables large-scale time series network dynamics analysis. We introduce Dango… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  50. Architecture-Preserving Provable Repair of Deep Neural Networks

    Authors: Zhe Tao, Stephanie Nawas, Jacqueline Mitchell, Aditya V. Thakur

    Abstract: Deep neural networks (DNNs) are becoming increasingly important components of software, and are considered the state-of-the-art solution for a number of problems, such as image recognition. However, DNNs are far from infallible, and incorrect behavior of DNNs can have disastrous real-world consequences. This paper addresses the problem of architecture-preserving V-polytope provable repair of DNNs.… ▽ More

    Submitted 16 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: Accepted paper at PLDI 2023. Tool is available at https://github.com/95616ARG/APRNN/