-
Emergent Crowd Grou** via Heuristic Self-Organization
Authors:
Xiao-Cheng Liao,
Wei-Neng Chen,
Xiang-Ling Chen,
Yi Mei
Abstract:
Modeling crowds has many important applications in games and computer animation. Inspired by the emergent following effect in real-life crowd scenarios, in this work, we develop a method for implicitly grou** moving agents. We achieve this by analyzing local information around each agent and rotating its preferred velocity accordingly. Each agent could automatically form an implicit group with i…
▽ More
Modeling crowds has many important applications in games and computer animation. Inspired by the emergent following effect in real-life crowd scenarios, in this work, we develop a method for implicitly grou** moving agents. We achieve this by analyzing local information around each agent and rotating its preferred velocity accordingly. Each agent could automatically form an implicit group with its neighboring agents that have similar directions. In contrast to an explicit group, there are no strict boundaries for an implicit group. If an agent's direction deviates from its group as a result of positional changes, it will autonomously exit the group or join another implicitly formed neighboring group. This implicit grou** is autonomously emergent among agents rather than deliberately controlled by the algorithm. The proposed method is compared with many crowd simulation models, and the experimental results indicate that our approach achieves the lowest congestion levels in some classic scenarios. In addition, we demonstrate that adjusting the preferred velocity of agents can actually reduce the dissimilarity between their actual velocity and the original preferred velocity. Our work is available online.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images
Authors:
Yuanyuan Peng,
Aidi Lin,
Meng Wang,
Tian Lin,
Ke Zou,
Yinglin Cheng,
Tingkun Shi,
Xulong Liao,
Lixia Feng,
Zhen Liang,
Xinjian Chen,
Huazhu Fu,
Haoyu Chen
Abstract:
Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE…
▽ More
Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition
Authors:
Xingming Liao,
Nankai Lin,
Haowen Li,
Lianglun Cheng,
Zhuowei Wang,
Chong Chen
Abstract:
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of…
▽ More
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Investigating the translation capabilities of Large Language Models trained on parallel data only
Authors:
Javier GarcĂa Gilabert,
Carlos Escolano,
Aleix Sant Savall,
Francesca De Luca Fornaciari,
Audrey Mash,
Xixian Liao,
Maite Melero
Abstract:
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this…
▽ More
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory
Authors:
Zhou Yang,
Zhaochun Ren,
Chenglong Ye,
Yufeng Wang,
Haizhou Sun,
Chao Chen,
Xiaofei Zhu,
Yunbing Wu,
Xiangwen Liao
Abstract:
In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performanc…
▽ More
In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performance from the perspective of prototype theory and propose a method to address this issue. Specifically, we conduct extensive pilot experiments and find that ICL conforms to the prototype theory on fine-grained emotion recognition. Based on this theory, we uncover the following deficiencies in ICL: (1) It relies on prototypes (example-label pairs) that are semantically similar but emotionally inaccurate to predict emotions. (2) It is prone to interference from irrelevant categories, affecting the accuracy and robustness of the predictions. To address these issues, we propose an Emotion Context Learning method (E-ICL) on fine-grained emotion recognition. E-ICL relies on more emotionally accurate prototypes to predict categories by referring to emotionally similar examples with dynamic labels. Simultaneously, E-ICL employs an exclusionary emotion prediction strategy to avoid interference from irrelevant categories, thereby increasing its accuracy and robustness. Note that the entire process is accomplished with the assistance of a plug-and-play emotion auxiliary model, without additional training. Experiments on the fine-grained emotion datasets EDOS, Empathetic-Dialogues, EmpatheticIntent, and GoEmotions show that E-ICL achieves superior emotion prediction performance. Furthermore, even when the emotion auxiliary model used is lower than 10% of the LLMs, E-ICL can still boost the performance of LLMs by over 4% on multiple datasets.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction
Authors:
Qinhua Jiang,
Xishun Liao,
Yaofa Gong,
Jiaqi Ma
Abstract:
Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict th…
▽ More
Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict the traffic speed and incident likelihood during planned work zone events. The proposed model transforms traffic patterns into 2D space-time images for both model input and output and employs an attention-based multi-context convolutional encoder-decoder architecture to capture the spatial-temporal dependencies between work zone events and traffic variations. Trained and validated on four years of archived work zone traffic data from Maryland, USA, the model demonstrates superior performance over baseline models in predicting traffic speed, incident likelihood, and inferred traffic attributes such as queue length and congestion timings (i.e., start time and duration). Specifically, the proposed model outperforms the baseline models by reducing the prediction error of traffic speed by 5% to 34%, queue length by 11% to 29%, congestion timing by 6% to 17%, and increasing the accuracy of incident predictions by 5% to 7%. Consequently, this model offers substantial promise for enhancing the planning and traffic management of work zones.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
Full-Stack Allreduce on Multi-Rail Networks
Authors:
Enda Yu,
Dezun Dong,
Xiangke Liao
Abstract:
The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale netw…
▽ More
The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale networks by facilitating collaborative data transfer across various networks. To achieve this, we propose the Nezha system, which integrates TCP, in-network computing protocol SHARP, and RDMA-based protocol GLEX. To maximize data transfer rates, Nezha incorporates a load balancing data allocation scheme based on cost feedback and combines exception handling to achieve reliable data transmission. Our experiments on a six-node cluster demonstrate that Nezha significantly enhances allreduce performance by 58\% to 87\% in homogeneous dual-rail configurations and offers considerable acceleration in heterogeneous settings, contingent on the performance variance among networks.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis
Authors:
Xishun Liao,
Brian Yueshuai He,
Qinhua Jiang,
Chenchen Kuai,
Jiaqi Ma
Abstract:
Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying se…
▽ More
Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying semantic interdependency among activities. Moreover, they are also constrained by the data source. These two factors thereby limit their realism and adaptability, respectively. Meanwhile, traditional activity-based models (ABMs) in transportation modeling rely on rigid assumptions and are costly and time-consuming to calibrate, making them difficult to adapt and scale to new regions, especially those regions with limited amount of required conventional travel data. To address these limitations, we develop a novel generative deep learning approach for human mobility modeling and synthesis, using ubiquitous and open-source data. Additionally, the model can be fine-tuned with local data, enabling adaptable and accurate representations of mobility patterns across different regions. The model is evaluated on a nationwide dataset of the United States, where it demonstrates superior performance in generating activity chains that closely follow ground truth distributions. Further tests using state- or city-specific datasets from California, Washington, and Mexico City confirm its transferability. This innovative approach offers substantial potential to advance mobility modeling research, especially in generating human activity chains as input for downstream activity-based mobility simulation models and providing enhanced tools for urban planners and policymakers.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Semantic Trajectory Data Mining with LLM-Informed POI Classification
Authors:
Yifan Liu,
Chenchen Kuai,
Haoxuan Ma,
Xishun Liao,
Brian Yueshuai He,
Jiaqi Ma
Abstract:
Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly en…
▽ More
Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories
Authors:
Zongmei Chen,
Xin Liao,
Xiaoshuai Wu,
Yanxiang Chen
Abstract:
The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable f…
▽ More
The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable for real-world scenarios. In this paper, we propose a deepfake video detection method based on 3D spatiotemporal trajectories. Specifically, we utilize a robust 3D model to construct spatiotemporal motion features, integrating feature details from both 2D and 3D frames to mitigate the influence of large head rotation angles or insufficient lighting within frames. Furthermore, we separate facial expressions from head movements and design a sequential analysis method based on phase space motion trajectories to explore the feature differences between genuine and fake faces in deepfake videos. We conduct extensive experiments to validate the performance of our proposed method on several compressed deepfake benchmarks. The robustness of the well-designed features is verified by calculating the consistent distribution of facial landmarks before and after video compression.Our method yields satisfactory results and showcases its potential for practical applications.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
SafePaint: Anti-forensic Image Inpainting with Domain Adaptation
Authors:
Dunyun Chen,
Xin Liao,
Xiaoshuai Wu,
Shiwei Chen
Abstract:
Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-foren…
▽ More
Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-forensic capabilities into consideration, firstly proposing an end-to-end training framework for anti-forensic image inpainting named SafePaint. Specifically, we innovatively formulated image inpainting as two major tasks: semantically plausible content completion and region-wise optimization. The former is similar to current inpainting methods that aim to restore the missing regions of corrupted images. The latter, through domain adaptation, endeavors to reconcile the discrepancies between the inpainted region and the unaltered area to achieve anti-forensic goals. Through comprehensive theoretical analysis, we validate the effectiveness of domain adaptation for anti-forensic performance. Furthermore, we meticulously crafted a region-wise separated attention (RWSA) module, which not only aligns with our objective of anti-forensics but also enhances the performance of the model. Extensive qualitative and quantitative evaluations show our approach achieves comparable results to existing image inpainting methods while offering anti-forensic capabilities not available in other methods.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics
Authors:
Xiaoshuai Wu,
Xin Liao,
Bo Ou,
Yuling Liu,
Zheng Qin
Abstract:
AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images…
▽ More
AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Uncertainty Quantification on Graph Learning: A Survey
Authors:
Chao Chen,
Chenghua Guo,
Rui Xu,
Xiangwen Liao,
Xi Zhang,
Sihong Xie,
Hui Xiong,
Philip Yu
Abstract:
Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works t…
▽ More
Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works that address uncertainty quantification within the model architectures, training, and inference of GNNs and PGMs. We aim to provide an overview of the current landscape of uncertainty in graphical models by organizing the recent methods into uncertainty representation and handling. By summarizing state-of-the-art methods, this survey seeks to deepen the understanding of uncertainty quantification in graphical models, thereby increasing their effectiveness and safety in critical applications.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Learning Traffic Signal Control via Genetic Programming
Authors:
Xiao-Cheng Liao,
Yi Mei,
Mengjie Zhang
Abstract:
The control of traffic signals is crucial for improving transportation efficiency. Recently, learning-based methods, especially Deep Reinforcement Learning (DRL), garnered substantial success in the quest for more efficient traffic signal control strategies. However, the design of rewards in DRL highly demands domain knowledge to converge to an effective policy, and the final policy also presents…
▽ More
The control of traffic signals is crucial for improving transportation efficiency. Recently, learning-based methods, especially Deep Reinforcement Learning (DRL), garnered substantial success in the quest for more efficient traffic signal control strategies. However, the design of rewards in DRL highly demands domain knowledge to converge to an effective policy, and the final policy also presents difficulties in terms of explainability. In this work, a new learning-based method for signal control in complex intersections is proposed. In our approach, we design a concept of phase urgency for each signal phase. During signal transitions, the traffic light control strategy selects the next phase to be activated based on the phase urgency. We then proposed to represent the urgency function as an explainable tree structure. The urgency function can calculate the phase urgency for a specific phase based on the current road conditions. Genetic programming is adopted to perform gradient-free optimization of the urgency function. We test our algorithm on multiple public traffic signal control datasets. The experimental results indicate that the tree-shaped urgency function evolved by genetic programming outperforms the baselines, including a state-of-the-art method in the transportation field and a well-known DRL-based method.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Authors:
Xinting Liao,
Weiming Liu,
Chaochao Chen,
Pengyang Zhou,
Fengyuan Yu,
Huabin Zhu,
Binhui Yao,
Tao Wang,
Xiaolin Zheng,
Yanchao Tan
Abstract:
Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) incon…
▽ More
Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) inconsistent representation spaces among local models. The former indicates that representation collapse in local model will subsequently impact the global model and other local models. The latter means that clients model data representation with inconsistent parameters due to the deficiency of supervision signals. In this work, we propose FedU2 which enhances generating uniform and unified representation in FUSL with non-IID data. Specifically, FedU2 consists of flexible uniform regularizer (FUR) and efficient unified aggregator (EUA). FUR in each client avoids representation collapse via dispersing samples uniformly, and EUA in server promotes unified representation by constraining consistent client model updating. To extensively validate the performance of FedU2, we conduct both cross-device and cross-silo evaluation experiments on two benchmark datasets, i.e., CIFAR10 and CIFAR100.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification
Authors:
Lanfeng Zhong,
Xin Liao,
Shaoting Zhang,
Xiaofan Zhang,
Guotai Wang
Abstract:
Despite that deep learning methods have achieved remarkable performance in pathology image classification, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo labels of the training s…
▽ More
Despite that deep learning methods have achieved remarkable performance in pathology image classification, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain shift between the pre-training data and the target dataset. To address this issue, we introduce VLM-CPL, a novel approach based on consensus pseudo labels that integrates two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo labels. By rejecting low-quality pseudo labels, we further propose High-confidence Cross Supervision (HCS) to learn from samples with reliable pseudo labels and the remaining unlabeled samples. Experimental results showed that our method obtained an accuracy of 87.1% and 95.1% on the HPH and LC25K datasets, respectively, and it largely outperformed existing zero-shot classification and noisy label learning methods. The code is available at https://github.com/lanfz2000/VLM-CPL.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Towards Fair and Efficient Learning-based Congestion Control
Authors:
Xudong Liao,
Han Tian,
Chaoliang Zeng,
Xinchen Wan,
Kai Chen
Abstract:
Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuiti…
▽ More
Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuitive, integrating these properties into existing learning-based CC is challenging, because: 1) their training environments are designed for the performance optimization of single flow but incapable of cooperative multi-flow optimization, and 2) there is no directly measurable metric to represent these properties into the training objective function.
We present Astraea, a new learning-based congestion control that ensures fast convergence to fairness with stability. At the heart of Astraea is a multi-agent deep reinforcement learning framework that explicitly optimizes these convergence properties during the training process by enabling the learning of interactive policy between multiple competing flows, while maintaining high performance. We further build a faithful multi-flow environment that emulates the competing behaviors of concurrent flows, explicitly expressing convergence properties to enable their optimization during training. We have fully implemented Astraea and our comprehensive experiments show that Astraea can quickly converge to fairness point and exhibit better stability than its counterparts. For example, \sys achieves near-optimal bandwidth sharing (i.e., fairness) when multiple flows compete for the same bottleneck, delivers up to 8.4$\times$ faster convergence speed and 2.8$\times$ smaller throughput deviation, while achieving comparable or even better performance over prior solutions.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
Authors:
Jianheng Huang,
Leyang Cui,
Ante Wang,
Chengyi Yang,
Xinting Liao,
Linfeng Song,
Junfeng Yao,
**song Su
Abstract:
Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To addr…
▽ More
Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To address this challenge, we propose a framework called Self-Synthesized Rehearsal (SSR) that uses the LLM to generate synthetic instances for rehearsal. Concretely, we first employ the base LLM for in-context learning to generate synthetic instances. Subsequently, we utilize the latest LLM to refine the instance outputs based on the synthetic inputs, preserving its acquired ability. Finally, we select diverse high-quality synthetic instances for rehearsal in future stages. Experimental results demonstrate that SSR achieves superior or comparable performance compared to conventional rehearsal-based approaches while being more data-efficient. Besides, SSR effectively preserves the generalization capabilities of LLMs in general domains.
△ Less
Submitted 25 May, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
An Iterative Associative Memory Model for Empathetic Response Generation
Authors:
Zhou Yang,
Zhaochun Ren,
Yufeng Wang,
Chao Chen,
Haizhou Sun,
Xiaofei Zhu,
Xiangwen Liao
Abstract:
Empathetic response generation aims to comprehend the cognitive and emotional states in dialogue utterances and generate proper responses. Psychological theories posit that comprehending emotional and cognitive states necessitates iteratively capturing and understanding associated words across dialogue utterances. However, existing approaches regard dialogue utterances as either a long sequence or…
▽ More
Empathetic response generation aims to comprehend the cognitive and emotional states in dialogue utterances and generate proper responses. Psychological theories posit that comprehending emotional and cognitive states necessitates iteratively capturing and understanding associated words across dialogue utterances. However, existing approaches regard dialogue utterances as either a long sequence or independent utterances for comprehension, which are prone to overlook the associated words between them. To address this issue, we propose an Iterative Associative Memory Model (IAMM) for empathetic response generation. Specifically, we employ a novel second-order interaction attention mechanism to iteratively capture vital associated words between dialogue utterances and situations, dialogue history, and a memory module (for storing associated words), thereby accurately and nuancedly comprehending the utterances. We conduct experiments on the Empathetic-Dialogue dataset. Both automatic and human evaluations validate the efficacy of the model. Variant experiments on LLMs also demonstrate that attending to associated words improves empathetic comprehension and expression.
△ Less
Submitted 2 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Exploiting Emotion-Semantic Correlations for Empathetic Response Generation
Authors:
Zhou Yang,
Zhaochun Ren,
Yufeng Wang,
Xiaofei Zhu,
Zhihao Chen,
Tiecheng Cai,
Yunbing Wu,
Yisong Su,
Sibo Ju,
Xiangwen Liao
Abstract:
Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with…
▽ More
Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with other grammar semantic roles, i.e., words with semantic meanings, in grammar. Previous methods overlook these two characteristics, which easily lead to misunderstandings of emotions and neglect of key semantics. To address this issue, we propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks. ESCM constructs dynamic emotion-semantic vectors through the interaction of context and emotions. We introduce dependency trees to reflect the correlations between emotions and semantics. Based on dynamic emotion-semantic vectors and dependency trees, we propose a dynamic correlation graph convolutional network to guide the model in learning context meanings in dialogue and generating empathetic responses. Experimental results on the EMPATHETIC-DIALOGUES dataset show that ESCM understands semantics and emotions more accurately and expresses fluent and informative empathetic responses. Our analysis results also indicate that the correlations between emotions and semantics are frequently used in dialogues, which is of great significance for empathetic perception and expression.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models
Authors:
Zhou Yang,
Zhaochun Ren,
Wang Yufeng,
Shizhong Peng,
Haizhou Sun,
Xiaofei Zhu,
Xiangwen Liao
Abstract:
Empathetic response generation is increasingly significant in AI, necessitating nuanced emotional and cognitive understanding coupled with articulate response expression. Current large language models (LLMs) excel in response expression; however, they lack the ability to deeply understand emotional and cognitive nuances, particularly in pinpointing fine-grained emotions and their triggers. Convers…
▽ More
Empathetic response generation is increasingly significant in AI, necessitating nuanced emotional and cognitive understanding coupled with articulate response expression. Current large language models (LLMs) excel in response expression; however, they lack the ability to deeply understand emotional and cognitive nuances, particularly in pinpointing fine-grained emotions and their triggers. Conversely, small-scale empathetic models (SEMs) offer strength in fine-grained emotion detection and detailed emotion cause identification. To harness the complementary strengths of both LLMs and SEMs, we introduce a Hybrid Empathetic Framework (HEF). HEF regards SEMs as flexible plugins to improve LLM's nuanced emotional and cognitive understanding. Regarding emotional understanding, HEF implements a two-stage emotion prediction strategy, encouraging LLMs to prioritize primary emotions emphasized by SEMs, followed by other categories, substantially alleviates the difficulties for LLMs in fine-grained emotion detection. Regarding cognitive understanding, HEF employs an emotion cause perception strategy, prompting LLMs to focus on crucial emotion-eliciting words identified by SEMs, thus boosting LLMs' capabilities in identifying emotion causes. This collaborative approach enables LLMs to discern emotions more precisely and formulate empathetic responses. We validate HEF on the Empathetic-Dialogue dataset, and the findings indicate that our framework enhances the refined understanding of LLMs and their ability to convey empathetic responses.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces
Authors:
Juan Hu,
Xin Liao,
Difei Gao,
Satoshi Tsutsui,
Qian Wang,
Zheng Qin,
Mike Zheng Shou
Abstract:
Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domai…
▽ More
Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domain Deepfake videos. Ourmethod consists of two stages named recoveringand localization. In the recovering stage, the modelrandomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for realfaces and a poor recovery effect for fake faces. Inthe localization stage, the output of the recoveryphase and the forgery ground truth mask serve assupervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Ourextensive experiments on four widely used benchmark datasets demonstrate that Delocate not onlyexcels in localizing tampered areas but also enhances cross-domain detection performance.
△ Less
Submitted 9 May, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Authors:
Yintao Tai,
Xiyang Liao,
Alessandro Suglia,
Antonio Vergari
Abstract:
Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they can…
▽ More
Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they cannot be used for generative tasks such as free-form question answering. In this work, we introduce PIXAR, the first pixel-based autoregressive LLM that performs text generation. Consisting of only a decoder, PIXAR can perform free-form generative tasks while kee** the number of parameters on par with previous encoder-decoder models. Furthermore, we highlight the challenges of generating text as non-noisy images and show this is due to using a maximum likelihood objective. To overcome this problem, we propose an adversarial pretraining stage that improves the readability and accuracy of PIXAR by 8.1 on LAMBADA and 8.5 on bAbI -- making it comparable to GPT-2 on text generation tasks. This paves the way to build open-vocabulary LLMs that operate on perceptual input only and calls into question the necessity of the usual symbolic input representation, i.e., text as (sub)tokens.
△ Less
Submitted 23 February, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services
Authors:
Zilong Lin,
Jian Cui,
Xiao**g Liao,
XiaoFeng Wang
Abstract:
The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic…
▽ More
The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today's public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime.
△ Less
Submitted 4 July, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Xorbits: Automating Operator Tiling for Distributed Data Science
Authors:
Weizheng Lu,
Kaisheng He,
Xuye Qin,
Chengjie Li,
Zhong Wang,
Tao Yuan,
Xia Liao,
Feng Zhang,
Yueguo Chen,
Xiaoyong Du
Abstract:
Data science pipelines commonly utilize dataframe and array operations for tasks such as data preprocessing, analysis, and machine learning. The most popular tools for these tasks are pandas and NumPy. However, these tools are limited to executing on a single node, making them unsuitable for processing large-scale data. Several systems have attempted to distribute data science applications to clus…
▽ More
Data science pipelines commonly utilize dataframe and array operations for tasks such as data preprocessing, analysis, and machine learning. The most popular tools for these tasks are pandas and NumPy. However, these tools are limited to executing on a single node, making them unsuitable for processing large-scale data. Several systems have attempted to distribute data science applications to clusters while maintaining interfaces similar to single-node libraries, enabling data scientists to scale their workloads without significant effort. However, existing systems often struggle with processing large datasets due to Out-of-Memory (OOM) problems caused by poor data partitioning. To overcome these challenges, we develop Xorbits, a high-performance, scalable data science framework specifically designed to distribute data science workloads across clusters while retaining familiar APIs. The key differentiator of Xorbits is its ability to dynamically switch between graph construction and graph execution. Xorbits has been successfully deployed in production environments with up to 5k CPU cores. Its applications span various domains, including user behavior analysis and recommendation systems in the e-commerce sector, as well as credit assessment and risk management in the finance industry. Users can easily scale their data science workloads by simply changing the import line of their pandas and NumPy code. Our experiments demonstrate that Xorbits can effectively process very large datasets without encountering OOM or data-skewing problems. Over the fastest state-of-the-art solutions, Xorbits achieves an impressive 2.66* speedup on average. In terms of API coverage, Xorbits attains a compatibility rate of 96.7%, surpassing the fastest framework by an impressive margin of 60 percentage points. Xorbits is available at https://github.com/xorbitsai/xorbits.
△ Less
Submitted 19 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality
Authors:
Junwen Yang,
Ang Li,
Xuewen Liao,
Christos Masouros,
A. L. Swindlehurst
Abstract:
This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per bl…
▽ More
This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per block, and while not necessarily constant over all the symbol slots, we refer to the technique as block-level slot-variant IE precoding. Through a careful examination of the optimal structure and the explicit duality inherent in block-level power minimization (PM) and signal-to-interference-plus-noise ratio (SINR) balancing (SB) problems, we discover that the joint optimization can be decomposed into subproblems with smaller variable sizes. As a step further, we propose block-level slot-invariant IE precoding by adding a structural constraint on the slot-variant IE precoding to maintain a constant precoder throughout the block. A novel linear precoder for IE is further presented, and we prove that the proposed slot-variant and slot-invariant IE precoding share an identical solution when the number of symbol slots does not exceed the number of users. Numerical simulations demonstrate that the proposed precoders achieve a significant complexity reduction compared against benchmark schemes, without sacrificing performance.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Progressive Frequency-Aware Network for Laparoscopic Image Desmoking
Authors:
Jiale Zhang,
Wenfeng Huang,
Xiangyun Liao,
Qiong Wang
Abstract:
Laparoscopic surgery offers minimally invasive procedures with better patient outcomes, but smoke presence challenges visibility and safety. Existing learning-based methods demand large datasets and high computational resources. We propose the Progressive Frequency-Aware Network (PFAN), a lightweight GAN framework for laparoscopic image desmoking, combining the strengths of CNN and Transformer for…
▽ More
Laparoscopic surgery offers minimally invasive procedures with better patient outcomes, but smoke presence challenges visibility and safety. Existing learning-based methods demand large datasets and high computational resources. We propose the Progressive Frequency-Aware Network (PFAN), a lightweight GAN framework for laparoscopic image desmoking, combining the strengths of CNN and Transformer for progressive information extraction in the frequency domain. PFAN features CNN-based Multi-scale Bottleneck-Inverting (MBI) Blocks for capturing local high-frequency information and Locally-Enhanced Axial Attention Transformers (LAT) for efficiently handling global low-frequency information. PFAN efficiently desmokes laparoscopic images even with limited training data. Our method outperforms state-of-the-art approaches in PSNR, SSIM, CIEDE2000, and visual quality on the Cholec80 dataset and retains only 629K parameters. Our code and models are made publicly available at: https://github.com/jlzcode/PFAN.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
PR-NeuS: A Prior-based Residual Learning Paradigm for Fast Multi-view Neural Surface Reconstruction
Authors:
Jianyao Xu,
Qingshan Xu,
Xinyao Liao,
Wanjuan Su,
Chen Zhang,
Yew-Soon Ong,
Wenbing Tao
Abstract:
Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a p…
▽ More
Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a prior-based residual learning paradigm for fast multi-view neural surface reconstruction. This paradigm consists of two optimization stages. In the first stage, we propose to leverage generalization models to generate a basis signed distance function (SDF) field. This initial field can be quickly obtained by fusing multiple local SDF fields produced by generalization models. This provides a coarse global geometry prior. Based on this prior, in the second stage, a fast residual learning strategy based on hash-encoding networks is proposed to encode an offset SDF field for the basis SDF field. Moreover, we introduce a prior-guided sampling scheme to help the residual learning stage converge better, and thus recover finer structures. With our designed paradigm, experimental results show that our method only takes about 3 minutes to reconstruct the surface of a single scene, while achieving competitive surface quality. Our code will be released upon publication.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Federated Learning for Short Text Clustering
Authors:
Mengling Hu,
Chaochao Chen,
Weiming Liu,
Xinting Liao,
Xiaolin Zheng
Abstract:
Short text clustering has been popularly studied for its significance in mining valuable insights from many short texts. In this paper, we focus on the federated short text clustering (FSTC) problem, i.e., clustering short texts that are distributed in different clients, which is a realistic problem under privacy requirements. Compared with the centralized short text clustering problem that short…
▽ More
Short text clustering has been popularly studied for its significance in mining valuable insights from many short texts. In this paper, we focus on the federated short text clustering (FSTC) problem, i.e., clustering short texts that are distributed in different clients, which is a realistic problem under privacy requirements. Compared with the centralized short text clustering problem that short texts are stored on a central server, the FSTC problem has not been explored yet. To fill this gap, we propose a Federated Robust Short Text Clustering (FSTC) framework. FSTC includes two main modules, i.e., robust short text clustering module and federated cluster center aggregation module. The robust short text clustering module aims to train an effective short text clustering model with local data in each client. We innovatively combine optimal transport to generate pseudo-labels with Gaussian-uniform mixture model to ensure the reliability of the pseudo-supervised data. The federated cluster center aggregation module aims to exchange knowledge across clients without sharing local raw data in an efficient way. The server aggregates the local cluster centers from different clients and then sends the global centers back to all clients in each communication round. Our empirical studies on three short text clustering datasets demonstrate that FSTC significantly outperforms the federated short text clustering baselines.
△ Less
Submitted 23 November, 2023;
originally announced December 2023.
-
Constructing Vec-tionaries to Extract Message Features from Texts: A Case Study of Moral Appeals
Authors:
Zening Duan,
Anqi Shao,
Yicheng Hu,
Heysung Lee,
Xining Liao,
Yoo Ji Suh,
Jisoo Kim,
Kai-Cheng Yang,
Kai** Chen,
Sijia Yang
Abstract:
While researchers often study message features like moral content in text, such as party manifestos and social media, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies dev…
▽ More
While researchers often study message features like moral content in text, such as party manifestos and social media, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies developed for the original applications. In this paper, we present an approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings through nonlinear optimization. By harnessing semantic relationships encoded by embeddings, vec-tionaries improve the measurement of message features from text, especially those in short format, by expanding the applicability of original vocabularies to other contexts. Importantly, a vec-tionary can produce additional metrics to capture the valence and ambivalence of a message feature beyond its strength in texts. Using moral content in tweets as a case study, we illustrate the steps to construct the moral foundations vec-tionary, showcasing its ability to process texts missed by conventional dictionaries and word embedding methods and to produce measurements better aligned with crowdsourced human assessments. Furthermore, additional metrics from the vec-tionary unveiled unique insights that facilitated predicting outcomes such as message retransmission.
△ Less
Submitted 8 March, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Graph for Science: From API based Programming to Graph Engine based Programming for HPC
Authors:
Yu Zhang,
Zixiao Wang,
** Zhao,
Yuluo Guo,
Hui Yu,
Zhiying Huang,
Xuanhua Shi,
Xiaofei Liao
Abstract:
Modern scientific applications predominantly run on large-scale computing platforms, necessitating collaboration between scientific domain experts and high-performance computing (HPC) experts. While domain experts are often skilled in customizing domain-specific scientific computing routines, which often involves various matrix computations, HPC experts are essential for achieving efficient execut…
▽ More
Modern scientific applications predominantly run on large-scale computing platforms, necessitating collaboration between scientific domain experts and high-performance computing (HPC) experts. While domain experts are often skilled in customizing domain-specific scientific computing routines, which often involves various matrix computations, HPC experts are essential for achieving efficient execution of these computations on large-scale platforms. This process often involves utilizing complex parallel computing libraries tailored to specific matrix computation scenarios. However, the intricate programming procedure and the need for deep understanding in both application domains and HPC poses significant challenges to the widespread adoption of scientific computing. In this research, we observe that matrix computations can be transformed into equivalent graph representations, and that by utilizing graph processing engines, HPC experts can be freed from the burden of implementing efficient scientific computations. Based on this observation, we introduce a graph engine-based scientific computing (Graph for Science) paradigm, which provides a unified graph programming interface, enabling domain experts to promptly implement various types of matrix computations. The proposed paradigm leverages the underlying graph processing engine to achieve efficient execution, eliminating the needs for HPC expertise in programming large-scale scientific applications. Our results show that the graph engine-based scientific computing paradigm achieves performance comparable to the best-performing implementations based on existing parallel computing libraries and bespoke implementations. Importantly, the paradigm greatly simplifies the development of scientific computations on large-scale platforms, reducing the programming difficulty for scientists and facilitating broader adoption of scientific computing.
△ Less
Submitted 2 April, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
Learning Uniform Clusters on Hypersphere for Deep Graph-level Clustering
Authors:
Mengling Hu,
Chaochao Chen,
Weiming Liu,
Xinyi Zhang,
Xinting Liao,
Xiaolin Zheng
Abstract:
Graph clustering has been popularly studied in recent years. However, most existing graph clustering methods focus on node-level clustering, i.e., grou** nodes in a single graph into clusters. In contrast, graph-level clustering, i.e., grou** multiple graphs into clusters, remains largely unexplored. Graph-level clustering is critical in a variety of real-world applications, such as, propertie…
▽ More
Graph clustering has been popularly studied in recent years. However, most existing graph clustering methods focus on node-level clustering, i.e., grou** nodes in a single graph into clusters. In contrast, graph-level clustering, i.e., grou** multiple graphs into clusters, remains largely unexplored. Graph-level clustering is critical in a variety of real-world applications, such as, properties prediction of molecules and community analysis in social networks. However, graph-level clustering is challenging due to the insufficient discriminability of graph-level representations, and the insufficient discriminability makes deep clustering be more likely to obtain degenerate solutions (cluster collapse). To address the issue, we propose a novel deep graph-level clustering method called Uniform Deep Graph Clustering (UDGC). UDGC assigns instances evenly to different clusters and then scatters those clusters on unit hypersphere, leading to a more uniform cluster-level distribution and a slighter cluster collapse. Specifically, we first propose Augmentation-Consensus Optimal Transport (ACOT) for generating uniformly distributed and reliable pseudo labels for partitioning clusters. Then we adopt contrastive learning to scatter those clusters. Besides, we propose Center Alignment Optimal Transport (CAOT) for guiding the model to learn better parameters, which further promotes the cluster performance. Our empirical study on eight well-known datasets demonstrates that UDGC significantly outperforms the state-of-the-art models.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Game Theoretic Application to Intersection Management: A Literature Review
Authors:
Ziye Qin,
Ang Ji,
Zhanbo Sun,
Guoyuan Wu,
Peng Hao,
Xishun Liao
Abstract:
The emergence of vehicle-to-everything (V2X) technology offers new insights into intersection management. This, however, has also presented new challenges, such as the need to understand and model the interactions of traffic participants, including their competition and cooperation behaviors. Game theory has been widely adopted to study rationally selfish or cooperative behaviors during interactio…
▽ More
The emergence of vehicle-to-everything (V2X) technology offers new insights into intersection management. This, however, has also presented new challenges, such as the need to understand and model the interactions of traffic participants, including their competition and cooperation behaviors. Game theory has been widely adopted to study rationally selfish or cooperative behaviors during interactions and has been applied to advanced intersection management. In this paper, we review the application of game theory to intersection management and sort out relevant studies under various levels of intelligence and connectivity. First, the problem of urban intersection management and its challenges are briefly introduced. The basic elements of game theory specifically for intersection applications are then summarized. Next, we present the game-theoretic models and solutions that have been applied to intersection management. Finally, the limitations and potential opportunities for subsequent studies within the game-theoretic application to intersection management are discussed.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
The Impact of Familiarity on Naming Variation: A Study on Object Naming in Mandarin Chinese
Authors:
Yunke He,
Xixian Liao,
Jialing Liang,
Gemma Boleda
Abstract:
Different speakers often produce different names for the same object or entity (e.g., "woman" vs. "tourist" for a female tourist). The reasons behind variation in naming are not well understood. We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images, and investigate how familiarity with a given kind of object relates to the de…
▽ More
Different speakers often produce different names for the same object or entity (e.g., "woman" vs. "tourist" for a female tourist). The reasons behind variation in naming are not well understood. We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images, and investigate how familiarity with a given kind of object relates to the degree of naming variation it triggers across subjects. We propose that familiarity influences naming variation in two competing ways: increasing familiarity can either expand vocabulary, leading to higher variation, or promote convergence on conventional names, thereby reducing variation. We find evidence for both factors being at play. Our study illustrates how computational resources can be used to address research questions in Cognitive Science.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
HB-net: Holistic bursting cell cluster integrated network for occluded multi-objects recognition
Authors:
Xudong Gao,
Xiao Guang Gao,
Jia Rong,
Xiaowei Chen,
Xiang Liao,
Jun Chen
Abstract:
Within the realm of image recognition, a specific category of multi-label classification (MLC) challenges arises when objects within the visual field may occlude one another, demanding simultaneous identification of both occluded and occluding objects. Traditional convolutional neural networks (CNNs) can tackle these challenges; however, those models tend to be bulky and can only attain modest lev…
▽ More
Within the realm of image recognition, a specific category of multi-label classification (MLC) challenges arises when objects within the visual field may occlude one another, demanding simultaneous identification of both occluded and occluding objects. Traditional convolutional neural networks (CNNs) can tackle these challenges; however, those models tend to be bulky and can only attain modest levels of accuracy. Leveraging insights from cutting-edge neural science research, specifically the Holistic Bursting (HB) cell, this paper introduces a pioneering integrated network framework named HB-net. Built upon the foundation of HB cell clusters, HB-net is designed to address the intricate task of simultaneously recognizing multiple occluded objects within images. Various Bursting cell cluster structures are introduced, complemented by an evidence accumulation mechanism. Testing is conducted on multiple datasets comprising digits and letters. The results demonstrate that models incorporating the HB framework exhibit a significant $2.98\%$ enhancement in recognition accuracy compared to models without the HB framework ($1.0298$ times, $p=0.0499$). Although in high-noise settings, standard CNNs exhibit slightly greater robustness when compared to HB-net models, the models that combine the HB framework and EA mechanism achieve a comparable level of accuracy and resilience to ResNet50, despite having only three convolutional layers and approximately $1/30$ of the parameters. The findings of this study offer valuable insights for improving computer vision algorithms. The essential code is provided at https://github.com/d-lab438/hb-net.git.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation
Authors:
Yingwei Ma,
Yue Yu,
Shanshan Li,
Yu Jiang,
Yong Guo,
Yuanliang Zhang,
Yutao Xie,
Xiangke Liao
Abstract:
Large language models (LLMs) have showcased remarkable prowess in code generation. However, automated code generation is still challenging since it requires a high-level semantic map** between natural language requirements and codes. Most existing LLMs-based approaches for code generation rely on decoder-only causal language models often treate codes merely as plain text tokens, i.e., feeding th…
▽ More
Large language models (LLMs) have showcased remarkable prowess in code generation. However, automated code generation is still challenging since it requires a high-level semantic map** between natural language requirements and codes. Most existing LLMs-based approaches for code generation rely on decoder-only causal language models often treate codes merely as plain text tokens, i.e., feeding the requirements as a prompt input, and outputing code as flat sequence of tokens, potentially missing the rich semantic features inherent in source code. To bridge this gap, this paper proposes the "Semantic Chain-of-Thought" approach to intruduce semantic information of code, named SeCoT. Our motivation is that the semantic information of the source code (\eg data flow and control flow) describes more precise program execution behavior, intention and function. By guiding LLM consider and integrate semantic information, we can achieve a more granular understanding and representation of code, enhancing code generation accuracy. Meanwhile, while traditional techniques leveraging such semantic information require complex static or dynamic code analysis to obtain features such as data flow and control flow, SeCoT demonstrates that this process can be fully automated via the intrinsic capabilities of LLMs (i.e., in-context learning), while being generalizable and applicable to challenging domains. While SeCoT can be applied with different LLMs, this paper focuses on the powerful GPT-style models: ChatGPT(close-source model) and WizardCoder(open-source model). The experimental study on three popular DL benchmarks (i.e., HumanEval, HumanEval-ET and MBPP) shows that SeCoT can achieves state-of-the-art performance, greatly improving the potential for large models and code generation.
△ Less
Submitted 22 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Provable Tensor Completion with Graph Information
Authors:
Kaidong Wang,
Yao Wang,
Xiuwu Liao,
Shaojie Tang,
Can Yang,
Deyu Meng
Abstract:
Graphs, depicting the interrelations between variables, has been widely used as effective side information for accurate data recovery in various matrix/tensor recovery related applications. In this paper, we study the tensor completion problem with graph information. Current research on graph-regularized tensor completion tends to be task-specific, lacking generality and systematic approaches. Fur…
▽ More
Graphs, depicting the interrelations between variables, has been widely used as effective side information for accurate data recovery in various matrix/tensor recovery related applications. In this paper, we study the tensor completion problem with graph information. Current research on graph-regularized tensor completion tends to be task-specific, lacking generality and systematic approaches. Furthermore, a recovery theory to ensure performance remains absent. Moreover, these approaches overlook the dynamic aspects of graphs, treating them as static akin to matrices, even though graphs could exhibit dynamism in tensor-related scenarios. To confront these challenges, we introduce a pioneering framework in this paper that systematically formulates a novel model, theory, and algorithm for solving the dynamic graph regularized tensor completion problem. For the model, we establish a rigorous mathematical representation of the dynamic graph, based on which we derive a new tensor-oriented graph smoothness regularization. By integrating this regularization into a tensor decomposition model based on transformed t-SVD, we develop a comprehensive model simultaneously capturing the low-rank and similarity structure of the tensor. In terms of theory, we showcase the alignment between the proposed graph smoothness regularization and a weighted tensor nuclear norm. Subsequently, we establish assurances of statistical consistency for our model, effectively bridging a gap in the theoretical examination of the problem involving tensor recovery with graph information. In terms of the algorithm, we develop a solution of high effectiveness, accompanied by a guaranteed convergence, to address the resulting model. To showcase the prowess of our proposed model in contrast to established ones, we provide in-depth numerical experiments encompassing synthetic data as well as real-world datasets.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Real-time Learning of Driving Gap Preference for Personalized Adaptive Cruise Control
Authors:
Zhouqiao Zhao,
Xishun Liao,
Amr Abdelraouf,
Kyungtae Han,
Rohit Gupta,
Matthew J. Barth,
Guoyuan Wu
Abstract:
Advanced Driver Assistance Systems (ADAS) are increasingly important in improving driving safety and comfort, with Adaptive Cruise Control (ACC) being one of the most widely used. However, pre-defined ACC settings may not always align with driver's preferences and habits, leading to discomfort and potential safety issues. Personalized ACC (P-ACC) has been proposed to address this problem, but most…
▽ More
Advanced Driver Assistance Systems (ADAS) are increasingly important in improving driving safety and comfort, with Adaptive Cruise Control (ACC) being one of the most widely used. However, pre-defined ACC settings may not always align with driver's preferences and habits, leading to discomfort and potential safety issues. Personalized ACC (P-ACC) has been proposed to address this problem, but most existing research uses historical driving data to imitate behaviors that conform to driver preferences, neglecting real-time driver feedback. To bridge this gap, we propose a cloud-vehicle collaborative P-ACC framework that incorporates driver feedback adaptation in real time. The framework is divided into offline and online parts. The offline component records the driver's naturalistic car-following trajectory and uses inverse reinforcement learning (IRL) to train the model on the cloud. In the online component, driver feedback is used to update the driving gap preference in real time. The model is then retrained on the cloud with driver's takeover trajectories, achieving incremental learning to better match driver's preference. Human-in-the-loop (HuiL) simulation experiments demonstrate that our proposed method significantly reduces driver intervention in automatic control systems by up to 62.8%. By incorporating real-time driver feedback, our approach enhances the comfort and safety of P-ACC, providing a personalized and adaptable driving experience.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data
Authors:
Xinting Liao,
Chaochao Chen,
Weiming Liu,
Pengyang Zhou,
Huabin Zhu,
Shuheng Shen,
Weiqiang Wang,
Mengling Hu,
Yanchao Tan,
Xiaolin Zheng
Abstract:
Federated learning (FL) is a distributed machine learning paradigm that needs collaboration between a server and a series of clients with decentralized data. To make FL effective in real-world applications, existing work devotes to improving the modeling of decentralized data with non-independent and identical distributions (non-IID). In non-IID settings, there are intra-client inconsistency that…
▽ More
Federated learning (FL) is a distributed machine learning paradigm that needs collaboration between a server and a series of clients with decentralized data. To make FL effective in real-world applications, existing work devotes to improving the modeling of decentralized data with non-independent and identical distributions (non-IID). In non-IID settings, there are intra-client inconsistency that comes from the imbalanced data modeling, and inter-client inconsistency among heterogeneous client distributions, which not only hinders sufficient representation of the minority data, but also brings discrepant model deviations. However, previous work overlooks to tackle the above two coupling inconsistencies together. In this work, we propose FedRANE, which consists of two main modules, i.e., local relational augmentation (LRA) and global Nash equilibrium (GNE), to resolve intra- and inter-client inconsistency simultaneously. Specifically, in each client, LRA mines the similarity relations among different data samples and enhances the minority sample representations with their neighbors using attentive message passing. In server, GNE reaches an agreement among inconsistent and discrepant model deviations from clients to server, which encourages the global model to update in the direction of global optimum without breaking down the clients optimization toward their local optimums. We conduct extensive experiments on four benchmark datasets to show the superiority of FedRANE in enhancing the performance of FL with non-IID data.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Map** Recovered Faces
Authors:
Juan Hu,
Xin Liao,
Difei Gao,
Satoshi Tsutsui,
Qian Wang,
Zheng Qin,
Mike Zheng Shou
Abstract:
The exploitation of Deepfake techniques for malicious intentions has driven significant research interest in Deepfake detection. Deepfake manipulations frequently introduce random tampered traces, leading to unpredictable outcomes in different facial regions. However, existing detection methods heavily rely on specific forgery indicators, and as the forgery mode improves, these traces become incre…
▽ More
The exploitation of Deepfake techniques for malicious intentions has driven significant research interest in Deepfake detection. Deepfake manipulations frequently introduce random tampered traces, leading to unpredictable outcomes in different facial regions. However, existing detection methods heavily rely on specific forgery indicators, and as the forgery mode improves, these traces become increasingly randomized, resulting in a decline in the detection performance of methods reliant on specific forgery traces. To address the limitation, we propose Recap, a novel Deepfake detection model that exposes unspecific facial part inconsistencies by recovering faces and enlarges the differences between real and fake by map** recovered faces. In the recovering stage, the model focuses on randomly masking regions of interest (ROIs) and reconstructing real faces without unpredictable tampered traces, resulting in a relatively good recovery effect for real faces while a poor recovery effect for fake faces. In the map** stage, the output of the recovery phase serves as supervision to guide the facial map** process. This map** process strategically emphasizes the map** of fake faces with poor recovery, leading to a further deterioration in their representation, while enhancing and refining the map** of real faces with good representation. As a result, this approach significantly amplifies the discrepancies between real and fake videos. Our extensive experiments on standard benchmarks demonstrate that Recap is effective in multiple scenarios.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
CTP-Net: Character Texture Perception Network for Document Image Forgery Localization
Authors:
Xin Liao,
Siliang Chen,
Jiaxin Chen,
Tianyi Wang,
Xiehua Li
Abstract:
Due to the progression of information technology in recent years, document images have been widely disseminated on social networks. With the help of powerful image editing tools, document images are easily forged without leaving visible manipulation traces, which leads to severe issues if significant information is falsified for malicious use. Therefore, the research of document image forensics is…
▽ More
Due to the progression of information technology in recent years, document images have been widely disseminated on social networks. With the help of powerful image editing tools, document images are easily forged without leaving visible manipulation traces, which leads to severe issues if significant information is falsified for malicious use. Therefore, the research of document image forensics is worth further exploring. In this paper, we propose a Character Texture Perception Network (CTP-Net) to localize the forged regions in document images. Specifically, considering the characters with semantics in a document image are highly vulnerable, capturing the forgery traces is the key to localize the forged regions. We design a Character Texture Stream (CTS) based on optical character recognition to capture features of text areas that are essential components of a document image. Meanwhile, texture features of the whole document image are exploited by an Image Texture Stream (ITS). Combining the features extracted from the CTS and the ITS, the CTP-Net can reveal more subtle forgery traces from document images. Moreover, to overcome the challenge caused by the lack of fake document images, we design a data generation strategy that is utilized to construct a Fake Chinese Trademark dataset (FCTM). Experimental results on different datasets demonstrate that the proposed CTP-Net is able to localize multi-scale forged areas in document images, and outperform the state-of-the-art forgery localization methods, even though post-processing operations are applied.
△ Less
Submitted 14 August, 2023; v1 submitted 4 August, 2023;
originally announced August 2023.
-
Erase and Repair: An Efficient Box-Free Removal Attack on High-Capacity Deep Hiding
Authors:
Hangcheng Liu,
Tao Xiang,
Shangwei Guo,
Han Li,
Tianwei Zhang,
Xiaofeng Liao
Abstract:
Deep hiding, embedding images with others using deep neural networks, has demonstrated impressive efficacy in increasing the message capacity and robustness of secret sharing. In this paper, we challenge the robustness of existing deep hiding schemes by preventing the recovery of secret images, building on our in-depth study of state-of-the-art deep hiding schemes and their vulnerabilities. Levera…
▽ More
Deep hiding, embedding images with others using deep neural networks, has demonstrated impressive efficacy in increasing the message capacity and robustness of secret sharing. In this paper, we challenge the robustness of existing deep hiding schemes by preventing the recovery of secret images, building on our in-depth study of state-of-the-art deep hiding schemes and their vulnerabilities. Leveraging our analysis, we first propose a simple box-free removal attack on deep hiding that does not require any prior knowledge of the deep hiding schemes.
To improve the removal performance on the deep hiding schemes that may be enhanced by adversarial training, we further design a more powerful removal attack, efficient box-free removal attack (EBRA), which employs image inpainting techniques to remove secret images from container images. In addition, to ensure the effectiveness of our attack and preserve the fidelity of the processed container images, we design an erasing phase based on the locality of deep hiding to remove secret information and then make full use of the visual information of container images to repair the erased visual content. Extensive evaluations show our method can completely remove secret images from container images with negligible impact on the quality of container images.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Random Sub-Samples Generation for Self-Supervised Real Image Denoising
Authors:
Yizhong Pan,
Xiao Liu,
Xiangyu Liao,
Yuanzhouhan Cao,
Chao Ren
Abstract:
With sufficient paired training samples, the supervised deep learning methods have attracted much attention in image denoising because of their superior performance. However, it is still very challenging to widely utilize the supervised methods in real cases due to the lack of paired noisy-clean images. Meanwhile, most self-supervised denoising methods are ineffective as well when applied to the r…
▽ More
With sufficient paired training samples, the supervised deep learning methods have attracted much attention in image denoising because of their superior performance. However, it is still very challenging to widely utilize the supervised methods in real cases due to the lack of paired noisy-clean images. Meanwhile, most self-supervised denoising methods are ineffective as well when applied to the real-world denoising tasks because of their strict assumptions in applications. For example, as a typical method for self-supervised denoising, the original blind spot network (BSN) assumes that the noise is pixel-wise independent, which is much different from the real cases. To solve this problem, we propose a novel self-supervised real image denoising framework named Sampling Difference As Perturbation (SDAP) based on Random Sub-samples Generation (RSG) with a cyclic sample difference loss. Specifically, we dig deeper into the properties of BSN to make it more suitable for real noise. Surprisingly, we find that adding an appropriate perturbation to the training images can effectively improve the performance of BSN. Further, we propose that the sampling difference can be considered as perturbation to achieve better results. Finally we propose a new BSN framework in combination with our RSG strategy. The results show that it significantly outperforms other state-of-the-art self-supervised denoising methods on real-world datasets. The code is available at https://github.com/p1y2z3/SDAP.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning
Authors:
Xinting Liao,
Weiming Liu,
Chaochao Chen,
Pengyang Zhou,
Huabin Zhu,
Yanchao Tan,
Jun Wang,
Yue Qi
Abstract:
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the abo…
▽ More
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Learning to Branch in Combinatorial Optimization with Graph Pointer Networks
Authors:
Rui Wang,
Zhiming Zhou,
Tao Zhang,
Ling Wang,
Xin Xu,
Xiangke Liao,
Kaiwen Li
Abstract:
Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effective…
▽ More
Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Authors:
Yuqi Wang,
Yuntao Chen,
Xingyu Liao,
Lue Fan,
Zhaoxiang Zhang
Abstract:
Comprehensive modeling of the surrounding 3D world is key to the success of autonomous driving. However, existing perception tasks like object detection, road structure segmentation, depth & elevation estimation, and open-set object localization each only focus on a small facet of the holistic 3D scene understanding task. This divide-and-conquer strategy simplifies the algorithm development proced…
▽ More
Comprehensive modeling of the surrounding 3D world is key to the success of autonomous driving. However, existing perception tasks like object detection, road structure segmentation, depth & elevation estimation, and open-set object localization each only focus on a small facet of the holistic 3D scene understanding task. This divide-and-conquer strategy simplifies the algorithm development procedure at the cost of losing an end-to-end unified solution to the problem. In this work, we address this limitation by studying camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding. To achieve this, we introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate spatiotemporal information from multi-frame and multi-view images in a coarse-to-fine scheme, integrating feature learning and scene representation into a unified occupancy representation. We have conducted extensive ablation studies to verify the effectiveness and efficiency of the proposed method. Our approach achieves new state-of-the-art results for camera-based semantic segmentation and panoptic segmentation on the nuScenes dataset. Furthermore, our method can be easily extended to dense occupancy prediction and has shown promising performance on the Occ3D benchmark. The code will be released at https://github.com/Robertwyq/PanoOcc.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
detrex: Benchmarking Detection Transformers
Authors:
Tianhe Ren,
Shilong Liu,
Feng Li,
Hao Zhang,
Ailing Zeng,
Jie Yang,
Xingyu Liao,
Ding Jia,
Hongyang Li,
He Cao,
Jianan Wang,
Zhaoyang Zeng,
Xianbiao Qi,
Yuhui Yuan,
Jianwei Yang,
Lei Zhang
Abstract:
The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection and other perception tasks. However, the current field lacks a unified and comprehensive benchmark specifically tailored for DETR-based models. To address this issue, we develop a unified, highly modular, and lightweight co…
▽ More
The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection and other perception tasks. However, the current field lacks a unified and comprehensive benchmark specifically tailored for DETR-based models. To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation. We conduct extensive experiments under detrex and perform a comprehensive benchmark for DETR-based models. Moreover, we enhance the performance of detection transformers through the refinement of training hyper-parameters, providing strong baselines for supported algorithms.We hope that detrex could offer research communities a standardized and unified platform to evaluate and compare different DETR-based models while fostering a deeper understanding and driving advancements in DETR-based instance recognition. Our code is available at https://github.com/IDEA-Research/detrex. The project is currently being actively developed. We encourage the community to use detrex codebase for further development and contributions.
△ Less
Submitted 13 June, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Semi-supervised Pathological Image Segmentation via Cross Distillation of Multiple Attentions
Authors:
Lanfeng Zhong,
Xin Liao,
Shaoting Zhang,
Guotai Wang
Abstract:
Segmentation of pathological images is a crucial step for accurate cancer diagnosis. However, acquiring dense annotations of such images for training is labor-intensive and time-consuming. To address this issue, Semi-Supervised Learning (SSL) has the potential for reducing the annotation cost, but it is challenged by a large number of unlabeled training images. In this paper, we propose a novel SS…
▽ More
Segmentation of pathological images is a crucial step for accurate cancer diagnosis. However, acquiring dense annotations of such images for training is labor-intensive and time-consuming. To address this issue, Semi-Supervised Learning (SSL) has the potential for reducing the annotation cost, but it is challenged by a large number of unlabeled training images. In this paper, we propose a novel SSL method based on Cross Distillation of Multiple Attentions (CDMA) to effectively leverage unlabeled images. Firstly, we propose a Multi-attention Tri-branch Network (MTNet) that consists of an encoder and a three-branch decoder, with each branch using a different attention mechanism that calibrates features in different aspects to generate diverse outputs. Secondly, we introduce Cross Decoder Knowledge Distillation (CDKD) between the three decoder branches, allowing them to learn from each other's soft labels to mitigate the negative impact of incorrect pseudo labels in training. Additionally, uncertainty minimization is applied to the average prediction of the three branches, which further regularizes predictions on unlabeled images and encourages inter-branch consistency. Our proposed CDMA was compared with eight state-of-the-art SSL methods on the public DigestPath dataset, and the experimental results showed that our method outperforms the other approaches under different annotation ratios. The code is available at \href{https://github.com/HiLab-git/CDMA}{https://github.com/HiLab-git/CDMA.}
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering
Authors:
Xiaolin Zheng,
Mengling Hu,
Weiming Liu,
Chaochao Chen,
Xinting Liao
Abstract:
Short text clustering is challenging since it takes imbalanced and noisy data as inputs. Existing approaches cannot solve this problem well, since (1) they are prone to obtain degenerate solutions especially on heavy imbalanced datasets, and (2) they are vulnerable to noises. To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced…
▽ More
Short text clustering is challenging since it takes imbalanced and noisy data as inputs. Existing approaches cannot solve this problem well, since (1) they are prone to obtain degenerate solutions especially on heavy imbalanced datasets, and (2) they are vulnerable to noises. To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data. RSTC includes two modules, i.e., pseudo-label generation module and robust representation learning module. The former generates pseudo-labels to provide supervision for the later, which contributes to more robust representations and correctly separated clusters. To provide robustness against the imbalance in data, we propose self-adaptive optimal transport in the pseudo-label generation module. To improve robustness against the noise in data, we further introduce both class-wise and instance-wise contrastive learning in the robust representation learning module. Our empirical studies on eight short text clustering datasets demonstrate that RSTC significantly outperforms the state-of-the-art models. The code is available at: https://github.com/hmllmh/RSTC.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.