Search | arXiv e-print repository

arXiv:2406.19631 [pdf, other]

Personalized Interpretation on Federated Learning: A Virtual Concepts approach

Authors: Peng Yan, Guodong Long, **g Jiang, Michael Blumenstein

Abstract: Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixt… ▽ More Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixture of conceptual vectors that each one represents an interpretable concept to end-users. These conceptual vectors could be pre-defined or refined in a human-in-the-loop process or be learnt via the optimization procedure of the federated learning system. In addition to the interpretability, the clarity of client-specific personalization could also be applied to enhance the robustness of the training process on FL system. The effectiveness of the proposed method have been validated on benchmark datasets. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.01961 [pdf, other]

Exploring Real World Map Change Generalization of Prior-Informed HD Map Prediction Models

Authors: Samuel M. Bateman, Ning Xu, H. Charles Zhao, Yael Ben Shalom, Vince Gong, Greg Long, Will Maddern

Abstract: Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the online map** problem. However, effectively predicting online maps at a high enough quality to enable safe, driverless deployments remains a significant challenge. Recent work on these m… ▽ More Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the online map** problem. However, effectively predicting online maps at a high enough quality to enable safe, driverless deployments remains a significant challenge. Recent work on these models proposes training robust online map** systems using low quality map priors with synthetic perturbations in an attempt to simulate out-of-date HD map priors. In this paper, we investigate how models trained on these synthetically perturbed map priors generalize to performance on deployment-scale, real world map changes. We present a large-scale experimental study to determine which synthetic perturbations are most useful in generalizing to real world HD map changes, evaluated using multiple years of real-world autonomous driving data. We show there is still a substantial sim2real gap between synthetic prior perturbations and observed real-world changes, which limits the utility of current prior-informed HD map prediction models. △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

arXiv:2406.00004 [pdf, other]

Navigating the Future of Federated Recommendation Systems with Foundation Models

Authors: Zhiwei Li, Guodong Long

Abstract: In recent years, the integration of federated learning (FL) and recommendation systems (RS), known as Federated Recommendation Systems (FRS), has attracted attention for preserving user privacy by kee** private data on client devices. However, FRS faces inherent limitations such as data heterogeneity and scarcity, due to the privacy requirements of FL and the typical data sparsity issues of RSs.… ▽ More In recent years, the integration of federated learning (FL) and recommendation systems (RS), known as Federated Recommendation Systems (FRS), has attracted attention for preserving user privacy by kee** private data on client devices. However, FRS faces inherent limitations such as data heterogeneity and scarcity, due to the privacy requirements of FL and the typical data sparsity issues of RSs. Models like ChatGPT are empowered by the concept of transfer learning and self-supervised learning, so they can be easily applied to the downstream tasks after fine-tuning or prompting. These models, so-called Foundation Models (FM), fouce on understanding the human's intent and perform following their designed roles in the specific tasks, which are widely recognized for producing high-quality content in the image and language domains. Thus, the achievements of FMs inspire the design of FRS and suggest a promising research direction: integrating foundation models to address the above limitations. In this study, we conduct a comprehensive review of FRSs with FMs. Specifically, we: 1) summarise the common approaches of current FRSs and FMs; 2) review the challenges posed by FRSs and FMs; 3) discuss potential future research directions; and 4) introduce some common benchmarks and evaluation metrics in the FRS field. We hope that this position paper provides the necessary background and guidance to explore this interesting and emerging topic. △ Less

Submitted 3 June, 2024; v1 submitted 12 May, 2024; originally announced June 2024.

Comments: 20 pages, position paper

arXiv:2405.20348 [pdf, other]

Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation Models

Authors: Shengchao Chen, Guodong Long, **g Jiang, Chengqi Zhang

Abstract: This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological d… ▽ More This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological data on devices while kee** high efficiency. Concretely, we introduce a lightweight personalized adapter into PLMs and endows it with weather pattern awareness. During communication between clients and the server, low-rank-based transmission is performed to effectively fuse the global knowledge among devices while maintaining high communication efficiency and ensuring privacy. Experiments on real-wold dataset show that LM-Weather outperforms the state-of-the-art results by a large margin across various tasks (e.g., forecasting and imputation at different scales). We provide extensive and in-depth analyses experiments, which verify that LM-Weather can (1) indeed leverage sequential knowledge from natural language to accurately handle meteorological sequence, (2) allows each devices obtain highly customized models under significant heterogeneity, and (3) generalize under data-limited and out-of-distribution (OOD) scenarios. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 42 pages, under review

arXiv:2405.18123 [pdf, other]

PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning

Authors: Martin Balla, George E. M. Long, James Goodman, Raluca D. Gaina, Diego Perez-Liebana

Abstract: Modern Tabletop Games present various interesting challenges for Multi-agent Reinforcement Learning. In this paper, we introduce PyTAG, a new framework that supports interacting with a large collection of games implemented in the Tabletop Games framework. In this work we highlight the challenges tabletop games provide, from a game-playing agent perspective, along with the opportunities they provid… ▽ More Modern Tabletop Games present various interesting challenges for Multi-agent Reinforcement Learning. In this paper, we introduce PyTAG, a new framework that supports interacting with a large collection of games implemented in the Tabletop Games framework. In this work we highlight the challenges tabletop games provide, from a game-playing agent perspective, along with the opportunities they provide for future research. Additionally, we highlight the technical challenges that involve training Reinforcement Learning agents on these games. To explore the Multi-agent setting provided by PyTAG we train the popular Proximal Policy Optimisation Reinforcement Learning algorithm using self-play on a subset of games and evaluate the trained policies against some simple agents and Monte-Carlo Tree Search implemented in the Tabletop Games framework. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16472 [pdf, other]

Multi-Level Additive Modeling for Structured Non-IID Federated Learning

Authors: Shutong Chen, Tianyi Zhou, Guodong Long, Jie Ma, **g Jiang, Chengqi Zhang

Abstract: The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-… ▽ More The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-level structure, called ``Multi-level Additive Models (MAM)'', for better knowledge-sharing across heterogeneous clients and their personalization. In federated MAM (FeMAM), each client is assigned to at most one model per level and its personalized prediction sums up the outputs of models assigned to it across all levels. For the top level, FeMAM trains one global model shared by all clients as FedAvg. For every mid-level, it learns multiple models each assigned to a subgroup of clients, as clustered FL. Every bottom-level model is trained for one client only. In the training objective, each model aims to minimize the residual of the additive predictions by the other models assigned to each client. To approximate the arbitrary structure of non-IID across clients, FeMAM introduces more flexibility and adaptivity to FL by incrementally adding new models to the prediction of each client and reassigning another if necessary, automatically optimizing the knowledge-sharing structure. Extensive experiments show that FeMAM surpasses existing clustered FL and personalized FL methods in various non-IID settings. Our code is available at https://github.com/shutong043/FeMAM. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.04840 [pdf, other]

Federated Adaptation for Foundation Model-based Recommendations

Authors: Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

Abstract: With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while… ▽ More With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while preserving privacy. This paper proposes a novel federated adaptation mechanism to enhance the foundation model-based recommendation system in a privacy-preserving manner. Specifically, each client will learn a lightweight personalized adapter using its private data. The adapter then collaborates with pre-trained foundation models to provide recommendation service efficiently with fine-grained manners. Importantly, users' private behavioral data remains secure as it is not shared with the server. This data localization-based privacy preservation is embodied via the federated learning framework. The model can ensure that shared knowledge is incorporated into all adapters while simultaneously preserving each user's personal preferences. Experimental results on four benchmark datasets demonstrate our method's superior performance. Implementation code is available to ease reproducibility. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted as a regular paper of IJCAI'24

arXiv:2404.10942 [pdf, other]

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Authors: Zhihong Deng, **g Jiang, Guodong Long, Chengqi Zhang

Abstract: In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sou… ▽ More In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair. △ Less

Submitted 28 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 13 pages, 9 figures, accepted by IJCAI 2024

arXiv:2403.19499 [pdf, other]

Client-supervised Federated Learning: Towards One-model-for-all Personalization

Authors: Peng Yan, Guodong Long

Abstract: Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challen… ▽ More Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challenge in the stage of model deployment and test time. This work tackles the challenge by proposing a novel federated learning framework to learn only one robust global model to achieve competitive performance to those personalized models on unseen/test clients in the FL system. Specifically, we design a new Client-Supervised Federated Learning (FedCS) to unravel clients' bias on instances' latent representations so that the global model can learn both client-specific and client-agnostic knowledge. Experimental study shows that the FedCS can learn a robust FL global model for the changing data distributions of unseen/test clients. The FedCS's global model can be directly deployed to the test clients while achieving comparable performance to other personalized FL methods that require model adaptation. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19211 [pdf, other]

Dual-Personalizing Adapter for Federated Foundation Models

Authors: Yiyuan Yang, Guodong Long, Tao Shen, **g Jiang, Michael Blumenstein

Abstract: Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning large amounts of instruction data. Notably, federated foundation models emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. T… ▽ More Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning large amounts of instruction data. Notably, federated foundation models emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. To alleviate communication and computation overhead, parameter-efficient methods are introduced for efficiency, and some research adapted personalization methods to federated foundation models for better user preferences alignment. However, a critical gap in existing research is the neglect of test-time distribution shifts in real-world applications. Therefore, to bridge this gap, we propose a new setting, termed test-time personalization, which not only concentrates on the targeted local task but also extends to other tasks that exhibit test-time distribution shifts. To address challenges in this new setting, we explore a simple yet effective solution to learn a comprehensive foundation model. Specifically, a dual-personalizing adapter architecture (FedDPA) is proposed, comprising a global adapter and a local adapter for addressing test-time distribution shifts and personalization, respectively. Additionally, we introduce an instance-wise dynamic weighting mechanism to optimize the balance between the global and local adapters, enhancing overall performance. The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.03944 [pdf, other]

IMUSE: IMU-based Facial Expression Capture

Authors: Youjia Wang, Yiwen Wu, Hengan Zhou, Hongyang Lin, Xingyue Peng, Yingwenqi Jiang, Yingsheng Zhu, Guanpeng Long, Yatu Zhang, **gya Wang, Lan Xu, **gyi Yu

Abstract: For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSE to fill the gap, a novel path for facial expression capture using purely IMU signals, signi… ▽ More For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSE to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSE is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. The IMUSE framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSE approach. Notably, IMUSE enables various potential and novel applications, i.e., facial capture against occlusions or in a moving performance. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community. △ Less

Submitted 12 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: Go to IMUSE project page https://sites.google.com/view/projectpage-imuse and watch our video https://youtu.be/Rki9syHsvpc

arXiv:2312.03014 [pdf, other]

Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey

Authors: Shengchao Chen, Guodong Long, **g Jiang, Dikai Liu, Chengqi Zhang

Abstract: As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate data… ▽ More As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate data. Cutting-edge performance on specific tasks within narrower spatio-temporal scales has been achieved recently through DL. The rise of large models, specifically large language models (LLMs), has enabled fine-tuning processes that yield remarkable outcomes across various downstream tasks, thereby propelling the advancement of general AI. However, we are still navigating the initial stages of crafting general AI for weather and climate. In this survey, we offer an exhaustive, timely overview of state-of-the-art AI methodologies specifically engineered for weather and climate data, with a special focus on time series and text data. Our primary coverage encompasses four critical aspects: types of weather and climate data, principal model architectures, model scopes and applications, and datasets for weather and climate. Furthermore, in relation to the creation and application of foundation models for weather and climate data understanding, we delve into the field's prevailing challenges, offer crucial insights, and propose detailed avenues for future research. This comprehensive approach equips practitioners with the requisite knowledge to make substantial progress in this domain. Our survey encapsulates the most recent breakthroughs in research on large, data-driven models for weather and climate data understanding, emphasizing robust foundations, current advancements, practical applications, crucial resources, and prospective research opportunities. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Ongoing work. Survey Paper. 35 pages, 2 figures, 4 tables. The first work to comprehensively and systematically summarize DL-based weather and climate data understanding, paving the way for the development of weather and climate foundation models

arXiv:2311.08734 [pdf, other]

Thread of Thought Unraveling Chaotic Contexts

Authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Chongyang Tao, Guodong Long, Jian-Guang Lou, Jianbing Shen

Abstract: Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In r… ▽ More Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In response to these challenges, we introduce the "Thread of Thought" (ThoT) strategy, which draws inspiration from human cognitive processes. ThoT systematically segments and analyzes extended contexts while adeptly selecting pertinent information. This strategy serves as a versatile "plug-and-play" module, seamlessly integrating with various LLMs and prompting techniques. In the experiments, we utilize the PopQA and EntityQ datasets, as well as a Multi-Turn Conversation Response dataset (MTCR) we collected, to illustrate that ThoT significantly improves reasoning performance compared to other prompting techniques. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 11 pages, 7 figures, 5 tables

arXiv:2309.12529 [pdf, other]

Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution

Authors: Shuang Ao, Tianyi Zhou, Guodong Long, Xuan Song, **g Jiang

Abstract: Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In thi… ▽ More Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.06275 [pdf, other]

Re-Reading Improves Reasoning in Large Language Models

Authors: Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou

Abstract: To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing… ▽ More To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. Consequently, Re2 demonstrates strong generality and compatibility with most thought-eliciting prompting methods, including CoT. Crucially, Re2 facilitates a "bidirectional" encoding in unidirectional decoder-only LLMs because the first pass could provide global information for the second pass. We begin with a preliminary empirical study as the foundation of Re2, illustrating its potential to enable "bidirectional" attention mechanisms. We then evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality. Our findings indicate that, with the exception of a few scenarios on vanilla ChatGPT, Re2 consistently enhances the reasoning performance of LLMs through a simple re-reading strategy. Further analyses reveal Re2's adaptability, showing how it can be effectively integrated with different LLMs, thought-eliciting prompting, and ensemble strategies. Our code is available at \url{https://github.com/Tebmer/Rereading-LLM-Reasoning/} △ Less

Submitted 29 February, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 25 pages

arXiv:2307.09905 [pdf, other]

PyTAG: Challenges and Opportunities for Reinforcement Learning in Tabletop Games

Authors: Martin Balla, George E. M. Long, Dominik Jeurissen, James Goodman, Raluca D. Gaina, Diego Perez-Liebana

Abstract: In recent years, Game AI research has made important breakthroughs using Reinforcement Learning (RL). Despite this, RL for modern tabletop games has gained little to no attention, even when they offer a range of unique challenges compared to video games. To bridge this gap, we introduce PyTAG, a Python API for interacting with the Tabletop Games framework (TAG). TAG contains a growing set of more… ▽ More In recent years, Game AI research has made important breakthroughs using Reinforcement Learning (RL). Despite this, RL for modern tabletop games has gained little to no attention, even when they offer a range of unique challenges compared to video games. To bridge this gap, we introduce PyTAG, a Python API for interacting with the Tabletop Games framework (TAG). TAG contains a growing set of more than 20 modern tabletop games, with a common API for AI agents. We present techniques for training RL agents in these games and introduce baseline results after training Proximal Policy Optimisation algorithms on a subset of games. Finally, we discuss the unique challenges complex modern tabletop games provide, now open to RL research through PyTAG. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted for Publication in: IEEE Conference on Games (2023)

arXiv:2307.04365 [pdf, other]

One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

Authors: Haiyan Zhao, Guodong Long

Abstract: Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the model from scratch can pose a practical challenge given the limited resources of each downstream task or device. To tackle this issue, we present a scalable one-s… ▽ More Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the model from scratch can pose a practical challenge given the limited resources of each downstream task or device. To tackle this issue, we present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task. Specifically, we create a score mask using the pruned models of similar tasks to identify task-specific filters/nodes in the pre-trained model for the new task. Based on this mask, we conduct a single round of pruning to extract a suitably-sized sub-network that can quickly adapt to the new task with only a few training iterations. Our experimental analysis demonstrates the effectiveness of the proposed method on the convolutional neural networks (CNNs) and vision transformers (ViT) with various datasets. The proposed method consistently outperforms popular pruning baseline methods in terms of accuracy and efficiency when dealing with diverse downstream tasks with different memory constraints. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2307.01452 [pdf, other]

Causal Reinforcement Learning: A Survey

Authors: Zhihong Deng, **g Jiang, Guodong Long, Chengqi Zhang

Abstract: Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through n… ▽ More Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field. △ Less

Submitted 20 November, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: 52 pages, 10 figures

arXiv:2306.03570 [pdf, other]

Personalization Disentanglement for Federated Learning: An explainable perspective

Authors: Peng Yan, Guodong Long

Abstract: Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentangleme… ▽ More Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentanglement is achieved by a novel Federated Dual Variational Autoencoder (FedDVA), which employs two encoders to infer the two types of representations. FedDVA can produce a better understanding of the trade-off between global knowledge sharing and local personalization in PFL. Moreover, it can be integrated with existing FL methods and turn them into personalized models for heterogeneous downstream tasks. Extensive experiments validate the advantages caused by disentanglement and show that models trained with disentangled representations substantially outperform those vanilla methods. △ Less

Submitted 13 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.01090 [pdf, other]

Improving the Robustness of Summarization Systems with Dual Augmentation

Authors: Xiuying Chen, Guodong Long, Chongyang Tao, Mingzhe Li, Xin Gao, Chengqi Zhang, Xiangliang Zhang

Abstract: A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generati… ▽ More A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models. Experimental results show that state-of-the-art summarization models have a significant decrease in performance on adversarial and noisy test sets. Next, we analyze the vulnerability of the summarization systems and explore improving the robustness by data augmentation. Specifically, the first brittleness factor we found is the poor understanding of infrequent words in the input. Correspondingly, we feed the encoder with more diverse cases created by SummAttacker in the input space. The other factor is in the latent space, where the attacked inputs bring more variations to the hidden states. Hence, we construct adversarial decoder input and devise manifold softmixing operation in hidden space to introduce more diversity. Experimental results on Gigaword and CNN/DM datasets demonstrate that our approach achieves significant improvements over strong baselines and exhibits higher robustness on noisy, attacked, and clean datasets. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 10 pages, 6 figures, ACL 2023 main coference

arXiv:2305.18444 [pdf, other]

Continual Task Allocation in Meta-Policy Network via Sparse Prompting

Authors: Yijun Yang, Tianyi Zhou, **g Jiang, Guodong Long, Yuhui Shi

Abstract: How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly adapt to new tasks (plasticity) meanwhile retaining the common knowledge from previous tasks (stability). We address it by "Continual Task Allocation via Sparse Prompting (CoTASP)", which lea… ▽ More How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly adapt to new tasks (plasticity) meanwhile retaining the common knowledge from previous tasks (stability). We address it by "Continual Task Allocation via Sparse Prompting (CoTASP)", which learns over-complete dictionaries to produce sparse masks as prompts extracting a sub-network for each task from a meta-policy network. CoTASP trains a policy for each task by optimizing the prompts and the sub-network weights alternatively. The dictionary is then updated to align the optimized prompts with tasks' embedding, thereby capturing tasks' semantic correlations. Hence, relevant tasks share more neurons in the meta-policy network due to similar prompts while cross-task interference causing forgetting is effectively restrained. Given a meta-policy and dictionaries trained on previous tasks, new task adaptation reduces to highly efficient sparse prompting and sub-network finetuning. In experiments, CoTASP achieves a promising plasticity-stability trade-off without storing or replaying any past tasks' experiences. It outperforms existing continual and multi-task RL methods on all seen tasks, forgetting reduction, and generalization to unseen tasks. △ Less

Submitted 3 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: Accepted by ICML 2023

arXiv:2305.14244 [pdf, other]

Federated Prompt Learning for Weather Foundation Models on Devices

Authors: Shengchao Chen, Guodong Long, Tao Shen, **g Jiang, Chengqi Zhang

Abstract: On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing, holds significance for supporting human activates. Federated Learning is a promising solution for such forecasting by enabling collaborative model training without sharing raw data. However, it faces three main challenges that hinder its reliability: (1) d… ▽ More On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing, holds significance for supporting human activates. Federated Learning is a promising solution for such forecasting by enabling collaborative model training without sharing raw data. However, it faces three main challenges that hinder its reliability: (1) data heterogeneity among devices due to geographic differences; (2) data homogeneity within individual devices and (3) communication overload from sending large model parameters for collaboration. To address these challenges, this paper propose Federated Prompt Learning for Weather Foundation Models on Devices (FedPoD), which enables devices to obtain highly customized models while maintaining communication efficiency. Concretely, our Adaptive Prompt Tuning leverages lightweight prompts guide frozen foundation model to generate more precise predictions, also conducts prompt-based multi-level communication to encourage multi-source knowledge fusion and regulate optimization. Additionally, Dynamic Graph Modeling constructs graphs from prompts, prioritizing collaborative training among devices with similar data distributions to against heterogeneity. Extensive experiments demonstrates FedPoD leads the performance among state-of-the-art baselines across various setting in real-world on-device weather forecasting datasets. △ Less

Submitted 21 April, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by Main Track in IJCAI'24 (the 33rd International Joint Conference on Artificial Intelligence)

arXiv:2305.12650 [pdf, other]

When Federated Recommendation Meets Cold-Start Problem: Separating Item Attributes and User Interactions

Authors: Chunxu Zhang, Guodong Long, Tianyi Zhou, Zijian Zhang, Peng Yan, Bo Yang

Abstract: Federated recommendation system usually trains a global model on the server without direct access to users' private data on their own devices. However, this separation of the recommendation model and users' private data poses a challenge in providing quality service, particularly when it comes to new items, namely cold-start recommendations in federated settings. This paper introduces a novel meth… ▽ More Federated recommendation system usually trains a global model on the server without direct access to users' private data on their own devices. However, this separation of the recommendation model and users' private data poses a challenge in providing quality service, particularly when it comes to new items, namely cold-start recommendations in federated settings. This paper introduces a novel method called Item-aligned Federated Aggregation (IFedRec) to address this challenge. It is the first research work in federated recommendation to specifically study the cold-start scenario. The proposed method learns two sets of item representations by leveraging item attributes and interaction records simultaneously. Additionally, an item representation alignment mechanism is designed to align two item representations and learn the meta attribute network at the server within a federated learning framework. Experiments on four benchmark datasets demonstrate IFedRec's superior performance for cold-start scenarios. Furthermore, we also verify IFedRec owns good robustness when the system faces limited client participation and noise injection, which brings promising practical application potential in privacy-protection enhanced federated recommendation systems. The implementation code is available △ Less

Submitted 24 February, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: Accepted as a regular paper of WWW'24

arXiv:2305.07866 [pdf, other]

GPFedRec: Graph-guided Personalization for Federated Recommendation

Authors: Chunxu Zhang, Guodong Long, Tianyi Zhou, Zijjian Zhang, Peng Yan, Bo Yang

Abstract: The federated recommendation system is an emerging AI service architecture that provides recommendation services in a privacy-preserving manner. Using user-relation graphs to enhance federated recommendations is a promising topic. However, it is still an open challenge to construct the user-relation graph while preserving data locality-based privacy protection in federated settings. Inspired by a… ▽ More The federated recommendation system is an emerging AI service architecture that provides recommendation services in a privacy-preserving manner. Using user-relation graphs to enhance federated recommendations is a promising topic. However, it is still an open challenge to construct the user-relation graph while preserving data locality-based privacy protection in federated settings. Inspired by a simple motivation, similar users share a similar vision (embeddings) to the same item set, this paper proposes a novel Graph-guided Personalization for Federated Recommendation (GPFedRec). The proposed method constructs a user-relation graph from user-specific personalized item embeddings at the server without accessing the users' interaction records. The personalized item embedding is locally fine-tuned on each device, and then a user-relation graph will be constructed by measuring the similarity among client-specific item embeddings. Without accessing users' historical interactions, we embody the data locality-based privacy protection of vanilla federated learning. Furthermore, a graph-guided aggregation mechanism is designed to leverage the user-relation graph and federated optimization framework simultaneously. Extensive experiments on five benchmark datasets demonstrate GPFedRec's superior performance. The in-depth study validates that GPFedRec can generally improve existing federated recommendation methods as a plugin while kee** user privacy safe. Code is available to ease reproducibility △ Less

Submitted 18 June, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

Comments: Accepted as a regular paper of KDD'24

arXiv:2305.07402 [pdf, other]

Synergistic Interplay between Search and Large Language Models for Information Retrieval

Authors: Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, Daxin Jiang

Abstract: Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language models (LLMs) has further revolutionized the IR field by enabling users to interact with search systems in natural languages. In this paper, we explore the advan… ▽ More Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language models (LLMs) has further revolutionized the IR field by enabling users to interact with search systems in natural languages. In this paper, we explore the advantages and disadvantages of LLMs and RMs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. To leverage the benefits of both paradigms while circumventing their limitations, we propose InteR, a novel framework that facilitates information refinement through synergy between RMs and LLMs. InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections and enables LLMs to enhance prompt formulation using retrieved documents. This iterative refinement process augments the inputs of RMs and LLMs, leading to more accurate retrieval. Experiments on large-scale retrieval benchmarks involving web search and low-resource retrieval tasks demonstrate that InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods, even those using relevance judgment. Source code is available at https://github.com/Cyril-JZ/InteR △ Less

Submitted 12 December, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Pre-print. Work in progress

arXiv:2304.14233 [pdf, other]

Large Language Models are Strong Zero-Shot Retriever

Authors: Tao Shen, Guodong Long, Xiubo Geng, Chongyang Tao, Tianyi Zhou, Daxin Jiang

Abstract: In this work, we propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios. Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM, while breaking brute-force combinations of retrievers with LLMs and lifting the performance of zero-shot retrieval to be very competitive on benchmark datase… ▽ More In this work, we propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios. Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM, while breaking brute-force combinations of retrievers with LLMs and lifting the performance of zero-shot retrieval to be very competitive on benchmark datasets. Essentially, we propose to augment a query with its potential answers by prompting LLMs with a composition of the query and the query's in-domain candidates. The candidates, regardless of correct or wrong, are obtained by a vanilla retrieval procedure on the target collection. As a part of the prompts, they are likely to help LLM generate more precise answers by pattern imitation or candidate summarization. Even if all the candidates are wrong, the prompts at least make LLM aware of in-collection patterns and genres. Moreover, due to the low performance of a self-supervised retriever, the LLM-based query augmentation becomes less effective as the retriever bottlenecks the whole pipeline. Therefore, we propose to leverage a non-parametric lexicon-based method (e.g., BM25) as the retrieval module to capture query-document overlap in a literal fashion. As such, LameR makes the retrieval procedure transparent to the LLM, thus circumventing the performance bottleneck. △ Less

Submitted 1 August, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: Work in progress

arXiv:2304.04158 [pdf, other]

Does Continual Learning Equally Forget All Parameters?

Authors: Haiyan Zhao, Tianyi Zhou, Guodong Long, **g Jiang, Chengqi Zhang

Abstract: Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. Although it can be alleviated by repeatedly replaying buffered data, the every-step replay is time-consuming. In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. Our proposed m… ▽ More Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. Although it can be alleviated by repeatedly replaying buffered data, the every-step replay is time-consuming. In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. Our proposed metrics show that only a few modules are more task-specific and sensitively alter between tasks, while others can be shared across tasks as common knowledge. Hence, we attribute forgetting mainly to the former and find that finetuning them only on a small buffer at the end of any CL method can bring non-trivial improvement. Due to the small number of finetuned parameters, such ``Forgetting Prioritized Finetuning (FPF)'' is efficient in computation. We further propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL. Surprisingly, this ``$k$-FPF'' performs comparably to FPF and outperforms the SOTA CL methods but significantly reduces their computational overhead and cost. In experiments on several benchmarks of class- and domain-incremental CL, FPF consistently improves existing CL methods by a large margin, and $k$-FPF further excels in efficiency without degrading the accuracy. We also empirically studied the impact of buffer size, epochs per task, and finetuning modules on the cost and accuracy of our methods. △ Less

Submitted 9 April, 2023; originally announced April 2023.

arXiv:2302.02173 [pdf, other]

A Survey on Deep Learning based Time Series Analysis with Frequency Transformation

Authors: Kun Yi, Qi Zhang, Longbing Cao, Shou** Wang, Guodong Long, Liang Hu, Hui He, Zhendong Niu, Wei Fan, Hui Xiong

Abstract: Recently, frequency transformation (FT) has been increasingly incorporated into deep learning models to significantly enhance state-of-the-art accuracy and efficiency in time series analysis. The advantages of FT, such as high efficiency and a global view, have been rapidly explored and exploited in various time series tasks and applications, demonstrating the promising potential of FT as a new de… ▽ More Recently, frequency transformation (FT) has been increasingly incorporated into deep learning models to significantly enhance state-of-the-art accuracy and efficiency in time series analysis. The advantages of FT, such as high efficiency and a global view, have been rapidly explored and exploited in various time series tasks and applications, demonstrating the promising potential of FT as a new deep learning paradigm for time series analysis. Despite the growing attention and the proliferation of research in this emerging field, there is currently a lack of a systematic review and in-depth analysis of deep learning-based time series models with FT. It is also unclear why FT can enhance time series analysis and what its limitations in the field are. To address these gaps, we present a comprehensive review that systematically investigates and summarizes the recent research advancements in deep learning-based time series analysis with FT. Specifically, we explore the primary approaches used in current models that incorporate FT, the types of neural networks that leverage FT, and the representative FT-equipped models in deep time series analysis. We propose a novel taxonomy to categorize the existing methods in this field, providing a structured overview of the diverse approaches employed in incorporating FT into deep learning models for time series analysis. Finally, we highlight the advantages and limitations of FT for time series modeling and identify potential future research directions that can further contribute to the community of time series analysis. △ Less

Submitted 15 October, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

arXiv:2301.11560 [pdf, other]

Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

Authors: Haiyan Zhao, Tianyi Zhou, Guodong Long, **g Jiang, Chengqi Zhang

Abstract: As a few large-scale pre-trained models become the major choices of various applications, new challenges arise for model pruning, e.g., can we avoid pruning the same model from scratch for every downstream task? How to reuse the pruning results of previous tasks to accelerate the pruning for a new task? To address these challenges, we create a small model for a new task from the pruned models of s… ▽ More As a few large-scale pre-trained models become the major choices of various applications, new challenges arise for model pruning, e.g., can we avoid pruning the same model from scratch for every downstream task? How to reuse the pruning results of previous tasks to accelerate the pruning for a new task? To address these challenges, we create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We study this ''meta-pruning'' from nearest tasks on two major classes of pre-trained models, convolutional neural network (CNN) and vision transformer (ViT), under a limited budget of pruning iterations. Our study begins by investigating the overlap of pruned models for similar tasks and how the overlap changes over different layers and blocks. Inspired by these discoveries, we develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task by initializing a sub-network from the pruned models of its nearest tasks. In experiments, we demonstrate MVP's advantages in accuracy, efficiency, and generalization through extensive empirical studies and comparisons with popular pruning methods over several datasets. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.11367 [pdf, other]

Style-Aware Contrastive Learning for Multi-Style Image Captioning

Authors: Yucheng Zhou, Guodong Long

Abstract: Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder… ▽ More Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: Findings of EACL 2023

arXiv:2301.11362 [pdf, other]

Improving Cross-modal Alignment for Text-Guided Image Inpainting

Authors: Yucheng Zhou, Guodong Long

Abstract: Text-guided image inpainting (TGII) aims to restore missing regions based on a given text in a damaged image. Existing methods are based on a strong vision encoder and a cross-modal fusion model to integrate cross-modal features. However, these methods allocate most of the computation to visual encoding, while light computation on modeling modality interactions. Moreover, they take cross-modal fus… ▽ More Text-guided image inpainting (TGII) aims to restore missing regions based on a given text in a damaged image. Existing methods are based on a strong vision encoder and a cross-modal fusion model to integrate cross-modal features. However, these methods allocate most of the computation to visual encoding, while light computation on modeling modality interactions. Moreover, they take cross-modal fusion for depth features, which ignores a fine-grained alignment between text and image. Recently, vision-language pre-trained models (VLPM), encapsulating rich cross-modal alignment knowledge, have advanced in most multimodal tasks. In this work, we propose a novel model for TGII by improving cross-modal alignment (CMA). CMA model consists of a VLPM as a vision-language encoder, an image generator and global-local discriminators. To explore cross-modal alignment knowledge for image restoration, we introduce cross-modal alignment distillation and in-sample distribution distillation. In addition, we employ adversarial training to enhance the model to fill the missing region in complicated structures effectively. Experiments are conducted on two popular vision-language datasets. Results show that our model achieves state-of-the-art performance compared with other strong competitors. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: EACL 2023

arXiv:2301.11357 [pdf, other]

Multimodal Event Transformer for Image-guided Story Ending Generation

Authors: Yucheng Zhou, Guodong Long

Abstract: Image-guided story ending generation (IgSEG) is to generate a story ending based on given story plots and ending image. Existing methods focus on cross-modal feature fusion but overlook reasoning and mining implicit information from story plots and ending image. To tackle this drawback, we propose a multimodal event transformer, an event-based reasoning framework for IgSEG. Specifically, we constr… ▽ More Image-guided story ending generation (IgSEG) is to generate a story ending based on given story plots and ending image. Existing methods focus on cross-modal feature fusion but overlook reasoning and mining implicit information from story plots and ending image. To tackle this drawback, we propose a multimodal event transformer, an event-based reasoning framework for IgSEG. Specifically, we construct visual and semantic event graphs from story plots and ending image, and leverage event-based reasoning to reason and mine implicit information in a single modality. Next, we connect visual and semantic event graphs and utilize cross-modal fusion to integrate different-modality features. In addition, we propose a multimodal injector to adaptive pass essential information to decoder. Besides, we present an incoherence detection to enhance the understanding context of a story plot and the robustness of graph modeling for our model. Experimental results show that our method achieves state-of-the-art performance for the image-guided story ending generation. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: EACL 2023

arXiv:2301.09152 [pdf, other]

Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data

Authors: Shengchao Chen, Guodong Long, Tao Shen, **g Jiang

Abstract: To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capa… ▽ More To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors' communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series. △ Less

Submitted 27 May, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: Accepted by IJCAI'23 (32nd International Joint Conference on Artificial Intelligence)

arXiv:2301.09109 [pdf, other]

Federated Recommendation with Additive Personalization

Authors: Zhiwei Li, Guodong Long, Tianyi Zhou

Abstract: Building recommendation systems via federated learning (FL) is a new emerging challenge for advancing next-generation Internet service and privacy protection. Existing approaches train shared item embedding by FL while kee** the user embedding private on client side. However, item embedding identical for all clients cannot capture users' individual differences on perceiving the same item and thu… ▽ More Building recommendation systems via federated learning (FL) is a new emerging challenge for advancing next-generation Internet service and privacy protection. Existing approaches train shared item embedding by FL while kee** the user embedding private on client side. However, item embedding identical for all clients cannot capture users' individual differences on perceiving the same item and thus leads to poor personalization. Moreover, dense item embedding in FL results in expensive communication cost and latency. To address these challenges, we propose Federated Recommendation with Additive Personalization (FedRAP), which learns a global view of items via FL and a personalized view locally on each user. FedRAP enforces sparsity of the global view to save FL's communication cost and encourages difference between the two views through regularization. We propose an effective curriculum to learn the local and global views progressively with increasing regularization weights. To produce recommendations for an user, FedRAP adds the two views together to obtain a personalized item embedding. FedRAP achieves the best performance in FL setting on multiple benchmarks. It outperforms recent federated recommendation methods and several ablation study baselines. △ Less

Submitted 7 February, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: 9 pages, conference

arXiv:2301.08143 [pdf, other]

Dual Personalization on Federated Recommendation

Authors: Chunxu Zhang, Guodong Long, Tianyi Zhou, Peng Yan, Zijian Zhang, Chengqi Zhang, Bo Yang

Abstract: Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-u… ▽ More Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available. △ Less

Submitted 13 May, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Comments: Accepted as a regular paper of IJCAI23

arXiv:2212.10423 [pdf, other]

Fine-Grained Distillation for Long Document Retrieval

Authors: Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Guodong Long, Can Xu, Daxin Jiang

Abstract: Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their… ▽ More Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 13 pages, 5 figures, 5 tables

arXiv:2211.13009 [pdf, other]

Federated Learning on Non-IID Graphs via Structural Knowledge Sharing

Authors: Yue Tan, Yixin Liu, Guodong Long, **g Jiang, Qinghua Lu, Chengqi Zhang

Abstract: Graph neural networks (GNNs) have shown their superiority in modeling graph data. Owing to the advantages of federated learning, federated graph learning (FGL) enables clients to train strong GNN models in a distributed manner without sharing their private data. A core challenge in federated systems is the non-IID problem, which also widely exists in real-world graph data. For example, local data… ▽ More Graph neural networks (GNNs) have shown their superiority in modeling graph data. Owing to the advantages of federated learning, federated graph learning (FGL) enables clients to train strong GNN models in a distributed manner without sharing their private data. A core challenge in federated systems is the non-IID problem, which also widely exists in real-world graph data. For example, local data of clients may come from diverse datasets or even domains, e.g., social networks and molecules, increasing the difficulty for FGL methods to capture commonly shared knowledge and learn a generalized encoder. From real-world graph datasets, we observe that some structural properties are shared by various domains, presenting great potential for sharing structural knowledge in FGL. Inspired by this, we propose FedStar, an FGL framework that extracts and shares the common underlying structure information for inter-graph federated learning tasks. To explicitly extract the structure information rather than encoding them along with the node features, we define structure embeddings and encode them with an independent structure encoder. Then, the structure encoder is shared across clients while the feature-based knowledge is learned in a personalized way, making FedStar capable of capturing more structure-based domain-invariant information and avoiding feature misalignment issues. We perform extensive experiments over both cross-dataset and cross-domain non-IID FGL settings, demonstrating the superiority of FedStar. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.08737 [pdf, other]

doi 10.1007/s11433-022-2057-y

Near-Term Quantum Computing Techniques: Variational Quantum Algorithms, Error Mitigation, Circuit Compilation, Benchmarking and Classical Simulation

Authors: He-Liang Huang, Xiao-Yue Xu, Chu Guo, Guo**g Tian, Shi-Jie Wei, Xiaoming Sun, Wan-Su Bao, Gui-Lu Long

Abstract: Quantum computing is a game-changing technology for global academia, research centers and industries including computational science, mathematics, finance, pharmaceutical, materials science, chemistry and cryptography. Although it has seen a major boost in the last decade, we are still a long way from reaching the maturity of a full-fledged quantum computer. That said, we will be in the Noisy-Inte… ▽ More Quantum computing is a game-changing technology for global academia, research centers and industries including computational science, mathematics, finance, pharmaceutical, materials science, chemistry and cryptography. Although it has seen a major boost in the last decade, we are still a long way from reaching the maturity of a full-fledged quantum computer. That said, we will be in the Noisy-Intermediate Scale Quantum (NISQ) era for a long time, working on dozens or even thousands of qubits quantum computing systems. An outstanding challenge, then, is to come up with an application that can reliably carry out a nontrivial task of interest on the near-term quantum devices with non-negligible quantum noise. To address this challenge, several near-term quantum computing techniques, including variational quantum algorithms, error mitigation, quantum circuit compilation and benchmarking protocols, have been proposed to characterize and mitigate errors, and to implement algorithms with a certain resistance to noise, so as to enhance the capabilities of near-term quantum devices and explore the boundaries of their ability to realize useful applications. Besides, the development of near-term quantum devices is inseparable from the efficient classical simulation, which plays a vital role in quantum algorithm design and verification, error-tolerant verification and other applications. This review will provide a thorough introduction of these near-term quantum computing techniques, report on their progress, and finally discuss the future prospect of these techniques, which we hope will motivate researchers to undertake additional studies in this field. △ Less

Submitted 27 December, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Please feel free to email He-Liang Huang with any comments, questions, suggestions or concerns

Journal ref: Sci. China-Phys. Mech. Astron. 66, 250302 (2023)

arXiv:2211.05987 [pdf, other]

CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class Classification

Authors: Yang Li, Canran Xu, Guodong Long, Tao Shen, Chongyang Tao, **g Jiang

Abstract: Recently, prefix-tuning was proposed to efficiently adapt pre-trained language models to a broad spectrum of natural language classification tasks. It leverages soft prefix as task-specific indicators and language verbalizers as categorical-label mentions to narrow the formulation gap from pre-training language models. However, when the label space increases considerably (i.e., many-class classifi… ▽ More Recently, prefix-tuning was proposed to efficiently adapt pre-trained language models to a broad spectrum of natural language classification tasks. It leverages soft prefix as task-specific indicators and language verbalizers as categorical-label mentions to narrow the formulation gap from pre-training language models. However, when the label space increases considerably (i.e., many-class classification), such a tuning technique suffers from a verbalizer ambiguity problem since the many-class labels are represented by semantic-similar verbalizers in short language phrases. To overcome this, inspired by the human-decision process that the most ambiguous classes would be mulled over for each instance, we propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix), for many-class classification. Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification. We conduct experiments on many-class benchmark datasets in both the fully supervised setting and the few-shot setting, which indicates that our model outperforms former baselines. △ Less

Submitted 12 February, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: has been accepted by EACL 2024

arXiv:2210.15248 [pdf, ps, other]

Unsupervised Knowledge Graph Construction and Event-centric Knowledge Infusion for Scientific NLI

Authors: Chenglin Wang, Yucheng Zhou, Guodong Long, Xiaodong Wang, Xiaowei Xu

Abstract: With the advance of natural language inference (NLI), a rising demand for NLI is to handle scientific texts. Existing methods depend on pre-trained models (PTM) which lack domain-specific knowledge. To tackle this drawback, we introduce a scientific knowledge graph to generalize PTM to scientific domain. However, existing knowledge graph construction approaches suffer from some drawbacks, i.e., ex… ▽ More With the advance of natural language inference (NLI), a rising demand for NLI is to handle scientific texts. Existing methods depend on pre-trained models (PTM) which lack domain-specific knowledge. To tackle this drawback, we introduce a scientific knowledge graph to generalize PTM to scientific domain. However, existing knowledge graph construction approaches suffer from some drawbacks, i.e., expensive labeled data, failure to apply in other domains, long inference time and difficulty extending to large corpora. Therefore, we propose an unsupervised knowledge graph construction method to build a scientific knowledge graph (SKG) without any labeled data. Moreover, to alleviate noise effect from SKG and complement knowledge in sentences better, we propose an event-centric knowledge infusion method to integrate external knowledge into each event that is a fine-grained semantic unit in sentences. Experimental results show that our method achieves state-of-the-art performance and the effectiveness and reliability of SKG. △ Less

Submitted 27 October, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.01855 [pdf, other]

Multifaceted Hierarchical Report Identification for Non-Functional Bugs in Deep Learning Frameworks

Authors: Guoming Long, Tao Chen, Georgina Cosma

Abstract: Non-functional bugs (e.g., performance- or accuracy-related bugs) in Deep Learning (DL) frameworks can lead to some of the most devastating consequences. Reporting those bugs on a repository such as GitHub is a standard route to fix them. Yet, given the growing number of new GitHub reports for DL frameworks, it is intrinsically difficult for developers to distinguish those that reveal non-function… ▽ More Non-functional bugs (e.g., performance- or accuracy-related bugs) in Deep Learning (DL) frameworks can lead to some of the most devastating consequences. Reporting those bugs on a repository such as GitHub is a standard route to fix them. Yet, given the growing number of new GitHub reports for DL frameworks, it is intrinsically difficult for developers to distinguish those that reveal non-functional bugs among the others, and assign them to the right contributor for investigation in a timely manner. In this paper, we propose MHNurf - an end-to-end tool for automatically identifying non-functional bug related reports in DL frameworks. The core of MHNurf is a Multifaceted Hierarchical Attention Network (MHAN) that tackles three unaddressed challenges: (1) learning the semantic knowledge, but doing so by (2) considering the hierarchy (e.g., words/tokens in sentences/statements) and focusing on the important parts (i.e., words, tokens, sentences, and statements) of a GitHub report, while (3) independently extracting information from different types of features, i.e., content, comment, code, command, and label. To evaluate MHNurf, we leverage 3,721 GitHub reports from five DL frameworks for conducting experiments. The results show that MHNurf works the best with a combination of content, comment, and code, which considerably outperforms the classic HAN where only the content is used. MHNurf also produces significantly more accurate results than nine other state-of-the-art classifiers with strong statistical significance, i.e., up to 71% AUC improvement and has the best Scott-Knott rank on four frameworks while 2nd on the remaining one. To facilitate reproduction and promote future research, we have made our dataset, code, and detailed supplementary results publicly available at: https://github.com/ideas-labo/APSEC2022-MHNurf. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: Accepted at APSEC 2022

arXiv:2209.10083 [pdf, other]

Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

Authors: Yue Tan, Guodong Long, Jie Ma, Lu Liu, Tianyi Zhou, **g Jiang

Abstract: Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework wh… ▽ More Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. In this work, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while kee** the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2206.08063 [pdf, other]

Towards Robust Ranker for Text Retrieval

Authors: Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Guodong Long, Binxing Jiao, Daxin Jiang

Abstract: A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranke… ▽ More A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 11 pages of main content, 4 tables, 3 figures

arXiv:2205.11194 [pdf, other]

UnifieR: A Unified Retriever for Large-Scale Retrieval

Authors: Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Guodong Long, Kai Zhang, Daxin Jiang

Abstract: Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms… ▽ More Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability. △ Less

Submitted 4 June, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: To appear at KDD ADS 2023

arXiv:2205.10110 [pdf, other]

FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

Authors: Zhuowei Wang, Tianyi Zhou, Guodong Long, Bo Han, **g Jiang

Abstract: Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices. Hence, the labels in practice are usually annotated by clients of varying expertise or criteria and thus contain different amounts of noises. Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the gl… ▽ More Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices. Hence, the labels in practice are usually annotated by clients of varying expertise or criteria and thus contain different amounts of noises. Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the global model through aggregation. Although recent robust FL methods take malicious clients into account, they have not addressed local noisy labels on each device and the impact to the global model. In this paper, we develop a simple two-level sampling method "FedNoiL" that (1) selects clients for more robust global aggregation on the server; and (2) selects clean labels and correct pseudo-labels at the client end for more robust local training. The sampling probabilities are built upon clean label detection by the global model. Moreover, we investigate different schedules changing the local epochs between aggregations over the course of FL, which notably improves the communication and computation efficiency in noisy label setting. In experiments with homogeneous/heterogeneous data distributions and noise ratios, we observed that direct combinations of SOTA FL methods with SOTA noisy-label learning methods can easily fail but our method consistently achieves better and robust performance. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: 12 pages

arXiv:2204.10562 [pdf, other]

doi 10.1109/INFOCOM48880.2022.9796787

Efficient Pipeline Planning for Expedited Distributed DNN Training

Authors: Ziyue Luo, Xiaodong Yi, Guo** Long, Shiqing Fan, Chuan Wu, Jun Yang, Wei Lin

Abstract: To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple versions of model parameters to co-exist (similar to asynchronous training), and cannot ensure the same model convergence and accuracy performance as without pipelini… ▽ More To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple versions of model parameters to co-exist (similar to asynchronous training), and cannot ensure the same model convergence and accuracy performance as without pipelining. Synchronous pipelining has recently been proposed which ensures model performance by enforcing a synchronization barrier between training iterations. Nonetheless, the synchronization barrier requires waiting for gradient aggregation from all microbatches and thus delays the training progress. Optimized pipeline planning is needed to minimize such wait and hence the training time, which has not been well studied in the literature. This paper designs efficient, near-optimal algorithms for expediting synchronous pipeline-parallel training of modern large DNNs over arbitrary inter-GPU connectivity. Our algorithm framework comprises two components: a pipeline partition and device map** algorithm, and a pipeline scheduler that decides processing order of microbatches over the partitions, which together minimize the per-iteration training time. We conduct thorough theoretical analysis, extensive testbed experiments and trace-driven simulation, and demonstrate our scheme can accelerate training up to 157% compared with state-of-the-art designs. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: INFOCOM 2022

arXiv:2204.07893 [pdf, other]

On Reporting Performance and Accuracy Bugs for Deep Learning Frameworks: An Exploratory Study from GitHub

Authors: Guoming Long, Tao Chen

Abstract: The tremendous success of Deep Learning (DL) has significantly boosted the number of open-sourced DL frameworks hosted on GitHub. Among others, performance and accuracy bugs are critical factors that affect the reputation of these DL frameworks, therefore understanding the practice of discovering and investigating them for DL is important. In this paper, we conduct an exploratory study on the natu… ▽ More The tremendous success of Deep Learning (DL) has significantly boosted the number of open-sourced DL frameworks hosted on GitHub. Among others, performance and accuracy bugs are critical factors that affect the reputation of these DL frameworks, therefore understanding the practice of discovering and investigating them for DL is important. In this paper, we conduct an exploratory study on the nature of reporting performance and accuracy bugs bugs for DL frameworks, aiming to improve our knowledge on this topic. Our study covers 10 most popular open-sourced DL frameworks on GitHub (e.g., TensorFlow, Keras, and PyTorch), based on which we sample 664 representative performance and accuracy bugs bug reports out of a total population of 22,522. Through systematic analysis of these samples, our key findings are: (1) low speed is the primary reason that a performance bug related report is submitted but we see no consistent pattern for accuracy related ones; (2) most of the reports are about issues encountered in the training stage; (3) only a small proportion of the reports provide insufficient information to investigate; (4) the majority of the performance and accuracy bugs bug reports (from 69% to 100%) are not related to the actual bug or regarded as unclassified; (5) around 50% of the performance and accuracy bug reports, which indeed reveal bugs, are not resolved by direct patches. Deriving from the above, we discuss a set of actionable implications to the researchers, maintainers, and report submitters on this subject. To promote open science, the labeled dataset has been made publicly available at https://tinyurl.com/4x3tap9w. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted at EASE 2022

arXiv:2204.05126 [pdf, other]

General Hamiltonian Representation of ML Detection Relying on the Quantum Approximate Optimization Algorithm

Authors: **g**g Cui, Gui Lu Long, Lajos Hanzo

Abstract: The quantum approximate optimization algorithm (QAOA) conceived for solving combinatorial optimization problems has attracted significant interest since it can be run on the existing noisy intermediate-scale quantum (NISQ) devices. A primary step of using the QAOA is the efficient Hamiltonian construction based on different problem instances. Hence, we solve the maximum likelihood (ML) detection p… ▽ More The quantum approximate optimization algorithm (QAOA) conceived for solving combinatorial optimization problems has attracted significant interest since it can be run on the existing noisy intermediate-scale quantum (NISQ) devices. A primary step of using the QAOA is the efficient Hamiltonian construction based on different problem instances. Hence, we solve the maximum likelihood (ML) detection problem for general constellations by appropriately adapting the QAOA, which gives rise to a new paradigm in communication systems. We first transform the ML detection problem into a weighted minimum $N$-satisfiability (WMIN-$N$-SAT) problem, where we formulate the objective function of the WMIN-$N$-SAT as a pseudo Boolean function. Furthermore, we formalize the connection between the degree of the objective function and the Gray-labelled modulation constellations. Explicitly, we show a series of results exploring the connection between the coefficients of the monomials and the patterns of the associated constellation points, which substantially simplifies the objective function with respect to the problem Hamiltonian of the QAOA. In particular, for an M-ary Gray-mapped quadrature amplitude modulation (MQAM) constellation, we show that the specific qubits encoding the in-phase components and those encoding the quadrature components are independent in the quantum system of interest, which allows the in-phase and quadrature components to be detected separately using the QAOA. Furthermore, we characterize the degree of the objective function in the WMIN-$N$-SAT problem corresponding to the ML detection of multiple-input and multiple-output (MIMO) channels. Finally, we evaluate the approximation ratio of the QAOA for the ML detection problem of quadrature phase shift keying (QPSK) relying on QAOA circuits of different depths. △ Less

Submitted 11 April, 2022; originally announced April 2022.

arXiv:2203.02225 [pdf, other]

ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification

Authors: Yucheng Zhou, Tao Shen, Xiubo Geng, Guodong Long, Daxin Jiang

Abstract: Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning. To achieve this, we propose three novel event-cen… ▽ More Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning. To achieve this, we propose three novel event-centric objectives, i.e., whole event recovering, contrastive event-correlation encoding and prompt-based event locating, which highlight event-level correlations with effective training. The proposed ClarET is applicable to a wide range of event-centric reasoning scenarios, considering its versatility of (i) event-correlation types (e.g., causal, temporal, contrast), (ii) application formulations (i.e., generation and classification), and (iii) reasoning types (e.g., abductive, counterfactual and ending reasoning). Empirical fine-tuning results, as well as zero- and few-shot learning, on 9 benchmarks (5 generation and 4 classification tasks covering 4 reasoning types with diverse event correlations), verify its effectiveness and generalization ability. △ Less

Submitted 9 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: ACL 2022 camera-ready version

arXiv:2203.00829 [pdf, other]

Personalized Federated Learning With Graph

Authors: Fengwen Chen, Guodong Long, Zonghan Wu, Tianyi Zhou, **g Jiang

Abstract: Knowledge sharing and model personalization are two key components in the conceptual framework of personalized federated learning (PFL). Existing PFL methods focus on proposing new model personalization mechanisms while simply implementing knowledge sharing by aggregating models from all clients, regardless of their relation graph. This paper aims to enhance the knowledge-sharing process in PFL by… ▽ More Knowledge sharing and model personalization are two key components in the conceptual framework of personalized federated learning (PFL). Existing PFL methods focus on proposing new model personalization mechanisms while simply implementing knowledge sharing by aggregating models from all clients, regardless of their relation graph. This paper aims to enhance the knowledge-sharing process in PFL by leveraging the graph-based structural information among clients. We propose a novel structured federated learning (SFL) framework to learn both the global and personalized models simultaneously using client-wise relation graphs and clients' private data. We cast SFL with graph into a novel optimization problem that can model the client-wise complex relations and graph-based structural topology by a unified framework. Moreover, in addition to using an existing relation graph, SFL could be expanded to learn the hidden relations among clients. Experiments on traffic and image benchmark datasets can demonstrate the effectiveness of the proposed method. All implementation codes are available on Github △ Less

Submitted 30 April, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Showing 1–50 of 123 results for author: Long, G