Search | arXiv e-print repository

Collaborative Performance Prediction for Large Language Models

Authors: Qiyuan Zhang, Fuyuan Lyu, Xue Liu, Chen Ma

Abstract: Comprehensively understanding and accurately predicting the performance of large language models across diverse downstream tasks has emerged as a pivotal challenge in NLP research. The pioneering scaling law on downstream works demonstrated intrinsic similarities within model families and utilized such similarities for performance prediction. However, they tend to overlook the similarities between… ▽ More Comprehensively understanding and accurately predicting the performance of large language models across diverse downstream tasks has emerged as a pivotal challenge in NLP research. The pioneering scaling law on downstream works demonstrated intrinsic similarities within model families and utilized such similarities for performance prediction. However, they tend to overlook the similarities between model families and only consider design factors listed in the original scaling law. To overcome these limitations, we introduce a novel framework, Collaborative Performance Prediction (CPP), which significantly enhances prediction accuracy by leveraging the historical performance of various models on downstream tasks and other design factors for both model and task. We also collect a collaborative data sourced from online platforms containing both historical performance and additional design factors. With the support of the collaborative data, CPP not only surpasses traditional scaling laws in predicting the performance of scaled LLMs but also facilitates a detailed analysis of factor importance, an area previously overlooked. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.02609 [pdf, other]

Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation

Authors: Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang

Abstract: Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient ad… ▽ More Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.03335 by other authors

arXiv:2405.17054 [pdf, other]

Improving Data-aware and Parameter-aware Robustness for Continual Learning

Authors: Hanxi Xiao, Fan Lyu

Abstract: The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parame… ▽ More The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parameter-aware robustness of CL, proposing a Robust Continual Learning (RCL) method. From the data perspective, we develop a contrastive loss based on the concepts of uniformity and alignment, forming a feature distribution that is more applicable to outliers. From the parameter perspective, we present a forward strategy for worst-case perturbation and apply robust gradient projection to the parameters. The experimental results on three benchmarks show that the proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results. The code is available at: https://github.com/HanxiXiao/RCL △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14602 [pdf, other]

Controllable Continual Test-Time Adaptation

Authors: Ziqi Shi, Fan Lyu, Ye Liu, Fanhua Shang, Fuyuan Hu, Wei Feng, Zhang Zhang, Liang Wang

Abstract: Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppr… ▽ More Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppressing domain shifts, which proves inadequate during the unsupervised test phase. In contrast, we introduce a novel approach that guides rather than suppresses these shifts. Specifically, we propose $\textbf{C}$ontrollable $\textbf{Co}$ntinual $\textbf{T}$est-$\textbf{T}$ime $\textbf{A}$daptation (C-CoTTA), which explicitly prevents any single category from encroaching on others, thereby mitigating the mutual influence between categories caused by uncontrollable shifts. Moreover, our method reduces the sensitivity of model to domain transformations, thereby minimizing the magnitude of category shifts. Extensive quantitative experiments demonstrate the effectiveness of our method, while qualitative analyses, such as t-SNE plots, confirm the theoretical validity of our approach. △ Less

Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.09133 [pdf, other]

Overcoming Domain Drift in Online Continual Learning

Authors: Fan Lyu, Daofeng Liu, Linglan Zhao, Zhang Zhang, Fanhua Shang, Fuyuan Hu, Wei Feng, Liang Wang

Abstract: Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential lea… ▽ More Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential learning tasks may entail the gradual displacement of the decision boundaries in the learned feature space, rendering the learned knowledge susceptible to forgetting. To address the above problem, in this paper, we propose a novel rehearsal strategy, termed Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects. First, we propose to select memory for more representative samples guided by constructed centroids in a data stream. Then, to keep the model from domain chaos in drifting, a two-level angular cross-task Contrastive Margin Loss (CML) is proposed, to encourage the intra-class and intra-task compactness, and increase the inter-class and inter-task discrepancy. Finally, to further suppress the continual domain drift, we present an optional Centorid Distillation Loss (CDL) on the rehearsal memory to anchor the knowledge in feature space for each previous old task. Extensive experimental results on four benchmark datasets validate that the proposed DRR can effectively mitigate the continual domain drift and achieve the state-of-the-art (SOTA) performance in OCL. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.07200 [pdf, other]

Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

Authors: Shaoxiang Qin, Fuyuan Lyu, Wenhui Peng, Dingyang Geng, Ju Wang, Nai** Gao, Xue Liu, Liangzhu Leon Wang

Abstract: In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinc… ▽ More In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinct low-frequency bias, which limits FNO's effectiveness in learning high-frequency information from PDE data. To tackle this challenge, we introduce SpecBoost, an ensemble learning framework that employs multiple FNOs to better capture high-frequency information. Specifically, a secondary FNO is utilized to learn the overlooked high-frequency information from the prediction residual of the initial FNO. Experiments demonstrate that SpecBoost noticeably enhances FNO's prediction accuracy on diverse PDE applications, achieving an up to 71% improvement. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.17442 [pdf, other]

Touch the Core: Exploring Task Dependence Among Hybrid Targets for Recommendation

Authors: Xing Tang, Yang Qiao, Fuyuan Lyu, Dugang Liu, Xiuqiang He

Abstract: As user behaviors become complicated on business platforms, online recommendations focus more on how to touch the core conversions, which are highly related to the interests of platforms. These core conversions are usually continuous targets, such as \textit{watch time}, \textit{revenue}, and so on, whose predictions can be enhanced by previous discrete conversion actions. Therefore, multi-task le… ▽ More As user behaviors become complicated on business platforms, online recommendations focus more on how to touch the core conversions, which are highly related to the interests of platforms. These core conversions are usually continuous targets, such as \textit{watch time}, \textit{revenue}, and so on, whose predictions can be enhanced by previous discrete conversion actions. Therefore, multi-task learning (MTL) can be adopted as the paradigm to learn these hybrid targets. However, existing works mainly emphasize investigating the sequential dependence among discrete conversion actions, which neglects the complexity of dependence between discrete conversions and the final continuous conversion. Moreover, simultaneously optimizing hybrid tasks with stronger task dependence will suffer from volatile issues where the core regression task might have a larger influence on other tasks. In this paper, we study the MTL problem with hybrid targets for the first time and propose the model named Hybrid Targets Learning Network (HTLNet) to explore task dependence and enhance optimization. Specifically, we introduce label embedding for each task to explicitly transfer the label information among these tasks, which can effectively explore logical task dependence. We also further design the gradient adjustment regime between the final regression task and other classification tasks to enhance the optimization. Extensive experiments on two offline public datasets and one real-world industrial dataset are conducted to validate the effectiveness of HTLNet. Moreover, online A/B tests on the financial recommender system also show our model has superior improvement. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.12559 [pdf, other]

Confidence Self-Calibration for Multi-Label Class-Incremental Learning

Authors: Kaile Du, Yifan Zhou, Fan Lyu, Yuyang Li, Chen Lu, Guangcan Liu

Abstract: The partial label challenge in Multi-Label Class-Incremental Learning (MLCIL) arises when only the new classes are labeled during training, while past and future labels remain unavailable. This issue leads to a proliferation of false-positive errors due to erroneously high confidence multi-label predictions, exacerbating catastrophic forgetting within the disjoint label space. In this paper, we ai… ▽ More The partial label challenge in Multi-Label Class-Incremental Learning (MLCIL) arises when only the new classes are labeled during training, while past and future labels remain unavailable. This issue leads to a proliferation of false-positive errors due to erroneously high confidence multi-label predictions, exacerbating catastrophic forgetting within the disjoint label space. In this paper, we aim to refine multi-label confidence calibration in MLCIL and propose a Confidence Self-Calibration (CSC) approach. Firstly, for label relationship calibration, we introduce a class-incremental graph convolutional network that bridges the isolated label spaces by constructing learnable, dynamically extended label relationship graph. Then, for confidence calibration, we present a max-entropy regularization for each multi-label increment, facilitating confidence self-calibration through the penalization of over-confident output distributions. Our approach attains new state-of-the-art results in MLCIL tasks on both MS-COCO and PASCAL VOC datasets, with the calibration of label confidences confirmed through our methodology. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.18609 [pdf, other]

ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Authors: Tianze Yang, Tianyi Yang, Fuyuan Lyu, Shaoshan Liu, Xue, Liu

Abstract: This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, which is among the first works that melds large language models (LLMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LLMs within an evolutionary fram… ▽ More This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, which is among the first works that melds large language models (LLMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LLMs within an evolutionary framework, significantly improving FS through the model's comprehensive world knowledge and its adaptability to a variety of roles. Our evaluation of this methodology spans three crucial MPA tasks: stroke, cardiovascular disease, and diabetes, where ICE-SEARCH outperforms traditional FS methods in pinpointing essential features for medical applications. ICE-SEARCH achieves State-of-the-Art (SOTA) performance in stroke prediction and diabetes prediction; the Decision-Randomized ICE-SEARCH ranks as SOTA in cardiovascular disease prediction. The study emphasizes the critical role of incorporating domain-specific insights, illustrating ICE-SEARCH's robustness, generalizability, and convergence. This opens avenues for further research into comprehensive and intricate FS landscapes, marking a significant stride in the application of artificial intelligence in medical predictive analytics. △ Less

Submitted 8 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.08182 [pdf, other]

Variational Continual Test-Time Adaptation

Authors: Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Zhang Zhang, Guangcan Liu, Liang Wang

Abstract: The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data, as it can cause significant error propagation. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA. At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy,… ▽ More The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data, as it can cause significant error propagation. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA. At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy, injecting uncertainties into the model. During the testing time, we employ a mean-teacher update strategy using variational inference for the student model and exponential moving average for the teacher model. Our novel approach updates the student model by combining priors from both the source and teacher models. The evidence lower bound is formulated as the cross-entropy between the student and teacher models, along with the Kullback-Leibler (KL) divergence of the prior mixture. Experimental results on three datasets demonstrate the method's effectiveness in mitigating prior drift within the CTTA framework. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.01054 [pdf, other]

Elastic Multi-Gradient Descent for Parallel Continual Learning

Authors: Fan Lyu, Wei Feng, Yuepan Li, Qing Sun, Fanhua Shang, Liang Wan, Liang Wang

Abstract: The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks. Previously studied CL assumes that data are given in sequence nose-to-tail for different tasks, thus indeed belonging to Serial Continual Learning (SCL). This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse… ▽ More The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks. Previously studied CL assumes that data are given in sequence nose-to-tail for different tasks, thus indeed belonging to Serial Continual Learning (SCL). This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse set of tasks is encountered at different time points. PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress, leading to the difficulty of guaranteeing effective model updates for all encountered tasks. In our previous conference work, we focused on measuring and reducing the discrepancy among gradients in a multi-objective optimization problem, which, however, may still contain negative transfers in every model update. To address this issue, in the dynamic multi-objective optimization problem, we introduce task-specific elastic factors to adjust the descent direction towards the Pareto front. The proposed method, called Elastic Multi-Gradient Descent (EMGD), ensures that each update follows an appropriate Pareto descent direction, minimizing any negative impact on previously learned tasks. To balance the training between old and new tasks, we also propose a memory editing mechanism guided by the gradient computed using EMGD. This editing process updates the stored data points, reducing interference in the Pareto descent direction from previous tasks. Experiments on public datasets validate the effectiveness of our EMGD in the PCL setting. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Submited to IEEE TPAMI

arXiv:2311.03526 [pdf, other]

Towards Automated Negative Sampling in Implicit Recommendation

Authors: Fuyuan Lyu, Yaochen Hu, Xing Tang, Yingxue Zhang, Ruiming Tang, Xue Liu

Abstract: Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies are orthogonal to the recommendation model and implicit datasets. However, such an idea contradicts the common belief in AutoML that the model and dataset shou… ▽ More Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies are orthogonal to the recommendation model and implicit datasets. However, such an idea contradicts the common belief in AutoML that the model and dataset should be matched. Empirical experiments suggest that the best-performing negative sampler depends on the implicit dataset and the specific recommendation model. Hence, we propose a hypothesis that the negative sampler should align with the capacity of the recommendation models as well as the statistics of the datasets to achieve optimal performance. A mismatch between these three would result in sub-optimal outcomes. An intuitive idea to address the mismatch problem is to exhaustively select the best-performing negative sampler given the model and dataset. However, such an approach is computationally expensive and time-consuming, leaving the problem unsolved. In this work, we propose the AutoSample framework that adaptively selects the best-performing negative sampler among candidates. Specifically, we propose a loss-to-instance approximation to transform the negative sampler search task into the learning task over a weighted sum, enabling end-to-end training of the model. We also designed an adaptive search algorithm to extensively and efficiently explore the search space. A specific initialization approach is also obtained to better utilize the obtained model parameters during the search stage, which is similar to curriculum learning and leads to better performance and less computation resource consumption. We evaluate the proposed framework on four benchmarks over three models. Extensive experiments demonstrate the effectiveness and efficiency of our proposed framework. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.20490 [pdf, other]

Long-Tailed Learning as Multi-Objective Optimization

Authors: Weiqi Li, Fan Lyu, Fanhua Shang, Liang Wan, Wei Feng

Abstract: Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models that are biased towards classes with sufficient samples and perform poorly on rare classes. Recent methods propose to rebalance classes but they undertake the seesaw dilemma (what is increasing performance on tail classes may decrease that of head classes, and vice versa). In this paper, we argue t… ▽ More Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models that are biased towards classes with sufficient samples and perform poorly on rare classes. Recent methods propose to rebalance classes but they undertake the seesaw dilemma (what is increasing performance on tail classes may decrease that of head classes, and vice versa). In this paper, we argue that the seesaw dilemma is derived from gradient imbalance of different classes, in which gradients of inappropriate classes are set to important for updating, thus are prone to overcompensation or undercompensation on tail classes. To achieve ideal compensation, we formulate the long-tailed recognition as an multi-objective optimization problem, which fairly respects the contributions of head and tail classes simultaneously. For efficiency, we propose a Gradient-Balancing Grou** (GBG) strategy to gather the classes with similar gradient directions, thus approximately make every update under a Pareto descent direction. Our GBG method drives classes with similar gradient directions to form more representative gradient and provide ideal compensation to the tail classes. Moreover, We conduct extensive experiments on commonly used benchmarks in long-tailed learning and demonstrate the superiority of our method over existing SOTA methods. △ Less

Submitted 1 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: In submission

arXiv:2310.20268 [pdf, other]

Constructing Sample-to-Class Graph for Few-Shot Class-Incremental Learning

Authors: Fuyuan Hu, Jian Zhang, Fan Lyu, Linyan Li, Fenglei Xu

Abstract: Few-shot class-incremental learning (FSCIL) aims to build machine learning model that can continually learn new concepts from a few data samples, without forgetting knowledge of old classes. The challenges of FSCIL lies in the limited data of new classes, which not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. As proved in early… ▽ More Few-shot class-incremental learning (FSCIL) aims to build machine learning model that can continually learn new concepts from a few data samples, without forgetting knowledge of old classes. The challenges of FSCIL lies in the limited data of new classes, which not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. As proved in early studies, building sample relationships is beneficial for learning from few-shot samples. In this paper, we promote the idea to the incremental scenario, and propose a Sample-to-Class (S2C) graph learning method for FSCIL. Specifically, we propose a Sample-level Graph Network (SGN) that focuses on analyzing sample relationships within a single session. This network helps aggregate similar samples, ultimately leading to the extraction of more refined class-level features. Then, we present a Class-level Graph Network (CGN) that establishes connections across class-level features of both new and old classes. This network plays a crucial role in linking the knowledge between different sessions and helps improve overall learning in the FSCIL scenario. Moreover, we design a multi-stage strategy for training S2C model, which mitigates the training challenges posed by limited data in the incremental process. The multi-stage training strategy is designed to build S2C graph from base to few-shot stages, and improve the capacity via an extra pseudo-incremental stage. Experiments on three popular benchmark datasets show that our method clearly outperforms the baselines and sets new state-of-the-art results in FSCIL. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.19113 [pdf, other]

Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision

Authors: Jiayao Tan, Fan Lyu, Linyan Li, Fuyuan Hu, Tingliang Feng, Fenglei Xu, Rui Yao

Abstract: Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we… ▽ More Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we propose to build V2X perception from road-to-vehicle vision and present Adaptive Road-to-Vehicle Perception (AR2VP) method. In AR2VP,we leverage roadside units to offer stable, wide-range sensing capabilities and serve as communication hubs. AR2VP is devised to tackle both intra-scene and inter-scene changes. For the former, we construct a dynamic perception representing module, which efficiently integrates vehicle perceptions, enabling vehicles to capture a more comprehensive range of dynamic factors within the scene.Moreover, we introduce a road-to-vehicle perception compensating module, aimed at preserving the maximized roadside unit perception information in the presence of intra-scene changes.For inter-scene changes, we implement an experience replay mechanism leveraging the roadside unit's storage capacity to retain a subset of historical scene data, maintaining model robustness in response to inter-scene shifts. We conduct perception experiment on 3D object detection and segmentation, and the results show that AR2VP excels in both performance-bandwidth trade-offs and adaptability within dynamic environments. △ Less

Submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.15342 [pdf, other]

Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network

Authors: Fuyuan Lyu, Xing Tang, Dugang Liu, Chen Ma, Weihong Luo, Liang Chen, Xiuqiang He, Xue Liu

Abstract: Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-gra… ▽ More Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method. △ Less

Submitted 30 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 poster

arXiv:2306.13382 [pdf, other]

OptMSM: Optimizing Multi-Scenario Modeling for Click-Through Rate Prediction

Authors: Xing Tang, Yang Qiao, Yuwen Fu, Fuyuan Lyu, Dugang Liu, Xiuqiang He

Abstract: A large-scale industrial recommendation platform typically consists of multiple associated scenarios, requiring a unified click-through rate (CTR) prediction model to serve them simultaneously. Existing approaches for multi-scenario CTR prediction generally consist of two main modules: i) a scenario-aware learning module that learns a set of multi-functional representations with scenario-shared an… ▽ More A large-scale industrial recommendation platform typically consists of multiple associated scenarios, requiring a unified click-through rate (CTR) prediction model to serve them simultaneously. Existing approaches for multi-scenario CTR prediction generally consist of two main modules: i) a scenario-aware learning module that learns a set of multi-functional representations with scenario-shared and scenario-specific information from input features, and ii) a scenario-specific prediction module that serves each scenario based on these representations. However, most of these approaches primarily focus on improving the former module and neglect the latter module. This can result in challenges such as increased model parameter size, training difficulty, and performance bottlenecks for each scenario. To address these issues, we propose a novel framework called OptMSM (\textbf{Opt}imizing \textbf{M}ulti-\textbf{S}cenario \textbf{M}odeling). First, we introduce a simplified yet effective scenario-enhanced learning module to alleviate the aforementioned challenges. Specifically, we partition the input features into scenario-specific and scenario-shared features, which are mapped to specific information embedding encodings and a set of shared information embeddings, respectively. By imposing an orthogonality constraint on the shared information embeddings to facilitate the disentanglement of shared information corresponding to each scenario, we combine them with the specific information embeddings to obtain multi-functional representations. Second, we introduce a scenario-specific hypernetwork in the scenario-specific prediction module to capture interactions within each scenario more effectively, thereby alleviating the performance bottlenecks. Finally, we conduct extensive offline experiments and an online A/B test to demonstrate the effectiveness of OptMSM. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: Accepted by ECML-PKDD 2023 Applied Data Science Track

arXiv:2306.00315 [pdf, other]

Explicit Feature Interaction-aware Uplift Network for Online Marketing

Authors: Dugang Liu, Xing Tang, Han Gao, Fuyuan Lyu, Xiuqiang He

Abstract: As a key component in online marketing, uplift modeling aims to accurately capture the degree to which different treatments motivate different users, such as coupons or discounts, also known as the estimation of individual treatment effect (ITE). In an actual business scenario, the options for treatment may be numerous and complex, and there may be correlations between different treatments. In add… ▽ More As a key component in online marketing, uplift modeling aims to accurately capture the degree to which different treatments motivate different users, such as coupons or discounts, also known as the estimation of individual treatment effect (ITE). In an actual business scenario, the options for treatment may be numerous and complex, and there may be correlations between different treatments. In addition, each marketing instance may also have rich user and contextual features. However, existing methods still fall short in both fully exploiting treatment information and mining features that are sensitive to a particular treatment. In this paper, we propose an explicit feature interaction-aware uplift network (EFIN) to address these two problems. Our EFIN includes four customized modules: 1) a feature encoding module encodes not only the user and contextual features, but also the treatment features; 2) a self-interaction module aims to accurately model the user's natural response with all but the treatment features; 3) a treatment-aware interaction module accurately models the degree to which a particular treatment motivates a user through interactions between the treatment features and other features, i.e., ITE; and 4) an intervention constraint module is used to balance the ITE distribution of users between the control and treatment groups so that the model would still achieve a accurate uplift ranking on data collected from a non-random intervention marketing scenario. We conduct extensive experiments on two public datasets and one product dataset to verify the effectiveness of our EFIN. In addition, our EFIN has been deployed in a credit card bill payment scenario of a large online financial platform with a significant improvement. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: Accepted by SIGKDD 2023 Applied Data Science Track

arXiv:2303.13862 [pdf, other]

Two-level Graph Network for Few-Shot Class-Incremental Learning

Authors: Hao Chen, Linyan Li, Fan Lyu, Fuyuan Hu, Fenglei Xu

Abstract: Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points, without forgetting knowledge of old classes. The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. However, existing FSCIL metho… ▽ More Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points, without forgetting knowledge of old classes. The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. However, existing FSCIL methods ignore the semantic relationships between sample-level and class-level. % Using the advantage that graph neural network (GNN) can mine rich information among few samples, In this paper, we designed a two-level graph network for FSCIL named Sample-level and Class-level Graph Neural Network (SCGN). Specifically, a pseudo incremental learning paradigm is designed in SCGN, which synthesizes virtual few-shot tasks as new tasks to optimize SCGN model parameters in advance. Sample-level graph network uses the relationship of a few samples to aggregate similar samples and obtains refined class-level features. Class-level graph network aims to mitigate the semantic conflict between prototype features of new classes and old classes. SCGN builds two-level graph networks to guarantee the latent semantic of each few-shot class can be effectively represented in FSCIL. Experiments on three popular benchmark datasets show that our method significantly outperforms the baselines and sets new state-of-the-art results with remarkable advantages. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.06953 by other authors

arXiv:2303.02954 [pdf, other]

Centroid Distance Distillation for Effective Rehearsal in Continual Learning

Authors: Daofeng Liu, Fan Lyu, Linyan Li, Fuyuan Hu

Abstract: Rehearsal, retraining on a stored small data subset of old tasks, has been proven effective in solving catastrophic forgetting in continual learning. However, due to the sampled data may have a large bias towards the original dataset, retraining them is susceptible to driving continual domain drift of old tasks in feature space, resulting in forgetting. In this paper, we focus on tackling the cont… ▽ More Rehearsal, retraining on a stored small data subset of old tasks, has been proven effective in solving catastrophic forgetting in continual learning. However, due to the sampled data may have a large bias towards the original dataset, retraining them is susceptible to driving continual domain drift of old tasks in feature space, resulting in forgetting. In this paper, we focus on tackling the continual domain drift problem with centroid distance distillation. First, we propose a centroid caching mechanism for sampling data points based on constructed centroids to reduce the sample bias in rehearsal. Then, we present a centroid distance distillation that only stores the centroid distance to reduce the continual domain drift. The experiments on four continual learning datasets show the superiority of the proposed method, and the continual domain drift can be reduced. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.02241 [pdf, other]

Feature Representation Learning for Click-through Rate Prediction: A Review and New Perspectives

Authors: Fuyuan Lyu, Xing Tang, Dugang Liu, Haolun Wu, Chen Ma, Xiuqiang He, Xue Liu

Abstract: Representation learning has been a critical topic in machine learning. In Click-through Rate Prediction, most features are represented as embedding vectors and learned simultaneously with other parameters in the model. With the development of CTR models, feature representation learning has become a trending topic and has been extensively studied by both industrial and academic researchers in recen… ▽ More Representation learning has been a critical topic in machine learning. In Click-through Rate Prediction, most features are represented as embedding vectors and learned simultaneously with other parameters in the model. With the development of CTR models, feature representation learning has become a trending topic and has been extensively studied by both industrial and academic researchers in recent years. This survey aims at summarizing the feature representation learning in a broader picture and pave the way for future research. To achieve such a goal, we first present a taxonomy of current research methods on feature representation learning following two main issues: (i) which feature to represent and (ii) how to represent these features. Then we give a detailed description of each method regarding these two issues. Finally, the review concludes with a discussion on the future directions of this field. △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: Submitted to IJCAI 2023 Survey Track

arXiv:2301.10909 [pdf, other]

Optimizing Feature Set for Click-Through Rate Prediction

Authors: Fuyuan Lyu, Xing Tang, Dugang Liu, Liang Chen, Xiuqiang He, Xue Liu

Abstract: Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both feature and its interaction. However, most previous works focus on either feature field selection or only select feature interaction… ▽ More Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both feature and its interaction. However, most previous works focus on either feature field selection or only select feature interaction based on the fixed feature set to produce the feature set. The former restricts search space to the feature field, which is too coarse to determine subtle features. They also do not filter useless feature interactions, leading to higher computation costs and degraded model performance. The latter identifies useful feature interaction from all available features, resulting in many redundant features in the feature set. In this paper, we propose a novel method named OptFS to address these problems. To unify the selection of feature and its interaction, we decompose the selection of each feature interaction into the selection of two correlated features. Such a decomposition makes the model end-to-end trainable given various feature interaction operations. By adopting feature-level search space, we set a learnable gate to determine whether each feature should be within the feature set. Because of the large-scale search space, we develop a learning-by-continuation training scheme to learn such gates. Hence, OptFS generates the feature set only containing features which improve the final prediction results. Experimentally, we evaluate OptFS on three public datasets, demonstrating OptFS can optimize feature sets which enhance the model performance and further reduce both the storage and computational cost. △ Less

Submitted 26 March, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Accepted by WWW 2023 Research Tracks

arXiv:2212.14464 [pdf, other]

Result Diversification in Search and Recommendation: A Survey

Authors: Haolun Wu, Yansen Zhang, Chen Ma, Fuyuan Lyu, Bowei He, Bhaskar Mitra, Xue Liu

Abstract: Diversifying return results is an important research topic in retrieval systems in order to satisfy both the various interests of customers and the equal market exposure of providers. There has been growing attention on diversity-aware research during recent years, accompanied by a proliferation of literature on methods to promote diversity in search and recommendation. However, diversity-aware st… ▽ More Diversifying return results is an important research topic in retrieval systems in order to satisfy both the various interests of customers and the equal market exposure of providers. There has been growing attention on diversity-aware research during recent years, accompanied by a proliferation of literature on methods to promote diversity in search and recommendation. However, diversity-aware studies in retrieval systems lack a systematic organization and are rather fragmented. In this survey, we are the first to propose a unified taxonomy for classifying the metrics and approaches of diversification in both search and recommendation, which are two of the most extensively researched fields of retrieval systems. We begin the survey with a brief discussion of why diversity is important in retrieval systems, followed by a summary of the various diversity concerns in search and recommendation, highlighting their relationship and differences. For the survey's main body, we present a unified taxonomy of diversification metrics and approaches in retrieval systems, from both the search and recommendation perspectives. In the later part of the survey, we discuss the open research questions of diversity-aware research in search and recommendation in an effort to inspire future innovations and encourage the implementation of diversity in real-world systems. △ Less

Submitted 18 February, 2024; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: 20 pages

arXiv:2211.14763 [pdf, other]

Multi-Label Continual Learning using Augmented Graph Convolutional Network

Authors: Kaile Du, Fan Lyu, Linyan Li, Fuyuan Hu, Wei Feng, Fenglei Xu, Xuefeng Xi, Han**g Cheng

Abstract: Multi-Label Continual Learning (MLCL) builds a class-incremental framework in a sequential multi-label image recognition data stream. The critical challenges of MLCL are the construction of label relationships on past-missing and future-missing partial labels of training data and the catastrophic forgetting on old classes, resulting in poor generalization. To solve the problems, the study proposes… ▽ More Multi-Label Continual Learning (MLCL) builds a class-incremental framework in a sequential multi-label image recognition data stream. The critical challenges of MLCL are the construction of label relationships on past-missing and future-missing partial labels of training data and the catastrophic forgetting on old classes, resulting in poor generalization. To solve the problems, the study proposes an Augmented Graph Convolutional Network (AGCN++) that can construct the cross-task label relationships in MLCL and sustain catastrophic forgetting. First, we build an Augmented Correlation Matrix (ACM) across all seen classes, where the intra-task relationships derive from the hard label statistics. In contrast, the inter-task relationships leverage hard and soft labels from data and a constructed expert network. Then, we propose a novel partial label encoder (PLE) for MLCL, which can extract dynamic class representation for each partial label image as graph nodes and help generate soft labels to create a more convincing ACM and suppress forgetting. Last, to suppress the forgetting of label dependencies across old tasks, we propose a relationship-preserving constrainter to construct label relationships. The inter-class topology can be augmented automatically, which also yields effective class representations. The proposed method is evaluated using two multi-label image benchmarks. The experimental results show that the proposed way is effective for MLCL image recognition and can build convincing correlations across tasks even if the labels of previous tasks are missing. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2210.10581 [pdf, other]

CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises

Authors: Peipei Liu, Hong Li, Zhiyu Wang, Yimo Ren, Jie Liu, Fei Lyu, Hongsong Zhu, Limin Sun

Abstract: Enterprise relation extraction aims to detect pairs of enterprise entities and identify the business relations between them from unstructured or semi-structured text data, and it is crucial for several real-world applications such as risk analysis, rating research and supply chain security. However, previous work mainly focuses on getting attribute information about enterprises like personnel and… ▽ More Enterprise relation extraction aims to detect pairs of enterprise entities and identify the business relations between them from unstructured or semi-structured text data, and it is crucial for several real-world applications such as risk analysis, rating research and supply chain security. However, previous work mainly focuses on getting attribute information about enterprises like personnel and corporate business, and pays little attention to enterprise relation extraction. To encourage further progress in the research, we introduce the CEntRE, a new dataset constructed from publicly available business news data with careful human annotation and intelligent data processing. Extensive experiments on CEntRE with six excellent models demonstrate the challenges of our proposed dataset. △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2209.12241 [pdf, other]

Exploring Example Influence in Continual Learning

Authors: Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan

Abstract: Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks). Due to the fact that past training data is not available, it is valuable to explore the influence difference on S and P among training examples, which may improve the learning pattern towards better SP. Inspired by… ▽ More Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks). Due to the fact that past training data is not available, it is valuable to explore the influence difference on S and P among training examples, which may improve the learning pattern towards better SP. Inspired by Influence Function (IF), we first study example influence via adding perturbation to example weight and computing the influence derivation. To avoid the storage and calculation burden of Hessian inverse in neural networks, we propose a simple yet effective MetaSP algorithm to simulate the two key steps in the computation of IF and obtain the S- and P-aware example influence. Moreover, we propose to fuse two kinds of example influence by solving a dual-objective optimization problem, and obtain a fused influence towards SP Pareto optimality. The fused influence can be used to control the update of model and optimize the storage of rehearsal. Empirical results show that our algorithm significantly outperforms state-of-the-art methods on both task- and class-incremental benchmark CL datasets. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2208.04482 [pdf, other]

doi 10.1145/3511808.3557411

OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction

Authors: Fuyuan Lyu, Xing Tang, Hong Zhu, Huifeng Guo, Yingxue Zhang, Ruiming Tang, Xue Liu

Abstract: Learning embedding table plays a fundamental role in Click-through rate(CTR) prediction from the view of the model performance and memory usage. The embedding table is a two-dimensional tensor, with its axes indicating the number of feature values and the embedding dimension, respectively. To learn an efficient and effective embedding table, recent works either assign various embedding dimensions… ▽ More Learning embedding table plays a fundamental role in Click-through rate(CTR) prediction from the view of the model performance and memory usage. The embedding table is a two-dimensional tensor, with its axes indicating the number of feature values and the embedding dimension, respectively. To learn an efficient and effective embedding table, recent works either assign various embedding dimensions for feature fields and reduce the number of embeddings respectively or mask the embedding table parameters. However, all these existing works cannot get an optimal embedding table. On the one hand, various embedding dimensions still require a large amount of memory due to the vast number of features in the dataset. On the other hand, decreasing the number of embeddings usually suffers from performance degradation, which is intolerable in CTR prediction. Finally, pruning embedding parameters will lead to a sparse embedding table, which is hard to be deployed. To this end, we propose an optimal embedding table learning framework OptEmbed, which provides a practical and general method to find an optimal embedding table for various base CTR models. Specifically, we propose pruning the redundant embeddings regarding corresponding features' importance by learnable pruning thresholds. Furthermore, we consider assigning various embedding dimensions as one single candidate architecture. To efficiently search the optimal embedding dimensions, we design a uniform embedding dimension sampling scheme to equally train all candidate architectures, meaning architecture-related parameters and learnable thresholds are trained simultaneously in one supernet. We then propose an evolution search method based on the supernet to find the optimal embedding dimensions for each field. Experiments on public datasets show that OptEmbed can learn a compact embedding table which can further improve the model performance. △ Less

Submitted 6 September, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: Accepted by CIKM 2022 Research Track

arXiv:2207.07840 [pdf, other]

Class-Incremental Lifelong Learning in Multi-Label Classification

Authors: Kaile Du, Linyan Li, Fan Lyu, Fuyuan Hu, Fenglei Xu

Abstract: Existing class-incremental lifelong learning studies only the data is with single-label, which limits its adaptation to multi-label data. This paper studies Lifelong Multi-Label (LML) classification, which builds an online class-incremental classifier in a sequential multi-label classification data stream. Training on the data with Partial Labels in LML classification may result in more serious Ca… ▽ More Existing class-incremental lifelong learning studies only the data is with single-label, which limits its adaptation to multi-label data. This paper studies Lifelong Multi-Label (LML) classification, which builds an online class-incremental classifier in a sequential multi-label classification data stream. Training on the data with Partial Labels in LML classification may result in more serious Catastrophic Forgetting in old classes. To solve the problem, the study proposes an Augmented Graph Convolutional Network (AGCN) with a built Augmented Correlation Matrix (ACM) across sequential partial-label tasks. The results of two benchmarks show that the method is effective for LML classification and reducing forgetting. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2203.05534

arXiv:2203.10480 [pdf, other]

Encoder-Decoder Architecture for Supervised Dynamic Graph Learning: A Survey

Authors: Yuecai Zhu, Fuyuan Lyu, Chengming Hu, Xi Chen, Xue Liu

Abstract: In recent years, the prevalent online services generate a sheer volume of user activity data. Service providers collect these data in order to perform client behavior analysis, and offer better and more customized services. Majority of these data can be modeled and stored as graph, such as the social graph in Facebook, user-video interaction graph in Youtube. These graphs need to evolve over time… ▽ More In recent years, the prevalent online services generate a sheer volume of user activity data. Service providers collect these data in order to perform client behavior analysis, and offer better and more customized services. Majority of these data can be modeled and stored as graph, such as the social graph in Facebook, user-video interaction graph in Youtube. These graphs need to evolve over time to capture the dynamics in the real world, leading to the invention of dynamic graphs. However, the temporal information embedded in the dynamic graphs brings new challenges in analyzing and deploying them. Events staleness, temporal information learning and explicit time dimension usage are some example challenges in dynamic graph learning. In order to offer a convenient reference to both the industry and academia, this survey presents the Three Stages Recurrent Temporal Learning Framework based on dynamic graph evolution theories, so as to interpret the learning of temporal information with a generalized framework. Under this framework, this survey categories and reviews different learnable encoder-decoder architectures for supervised dynamic graph learning. We believe that this survey could supply useful guidelines to researchers and engineers in finding suitable graph structures for their dynamic learning tasks. △ Less

Submitted 27 March, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: Optimize title for better visibility

arXiv:2203.05534 [pdf, other]

AGCN: Augmented Graph Convolutional Network for Lifelong Multi-label Image Recognition

Authors: Kaile Du, Fan Lyu, Fuyuan Hu, Linyan Li, Wei Feng, Fenglei Xu, Qiming Fu

Abstract: The Lifelong Multi-Label (LML) image recognition builds an online class-incremental classifier in a sequential multi-label image recognition data stream. The key challenges of LML image recognition are the construction of label relationships on Partial Labels of training data and the Catastrophic Forgetting on old classes, resulting in poor generalization. To solve the problems, the study proposes… ▽ More The Lifelong Multi-Label (LML) image recognition builds an online class-incremental classifier in a sequential multi-label image recognition data stream. The key challenges of LML image recognition are the construction of label relationships on Partial Labels of training data and the Catastrophic Forgetting on old classes, resulting in poor generalization. To solve the problems, the study proposes an Augmented Graph Convolutional Network (AGCN) model that can construct the label relationships across the sequential recognition tasks and sustain the catastrophic forgetting. First, we build an Augmented Correlation Matrix (ACM) across all seen classes, where the intra-task relationships derive from the hard label statistics while the inter-task relationships leverage both hard and soft labels from data and a constructed expert network. Then, based on the ACM, the proposed AGCN captures label dependencies with dynamic augmented structure and yields effective class representations. Last, to suppress the forgetting of label dependencies across old tasks, we propose a relationship-preserving loss as a constraint to the construction of label relationships. The proposed method is evaluated using two multi-label image benchmarks and the experimental results show that the proposed method is effective for LML image recognition and can build convincing correlation across tasks even if the labels of previous tasks are missing. Our code is available at https://github.com/Kaile-Du/AGCN. △ Less

Submitted 10 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: Accpted in ICME 2022

arXiv:2110.11159 [pdf, other]

Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing

Authors: Liuqing Zhao, Fan Lyu, Fuyuan Hu, Kaizhu Huang, Fenglei Xu, Linyan Li

Abstract: Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enha… ▽ More Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentence-based image editing with multiple attributes on CUB and COCO dataset. Our code is available at https://github.com/Zlq2021/CA-GAN △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: Accepted by BMVC 2021

arXiv:2108.01265 [pdf, other]

Memorize, Factorize, or be Naïve: Learning Optimal Feature Interaction Methods for CTR Prediction

Authors: Fuyuan Lyu, Xing Tang, Huifeng Guo, Ruiming Tang, Xiuqiang He, Rui Zhang, Xue Liu

Abstract: Click-through rate prediction is one of the core tasks in commercial recommender systems. It aims to predict the probability of a user clicking a particular item given user and item features. As feature interactions bring in non-linearity, they are widely adopted to improve the performance of CTR prediction models. Therefore, effectively modelling feature interactions has attracted much attention… ▽ More Click-through rate prediction is one of the core tasks in commercial recommender systems. It aims to predict the probability of a user clicking a particular item given user and item features. As feature interactions bring in non-linearity, they are widely adopted to improve the performance of CTR prediction models. Therefore, effectively modelling feature interactions has attracted much attention in both the research and industry field. The current approaches can generally be categorized into three classes: (1) naïve methods, which do not model feature interactions and only use original features; (2) memorized methods, which memorize feature interactions by explicitly viewing them as new features and assigning trainable embeddings; (3) factorized methods, which learn latent vectors for original features and implicitly model feature interactions through factorization functions. Studies have shown that modelling feature interactions by one of these methods alone are suboptimal due to the unique characteristics of different feature interactions. To address this issue, we first propose a general framework called OptInter which finds the most suitable modelling method for each feature interaction. Different state-of-the-art deep CTR models can be viewed as instances of OptInter. To realize the functionality of OptInter, we also introduce a learning algorithm that automatically searches for the optimal modelling method. We conduct extensive experiments on four large datasets. Our experiments show that OptInter improves the best performed state-of-the-art baseline deep CTR models by up to 2.21%. Compared to the memorized method, which also outperforms baselines, we reduce up to 91% parameters. In addition, we conduct several ablation studies to investigate the influence of different components of OptInter. Finally, we provide interpretable discussions on the results of OptInter. △ Less

Submitted 24 November, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Published in ICDE 2022

arXiv:2106.08605 [pdf, other]

doi 10.1109/TMM.2021.3089017

Disentangling Semantic-to-visual Confusion for Zero-shot Learning

Authors: Zihan Ye, Fuyuan Hu, Fan Lyu, Linyan Li, Kaizhu Huang

Abstract: Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled represent… ▽ More Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multimodal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets. Our code is available at https://github.com/FouriYe/DCRGAN-TMM. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE TRANSACTIONS ON MULTIMEDIA (TMM) in 2021

arXiv:2102.06528 [pdf, other]

A Tale of Two Countries: A Longitudinal Cross-Country Study of Mobile Users' Reactions to the COVID-19 Pandemic Through the Lens of App Popularity

Authors: Liu Wang, Haoyu Wang, Yi Wang, Gareth Tyson, Fei Lyu

Abstract: The ongoing COVID-19 pandemic has profoundly impacted people's life around the world, including how they interact with mobile technologies. In this paper, we seek to develop an understanding of how the dynamic trajectory of a pandemic shapes mobile phone users' experiences. Through the lens of app popularity, we approach this goal from a cross-country perspective. We compile a dataset consisting o… ▽ More The ongoing COVID-19 pandemic has profoundly impacted people's life around the world, including how they interact with mobile technologies. In this paper, we seek to develop an understanding of how the dynamic trajectory of a pandemic shapes mobile phone users' experiences. Through the lens of app popularity, we approach this goal from a cross-country perspective. We compile a dataset consisting of six-month daily snapshots of the most popular apps in the iOS App Store in China and the US, where the pandemic has exhibited distinct trajectories. Using this longitudinal dataset, our analysis provides detailed patterns of app ranking during the pandemic at both category and individual app levels. We reveal that app categories' rankings are correlated with the pandemic, contingent upon country-specific development trajectories. Our work offers rich insights into how the COVID-19, a typical global public health crisis, has influence people's day-to-day interaction with the Internet and mobile technologies. △ Less

Submitted 30 March, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

arXiv:2012.13662 [pdf, other]

Coarse to Fine: Multi-label Image Classification with Global/Local Attention

Authors: Fan Lyu, Fuyuan Hu, Victor S. Sheng, Zhengtian Wu, Qiming Fu, Baochuan Fu

Abstract: In our daily life, the scenes around us are always with multiple labels especially in a smart city, i.e., recognizing the information of city operation to response and control. Great efforts have been made by using Deep Neural Networks to recognize multi-label images. Since multi-label image classification is very complicated, people seek to use the attention mechanism to guide the classification… ▽ More In our daily life, the scenes around us are always with multiple labels especially in a smart city, i.e., recognizing the information of city operation to response and control. Great efforts have been made by using Deep Neural Networks to recognize multi-label images. Since multi-label image classification is very complicated, people seek to use the attention mechanism to guide the classification process. However, conventional attention-based methods always analyzed images directly and aggressively. It is difficult for them to well understand complicated scenes. In this paper, we propose a global/local attention method that can recognize an image from coarse to fine by mimicking how human-beings observe images. Specifically, our global/local attention method first concentrates on the whole image, and then focuses on local specific objects in the image. We also propose a joint max-margin objective function, which enforces that the minimum score of positive labels should be larger than the maximum score of negative labels horizontally and vertically. This function can further improve our multi-label image classification method. We evaluate the effectiveness of our method on two popular multi-label image datasets (i.e., Pascal VOC and MS-COCO). Our experimental results show that our method outperforms state-of-the-art methods. △ Less

Submitted 25 December, 2020; originally announced December 2020.

Comments: Accepted by IEEE International Smart Cities Conference 2018

arXiv:2012.07236 [pdf, other]

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Authors: Fan Lyu, Shuai Wang, Wei Feng, Zihan Ye, Fuyuan Hu, Song Wang

Abstract: Rehearsal, seeking to remind the model by storing old knowledge in lifelong learning, is one of the most effective ways to mitigate catastrophic forgetting, i.e., biased forgetting of previous knowledge when moving to new tasks. However, the old tasks of the most previous rehearsal-based methods suffer from the unpredictable domain shift when training the new task. This is because these methods al… ▽ More Rehearsal, seeking to remind the model by storing old knowledge in lifelong learning, is one of the most effective ways to mitigate catastrophic forgetting, i.e., biased forgetting of previous knowledge when moving to new tasks. However, the old tasks of the most previous rehearsal-based methods suffer from the unpredictable domain shift when training the new task. This is because these methods always ignore two significant factors. First, the Data Imbalance between the new task and old tasks that makes the domain of old tasks prone to shift. Second, the Task Isolation among all tasks will make the domain shift toward unpredictable directions; To address the unpredictable domain shift, in this paper, we propose Multi-Domain Multi-Task (MDMT) rehearsal to train the old tasks and new task parallelly and equally to break the isolation among tasks. Specifically, a two-level angular margin loss is proposed to encourage the intra-class/task compactness and inter-class/task discrepancy, which keeps the model from domain chaos. In addition, to further address domain shift of the old tasks, we propose an optional episodic distillation loss on the memory to anchor the knowledge for each old task. Experiments on benchmark datasets validate the proposed approach can effectively mitigate the unpredictable domain shift. △ Less

Submitted 13 December, 2020; originally announced December 2020.

Comments: Accepted by AAAI 2021

arXiv:2010.01471 [pdf, ps, other]

Deep Reinforcement Learning for Delay-Oriented IoT Task Scheduling in Space-Air-Ground Integrated Network

Authors: Conghao Zhou, Wen Wu, Hongli He, Peng Yang, Feng Lyu, Nan Cheng, Xuemin, Shen

Abstract: In this paper, we investigate a computing task scheduling problem in space-air-ground integrated network (SAGIN) for delay-oriented Internet of Things (IoT) services. In the considered scenario, an unmanned aerial vehicle (UAV) collects computing tasks from IoT devices and then makes online offloading decisions, in which the tasks can be processed at the UAV or offloaded to the nearby base station… ▽ More In this paper, we investigate a computing task scheduling problem in space-air-ground integrated network (SAGIN) for delay-oriented Internet of Things (IoT) services. In the considered scenario, an unmanned aerial vehicle (UAV) collects computing tasks from IoT devices and then makes online offloading decisions, in which the tasks can be processed at the UAV or offloaded to the nearby base station or the remote satellite. Our objective is to design a task scheduling policy that minimizes offloading and computing delay of all tasks given the UAV energy capacity constraint. To this end, we first formulate the online scheduling problem as an energy-constrained Markov decision process (MDP). Then, considering the task arrival dynamics, we develop a novel deep risk-sensitive reinforcement learning algorithm. Specifically, the algorithm evaluates the risk, which measures the energy consumption that exceeds the constraint, for each state and searches the optimal parameter weighing the minimization of delay and risk while learning the optimal policy. Extensive simulation results demonstrate that the proposed algorithm can reduce the task processing delay by up to 30% compared to probabilistic configuration methods while satisfying the UAV energy capacity constraint. △ Less

Submitted 3 October, 2020; originally announced October 2020.

Comments: 14 pages, 8 figures

arXiv:2007.00857 [pdf, other]

doi 10.1109/TVT.2020.3000757

Efficient Hybrid Beamforming with Anti-Blockage Design for High-Speed Railway Communications

Authors: Meilin Gao, Bo Ai, Yong Niu, Wen Wu, Peng Yang, Feng Lyu, Xuemin, Shen

Abstract: Future railway is expected to accommodate both train operation services and passenger broadband services. The millimeter wave (mmWave) communication is a promising technology in providing multi-gigabit data rates to onboard users. However, mmWave communications suffer from severe propagation attenuation and vulnerability to blockage, which can be very challenging in high-speed railway (HSR) scenar… ▽ More Future railway is expected to accommodate both train operation services and passenger broadband services. The millimeter wave (mmWave) communication is a promising technology in providing multi-gigabit data rates to onboard users. However, mmWave communications suffer from severe propagation attenuation and vulnerability to blockage, which can be very challenging in high-speed railway (HSR) scenarios. In this paper, we investigate efficient hybrid beamforming (HBF) design for train-to-ground communications. First, we develop a two-stage HBF algorithm in blockage-free scenarios. In the first stage, the minimum mean square error method is adopted for optimal hybrid beamformer design with low complexity and fast convergence; in the second stage, the orthogonal matching pursuit method is utilized to approximately recover the analog and digital beamformers. Second, in blocked scenarios, we design an anti-blockage scheme by adaptively invoking the proposed HBF algorithm, which can efficiently deal with random blockages. Extensive simulation results are presented to show the sum rate performance of the proposed algorithms under various configurations, including transmission power, velocity of the train, blockage probability, etc. It is demonstrated that the proposed anti-blockage algorithm can improve the effective rate by 20% in severely-blocked scenarios while maintaining low outage probability. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: 11 Pages, 9 Figures

Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020

arXiv:2005.09153 [pdf, other]

Associating Multi-Scale Receptive Fields for Fine-grained Recognition

Authors: Zihan Ye, Fuyuan Hu, Yin Liu, Fan Lyu, Pengqing Liu

Abstract: Extracting and fusing part features have become the key of fined-grained image recognition. Recently, Non-local (NL) module has shown excellent improvement in image recognition. However, it lacks the mechanism to model the interactions between multi-scale part features, which is vital for fine-grained recognition. In this paper, we propose a novel cross-layer non-local (CNL) module to associate mu… ▽ More Extracting and fusing part features have become the key of fined-grained image recognition. Recently, Non-local (NL) module has shown excellent improvement in image recognition. However, it lacks the mechanism to model the interactions between multi-scale part features, which is vital for fine-grained recognition. In this paper, we propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations. First, CNL computes correlations between features of a query layer and all response layers. Second, all response features are weighted according to the correlations and are added to the query features. Due to the interactions of cross-layer features, our model builds spatial dependencies among multi-level layers and learns more discriminative features. In addition, we can reduce the aggregation cost if we set low-dimensional deep layer as query layer. Experiments are conducted to show our model achieves or surpasses state-of-the-art results on three benchmark datasets of fine-grained classification. Our codes can be found at github.com/FouriYe/CNL-ICIP2020. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: Accepted by ICIP2020

arXiv:2005.09034 [pdf, other]

Cross-filter compression for CNN inference acceleration

Authors: Fuyuan Lyu, Shien Zhu, Weichen Liu

Abstract: Convolution neural network demonstrates great capability for multiple tasks, such as image classification and many others. However, much resource is required to train a network. Hence much effort has been made to accelerate neural network by reducing precision of weights, activation, and gradient. However, these filter-wise quantification methods exist a natural upper limit, caused by the size of… ▽ More Convolution neural network demonstrates great capability for multiple tasks, such as image classification and many others. However, much resource is required to train a network. Hence much effort has been made to accelerate neural network by reducing precision of weights, activation, and gradient. However, these filter-wise quantification methods exist a natural upper limit, caused by the size of the kernel. Meanwhile, with the popularity of small kernel, the natural limit further decrease. To address this issue, we propose a new cross-filter compression method that can provide $\sim32\times$ memory savings and $122\times$ speed up in convolution operations. In our method, all convolution filters are quantized to given bits and spatially adjacent filters share the same scaling factor. Our compression method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset with widely used network structures, such as ResNet and VGG, and witness tolerable accuracy loss compared to state-of-the-art quantification methods. △ Less

Submitted 18 May, 2020; originally announced May 2020.

arXiv:2003.10663 [pdf, ps, other]

Modeling Cross-view Interaction Consistency for Paired Egocentric Interaction Recognition

Authors: Zhongguo Li, Fan Lyu, Wei Feng, Song Wang

Abstract: With the development of Augmented Reality (AR), egocentric action recognition (EAR) plays important role in accurately understanding demands from the user. However, EAR is designed to help recognize human-machine interaction in single egocentric view, thus difficult to capture interactions between two face-to-face AR users. Paired egocentric interaction recognition (PEIR) is the task to collaborat… ▽ More With the development of Augmented Reality (AR), egocentric action recognition (EAR) plays important role in accurately understanding demands from the user. However, EAR is designed to help recognize human-machine interaction in single egocentric view, thus difficult to capture interactions between two face-to-face AR users. Paired egocentric interaction recognition (PEIR) is the task to collaboratively recognize the interactions between two persons with the videos in their corresponding views. Unfortunately, existing PEIR methods always directly use linear decision function to fuse the features extracted from two corresponding egocentric videos, which ignore consistency of interaction in paired egocentric videos. The consistency of interactions in paired videos, and features extracted from them are correlated to each other. On top of that, we propose to build the relevance between two views using biliear pooling, which capture the consistency of two views in feature-level. Specifically, each neuron in the feature maps from one view connects to the neurons from another view, which guarantee the compact consistency between two views. Then all possible paired neurons are used for PEIR for the inside consistent information of them. To be efficient, we use compact bilinear pooling with Count Sketch to avoid directly computing outer product in bilinear. Experimental results on dataset PEV shows the superiority of the proposed methods on the task PEIR. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: ICME2020

arXiv:2003.08027 [pdf, other]

MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension

Authors: Shuai Wang, Fan Lyu, Wei Feng, Song Wang

Abstract: Referring expression comprehension (REC) aims to localize a text-related region in a given image by a referring expression in natural language. Existing methods focus on how to build convincing visual and language representations independently, which may significantly isolate visual and language information. In this paper, we argue that for REC the referring expression and the target region are se… ▽ More Referring expression comprehension (REC) aims to localize a text-related region in a given image by a referring expression in natural language. Existing methods focus on how to build convincing visual and language representations independently, which may significantly isolate visual and language information. In this paper, we argue that for REC the referring expression and the target region are semantically correlated and subject, location and relationship consistency exist between vision and language.On top of this, we propose a novel approach called MutAtt to construct mutual guidance between vision and language, which treat vision and language equally thus yield compact information matching. Specifically, for each module of subject, location and relationship, MutAtt builds two kinds of attention-based mutual guidance strategies. One strategy is to generate vision-guided language embedding for the sake of matching relevant visual feature. The other reversely generates language-guided visual feature to match relevant language embedding. This mutual guidance strategy can effectively guarantees the vision-language consistency in three modules. Experiments on three popular REC datasets demonstrate that the proposed approach outperforms the current state-of-the-art methods. △ Less

Submitted 20 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

Comments: 6 pages, Accepted by ICME-2020

arXiv:1906.00777 [pdf, other]

doi 10.1109/TVT.2019.2925629

Multi-Drone 3D Trajectory Planning and Scheduling in Drone Assisted Radio Access Networks

Authors: Weisen Shi, Junlng Li, Nan Cheng, Feng Lyu, Shan Zhang, Haibo Zhou, Xuemin Shen

Abstract: Drone base station (DBS) is a promising technique to extend wireless connections for uncovered users of terrestrial radio access networks (RAN). To improve user fairness and network performance, in this paper, we design 3D trajectories of multiple DBSs in the drone assisted radio access networks (DA-RAN) where DBSs fly over associated areas of interests (AoIs) and relay communications between the… ▽ More Drone base station (DBS) is a promising technique to extend wireless connections for uncovered users of terrestrial radio access networks (RAN). To improve user fairness and network performance, in this paper, we design 3D trajectories of multiple DBSs in the drone assisted radio access networks (DA-RAN) where DBSs fly over associated areas of interests (AoIs) and relay communications between the base station (BS) and users in AoIs. We formulate the multi-DBS 3D trajectory planning and scheduling as a mixed integer non-linear programming (MINLP) problem with the objective of minimizing the average DBS-to-user (D2U) pathloss. The 3D trajectory variations in both horizontal and vertical directions, as well as the state-of-the-art DBS-related channel models are considered in the formulation. To address the non-convexity and NP-hardness of the MINLP problem, we first decouple it into multiple integer linear programming (ILP) and quasi-convex sub-problems in which AoI association, D2U communication scheduling, horizontal trajectories and flying heights of DBSs are respectively optimized. Then, we design a multi-DBS 3D trajectory planning and scheduling algorithm to solve the sub-problems iteratively based on the block coordinate descent (BCD) method. A k-means-based initial trajectory generation and a search-based start slot scheduling are considered in the proposed algorithm to improve trajectory design performance and ensure inter-DBS distance constraint, respectively. Extensive simulations are conducted to investigate the impacts of DBS quantity, horizontal speed and initial trajectory on the trajectory planning results. Compared with the static DBS deployment, the proposed trajectory planning can achieve 10-15 dB reduction on average D2U pathloss, and reduce the D2U pathloss standard deviation by 68%, which indicate the improvements of network performance and user fairness. △ Less

Submitted 22 October, 2019; v1 submitted 29 May, 2019; originally announced June 2019.

Comments: Published on IEEE TVT

Journal ref: IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 8145-8158, Aug. 2019

arXiv:1906.00776 [pdf, other]

doi 10.1109/ICC.2019.8761719

3D Multi-Drone-Cell Trajectory Design for Efficient IoT Data Collection

Authors: Weisen Shi, Junling Li, Nan Cheng, Feng Lyu, Yanpeng Dai, Haibo Zhou, Xuemin, Shen

Abstract: Drone cell (DC) is an emerging technique to offer flexible and cost-effective wireless connections to collect Internet-of-things (IoT) data in uncovered areas of terrestrial networks. The flying trajectory of DC significantly impacts the data collection performance. However, designing the trajectory is a challenging issue due to the complicated 3D mobility of DC, unique DC-to-ground (D2G) channel… ▽ More Drone cell (DC) is an emerging technique to offer flexible and cost-effective wireless connections to collect Internet-of-things (IoT) data in uncovered areas of terrestrial networks. The flying trajectory of DC significantly impacts the data collection performance. However, designing the trajectory is a challenging issue due to the complicated 3D mobility of DC, unique DC-to-ground (D2G) channel features, limited DC-to-BS (D2B) backhaul link quality, etc. In this paper, we propose a 3D DC trajectory design for the DC-assisted IoT data collection where multiple DCs periodically fly over IoT devices and relay the IoT data to the base stations (BSs). The trajectory design is formulated as a mixed integer non-linear programming (MINLP) problem to minimize the average user-to-DC (U2D) pathloss, considering the state-of-the-art practical D2G channel model. We decouple the MINLP problem into multiple quasi-convex or integer linear programming (ILP) sub-problems, which optimizes the user association, user scheduling, horizontal trajectories and DC flying altitudes of DCs, respectively. Then, a 3D multi-DC trajectory design algorithm is developed to solve the MINLP problem, in which the sub-problems are optimized iteratively through the block coordinate descent (BCD) method. Compared with the static DC deployment, the proposed trajectory design can lower the average U2D pathloss by 10-15 dB, and reduce the standard deviation of U2D pathloss by 56%, which indicates the improvements in both link quality and user fairness. △ Less

Submitted 29 May, 2019; originally announced June 2019.

Comments: Accepted by IEEE ICC'19

Journal ref: ICC 2019 - 2019 IEEE International Conference on Communications (ICC)

arXiv:1904.06996 [pdf, other]

SR-GAN: Semantic Rectifying Generative Adversarial Network for Zero-shot Learning

Authors: Zihan Ye, Fan Lyu, Linyan Li, Qiming Fu, **chang Ren, Fuyuan Hu

Abstract: The existing Zero-Shot learning (ZSL) methods may suffer from the vague class attributes that are highly overlapped for different classes. Unlike these methods that ignore the discrimination among classes, in this paper, we propose to classify unseen image by rectifying the semantic space guided by the visual space. First, we pre-train a Semantic Rectifying Network (SRN) to rectify semantic space… ▽ More The existing Zero-Shot learning (ZSL) methods may suffer from the vague class attributes that are highly overlapped for different classes. Unlike these methods that ignore the discrimination among classes, in this paper, we propose to classify unseen image by rectifying the semantic space guided by the visual space. First, we pre-train a Semantic Rectifying Network (SRN) to rectify semantic space with a semantic loss and a rectifying loss. Then, a Semantic Rectifying Generative Adversarial Network (SR-GAN) is built to generate plausible visual feature of unseen class from both semantic feature and rectified semantic feature. To guarantee the effectiveness of rectified semantic features and synthetic visual features, a pre-reconstruction and a post reconstruction networks are proposed, which keep the consistency between visual feature and semantic feature. Experimental results demonstrate that our approach significantly outperforms the state-of-the-arts on four benchmark datasets. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: ICME 2019 Oral

arXiv:1805.09583 [pdf, other]

Vehicular Communication Networks in Automated Driving Era

Authors: Shan Zhang, Jiayin Chen, Feng Lyu, Nan Cheng, Weisen Shi, Xuemin, Shen

Abstract: Embedded with advanced sensors, cameras and processors, the emerging automated driving vehicles are capable of sensing the environment and conducting automobile operation, paving the way to modern intelligent transportation systems (ITS) with high safety and efficiency. On the other hand, vehicular communication networks (VCNs) connect vehicles, infrastructures, clouds, and all other devices with… ▽ More Embedded with advanced sensors, cameras and processors, the emerging automated driving vehicles are capable of sensing the environment and conducting automobile operation, paving the way to modern intelligent transportation systems (ITS) with high safety and efficiency. On the other hand, vehicular communication networks (VCNs) connect vehicles, infrastructures, clouds, and all other devices with communication modules, whereby vehicles can obtain local and global information to make intelligent operation decisions. Although the sensing-based automated driving technologies and VCNs have been investigated independently, their interactions and mutual benefits are still underdeveloped. In this article, we argue that VCNs have attractive potentials to enhance the on-board sensing-based automated vehicles from different perspectives, such as driving safety, transportation efficiency, as well as customer experiences. A case study is conducted to demonstrate that the traffic jam can be relieved at intersections with automated driving vehicles coordinated with each other through VCNs. Furthermore, we highlight the critical yet interesting issues for future research, based on the specific requirements posed by automated driving on VCNs. △ Less

Submitted 24 May, 2018; originally announced May 2018.

Comments: 15 pages, 5 figures, IEEE Communications Magazine

Showing 1–46 of 46 results for author: Lyu, F