-
Quantifying Marketing Performance at Channel-Partner Level by Using Marketing Mix Modeling (MMM) and Shapley Value Regression
Authors:
Sean Tang,
Sriya Musunuru,
Baoshi Zong,
Brooks Thornton
Abstract:
This paper explores the application of Shapley Value Regression in dissecting marketing performance at channel-partner level, complementing channel-level Marketing Mix Modeling (MMM). Utilizing real-world data from the financial services industry, we demonstrate the practicality of Shapley Value Regression in evaluating individual partner contributions. Although structured in-field testing along w…
▽ More
This paper explores the application of Shapley Value Regression in dissecting marketing performance at channel-partner level, complementing channel-level Marketing Mix Modeling (MMM). Utilizing real-world data from the financial services industry, we demonstrate the practicality of Shapley Value Regression in evaluating individual partner contributions. Although structured in-field testing along with cooperative game theory is most accurate, it can often be highly complex and expensive to conduct. Shapley Value Regression is thus a more feasible approach to disentangle the influence of each marketing partner within a marketing channel. We also propose a simple method to derive adjusted coefficients of Shapley Value Regression and compare it with alternative approaches.
△ Less
Submitted 11 March, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Deep Federated Anomaly Detection for Multivariate Time Series Data
Authors:
Wei Zhu,
Dong** Song,
Yuncong Chen,
Wei Cheng,
Bo Zong,
Takehiko Mizoguchi,
Cristian Lumezanu,
Haifeng Chen,
Jiebo Luo
Abstract:
Despite the fact that many anomaly detection approaches have been developed for multivariate time series data, limited effort has been made on federated settings in which multivariate time series data are heterogeneously distributed among different edge devices while data sharing is prohibited. In this paper, we investigate the problem of federated unsupervised anomaly detection and present a Fede…
▽ More
Despite the fact that many anomaly detection approaches have been developed for multivariate time series data, limited effort has been made on federated settings in which multivariate time series data are heterogeneously distributed among different edge devices while data sharing is prohibited. In this paper, we investigate the problem of federated unsupervised anomaly detection and present a Federated Exemplar-based Deep Neural Network (Fed-ExDNN) to conduct anomaly detection for multivariate time series data on different edge devices. Specifically, we first design an Exemplar-based Deep Neural network (ExDNN) to learn local time series representations based on their compatibility with an exemplar module which consists of hidden parameters learned to capture varieties of normal patterns on each edge device. Next, a constrained clustering mechanism (FedCC) is employed on the centralized server to align and aggregate the parameters of different local exemplar modules to obtain a unified global exemplar module. Finally, the global exemplar module is deployed together with a shared feature encoder to each edge device and anomaly detection is conducted by examining the compatibility of testing data to the exemplar module. Fed-ExDNN captures local normal time series patterns with ExDNN and aggregates these patterns by FedCC, and thus can handle the heterogeneous data distributed over different edge devices simultaneously. Thoroughly empirical studies on six public datasets show that ExDNN and Fed-ExDNN can outperform state-of-the-art anomaly detection algorithms and federated learning techniques.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Do Multi-Lingual Pre-trained Language Models Reveal Consistent Token Attributions in Different Languages?
Authors:
Junxiang Wang,
Xuchao Zhang,
Bo Zong,
Yanchi Liu,
Wei Cheng,
**gchao Ni,
Haifeng Chen,
Liang Zhao
Abstract:
During the past several years, a surge of multi-lingual Pre-trained Language Models (PLMs) has been proposed to achieve state-of-the-art performance in many cross-lingual downstream tasks. However, the understanding of why multi-lingual PLMs perform well is still an open domain. For example, it is unclear whether multi-Lingual PLMs reveal consistent token attributions in different languages. To ad…
▽ More
During the past several years, a surge of multi-lingual Pre-trained Language Models (PLMs) has been proposed to achieve state-of-the-art performance in many cross-lingual downstream tasks. However, the understanding of why multi-lingual PLMs perform well is still an open domain. For example, it is unclear whether multi-Lingual PLMs reveal consistent token attributions in different languages. To address this, in this paper, we propose a Cross-lingual Consistency of Token Attributions (CCTA) evaluation framework. Extensive experiments in three downstream tasks demonstrate that multi-lingual PLMs assign significantly different attributions to multi-lingual synonyms. Moreover, we have the following observations: 1) the Spanish achieves the most consistent token attributions in different languages when it is used for training PLMs; 2) the consistency of token attributions strongly correlates with performance in downstream tasks.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-sentence Dependency Graph
Authors:
Liyan Xu,
Xuchao Zhang,
Bo Zong,
Yanchi Liu,
Wei Cheng,
**gchao Ni,
Haifeng Chen,
Liang Zhao,
**ho D. Choi
Abstract:
We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to…
▽ More
We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path.
△ Less
Submitted 15 March, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Merlion: A Machine Learning Library for Time Series
Authors:
Aadyot Bhatnagar,
Paul Kassianik,
Chenghao Liu,
Tian Lan,
Wenzhuo Yang,
Rowan Cassius,
Doyen Sahoo,
Devansh Arpit,
Sri Subramanian,
Gerald Woo,
Amrita Saha,
Arun Kumar Jagota,
Gokulakrishnan Gopalakrishnan,
Manpreet Singh,
K C Krithika,
Sukumar Maddineni,
Daeki Cho,
Bo Zong,
Yingbo Zhou,
Caiming Xiong,
Silvio Savarese,
Steven Hoi,
Huan Wang
Abstract:
We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve in…
▽ More
We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve interpetability, AutoML for hyperparameter tuning and model selection, and model ensembling. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs and benchmark them across multiple time series datasets. In this technical report, we highlight Merlion's architecture and major functionalities, and we report benchmark numbers across different baseline models and ensembles.
△ Less
Submitted 19 September, 2021;
originally announced September 2021.
-
Unsupervised Document Embedding via Contrastive Augmentation
Authors:
Dongsheng Luo,
Wei Cheng,
**gchao Ni,
Wenchao Yu,
Xuchao Zhang,
Bo Zong,
Yanchi Liu,
Zhengzhang Chen,
Dong** Song,
Haifeng Chen,
Xiang Zhang
Abstract:
We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner. Inspired by recent contrastive self-supervised learning algorithms used for image and NLP pretraining, we hypothesize that high-quality document embedding should be invariant to diverse paraphrases that preserve the semantics of the original document. With diffe…
▽ More
We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner. Inspired by recent contrastive self-supervised learning algorithms used for image and NLP pretraining, we hypothesize that high-quality document embedding should be invariant to diverse paraphrases that preserve the semantics of the original document. With different backbones and contrastive learning frameworks, our study reveals the enormous benefits of contrastive augmentation for document representation learning with two additional insights: 1) including data augmentation in a contrastive way can substantially improve the embedding quality in unsupervised document representation learning, and 2) in general, stochastic augmentations generated by simple word-level manipulation work much better than sentence-level and document-level ones. We plug our method into a classifier and compare it with a broad range of baseline methods on six benchmark datasets. Our method can decrease the classification error rate by up to 6.4% over the SOTA approaches on the document classification task, matching or even surpassing fully-supervised methods.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series
Authors:
Yinjun Wu,
**gchao Ni,
Wei Cheng,
Bo Zong,
Dong** Song,
Zhengzhang Chen,
Yanchi Liu,
Xuchao Zhang,
Haifeng Chen,
Susan Davidson
Abstract:
Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS's individually, and do not leverage the dynamic distributions underlying the MTS's, leading to sub-optimal results when the sparsity is high. To address this chall…
▽ More
Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS's individually, and do not leverage the dynamic distributions underlying the MTS's, leading to sub-optimal results when the sparsity is high. To address this challenge, we propose a novel generative model, which tracks the transition of latent clusters, instead of isolated feature representations, to achieve robust modeling. It is characterized by a newly designed dynamic Gaussian mixture distribution, which captures the dynamics of clustering structures, and is used for emitting timeseries. The generative model is parameterized by neural networks. A structured inference network is also designed for enabling inductive analysis. A gating mechanism is further introduced to dynamically tune the Gaussian mixture distributions. Extensive experimental results on a variety of real-life datasets demonstrate the effectiveness of our method.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Learning to Drop: Robust Graph Neural Network via Topological Denoising
Authors:
Dongsheng Luo,
Wei Cheng,
Wenchao Yu,
Bo Zong,
**gchao Ni,
Haifeng Chen,
Xiang Zhang
Abstract:
Graph Neural Networks (GNNs) have shown to be powerful tools for graph analytics. The key idea is to recursively propagate and aggregate information along edges of the given graph. Despite their success, however, the existing GNNs are usually sensitive to the quality of the input graph. Real-world graphs are often noisy and contain task-irrelevant edges, which may lead to suboptimal generalization…
▽ More
Graph Neural Networks (GNNs) have shown to be powerful tools for graph analytics. The key idea is to recursively propagate and aggregate information along edges of the given graph. Despite their success, however, the existing GNNs are usually sensitive to the quality of the input graph. Real-world graphs are often noisy and contain task-irrelevant edges, which may lead to suboptimal generalization performance in the learned GNN models. In this paper, we propose PTDNet, a parameterized topological denoising network, to improve the robustness and generalization performance of GNNs by learning to drop task-irrelevant edges. PTDNet prunes task-irrelevant edges by penalizing the number of edges in the sparsified graph with parameterized networks. To take into consideration of the topology of the entire graph, the nuclear norm regularization is applied to impose the low-rank constraint on the resulting sparsified graph for better generalization. PTDNet can be used as a key component in GNN models to improve their performances on various tasks, such as node classification and link prediction. Experimental studies on both synthetic and benchmark datasets show that PTDNet can improve the performance of GNNs significantly and the performance gain becomes larger for more noisy datasets.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
Parameterized Explainer for Graph Neural Network
Authors:
Dongsheng Luo,
Wei Cheng,
Dongkuan Xu,
Wenchao Yu,
Bo Zong,
Haifeng Chen,
Xiang Zhang
Abstract:
Despite recent progress in Graph Neural Networks (GNNs), explaining predictions made by GNNs remains a challenging open problem. The leading method independently addresses the local explanations (i.e., important subgraph structure and node features) to interpret why a GNN model makes the prediction for a single instance, e.g. a node or a graph. As a result, the explanation generated is painstaking…
▽ More
Despite recent progress in Graph Neural Networks (GNNs), explaining predictions made by GNNs remains a challenging open problem. The leading method independently addresses the local explanations (i.e., important subgraph structure and node features) to interpret why a GNN model makes the prediction for a single instance, e.g. a node or a graph. As a result, the explanation generated is painstakingly customized for each instance. The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to a lack of generalizability and hindering it from being used in the inductive setting. Besides, as it is designed for explaining a single instance, it is challenging to explain a set of instances naturally (e.g., graphs of a given class). In this study, we address these key challenges and propose PGExplainer, a parameterized explainer for GNNs. PGExplainer adopts a deep neural network to parameterize the generation process of explanations, which enables PGExplainer a natural approach to explaining multiple instances collectively. Compared to the existing work, PGExplainer has better generalization ability and can be utilized in an inductive setting easily. Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24.7\% relative improvement in AUC on explaining graph classification over the leading baseline.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
T$^2$-Net: A Semi-supervised Deep Model for Turbulence Forecasting
Authors:
Denghui Zhang,
Yanchi Liu,
Wei Cheng,
Bo Zong,
**gchao Ni,
Zhengzhang Chen,
Haifeng Chen,
Hui Xiong
Abstract:
Accurate air turbulence forecasting can help airlines avoid hazardous turbulence, guide the routes that keep passengers safe, maximize efficiency, and reduce costs. Traditional turbulence forecasting approaches heavily rely on painstakingly customized turbulence indexes, which are less effective in dynamic and complex weather conditions. The recent availability of high-resolution weather data and…
▽ More
Accurate air turbulence forecasting can help airlines avoid hazardous turbulence, guide the routes that keep passengers safe, maximize efficiency, and reduce costs. Traditional turbulence forecasting approaches heavily rely on painstakingly customized turbulence indexes, which are less effective in dynamic and complex weather conditions. The recent availability of high-resolution weather data and turbulence records allows more accurate forecasting of the turbulence in a data-driven way. However, it is a non-trivial task for develo** a machine learning based turbulence forecasting system due to two challenges: (1) Complex spatio-temporal correlations, turbulence is caused by air movement with complex spatio-temporal patterns, (2) Label scarcity, very limited turbulence labels can be obtained. To this end, in this paper, we develop a unified semi-supervised framework, T$^2$-Net, to address the above challenges. Specifically, we first build an encoder-decoder paradigm based on the convolutional LSTM to model the spatio-temporal correlations. Then, to tackle the label scarcity problem, we propose a novel Dual Label Guessing method to take advantage of massive unlabeled turbulence data. It integrates complementary signals from the main Turbulence Forecasting task and the auxiliary Turbulence Detection task to generate pseudo-labels, which are dynamically utilized as additional training data. Finally, extensive experimental results on a real-world turbulence dataset validate the superiority of our method on turbulence forecasting.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation
Authors:
Xin Dong,
**gchao Ni,
Wei Cheng,
Zhengzhang Chen,
Bo Zong,
Dong** Song,
Yanchi Liu,
Haifeng Chen,
Gerard de Melo
Abstract:
Recently, recommender systems have been able to emit substantially improved recommendations by leveraging user-provided reviews. Existing methods typically merge all reviews of a given user or item into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users' reviews reflect a variety of items that t…
▽ More
Recently, recommender systems have been able to emit substantially improved recommendations by leveraging user-provided reviews. Existing methods typically merge all reviews of a given user or item into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users' reviews reflect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item's reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.
△ Less
Submitted 18 December, 2019;
originally announced January 2020.
-
Potential Key Technologies for 6G Mobile Communications
Authors:
Yifei Yuan,
Yajun Zhao,
Baiqing Zong,
Sergio Parolari
Abstract:
The standard development of 5G wireless communication culminated between 2017 and 2019, followed by the worldwide deployment of 5G networks, which is expected to result in very high data rate for enhanced mobile broadband, support ultra-reliable and low-latency services and accommodate massive number of connections. Research attention is shifting to future generation of wireless communications, fo…
▽ More
The standard development of 5G wireless communication culminated between 2017 and 2019, followed by the worldwide deployment of 5G networks, which is expected to result in very high data rate for enhanced mobile broadband, support ultra-reliable and low-latency services and accommodate massive number of connections. Research attention is shifting to future generation of wireless communications, for instance, beyond 5G or 6G. Unlike previous papers, which discussed the use cases, deployment scenarios, or new network architectures of 6G in depth, this paper focuses on a few potential technologies for 6G wireless communications, all of which represent certain fundamental breakthrough at the physical layer-technical hardcore of any new generation of wireless communications. Some of them, such as holographic radio, terahertz communication, large intelligent surface, and orbital angular momentum, are of revolutionary nature and many related studies are still at their scientific exploration stage. Several technical areas, such as advanced channel coding and modulation, visible light communication, and advanced duplex, while having been studied, may find more opportunities in 6G.
△ Less
Submitted 2 March, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
Authors:
Chuxu Zhang,
Dong** Song,
Yuncong Chen,
Xinyang Feng,
Cristian Lumezanu,
Wei Cheng,
**gchao Ni,
Bo Zong,
Haifeng Chen,
Nitesh V. Chawla
Abstract:
Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependen…
▽ More
Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-of-the-art baseline methods.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
Behavior Query Discovery in System-Generated Temporal Graphs
Authors:
Bo Zong,
Xusheng Xiao,
Zhichun Li,
Zhenyu Wu,
Zhiyun Qian,
Xifeng Yan,
Ambuj K. Singh,
Guofei Jiang
Abstract:
Computer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal graphs…
▽ More
Computer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal graphs with nodes being system entities and edges being their interactions over time. Given the complexity of such graphs, it becomes time-consuming for system administrators to manually formulate useful queries in order to examine abnormal activities, attacks, and vulnerabilities in computer systems.
In this work, we investigate how to query temporal graphs and treat query formulation as a discriminative temporal graph pattern mining problem. We introduce TGMiner to mine discriminative patterns from system logs, and these patterns can be taken as templates for building more complex queries. TGMiner leverages temporal information in graphs to prune graph patterns that share similar growth trend without compromising pattern quality. Experimental results on real system data show that TGMiner is 6-32 times faster than baseline methods. The discovered patterns were verified by system experts; they achieved high precision (97%) and recall (91%).
△ Less
Submitted 19 November, 2015; v1 submitted 18 November, 2015;
originally announced November 2015.
-
Inferring the Underlying Structure of Information Cascades
Authors:
Bo Zong,
Yinghui Wu,
Ambuj K. Singh,
Xifeng Yan
Abstract:
In social networks, information and influence diffuse among users as cascades. While the importance of studying cascades has been recognized in various applications, it is difficult to observe the complete structure of cascades in practice. Moreover, much less is known on how to infer cascades based on partial observations. In this paper we study the cascade inference problem following the indepen…
▽ More
In social networks, information and influence diffuse among users as cascades. While the importance of studying cascades has been recognized in various applications, it is difficult to observe the complete structure of cascades in practice. Moreover, much less is known on how to infer cascades based on partial observations. In this paper we study the cascade inference problem following the independent cascade model, and provide a full treatment from complexity to algorithms: (a) We propose the idea of consistent trees as the inferred structures for cascades; these trees connect source nodes and observed nodes with paths satisfying the constraints from the observed temporal information. (b) We introduce metrics to measure the likelihood of consistent trees as inferred cascades, as well as several optimization problems for finding them. (c) We show that the decision problems for consistent trees are in general NP-complete, and that the optimization problems are hard to approximate. (d) We provide approximation algorithms with performance guarantees on the quality of the inferred cascades, as well as heuristics. We experimentally verify the efficiency and effectiveness of our inference algorithms, using real and synthetic data.
△ Less
Submitted 12 October, 2012;
originally announced October 2012.