Search | arXiv e-print repository

Investor Behavior Modeling by Analyzing Financial Advisor Notes: A Machine Learning Perspective

Authors: Cynthia Pagliaro, Dhagash Mehta, Han-Tai Shiao, Shaofei Wang, Luwei Xiong

Abstract: Modeling investor behavior is crucial to identifying behavioral coaching opportunities for financial advisors. With the help of natural language processing (NLP) we analyze an unstructured (textual) dataset of financial advisors' summary notes, taken after every investor conversation, to gain first ever insights into advisor-investor interactions. These insights are used to predict investor needs… ▽ More Modeling investor behavior is crucial to identifying behavioral coaching opportunities for financial advisors. With the help of natural language processing (NLP) we analyze an unstructured (textual) dataset of financial advisors' summary notes, taken after every investor conversation, to gain first ever insights into advisor-investor interactions. These insights are used to predict investor needs during adverse market conditions; thus allowing advisors to coach investors and help avoid inappropriate financial decision-making. First, we perform topic modeling to gain insight into the emerging topics and trends. Based on this insight, we construct a supervised classification model to predict the probability that an advised investor will require behavioral coaching during volatile market periods. To the best of our knowledge, ours is the first work on exploring the advisor-investor relationship using unstructured data. This work may have far-reaching implications for both traditional and emerging financial advisory service models like robo-advising. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: 8 pages, 2 column format, 7 figures+5 tables

arXiv:2106.13423 [pdf, other]

Federated Graph Classification over Non-IID Graphs

Authors: Han Xie, **g Ma, Li Xiong, Carl Yang

Abstract: Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collab… ▽ More Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time war** (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks. △ Less

Submitted 7 November, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2008.11922 [pdf, other]

Time-based Sequence Model for Personalization and Recommendation Systems

Authors: Tigran Ishkhanov, Maxim Naumov, Xianjie Chen, Yan Zhu, Yuan Zhong, Alisson Gusatti Azzolini, Chonglin Sun, Frank Jiang, Andrey Malevich, Liang Xiong

Abstract: In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study t… ▽ More In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study the properties of our state-of-the-art model on statistically designed data set. Also, we show that it outperforms more complex models with longer sequence length on the Taobao User Behavior dataset. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 17 pages, 7 figures

MSC Class: 68T05 ACM Class: I.2.6; I.5.0; H.3.3; H.3.4

arXiv:2006.11943 [pdf, other]

Spatio-Temporal Tensor Sketching via Adaptive Sampling

Authors: **g Ma, Qiuchen Zhang, Joyce C. Ho, Li Xiong

Abstract: Mining massive spatio-temporal data can help a variety of real-world applications such as city capacity planning, event management, and social network analysis. The tensor representation can be used to capture the correlation between space and time and simultaneously exploit the latent structure of the spatial and temporal patterns in an unsupervised fashion. However, the increasing volume of spat… ▽ More Mining massive spatio-temporal data can help a variety of real-world applications such as city capacity planning, event management, and social network analysis. The tensor representation can be used to capture the correlation between space and time and simultaneously exploit the latent structure of the spatial and temporal patterns in an unsupervised fashion. However, the increasing volume of spatio-temporal data has made it prohibitively expensive to store and analyze using tensor factorization. In this paper, we propose SkeTenSmooth, a novel tensor factorization framework that uses adaptive sampling to compress the tensor in a temporally streaming fashion and preserves the underlying global structure. SkeTenSmooth adaptively samples incoming tensor slices according to the detected data dynamics. Thus, the sketches are more representative and informative of the tensor dynamic patterns. In addition, we propose a robust tensor factorization method that can deal with the sketched tensor and recover the original patterns. Experiments on the New York City Yellow Taxi data show that SkeTenSmooth greatly reduces the memory cost and outperforms random sampling and fixed rate sampling method in terms of retaining the underlying patterns. △ Less

Submitted 21 June, 2020; originally announced June 2020.

arXiv:2004.04690 [pdf, other]

Orthogonal Over-Parameterized Training

Authors: Weiyang Liu, Rongmei Lin, Zhen Liu, James M. Rehg, Liam Paull, Li Xiong, Le Song, Adrian Weller

Abstract: The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By… ▽ More The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the empirical generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We consider multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. For better scalability, we propose the stochastic OPT which performs orthogonal transformation stochastically for partial dimensions of neurons. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT over the standard training. △ Less

Submitted 4 June, 2021; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: CVPR 2021 Oral (43 Pages, Substantial Update from v3, Typos Fixed from v5)

arXiv:2003.03477 [pdf, other]

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

Authors: Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu, Ou **, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou

Abstract: Recommendation systems are often trained with a tremendous amount of data, and distributed training is the workhorse to shorten the training time. While the training throughput can be increased by simply adding more workers, it is also increasingly challenging to preserve the model quality. In this paper, we present \shadowsync, a distributed framework specifically tailored to modern scale recomme… ▽ More Recommendation systems are often trained with a tremendous amount of data, and distributed training is the workhorse to shorten the training time. While the training throughput can be increased by simply adding more workers, it is also increasingly challenging to preserve the model quality. In this paper, we present \shadowsync, a distributed framework specifically tailored to modern scale recommendation system training. In contrast to previous works where synchronization happens as part of the training process, \shadowsync separates the synchronization from training and runs it in the background. Such isolation significantly reduces the synchronization overhead and increases the synchronization frequency, so that we are able to obtain both high throughput and excellent model quality when training at scale. The superiority of our procedure is confirmed by experiments on training deep neural networks for click-through-rate prediction tasks. Our framework is capable to express data parallelism and/or model parallelism, generic to host various types of synchronization algorithms, and readily applicable to large scale problems in other areas. △ Less

Submitted 23 February, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

arXiv:1912.04977 [pdf, other]

Advances and Open Problems in Federated Learning

Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges. △ Less

Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

arXiv:1908.09888 [pdf, other]

doi 10.1145/3357384.3357878

Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis

Authors: **g Ma, Qiuchen Zhang, Jian Lou, Joyce C. Ho, Li Xiong, Xiaoqian Jiang

Abstract: Tensor factorization has been demonstrated as an efficient approach for computational phenoty**, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient informa… ▽ More Tensor factorization has been demonstrated as an efficient approach for computational phenoty**, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenoty** using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors, and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods. △ Less

Submitted 1 November, 2019; v1 submitted 26 August, 2019; originally announced August 2019.

arXiv:1809.05822 [pdf, other]

doi 10.1145/3178876.3186146

Aesthetic-based Clothing Recommendation

Authors: Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, Zheng Qin

Abstract: Recently, product images have gained increasing attention in clothing recommendation since the visual appearance of clothing products has a significant impact on consumers' decision. Most existing methods rely on conventional features to represent an image, such as the visual features extracted by convolutional neural networks (CNN features) and the scale-invariant feature transform algorithm (SIF… ▽ More Recently, product images have gained increasing attention in clothing recommendation since the visual appearance of clothing products has a significant impact on consumers' decision. Most existing methods rely on conventional features to represent an image, such as the visual features extracted by convolutional neural networks (CNN features) and the scale-invariant feature transform algorithm (SIFT features), color histograms, and so on. Nevertheless, one important type of features, the \emph{aesthetic features}, is seldom considered. It plays a vital role in clothing recommendation since a users' decision depends largely on whether the clothing is in line with her aesthetics, however the conventional image features cannot portray this directly. To bridge this gap, we propose to introduce the aesthetic information, which is highly relevant with user preference, into clothing recommender systems. To achieve this, we first present the aesthetic features extracted by a pre-trained neural network, which is a brain-inspired deep structure trained for the aesthetic assessment task. Considering that the aesthetic preference varies significantly from user to user and by time, we then propose a new tensor factorization model to incorporate the aesthetic features in a personalized manner. We conduct extensive experiments on real-world datasets, which demonstrate that our approach can capture the aesthetic preference of users and significantly outperform several state-of-the-art recommendation methods. △ Less

Submitted 16 September, 2018; originally announced September 2018.

Comments: WWW 2018

arXiv:1809.00338 [pdf, other]

Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition

Authors: Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Haochong Lan, Fang Zhao, Lin Xiong, Yan Xu, Jianshu Li, Sugiri Pranata, Shengmei Shen, Junliang Xing, Hengzhu Liu, Shuicheng Yan, Jiashi Feng

Abstract: Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages still remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intra-class variations. As opposed to current techniques for age-invariant face recognition, which either directly extract age-invariant features for recognition, or f… ▽ More Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages still remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intra-class variations. As opposed to current techniques for age-invariant face recognition, which either directly extract age-invariant features for recognition, or first synthesize a face that matches target age before feature extraction, we argue that it is more desirable to perform both tasks jointly so that they can leverage each other. To this end, we propose a deep Age-Invariant Model (AIM) for face recognition in the wild with three distinct novelties. First, AIM presents a novel unified deep architecture jointly performing cross-age face synthesis and recognition in a mutual boosting way. Second, AIM achieves continuous face rejuvenation/aging with remarkable photorealistic and identity-preserving properties, avoiding the requirement of paired data and the true age of testing samples. Third, we develop effective and novel training strategies for end-to-end learning the whole deep architecture, which generates powerful age-invariant face representations explicitly disentangled from the age variation. Moreover, we propose a new large-scale Cross-Age Face Recognition (CAFR) benchmark dataset to facilitate existing efforts and push the frontiers of age-invariant face recognition research. Extensive experiments on both our CAFR and several other cross-age datasets (MORPH, CACD and FG-NET) demonstrate the superiority of the proposed AIM model over the state-of-the-arts. Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild. △ Less

Submitted 3 October, 2018; v1 submitted 2 September, 2018; originally announced September 2018.

arXiv:1701.01917 [pdf, ps, other]

See the Near Future: A Short-Term Predictive Methodology to Traffic Load in ITS

Authors: Xun Zhou, Changle Li, Zhe Liu, Tom H. Luan, Zhifang Miao, Lina Zhu, Lei Xiong

Abstract: The Intelligent Transportation System (ITS) targets to a coordinated traffic system by applying the advanced wireless communication technologies for road traffic scheduling. Towards an accurate road traffic control, the short-term traffic forecasting to predict the road traffic at the particular site in a short period is often useful and important. In existing works, Seasonal Autoregressive Integr… ▽ More The Intelligent Transportation System (ITS) targets to a coordinated traffic system by applying the advanced wireless communication technologies for road traffic scheduling. Towards an accurate road traffic control, the short-term traffic forecasting to predict the road traffic at the particular site in a short period is often useful and important. In existing works, Seasonal Autoregressive Integrated Moving Average (SARIMA) model is a popular approach. The scheme however encounters two challenges: 1) the analysis on related data is insufficient whereas some important features of data may be neglected; and 2) with data presenting different features, it is unlikely to have one predictive model that can fit all situations. To tackle above issues, in this work, we develop a hybrid model to improve accuracy of SARIMA. In specific, we first explore the autocorrelation and distribution features existed in traffic flow to revise structure of the time series model. Based on the Gaussian distribution of traffic flow, a hybrid model with a Bayesian learning algorithm is developed which can effectively expand the application scenarios of SARIMA. We show the efficiency and accuracy of our proposal using both analysis and experimental studies. Using the real-world trace data, we show that the proposed predicting approach can achieve satisfactory performance in practice. △ Less

Submitted 8 January, 2017; originally announced January 2017.

arXiv:1202.3758 [pdf]

Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

Authors: Barnabas Poczos, Liang Xiong, Jeff Schneider

Abstract: Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. Th… ▽ More Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets. △ Less

Submitted 14 February, 2012; originally announced February 2012.

Report number: UAI-P-2011-PG-599-608

arXiv:1202.0302 [pdf, other]

Kernels on Sample Sets via Nonparametric Divergence Estimates

Authors: Danica J. Sutherland, Liang Xiong, Barnabás Póczos, Jeff Schneider

Abstract: Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on… ▽ More Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on i.i.d. sample sets of vectors. We define certain kernel functions on pairs of distributions, and then use a nonparametric estimator to consistently estimate those functions based on sample sets. The projection of the estimated Gram matrix to the cone of symmetric positive semi-definite matrices enables us to use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions. We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach. △ Less

Submitted 14 January, 2021; v1 submitted 1 February, 2012; originally announced February 2012.

Showing 1–13 of 13 results for author: Xiong, L