-
Adaptive Reduced Rank Regression
Authors:
Qiong Wu,
Felix Ming Fai Wong,
Zhenming Liu,
Yanhua Li,
Varun Kanade
Abstract:
We study the low rank regression problem $\my = M\mx + ε$, where $\mx$ and $\my$ are $d_1$ and $d_2$ dimensional vectors respectively. We consider the extreme high-dimensional setting where the number of observations $n$ is less than $d_1 + d_2$. Existing algorithms are designed for settings where $n$ is typically as large as $\Rank(M)(d_1+d_2)$. This work provides an efficient algorithm which onl…
▽ More
We study the low rank regression problem $\my = M\mx + ε$, where $\mx$ and $\my$ are $d_1$ and $d_2$ dimensional vectors respectively. We consider the extreme high-dimensional setting where the number of observations $n$ is less than $d_1 + d_2$. Existing algorithms are designed for settings where $n$ is typically as large as $\Rank(M)(d_1+d_2)$. This work provides an efficient algorithm which only involves two SVD, and establishes statistical guarantees on its performance. The algorithm decouples the problem by first estimating the precision matrix of the features, and then solving the matrix denoising problem. To complement the upper bound, we introduce new techniques for establishing lower bounds on the performance of any algorithm for this problem. Our preliminary experiments confirm that our algorithm often out-performs existing baselines, and is always at least competitive.
△ Less
Submitted 23 October, 2020; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization
Authors:
Felix Ming Fai Wong,
Zhenming Liu,
Mung Chiang
Abstract:
We revisit the problem of predicting directional movements of stock prices based on news articles: here our algorithm uses daily articles from The Wall Street Journal to predict the closing stock prices on the same day. We propose a unified latent space model to characterize the "co-movements" between stock prices and news articles. Unlike many existing approaches, our new model is able to simulta…
▽ More
We revisit the problem of predicting directional movements of stock prices based on news articles: here our algorithm uses daily articles from The Wall Street Journal to predict the closing stock prices on the same day. We propose a unified latent space model to characterize the "co-movements" between stock prices and news articles. Unlike many existing approaches, our new model is able to simultaneously leverage the correlations: (a) among stock prices, (b) among news articles, and (c) between stock prices and news articles. Thus, our model is able to make daily predictions on more than 500 stocks (most of which are not even mentioned in any news article) while having low complexity. We carry out extensive backtesting on trading strategies based on our algorithm. The result shows that our model has substantially better accuracy rate (55.7%) compared to many widely used algorithms. The return (56%) and Sharpe ratio due to a trading strategy based on our model are also much higher than baseline indices.
△ Less
Submitted 27 June, 2014;
originally announced June 2014.
-
Mind Your Own Bandwidth: An Edge Solution to Peak-hour Broadband Congestion
Authors:
Felix Ming Fai Wong,
Carlee Joe-Wong,
Sangtae Ha,
Zhenming Liu,
Mung Chiang
Abstract:
Motivated by recent increases in network traffic, we propose a decentralized network edge-based solution to peak-hour broadband congestion that incentivizes users to moderate their bandwidth demands to their actual needs. Our solution is centered on smart home gateways that allocate bandwidth in a two-level hierarchy: first, a gateway purchases guaranteed bandwidth from the Internet Service Provid…
▽ More
Motivated by recent increases in network traffic, we propose a decentralized network edge-based solution to peak-hour broadband congestion that incentivizes users to moderate their bandwidth demands to their actual needs. Our solution is centered on smart home gateways that allocate bandwidth in a two-level hierarchy: first, a gateway purchases guaranteed bandwidth from the Internet Service Provider (ISP) with virtual credits. It then self-limits its bandwidth usage and distributes the bandwidth among its apps and devices according to their relative priorities. To this end, we design a credit allocation and redistribution mechanism for the first level, and implement our gateways on commodity wireless routers for the second level. We demonstrate our system's effectiveness and practicality with theoretical analysis, simulations and experiments on real traffic. Compared to a baseline equal sharing algorithm, our solution significantly improves users' overall satisfaction and yields a fair allocation of bandwidth across users.
△ Less
Submitted 30 December, 2013;
originally announced December 2013.
-
Learning about social learning in MOOCs: From statistical analysis to generative model
Authors:
Christopher G. Brinton,
Mung Chiang,
Shaili Jain,
Henry Lam,
Zhenming Liu,
Felix Ming Fai Wong
Abstract:
We study user behavior in the courses offered by a major Massive Online Open Course (MOOC) provider during the summer of 2013. Since social learning is a key element of scalable education in MOOCs and is done via online discussion forums, our main focus is in understanding forum activities. Two salient features of MOOC forum activities drive our research: 1. High decline rate: for all courses stud…
▽ More
We study user behavior in the courses offered by a major Massive Online Open Course (MOOC) provider during the summer of 2013. Since social learning is a key element of scalable education in MOOCs and is done via online discussion forums, our main focus is in understanding forum activities. Two salient features of MOOC forum activities drive our research: 1. High decline rate: for all courses studied, the volume of discussions in the forum declines continuously throughout the duration of the course. 2. High-volume, noisy discussions: at least 30% of the courses produce new discussion threads at rates that are infeasible for students or teaching staff to read through. Furthermore, a substantial portion of the discussions are not directly course-related.
We investigate factors that correlate with the decline of activity in the online discussion forums and find effective strategies to classify threads and rank their relevance. Specifically, we use linear regression models to analyze the time series of the count data for the forum activities and make a number of observations, e.g., the teaching staff's active participation in the discussion increases the discussion volume but does not slow down the decline rate. We then propose a unified generative model for the discussion threads, which allows us both to choose efficient thread classifiers and design an effective algorithm for ranking thread relevance. Our ranking algorithm is further compared against two baseline algorithms, using human evaluation from Amazon Mechanical Turk.
The authors on this paper are listed in alphabetical order. For media and press coverage, please refer to us collectively, as "researchers from the EDGE Lab at Princeton University, together with collaborators at Boston University and Microsoft Corporation."
△ Less
Submitted 19 December, 2013; v1 submitted 7 December, 2013;
originally announced December 2013.
-
Why Watching Movie Tweets Won't Tell the Whole Story?
Authors:
Felix Ming Fai Wong,
Soumya Sen,
Mung Chiang
Abstract:
Data from Online Social Networks (OSNs) are providing analysts with an unprecedented access to public opinion on elections, news, movies etc. However, caution must be taken to determine whether and how much of the opinion extracted from OSN user data is indeed reflective of the opinion of the larger online population. In this work we study this issue in the context of movie reviews on Twitter and…
▽ More
Data from Online Social Networks (OSNs) are providing analysts with an unprecedented access to public opinion on elections, news, movies etc. However, caution must be taken to determine whether and how much of the opinion extracted from OSN user data is indeed reflective of the opinion of the larger online population. In this work we study this issue in the context of movie reviews on Twitter and compare the opinion of Twitter users with that of the online population of IMDb and Rotten Tomatoes. We introduce new metrics to show that the Twitter users can be characteristically different from general users, both in their rating and their relative preference for Oscar-nominated and non-nominated movies. Additionally, we investigate whether such data can truly predict a movie's box-office success.
△ Less
Submitted 20 March, 2012;
originally announced March 2012.