Search | arXiv e-print repository

Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

Authors: Hao Wang, Zhichao Chen, Yuan Shen, Jiajun Fan, Zhaoran Liu, Degui Yang, Xinggao Liu, Haoxuan Li

Abstract: Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In… ▽ More Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-aware Counterfactual Regression (PCR) to exploit proximity for representation balancing within the HTE estimation context. Specifically, we introduce a local proximity preservation regularizer based on optimal transport to depict the local proximity in discrepancy calculation. Furthermore, to overcome the curse of dimensionality that renders the estimation of discrepancy ineffective, exacerbated by limited data availability for HTE estimation, we develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that PCR accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://anonymous.4open.science/status/ncr-B697. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Code is available at https://anonymous.4open.science/status/ncr-B697

arXiv:2404.18980 [pdf, other]

The Impact of COVID-19 on Co-authorship and Economics Scholars' Productivity

Authors: Hanqiao Zhang, Joy D. Xiuyao Yang

Abstract: The COVID-19 pandemic has disrupted traditional academic collaboration patterns, prompting a unique opportunity to analyze the influence of peer effects and coauthorship dynamics on research output. Using a novel dataset, this paper endeavors to make a first cut at investigating the role of peer effects on the productivity of economics scholars, measured by the number of publications, in both pre-… ▽ More The COVID-19 pandemic has disrupted traditional academic collaboration patterns, prompting a unique opportunity to analyze the influence of peer effects and coauthorship dynamics on research output. Using a novel dataset, this paper endeavors to make a first cut at investigating the role of peer effects on the productivity of economics scholars, measured by the number of publications, in both pre-pandemic and pandemic times. Results show that peer effect is significant for the pre-pandemic time but not for the pandemic time. The findings contribute to our understanding of how research collaboration influences knowledge production and may help guide policies aimed at fostering collaboration and enhancing research productivity in the academic community. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.01413 [pdf, other]

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs. △ Less

Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2402.02399 [pdf, other]

FreDF: Learning to Forecast in Frequency Domain

Authors: Hao Wang, Licheng Pan, Zhichao Chen, Degui Yang, Sen Zhang, Yifei Yang, Xinggao Liu, Haoxuan Li, Dacheng Tao

Abstract: Time series modeling is uniquely challenged by the presence of autocorrelation in both historical and label sequences. Current research predominantly focuses on handling autocorrelation within the historical sequence but often neglects its presence in the label sequence. Specifically, emerging forecast models mainly conform to the direct forecast (DF) paradigm, generating multi-step forecasts unde… ▽ More Time series modeling is uniquely challenged by the presence of autocorrelation in both historical and label sequences. Current research predominantly focuses on handling autocorrelation within the historical sequence but often neglects its presence in the label sequence. Specifically, emerging forecast models mainly conform to the direct forecast (DF) paradigm, generating multi-step forecasts under the assumption of conditional independence within the label sequence. This assumption disregards the inherent autocorrelation in the label sequence, thereby limiting the performance of DF-based models. In response to this gap, we introduce the Frequency-enhanced Direct Forecast (FreDF), which bypasses the complexity of label autocorrelation by learning to forecast in the frequency domain. Our experiments demonstrate that FreDF substantially outperforms existing state-of-the-art methods including iTransformer and is compatible with a variety of forecast models. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2312.08670 [pdf, other]

Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables handle data from a holistic perspective, which cannot ensure the precision of causal effects in specific temporal and spatial dimensions. However, temporal and spatial dimensions are extremely critical in the logistics field, and this limitation may directly affect the precision of subsidy and pricing strategies. To address these issues, this study proposes a technique based on flexible temporal-spatial grid partitioning. Furthermore, based on the flexible grid partitioning technique, we further propose a continuous entropy balancing method in the temporal-spatial domain, which named TS-EBCT (Temporal-Spatial Entropy Balancing for Causal Continue Treatments). The method proposed in this paper has been tested on two simulation datasets and two real datasets, all of which have achieved excellent performance. In fact, after applying the TS-EBCT method to the intracity freight transportation field, the prediction accuracy of the causal effect has been significantly improved. It brings good business benefits to the company's subsidy and pricing strategies. △ Less

Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 10 pages;

arXiv:2311.12597 [pdf, other]

Optimal Functional Bilinear Regression with Two-way Functional Covariates via Reproducing Kernel Hilbert Space

Authors: Dan Yang, Jianlong Shao, Haipeng Shen, Dong Wang, Hongtu Zhu

Abstract: Traditional functional linear regression usually takes a one-dimensional functional predictor as input and estimates the continuous coefficient function. Modern applications often generate two-dimensional covariates, which become matrices when observed at grid points. To avoid the inefficiency of the classical method involving estimation of a two-dimensional coefficient function, we propose a func… ▽ More Traditional functional linear regression usually takes a one-dimensional functional predictor as input and estimates the continuous coefficient function. Modern applications often generate two-dimensional covariates, which become matrices when observed at grid points. To avoid the inefficiency of the classical method involving estimation of a two-dimensional coefficient function, we propose a functional bilinear regression model, and introduce an innovative three-term penalty to impose roughness penalty in the estimation. The proposed estimator exhibits minimax optimal property for prediction under the framework of reproducing kernel Hilbert space. An iterative generalized cross-validation approach is developed to choose tuning parameters, which significantly improves the computational efficiency over the traditional cross-validation approach. The statistical and computational advantages of the proposed method over existing methods are further demonstrated via simulated experiments, the Canadian weather data, and a biochemical long-range infrared light detection and ranging data. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 48 pages, 19 figures

arXiv:2311.03967 [pdf, other]

CeCNN: Copula-enhanced convolutional neural networks in joint prediction of refraction error and axial length based on ultra-widefield fundus images

Authors: Chong Zhong, Yang Li, Danjuan Yang, Meiyan Li, Xingyao Zhou, Bo Fu, Catherine C. Liu, A. H. Welsh

Abstract: Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular c… ▽ More Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular component for assessing myopia. Cutting-edge studies show that SE and AL are strongly correlated. Using the joint information from SE and AL is potentially better than using either separately. In the deep learning community, though there is research on multiple-response tasks with a 3D image biomarker, dependence among responses is only sporadically taken into consideration. Inspired by the spirit that information extracted from the data by statistical methods can improve the prediction accuracy of deep learning models, we formulate a class of multivariate response regression models with a higher-order tensor biomarker, for the bivariate tasks of regression-classification and regression-regression. Specifically, we propose a copula-enhanced convolutional neural network (CeCNN) framework that incorporates the dependence between responses through a Gaussian copula (with parameters estimated from a warm-up CNN) and uses the induced copula-likelihood loss with the backbone CNNs. We establish the statistical framework and algorithms for the aforementioned two bivariate tasks. We show that the CeCNN has better prediction accuracy after adding the dependency information to the backbone models. The modeling and the proposed CeCNN algorithm are applicable beyond the UWF scenario and can be effective with other backbones beyond ResNet and LeNet. △ Less

Submitted 1 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2309.12673 [pdf, other]

On Sparse Modern Hopfield Model

Authors: Jerry Yao-Chieh Hu, Donglin Yang, Dennis Wu, Chenwei Xu, Bo-Yu Chen, Han Liu

Abstract: We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate… ▽ More We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate of the sparse entropic regularizer. Building upon this, we derive the sparse memory retrieval dynamics from the sparse energy function and show its one-step approximation is equivalent to the sparse-structured attention. Importantly, we provide a sparsity-dependent memory retrieval error bound which is provably tighter than its dense analog. The conditions for the benefits of sparsity to arise are therefore identified and discussed. In addition, we show that the sparse modern Hopfield model maintains the robust theoretical properties of its dense counterpart, including rapid fixed point convergence and exponential memory capacity. Empirically, we use both synthetic and real-world datasets to demonstrate that the sparse Hopfield model outperforms its dense counterpart in many situations. △ Less

Submitted 29 November, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 37 pages, accepted at NeurIPS 2023. [v2] updated to match with camera-ready version. Code is available at https://github.com/MAGICS-LAB/SparseModernHopfield

arXiv:2305.14765 [pdf, other]

Masked Bayesian Neural Networks : Theoretical Guarantee and its Posterior Inference

Authors: Insung Kong, Dongyoon Yang, Jong** Lee, Ilsang Ohn, Gyuseung Baek, Yongdai Kim

Abstract: Bayesian approaches for learning deep neural networks (BNN) have been received much attention and successfully applied to various applications. Particularly, BNNs have the merit of having better generalization ability as well as better uncertainty quantification. For the success of BNN, search an appropriate architecture of the neural networks is an important task, and various algorithms to find g… ▽ More Bayesian approaches for learning deep neural networks (BNN) have been received much attention and successfully applied to various applications. Particularly, BNNs have the merit of having better generalization ability as well as better uncertainty quantification. For the success of BNN, search an appropriate architecture of the neural networks is an important task, and various algorithms to find good sparse neural networks have been proposed. In this paper, we propose a new node-sparse BNN model which has good theoretical properties and is computationally feasible. We prove that the posterior concentration rate to the true model is near minimax optimal and adaptive to the smoothness of the true model. In particular the adaptiveness is the first of its kind for node-sparse BNNs. In addition, we develop a novel MCMC algorithm which makes the Bayesian inference of the node-sparse BNN model feasible in practice. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 30 pages, ICML 2023 proceedings. arXiv admin note: substantial text overlap with arXiv:2206.00853

arXiv:2304.08673 [pdf, other]

Semi-supervised Learning of Pushforwards For Domain Translation & Adaptation

Authors: Nishant Panda, Natalie Klein, Dominic Yang, Patrick Gasda, Diane Oyen

Abstract: Given two probability densities on related data spaces, we seek a map pushing one density to the other while satisfying application-dependent constraints. For maps to have utility in a broad application space (including domain translation, domain adaptation, and generative modeling), the map must be available to apply on out-of-sample data points and should correspond to a probabilistic model over… ▽ More Given two probability densities on related data spaces, we seek a map pushing one density to the other while satisfying application-dependent constraints. For maps to have utility in a broad application space (including domain translation, domain adaptation, and generative modeling), the map must be available to apply on out-of-sample data points and should correspond to a probabilistic model over the two spaces. Unfortunately, existing approaches, which are primarily based on optimal transport, do not address these needs. In this paper, we introduce a novel pushforward map learning algorithm that utilizes normalizing flows to parameterize the map. We first re-formulate the classical optimal transport problem to be map-focused and propose a learning algorithm to select from all possible maps under the constraint that the map minimizes a probability distance and application-specific regularizers; thus, our method can be seen as solving a modified optimal transport problem. Once the map is learned, it can be used to map samples from a source domain to a target domain. In addition, because the map is parameterized as a composition of normalizing flows, it models the empirical distributions over the two data spaces and allows both sampling and likelihood evaluation for both data sets. We compare our method (parOT) to related optimal transport approaches in the context of domain adaptation and domain translation on benchmark data sets. Finally, to illustrate the impact of our work on applied problems, we apply parOT to a real scientific application: spectral calibration for high-dimensional measurements from two vastly different environments △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 19 pages, 7 figures

arXiv:2212.10872 [pdf, ps, other]

Is it easier to count communities than find them?

Authors: Cynthia Rush, Fiona Skerman, Alexander S. Wein, Dana Yang

Abstract: Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of commu… ▽ More Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of communities) even in situations where actually finding those communities is believed to be computationally hard? We show the answer is no. In particular, we consider certain hypothesis testing problems between models with different community structures, and we show (in the low-degree polynomial framework) that testing between two options is as hard as finding the communities. In addition, our methods give the first computational lower bounds for testing between two different `planted' distributions, whereas previous results have considered testing between a planted distribution and an i.i.d. `null' distribution. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: Accepted to Innovations in Theoretical Computer Science (ITCS) 2023

MSC Class: 05C80; 62F03; 68Q25 ACM Class: F.2; G.2

arXiv:2211.12151 [pdf, other]

Reinforcement Causal Structure Learning on Order Graph

Authors: Dezhi Yang, Guoxian Yu, Jun Wang, Zhengtian Wu, Maozu Guo

Abstract: Learning directed acyclic graph (DAG) that describes the causality of observed data is a very challenging but important task. Due to the limited quantity and quality of observed data, and non-identifiability of causal graph, it is almost impossible to infer a single precise DAG. Some methods approximate the posterior distribution of DAGs to explore the DAG space via Markov chain Monte Carlo (MCMC)… ▽ More Learning directed acyclic graph (DAG) that describes the causality of observed data is a very challenging but important task. Due to the limited quantity and quality of observed data, and non-identifiability of causal graph, it is almost impossible to infer a single precise DAG. Some methods approximate the posterior distribution of DAGs to explore the DAG space via Markov chain Monte Carlo (MCMC), but the DAG space is over the nature of super-exponential growth, accurately characterizing the whole distribution over DAGs is very intractable. In this paper, we propose {Reinforcement Causal Structure Learning on Order Graph} (RCL-OG) that uses order graph instead of MCMC to model different DAG topological orderings and to reduce the problem size. RCL-OG first defines reinforcement learning with a new reward mechanism to approximate the posterior distribution of orderings in an efficacy way, and uses deep Q-learning to update and transfer rewards between nodes. Next, it obtains the probability transition model of nodes on order graph, and computes the posterior probability of different orderings. In this way, we can sample on this model to obtain the ordering with high probability. Experiments on synthetic and benchmark datasets show that RCL-OG provides accurate posterior probability approximation and achieves better results than competitive causal discovery algorithms. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023)

arXiv:2210.05673 [pdf]

Performance Deterioration of Deep Learning Models after Clinical Deployment: A Case Study with Auto-segmentation for Definitive Prostate Cancer Radiotherapy

Authors: Biling Wang, Michael Dohopolski, Ti Bai, Junjie Wu, Raquibul Hannan, Neil Desai, Aurelie Garant, Daniel Yang, Dan Nguyen, Mu-Han Lin, Robert Timmerman, Xinlei Wang, Steve Jiang

Abstract: We evaluated the temporal performance of a deep learning (DL) based artificial intelligence (AI) model for auto segmentation in prostate radiotherapy, seeking to correlate its efficacy with changes in clinical landscapes. Our study involved 1328 prostate cancer patients who underwent definitive radiotherapy from January 2006 to August 2022 at the University of Texas Southwestern Medical Center. We… ▽ More We evaluated the temporal performance of a deep learning (DL) based artificial intelligence (AI) model for auto segmentation in prostate radiotherapy, seeking to correlate its efficacy with changes in clinical landscapes. Our study involved 1328 prostate cancer patients who underwent definitive radiotherapy from January 2006 to August 2022 at the University of Texas Southwestern Medical Center. We trained a UNet based segmentation model on data from 2006 to 2011 and tested it on data from 2012 to 2022 to simulate real world clinical deployment. We measured the model performance using the Dice similarity coefficient (DSC), visualized the trends in contour quality using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test to analyze the differences in DSC distributions across distinct periods, and multiple linear regression to investigate the impact of various clinical factors. The model exhibited peak performance in the initial phase (from 2012 to 2014) for segmenting the prostate, rectum, and bladder. However, we observed a notable decline in performance for the prostate and rectum after 2015, while bladder contour quality remained stable. Key factors that impacted the prostate contour quality included physician contouring styles, the use of various hydrogel spacer, CT scan slice thickness, MRI-guided contouring, and using intravenous (IV) contrast. Rectum contour quality was influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The bladder contour quality was primarily affected by using IV contrast. This study highlights the challenges in maintaining AI model performance consistency in a dynamic clinical setting. It underscores the need for continuous monitoring and updating of AI models to ensure their ongoing effectiveness and relevance in patient care. △ Less

Submitted 16 November, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.03353 [pdf, other]

Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples

Authors: Dongyoon Yang, Insung Kong, Yongdai Kim

Abstract: Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the… ▽ More Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance. △ Less

Submitted 31 May, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: Accepted in ICML 2023

arXiv:2206.00853

Masked Bayesian Neural Networks : Computation and Optimality

Authors: Insung Kong, Dongyoon Yang, Jong** Lee, Ilsang Ohn, Yongdai Kim

Abstract: As data size and computing power increase, the architectures of deep neural networks (DNNs) have been getting more complex and huge, and thus there is a growing need to simplify such complex and huge DNNs. In this paper, we propose a novel sparse Bayesian neural network (BNN) which searches a good DNN with an appropriate complexity. We employ the masking variables at each node which can turn off s… ▽ More As data size and computing power increase, the architectures of deep neural networks (DNNs) have been getting more complex and huge, and thus there is a growing need to simplify such complex and huge DNNs. In this paper, we propose a novel sparse Bayesian neural network (BNN) which searches a good DNN with an appropriate complexity. We employ the masking variables at each node which can turn off some nodes according to the posterior distribution to yield a nodewise sparse DNN. We devise a prior distribution such that the posterior distribution has theoretical optimalities (i.e. minimax optimality and adaptiveness), and develop an efficient MCMC algorithm. By analyzing several benchmark datasets, we illustrate that the proposed BNN performs well compared to other existing methods in the sense that it discovers well condensed DNN architectures with similar prediction accuracy and uncertainty quantification compared to large DNNs. △ Less

Submitted 23 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: I will change to another file

arXiv:2203.14959 [pdf, other]

doi 10.1016/j.renene.2022.04.065

Benchmarks for Solar Radiation Time Series Forecasting

Authors: Cyril Voyant, Gilles Notton, Jean-Laurent Duchaud, Luis Antonio García Gutiérrez, Jamie M. Bright, Dazhi Yang

Abstract: With an ever-increasing share of intermittent renewable energy in the world's energy mix,there is an increasing need for advanced solar power forecasting models to optimize the operation and control of solar power plants. In order to justify the need for more elaborate forecast modeling, one must compare the performance of advanced models with naive reference methods. On this point, a rigorous for… ▽ More With an ever-increasing share of intermittent renewable energy in the world's energy mix,there is an increasing need for advanced solar power forecasting models to optimize the operation and control of solar power plants. In order to justify the need for more elaborate forecast modeling, one must compare the performance of advanced models with naive reference methods. On this point, a rigorous formalism using statistical tools, variational calculation and quantification of noise in the measurement is studied and five naive reference forecasting methods are considered, among which there is a newly proposed approach called ARTU (a particular autoregressive model of order two). These methods do not require any training phase nor demand any (or almost no) historical data. Additionally, motivated by the well-known benefits of ensemble forecasting, a combination of these models is considered, and then validated using data from multiple sites with diverse climatological characteristics, based on various error metrics, among which some are rarely used in the field of solar energy. The most appropriate benchmarking method depends on the salient features of the variable being forecast (e.g., seasonality, cyclicity, or conditional heteoroscedasity) as well as the forecast horizon. Hence, to ensure a fair benchmarking, forecasters should endeavor to discover the most appropriate naive reference method for their setup by testing all available options. Among the methods proposed in this paper, the combination and ARTU statistically offer the best results for the proposed study conditions. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: 32 pages, 9 Tables and 4 Figures

arXiv:2111.12921 [pdf, other]

Network regression and supervised centrality estimation

Authors: Junhui Cai, Dan Yang, Wu Zhu, Haipeng Shen, Linda Zhao

Abstract: The centrality in a network is a popular metric for agents' network positions and is often used in regression models to model the network effect on an outcome variable of interest. In empirical studies, researchers often adopt a two-stage procedure to first estimate the centrality and then infer the network effect using the estimated centrality. Despite its prevalent adoption, this two-stage proce… ▽ More The centrality in a network is a popular metric for agents' network positions and is often used in regression models to model the network effect on an outcome variable of interest. In empirical studies, researchers often adopt a two-stage procedure to first estimate the centrality and then infer the network effect using the estimated centrality. Despite its prevalent adoption, this two-stage procedure lacks theoretical backing and can fail in both estimation and inference. We, therefore, propose a unified framework, under which we prove the shortcomings of the two-stage in centrality estimation and the undesirable consequences in the regression. We then propose a novel supervised network centrality estimation (SuperCENT) methodology that simultaneously yields superior estimations of the centrality and the network effect and provides valid and narrower confidence intervals than those from the two-stage. We showcase the superiority of SuperCENT in predicting the currency risk premium based on the global trade network. △ Less

Submitted 25 November, 2021; originally announced November 2021.

arXiv:2110.15517 [pdf, other]

CP Factor Model for Dynamic Tensors

Authors: Yuefeng Han, Dan Yang, Cun-Hui Zhang, Rong Chen

Abstract: Observations in various applications are frequently represented as a time series of multidimensional arrays, called tensor time series, preserving the inherent multidimensional structure. In this paper, we present a factor model approach, in a form similar to tensor CP decomposition, to the analysis of high-dimensional dynamic tensor time series. As the loading vectors are uniquely defined but not… ▽ More Observations in various applications are frequently represented as a time series of multidimensional arrays, called tensor time series, preserving the inherent multidimensional structure. In this paper, we present a factor model approach, in a form similar to tensor CP decomposition, to the analysis of high-dimensional dynamic tensor time series. As the loading vectors are uniquely defined but not necessarily orthogonal, it is significantly different from the existing tensor factor models based on Tucker-type tensor decomposition. The model structure allows for a set of uncorrelated one-dimensional latent dynamic factor processes, making it much more convenient to study the underlying dynamics of the time series. A new high order projection estimator is proposed for such a factor model, utilizing the special structure and the idea of the higher order orthogonal iteration procedures commonly used in Tucker-type tensor factor model and general tensor CP decomposition procedures. Theoretical investigation provides statistical error bounds for the proposed methods, which shows the significant advantage of utilizing the special model structure. Simulation study is conducted to further demonstrate the finite sample properties of the estimators. Real data application is used to illustrate the model and its interpretations. △ Less

Submitted 18 April, 2024; v1 submitted 28 October, 2021; originally announced October 2021.

arXiv:2103.09383 [pdf, ps, other]

The planted matching problem: Sharp threshold and infinite-order phase transition

Authors: Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang

Abstract: We study the problem of reconstructing a perfect matching $M^*$ hidden in a randomly weighted $n\times n$ bipartite graph. The edge set includes every node pair in $M^*$ and each of the $n(n-1)$ node pairs not in $M^*$ independently with probability $d/n$. The weight of each edge $e$ is independently drawn from the distribution $\mathcal{P}$ if $e \in M^*$ and from $\mathcal{Q}$ if $e \notin M^*$.… ▽ More We study the problem of reconstructing a perfect matching $M^*$ hidden in a randomly weighted $n\times n$ bipartite graph. The edge set includes every node pair in $M^*$ and each of the $n(n-1)$ node pairs not in $M^*$ independently with probability $d/n$. The weight of each edge $e$ is independently drawn from the distribution $\mathcal{P}$ if $e \in M^*$ and from $\mathcal{Q}$ if $e \notin M^*$. We show that if $\sqrt{d} B(\mathcal{P},\mathcal{Q}) \le 1$, where $B(\mathcal{P},\mathcal{Q})$ stands for the Bhattacharyya coefficient, the reconstruction error (average fraction of misclassified edges) of the maximum likelihood estimator of $M^*$ converges to $0$ as $n\to \infty$. Conversely, if $\sqrt{d} B(\mathcal{P},\mathcal{Q}) \ge 1+ε$ for an arbitrarily small constant $ε>0$, the reconstruction error for any estimator is shown to be bounded away from $0$ under both the sparse and dense model, resolving the conjecture in [Moharrami et al. 2019, Semerjian et al. 2020]. Furthermore, in the special case of complete exponentially weighted graph with $d=n$, $\mathcal{P}=\exp(λ)$, and $\mathcal{Q}=\exp(1/n)$, for which the sharp threshold simplifies to $λ=4$, we prove that when $λ\le 4-ε$, the optimal reconstruction error is $\exp\left( - Θ(1/\sqrtε) \right)$, confirming the conjectured infinite-order phase transition in [Semerjian et al. 2020]. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2102.11976 [pdf, other]

Learner-Private Convex Optimization

Authors: Jiaming Xu, Kuang Xu, Dana Yang

Abstract: Convex optimization with feedback is a framework where a learner relies on iterative queries and feedback to arrive at the minimizer of a convex function. It has gained considerable popularity thanks to its scalability in large-scale optimization and machine learning. The repeated interactions, however, expose the learner to privacy risks from eavesdrop** adversaries that observe the submitted q… ▽ More Convex optimization with feedback is a framework where a learner relies on iterative queries and feedback to arrive at the minimizer of a convex function. It has gained considerable popularity thanks to its scalability in large-scale optimization and machine learning. The repeated interactions, however, expose the learner to privacy risks from eavesdrop** adversaries that observe the submitted queries. In this paper, we study how to optimally obfuscate the learner's queries in convex optimization with first-order feedback, so that their learned optimal value is provably difficult to estimate for an eavesdrop** adversary. We consider two formulations of learner privacy: a Bayesian formulation in which the convex function is drawn randomly, and a minimax formulation in which the function is fixed and the adversary's probability of error is measured with respect to a minimax criterion. Suppose that the learner wishes to ensure the adversary cannot estimate accurately with probability greater than $1/L$ for some $L>0$. Our main results show that the query complexity overhead is additive in $L$ in the minimax formulation, but multiplicative in $L$ in the Bayesian formulation. Compared to existing learner-private sequential learning models with binary feedback, our results apply to the significantly richer family of general convex functions with full-gradient feedback. Our proofs learn on tools from the theory of Dirichlet processes, as well as a novel strategy designed for measuring information leakage under a full-gradient oracle. △ Less

Submitted 23 October, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

arXiv:2007.13533 [pdf]

Learning Common Harmonic Waves on Stiefel Manifold -- A New Mathematical Approach for Brain Network Analyses

Authors: Jiazhou Chen, Guoqiang Han, Hongmin Cai, Defu Yang, Paul J. Laurienti, Martin Styner, Guorong Wu, Alzheimer's Disease Neuroimaging Initiative ADNI

Abstract: Converging evidence shows that disease-relevant brain alterations do not appear in random brain locations, instead, its spatial pattern follows large scale brain networks. In this context, a powerful network analysis approach with a mathematical foundation is indispensable to understand the mechanism of neuropathological events spreading throughout the brain. Indeed, the topology of each brain net… ▽ More Converging evidence shows that disease-relevant brain alterations do not appear in random brain locations, instead, its spatial pattern follows large scale brain networks. In this context, a powerful network analysis approach with a mathematical foundation is indispensable to understand the mechanism of neuropathological events spreading throughout the brain. Indeed, the topology of each brain network is governed by its native harmonic waves, which are a set of orthogonal bases derived from the Eigen-system of the underlying Laplacian matrix. To that end, we propose a novel connectome harmonic analysis framework to provide enhanced mathematical insights by detecting frequency-based alterations relevant to brain disorders. The backbone of our framework is a novel manifold algebra appropriate for inference across harmonic waves that overcomes the limitations of using classic Euclidean operations on irregular data structures. The individual harmonic difference is measured by a set of common harmonic waves learned from a population of individual Eigen systems, where each native Eigen-system is regarded as a sample drawn from the Stiefel manifold. Specifically, a manifold optimization scheme is tailored to find the common harmonic waves which reside at the center of Stiefel manifold. To that end, the common harmonic waves constitute the new neuro-biological bases to understand disease progression. Each harmonic wave exhibits a unique propagation pattern of neuro-pathological burdens spreading across brain networks. The statistical power of our novel connectome harmonic analysis approach is evaluated by identifying frequency-based alterations relevant to Alzheimer's disease, where our learning-based manifold approach discovers more significant and reproducible network dysfunction patterns compared to Euclidian methods. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2007.11164 [pdf, other]

Time-aware Graph Embedding: A temporal smoothness and task-oriented approach

Authors: Yonghui Xu, Shengjie Sun, Yuan Miao, Dong Yang, Xiaonan Meng, Yi Hu, Ke Wang, Hengjie Song, Chuanyan Miao

Abstract: Knowledge graph embedding, which aims to learn the low-dimensional representations of entities and relationships, has attracted considerable research efforts recently. However, most knowledge graph embedding methods focus on the structural relationships in fixed triples while ignoring the temporal information. Currently, existing time-aware graph embedding methods only focus on the factual plausib… ▽ More Knowledge graph embedding, which aims to learn the low-dimensional representations of entities and relationships, has attracted considerable research efforts recently. However, most knowledge graph embedding methods focus on the structural relationships in fixed triples while ignoring the temporal information. Currently, existing time-aware graph embedding methods only focus on the factual plausibility, while ignoring the temporal smoothness which models the interactions between a fact and its contexts, and thus can capture fine-granularity temporal relationships. This leads to the limited performance of embedding related applications. To solve this problem, this paper presents a Robustly Time-aware Graph Embedding (RTGE) method by incorporating temporal smoothness. Two major innovations of our paper are presented here. At first, RTGE integrates a measure of temporal smoothness in the learning process of the time-aware graph embedding. Via the proposed additional smoothing factor, RTGE can preserve both structural information and evolutionary patterns of a given graph. Secondly, RTGE provides a general task-oriented negative sampling strategy associated with temporally-aware information, which further improves the adaptive ability of the proposed algorithm and plays an essential role in obtaining superior performance in various tasks. Extensive experiments conducted on multiple benchmark tasks show that RTGE can increase performance in entity/relationship/temporal sco** prediction tasks. △ Less

Submitted 21 July, 2020; originally announced July 2020.

arXiv:2007.07084 [pdf, other]

MRIF: Multi-resolution Interest Fusion for Recommendation

Authors: Shihao Li, Dekun Yang, Bufeng Zhang

Abstract: The main task of personalized recommendation is capturing users' interests based on their historical behaviors. Most of recent advances in recommender systems mainly focus on modeling users' preferences accurately using deep learning based approaches. There are two important properties of users' interests, one is that users' interests are dynamic and evolve over time, the other is that users' inte… ▽ More The main task of personalized recommendation is capturing users' interests based on their historical behaviors. Most of recent advances in recommender systems mainly focus on modeling users' preferences accurately using deep learning based approaches. There are two important properties of users' interests, one is that users' interests are dynamic and evolve over time, the other is that users' interests have different resolutions, or temporal-ranges to be precise, such as long-term and short-term preferences. Existing approaches either use Recurrent Neural Networks (RNNs) to address the drifts in users' interests without considering different temporal-ranges, or design two different networks to model long-term and short-term preferences separately. This paper presents a multi-resolution interest fusion model (MRIF) that takes both properties of users' interests into consideration. The proposed model is capable to capture the dynamic changes in users' interests at different temporal-ranges, and provides an effective way to combine a group of multi-resolution user interests to make predictions. Experiments show that our method outperforms state-of-the-art recommendation methods consistently. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: 4 pages

arXiv:2006.16501 [pdf, other]

Testing and Support Recovery of Correlation Structures for Matrix-Valued Observations with an Application to Stock Market Data

Authors: Xin Chen, Dan Yang, Yan Xu, Yin Xia, Dong Wang, Haipeng Shen

Abstract: Estimation of the covariance matrix of asset returns is crucial to portfolio construction. As suggested by economic theories, the correlation structure among assets differs between emerging markets and developed countries. It is therefore imperative to make rigorous statistical inference on correlation matrix equality between the two groups of countries. However, if the traditional vector-valued a… ▽ More Estimation of the covariance matrix of asset returns is crucial to portfolio construction. As suggested by economic theories, the correlation structure among assets differs between emerging markets and developed countries. It is therefore imperative to make rigorous statistical inference on correlation matrix equality between the two groups of countries. However, if the traditional vector-valued approach is undertaken, such inference is either infeasible due to limited number of countries comparing to the relatively abundant assets, or invalid due to the violations of temporal independence assumption. This highlights the necessity of treating the observations as matrix-valued rather than vector-valued. With matrix-valued observations, our problem of interest can be formulated as statistical inference on covariance structures under sub-Gaussian distributions, i.e., testing non-correlation and correlation equality, as well as the corresponding support estimations. We develop procedures that are asymptotically optimal under some regularity conditions. Simulation results demonstrate the computational and statistical advantages of our procedures over certain existing state-of-the-art methods for both normal and non-normal distributions. Application of our procedures to stock market data reveals interesting patterns and validates several economic propositions via rigorous statistical testing. △ Less

Submitted 27 September, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

arXiv:2006.10932 [pdf, other]

Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty

Authors: Junyang Jiang, Deqing Yang, Yanghua Xiao, Chenlu Shen

Abstract: Most of existing embedding based recommendation models use embeddings (vectors) corresponding to a single fixed point in low-dimensional space, to represent users and items. Such embeddings fail to precisely represent the users/items with uncertainty often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, whi… ▽ More Most of existing embedding based recommendation models use embeddings (vectors) corresponding to a single fixed point in low-dimensional space, to represent users and items. Such embeddings fail to precisely represent the users/items with uncertainty often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, which are proven adaptive to uncertain preferences exhibited by some users, resulting in better user representations and recommendation performance. Furthermore, our framework adopts Monte-Carlo sampling and convolutional neural networks to compute the correlation between the objective user and the candidate item, based on which precise recommendations are achieved. Our extensive experiments on two benchmark datasets not only justify that our proposed Gaussian embeddings capture the uncertainty of users very well, but also demonstrate its superior performance over the state-of-the-art recommendation models. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Journal ref: IJCAI 2019

arXiv:2006.02611 [pdf, other]

Tensor Factor Model Estimation by Iterative Projection

Authors: Yuefeng Han, Rong Chen, Dan Yang, Cun-Hui Zhang

Abstract: Tensor time series, which is a time series consisting of tensorial observations, has become ubiquitous. It typically exhibits high dimensionality. One approach for dimension reduction is to use a factor model structure, in a form similar to Tucker tensor decomposition, except that the time dimension is treated as a dynamic process with a time dependent structure. In this paper we introduce two app… ▽ More Tensor time series, which is a time series consisting of tensorial observations, has become ubiquitous. It typically exhibits high dimensionality. One approach for dimension reduction is to use a factor model structure, in a form similar to Tucker tensor decomposition, except that the time dimension is treated as a dynamic process with a time dependent structure. In this paper we introduce two approaches to estimate such a tensor factor model by using iterative orthogonal projections of the original tensor time series. These approaches extend the existing estimation procedures and improve the estimation accuracy and convergence rate significantly as proven in our theoretical investigation. Our algorithms are similar to the higher order orthogonal projection method for tensor decomposition, but with significant differences due to the need to unfold tensors in the iterations and the use of autocorrelation. Consequently, our analysis is significantly different from the existing ones. Computational and statistical lower bounds are derived to prove the optimality of the sample size requirement and convergence rate for the proposed methods. Simulation study is conducted to further illustrate the statistical properties of these estimators. △ Less

Submitted 5 May, 2022; v1 submitted 3 June, 2020; originally announced June 2020.

MSC Class: Primary 62H25; 62H12; secondary 62R07

arXiv:2005.04586 [pdf, other]

Ensemble Wrapper Subsampling for Deep Modulation Classification

Authors: Sharan Ramjee, Shengtai Ju, Diyu Yang, Xiaoyu Liu, Aly El Gamal, Yonina C. Eldar

Abstract: Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms that rely on the output samples. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems. Unlike traditional approaches that rely on pre-designed strateg… ▽ More Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms that rely on the output samples. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems. Unlike traditional approaches that rely on pre-designed strategies that are solely based on expert knowledge, the proposed data-driven subsampling strategy employs deep neural network architectures to simulate the effect of removing candidate combinations of samples from each training input vector, in a manner inspired by how wrapper feature selection models work. The subsampled data is then processed by another deep learning classifier that recognizes each of the considered 10 modulation types. We show that the proposed subsampling strategy not only introduces drastic reduction in the classifier training time, but can also improve the classification accuracy to higher levels than those reached before for the considered dataset. An important feature herein is exploiting the transferability property of deep neural networks to avoid retraining the wrapper models and obtain superior performance through an ensemble of wrappers over that possible through solely relying on any of them. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 22 pages, 13 figures, 2 tables

arXiv:2003.09638 [pdf, other]

An Uncoupled Training Architecture for Large Graph Learning

Authors: Dalong Yang, Chuan Chen, Youhao Zheng, Zibin Zheng, Shih-wei Liao

Abstract: Graph Convolutional Network (GCN) has been widely used in graph learning tasks. However, GCN-based models (GCNs) is an inherently coupled training framework repetitively conducting the complex neighboring aggregation, which leads to the limitation of flexibility in processing large-scale graph. With the depth of layers increases, the computational and memory cost of GCNs grow explosively due to th… ▽ More Graph Convolutional Network (GCN) has been widely used in graph learning tasks. However, GCN-based models (GCNs) is an inherently coupled training framework repetitively conducting the complex neighboring aggregation, which leads to the limitation of flexibility in processing large-scale graph. With the depth of layers increases, the computational and memory cost of GCNs grow explosively due to the recursive neighborhood expansion. To tackle these issues, we present Node2Grids, a flexible uncoupled training framework that leverages the independent mapped data for obtaining the embedding. Instead of directly processing the coupled nodes as GCNs, Node2Grids supports a more efficacious method in practice, map** the coupled graph data into the independent grid-like data which can be fed into the efficient Convolutional Neural Network (CNN). This simple but valid strategy significantly saves memory and computational resource while achieving comparable results with the leading GCN-based models. Specifically, by ranking each node's influence through degree, Node2Grids selects the most influential first-order as well as second-order neighbors with central node fusion information to construct the grid-like data. For further improving the efficiency of downstream tasks, a simple CNN-based neural network is employed to capture the significant information from the mapped grid-like data. Moreover, the grid-level attention mechanism is implemented, which enables implicitly specifying the different weights for neighboring nodes with different influences. In addition to the typical transductive and inductive learning tasks, we also verify our framework on million-scale graphs to demonstrate the superiority of the proposed Node2Grids model against the state-of-the-art GCN-based approaches. △ Less

Submitted 21 July, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

arXiv:2003.09615 [pdf, other]

DP-Net: Dynamic Programming Guided Deep Neural Network Compression

Authors: Dingcheng Yang, Wenjian Yu, Ao Zhou, Haoyuan Mu, Gary Yao, Xiaoyi Wang

Abstract: In this work, we propose an effective scheme (called DP-Net) for compressing the deep neural networks (DNNs). It includes a novel dynamic programming (DP) based algorithm to obtain the optimal solution of weight quantization and an optimization process to train a clustering-friendly DNN. Experiments showed that the DP-Net allows larger compression than the state-of-the-art counterparts while prese… ▽ More In this work, we propose an effective scheme (called DP-Net) for compressing the deep neural networks (DNNs). It includes a novel dynamic programming (DP) based algorithm to obtain the optimal solution of weight quantization and an optimization process to train a clustering-friendly DNN. Experiments showed that the DP-Net allows larger compression than the state-of-the-art counterparts while preserving accuracy. The largest 77X compression ratio on Wide ResNet is achieved by combining DP-Net with other compression techniques. Furthermore, the DP-Net is extended for compressing a robust DNN model with negligible accuracy loss. At last, a custom accelerator is designed on FPGA to speed up the inference computation with DP-Net. △ Less

Submitted 21 March, 2020; originally announced March 2020.

Comments: 7pages, 4 figures

arXiv:1912.08808 [pdf, other]

Bridging the Gap between Community and Node Representations: Graph Embedding via Community Detection

Authors: Artem Lutov, Dingqi Yang, Philippe Cudré-Mauroux

Abstract: Graph embedding has become a key component of many data mining and analysis systems. Current graph embedding approaches either sample a large number of node pairs from a graph to learn node embeddings via stochastic optimization or factorize a high-order proximity/adjacency matrix of the graph via computationally expensive matrix factorization techniques. These approaches typically require signifi… ▽ More Graph embedding has become a key component of many data mining and analysis systems. Current graph embedding approaches either sample a large number of node pairs from a graph to learn node embeddings via stochastic optimization or factorize a high-order proximity/adjacency matrix of the graph via computationally expensive matrix factorization techniques. These approaches typically require significant resources for the learning process and rely on multiple parameters, which limits their applicability in practice. Moreover, most of the existing graph embedding techniques operate effectively in one specific metric space only (e.g., the one produced with cosine similarity), do not preserve higher-order structural features of the input graph and cannot automatically determine a meaningful number of embedding dimensions. Typically, the produced embeddings are not easily interpretable, which complicates further analyses and limits their applicability. To address these issues, we propose DAOR, a highly efficient and parameter-free graph embedding technique producing metric space-robust, compact and interpretable embeddings without any manual tuning. Compared to a dozen state-of-the-art graph embedding algorithms, DAOR yields competitive results on both node classification (which benefits form high-order proximity) and link prediction (which relies on low-order proximity mostly). Unlike existing techniques, however, DAOR does not require any parameter tuning and improves the embeddings generation speed by several orders of magnitude. Our approach has hence the ambition to greatly simplify and speed up data analysis tasks involving graph representation learning. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Comments: IEEE BigData'19, Special Session on Information Granulation in Data Science and Scalable Computing

MSC Class: 05C60 (Primary); 14E25; 30L05; 54C25; 57N35; 91C20 (Secondary); 05C85 (Secondary); 62G35 (Secondary); 91D30 (Secondary); 68T30 (Secondary) ACM Class: I.2.6; E.1; F.2.2; H.3.4

arXiv:1911.08004 [pdf, other]

Consistent recovery threshold of hidden nearest neighbor graphs

Authors: Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang

Abstract: Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distrib… ▽ More Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2α_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf \frac{kD(P_n||Q_n)}{\log n}>1$, where $α_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the Rényi divergence of order $\frac{1}{2}$ and $D(P_n||Q_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.02140 [pdf, other]

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

Authors: Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tieyan Liu

Abstract: Distributional Reinforcement Learning (RL) differs from traditional RL in that, rather than the expectation of total returns, it estimates distributions and has achieved state-of-the-art performance on Atari Games. The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution. Existing di… ▽ More Distributional Reinforcement Learning (RL) differs from traditional RL in that, rather than the expectation of total returns, it estimates distributions and has achieved state-of-the-art performance on Atari Games. The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution. Existing distributional RL algorithms parameterize either the probability side or the return value side of the distribution function, leaving the other side uniformly fixed as in C51, QR-DQN or randomly sampled as in IQN. In this paper, we propose fully parameterized quantile function that parameterizes both the quantile fraction axis (i.e., the x-axis) and the value axis (i.e., y-axis) for distributional RL. Our algorithm contains a fraction proposal network that generates a discrete set of quantile fractions and a quantile value network that gives corresponding quantile values. The two networks are jointly trained to find the best approximation of the true distribution. Experiments on 55 Atari Games show that our algorithm significantly outperforms existing distributional RL algorithms and creates a new record for the Atari Learning Environment for non-distributed agents. △ Less

Submitted 2 August, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

Comments: NeurIPS 2019. Code at https://github.com/microsoft/FQF

arXiv:1909.11773 [pdf, other]

Rapid mixing of a Markov chain for an exponentially weighted aggregation estimator

Authors: David Pollard, Dana Yang

Abstract: The Metropolis-Hastings method is often used to construct a Markov chain with a given $π$ as its stationary distribution. The method works even if $π$ is known only up to an intractable constant of proportionality. Polynomial time convergence results for such chains (rapid mixing) are hard to obtain for high dimensional probability models where the size of the state space potentially grows exponen… ▽ More The Metropolis-Hastings method is often used to construct a Markov chain with a given $π$ as its stationary distribution. The method works even if $π$ is known only up to an intractable constant of proportionality. Polynomial time convergence results for such chains (rapid mixing) are hard to obtain for high dimensional probability models where the size of the state space potentially grows exponentially with the model dimension. In a Bayesian context, Yang, Wainwright, and Jordan (2016) (=YWJ) used the path method to prove rapid mixing for high dimensional linear models. This paper proposes a modification of the YWJ approach that simplifies the theoretical argument and improves the rate of convergence. The new approach is illustrated by an application to an exponentially weighted aggregation estimator. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.09836 [pdf, other]

Optimal query complexity for private sequential learning against eavesdrop**

Authors: Jiaming Xu, Kuang Xu, Dana Yang

Abstract: We study the query complexity of a learner-private sequential learning problem, motivated by the privacy and security concerns due to eavesdrop** that arise in practical applications such as pricing and Federated Learning. A learner tries to estimate an unknown scalar value, by sequentially querying an external database and receiving binary responses; meanwhile, a third-party adversary observes… ▽ More We study the query complexity of a learner-private sequential learning problem, motivated by the privacy and security concerns due to eavesdrop** that arise in practical applications such as pricing and Federated Learning. A learner tries to estimate an unknown scalar value, by sequentially querying an external database and receiving binary responses; meanwhile, a third-party adversary observes the learner's queries but not the responses. The learner's goal is to design a querying strategy with the minimum number of queries (optimal query complexity) so that she can accurately estimate the true value, while the eavesdrop** adversary even with the complete knowledge of her querying strategy cannot. We develop new querying strategies and analytical techniques and use them to prove tight upper and lower bounds on the optimal query complexity. The bounds almost match across the entire parameter range, substantially improving upon existing results. We thus obtain a complete picture of the optimal query complexity as a function of the estimation accuracy and the desired levels of privacy. We also extend the results to sequential learning models in higher dimensions, and where the binary responses are noisy. Our analysis leverages a crucial insight into the nature of private learning problem, which suggests that the query trajectory of an optimal learner can be divided into distinct phases that focus on pure learning versus learning and obfuscation, respectively. △ Less

Submitted 16 August, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

arXiv:1907.08646 [pdf, other]

Fair quantile regression

Authors: Dana Yang, John Lafferty, David Pollard

Abstract: Quantile regression is a tool for learning conditional distributions. In this paper we study quantile regression in the setting where a protected attribute is unavailable when fitting the model. This can lead to "unfair'' quantile estimators for which the effective quantiles are very different for the subpopulations defined by the protected attribute. We propose a procedure for adjusting the estim… ▽ More Quantile regression is a tool for learning conditional distributions. In this paper we study quantile regression in the setting where a protected attribute is unavailable when fitting the model. This can lead to "unfair'' quantile estimators for which the effective quantiles are very different for the subpopulations defined by the protected attribute. We propose a procedure for adjusting the estimator on a heldout sample where the protected attribute is available. The main result of the paper is an empirical process analysis showing that the adjustment leads to a fair estimator for which the target quantiles are brought into balance, in a statistical sense that we call $\sqrt{n}$-fairness. We illustrate the ideas and adjustment procedure on a dataset of 200,000 live births, where the objective is to characterize the dependence of the birth weights of the babies on demographic attributes of the birth mother; the protected attribute is the mother's race. △ Less

Submitted 19 July, 2019; originally announced July 2019.

arXiv:1907.05418 [pdf, other]

Adversarial Objects Against LiDAR-Based Autonomous Driving Systems

Authors: Yulong Cao, Chaowei Xiao, Dawei Yang, **g Fang, Ruigang Yang, Mingyan Liu, Bo Li

Abstract: Deep neural networks (DNNs) are found to be vulnerable against adversarial examples, which are carefully crafted inputs with a small magnitude of perturbation aiming to induce arbitrarily incorrect predictions. Recent studies show that adversarial examples can pose a threat to real-world security-critical applications: a "physical adversarial Stop Sign" can be synthesized such that the autonomous… ▽ More Deep neural networks (DNNs) are found to be vulnerable against adversarial examples, which are carefully crafted inputs with a small magnitude of perturbation aiming to induce arbitrarily incorrect predictions. Recent studies show that adversarial examples can pose a threat to real-world security-critical applications: a "physical adversarial Stop Sign" can be synthesized such that the autonomous driving cars will misrecognize it as others (e.g., a speed limit sign). However, these image-space adversarial examples cannot easily alter 3D scans of widely equipped LiDAR or radar on autonomous vehicles. In this paper, we reveal the potential vulnerabilities of LiDAR-based autonomous driving detection systems, by proposing an optimization based approach LiDAR-Adv to generate adversarial objects that can evade the LiDAR-based detection system under various conditions. We first show the vulnerabilities using a blackbox evolution-based algorithm, and then explore how much a strong adversary can do, using our gradient-based approach LiDAR-Adv. We test the generated adversarial objects on the Baidu Apollo autonomous driving platform and show that such physical systems are indeed vulnerable to the proposed attacks. We also 3D-print our adversarial objects and perform physical experiments to illustrate that such vulnerability exists in the real world. Please find more visualizations and results on the anonymous website: https://sites.google.com/view/lidar-adv. △ Less

Submitted 11 July, 2019; originally announced July 2019.

arXiv:1907.00267 [pdf, other]

Learning to Generate Synthetic 3D Training Data through Hybrid Gradient

Authors: Dawei Yang, Jia Deng

Abstract: Synthetic images rendered by graphics engines are a promising source for training deep networks. However, it is challenging to ensure that they can help train a network to perform well on real images, because a graphics-based generation pipeline requires numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this work, we propose a new method that optimize… ▽ More Synthetic images rendered by graphics engines are a promising source for training deep networks. However, it is challenging to ensure that they can help train a network to perform well on real images, because a graphics-based generation pipeline requires numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this work, we propose a new method that optimizes the generation of 3D training data based on what we call "hybrid gradient". We parametrize the design decisions as a real vector, and combine the approximate gradient and the analytical gradient to obtain the hybrid gradient of the network performance with respect to this vector. We evaluate our approach on the task of estimating surface normal, depth or intrinsic decomposition from a single image. Experiments on standard benchmarks show that our approach can outperform the prior state of the art on optimizing the generation of 3D training data, particularly in terms of computational efficiency. △ Less

Submitted 25 April, 2020; v1 submitted 29 June, 2019; originally announced July 2019.

Comments: Accepted to CVPR 2020

arXiv:1906.08189 [pdf, other]

Reward Prediction Error as an Exploration Objective in Deep RL

Authors: Riley Simmons-Edler, Ben Eisner, Daniel Yang, Anthony Bisulco, Eric Mitchell, Sebastian Seung, Daniel Lee

Abstract: A major challenge in reinforcement learning is exploration, when local dithering methods such as epsilon-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations corr… ▽ More A major challenge in reinforcement learning is exploration, when local dithering methods such as epsilon-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations correlate well with improved reward, they may not explore more efficiently than epsilon-greedy approaches in environments where the two are not well-correlated. In this paper, we distinguish between exploration tasks in which seeking novel states aids in finding new reward, and those where it does not, such as goal-conditioned tasks and esca** local reward maxima. We propose a new exploration objective, maximizing the reward prediction error (RPE) of a value function trained to predict extrinsic reward. We then propose a deep reinforcement learning method, QXplore, which exploits the temporal difference error of a Q-function to solve hard exploration tasks in high-dimensional MDPs. We demonstrate the exploration behavior of QXplore on several OpenAI Gym MuJoCo tasks and Atari games and observe that QXplore is comparable to or better than a baseline state-novelty method in all cases, outperforming the baseline on tasks where state novelty is not well-correlated with improved reward. △ Less

Submitted 13 January, 2021; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: Published at IJCAI 2020, camera-ready version

arXiv:1905.12517 [pdf, ps, other]

The cost-free nature of optimally tuning Tikhonov regularizers and other ordered smoothers

Authors: Pierre C Bellec, Dana Yang

Abstract: We consider the problem of selecting the best estimator among a family of Tikhonov regularized estimators, or, alternatively, to select a linear combination of these regularizers that is as good as the best regularizer in the family. Our theory reveals that if the Tikhonov regularizers share the same penalty matrix with different tuning parameters, a convex procedure based on $Q$-aggregation achie… ▽ More We consider the problem of selecting the best estimator among a family of Tikhonov regularized estimators, or, alternatively, to select a linear combination of these regularizers that is as good as the best regularizer in the family. Our theory reveals that if the Tikhonov regularizers share the same penalty matrix with different tuning parameters, a convex procedure based on $Q$-aggregation achieves the mean square error of the best estimator, up to a small error term no larger than $Cσ^2$, where $σ^2$ is the noise level and $C>0$ is an absolute constant. Remarkably, the error term does not depend on the penalty matrix or the number of estimators as long as they share the same penalty matrix, i.e., it applies to any grid of tuning parameters, no matter how large the cardinality of the grid is. This reveals the surprising "cost-free" nature of optimally tuning Tikhonov regularizers, in striking contrast with the existing literature on aggregation of estimators where one typically has to pay a cost of $σ^2\log(M)$ where $M$ is the number of estimators in the family. The result holds, more generally, for any family of ordered linear smoothers. This encompasses Ridge regression as well as Principal Component Regression. The result is extended to the problem of tuning Tikhonov regularizers with different penalty matrices. △ Less

Submitted 29 May, 2019; originally announced May 2019.

arXiv:1905.07530 [pdf, other]

Factor Models for High-Dimensional Tensor Time Series

Authors: Rong Chen, Dan Yang, Cun-hui Zhang

Abstract: Large tensor (multi-dimensional array) data are now routinely collected in a wide range of applications, due to modern data collection capabilities. Often such observations are taken over time, forming tensor time series. In this paper we present a factor model approach for analyzing high-dimensional dynamic tensor time series and multi-category dynamic transport networks. Two estimation procedure… ▽ More Large tensor (multi-dimensional array) data are now routinely collected in a wide range of applications, due to modern data collection capabilities. Often such observations are taken over time, forming tensor time series. In this paper we present a factor model approach for analyzing high-dimensional dynamic tensor time series and multi-category dynamic transport networks. Two estimation procedures along with their theoretical properties and simulation results are presented. Two applications are used to illustrate the model and its interpretations. △ Less

Submitted 18 May, 2020; v1 submitted 17 May, 2019; originally announced May 2019.

arXiv:1902.03515 [pdf, other]

Multi-Domain Translation by Learning Uncoupled Autoencoders

Authors: Karren D. Yang, Caroline Uhler

Abstract: Multi-domain translation seeks to learn a probabilistic coupling between marginal distributions that reflects the correspondence between different domains. We assume that data from different domains are generated from a shared latent representation based on a structural equation model. Under this assumption, we show that the problem of computing a probabilistic coupling between marginals is equiva… ▽ More Multi-domain translation seeks to learn a probabilistic coupling between marginal distributions that reflects the correspondence between different domains. We assume that data from different domains are generated from a shared latent representation based on a structural equation model. Under this assumption, we show that the problem of computing a probabilistic coupling between marginals is equivalent to learning multiple uncoupled autoencoders that embed to a given shared latent distribution. In addition, we propose a new framework and algorithm for multi-domain translation based on learning the shared latent distribution and training autoencoders under distributional constraints. A key practical advantage of our framework is that new autoencoders (i.e., new domains) can be added sequentially to the model without retraining on the other domains, which we demonstrate experimentally on image as well as genomics datasets. △ Less

Submitted 9 February, 2019; originally announced February 2019.

MSC Class: 68T01

arXiv:1901.09024 [pdf, other]

Diversity-Sensitive Conditional Generative Adversarial Networks

Authors: Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, Honglak Lee

Abstract: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To… ▽ More We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task. △ Less

Submitted 25 January, 2019; originally announced January 2019.

Comments: Accepted as a conference paper at ICLR 2019

arXiv:1901.05850 [pdf, other]

Fast Deep Learning for Automatic Modulation Classification

Authors: Sharan Ramjee, Shengtai Ju, Diyu Yang, Xiaoyu Liu, Aly El Gamal, Yonina C. Eldar

Abstract: In this work, we investigate the feasibility and effectiveness of employing deep learning algorithms for automatic recognition of the modulation type of received wireless communication signals from subsampled data. Recent work considered a GNU radio-based data set that mimics the imperfections in a real wireless channel and uses 10 different modulation types. A Convolutional Neural Network (CNN) a… ▽ More In this work, we investigate the feasibility and effectiveness of employing deep learning algorithms for automatic recognition of the modulation type of received wireless communication signals from subsampled data. Recent work considered a GNU radio-based data set that mimics the imperfections in a real wireless channel and uses 10 different modulation types. A Convolutional Neural Network (CNN) architecture was then developed and shown to achieve performance that exceeds that of expert-based approaches. Here, we continue this line of work and investigate deep neural network architectures that deliver high classification accuracy. We identify three architectures - namely, a Convolutional Long Short-term Deep Neural Network (CLDNN), a Long Short-Term Memory neural network (LSTM), and a deep Residual Network (ResNet) - that lead to typical classification accuracy values around 90% at high SNR. We then study algorithms to reduce the training time by minimizing the size of the training data set, while incurring a minimal loss in classification accuracy. To this end, we demonstrate the performance of Principal Component Analysis in significantly reducing the training time, while maintaining good performance at low SNR. We also investigate subsampling techniques that further reduce the training time, and pave the way for online classification at high SNR. Finally, we identify representative SNR values for training each of the candidate architectures, and consequently, realize drastic reductions of the training time, with negligible loss in classification accuracy. △ Less

Submitted 15 January, 2019; originally announced January 2019.

Comments: 29 pages, 30 figures, submitted to Journal on Selected Areas in Communications - Special Issue on Machine Learning in Wireless Communications

arXiv:1901.04295 [pdf, other]

On-Demand Video Dispatch Networks: A Scalable End-to-End Learning Approach

Authors: Damao Yang, Sihan Peng, He Huang, Hongliang Xue

Abstract: We design a dispatch system to improve the peak service quality of video on demand (VOD). Our system predicts the hot videos during the peak hours of the next day based on the historical requests, and dispatches to the content delivery networks (CDNs) at the previous off-peak time. In order to scale to billions of videos, we build the system with two neural networks, one for video clustering and t… ▽ More We design a dispatch system to improve the peak service quality of video on demand (VOD). Our system predicts the hot videos during the peak hours of the next day based on the historical requests, and dispatches to the content delivery networks (CDNs) at the previous off-peak time. In order to scale to billions of videos, we build the system with two neural networks, one for video clustering and the other for dispatch policy develo**. The clustering network employs autoencoder layers and reduces the video number to a fixed value. The policy network employs fully connected layers and ranks the clustered videos with dispatch probabilities. The two networks are coupled with weight-sharing temporal layers, which analyze the video request sequences with convolutional and recurrent modules. Therefore, the clustering and dispatch tasks are trained in an end-to-end mechanism. The real-world results show that our approach achieves an average prediction accuracy of 17%, compared with 3% from the present baseline method, for the same amount of dispatches. △ Less

Submitted 25 December, 2018; originally announced January 2019.

Comments: 12 pages, 11 figures

arXiv:1812.08916 [pdf, other]

Autoregressive Models for Matrix-Valued Time Series

Authors: Rong Chen, Han Xiao, Dan Yang

Abstract: In finance, economics and many other fields, observations in a matrix form are often generated over time. For example, a set of key economic indicators are regularly reported in different countries every quarter. The observations at each quarter neatly form a matrix and are observed over many consecutive quarters. Dynamic transport networks with observations generated on the edges can be formed as… ▽ More In finance, economics and many other fields, observations in a matrix form are often generated over time. For example, a set of key economic indicators are regularly reported in different countries every quarter. The observations at each quarter neatly form a matrix and are observed over many consecutive quarters. Dynamic transport networks with observations generated on the edges can be formed as a matrix observed over time. Although it is natural to turn the matrix observations into a long vector, and then use the standard vector time series models for analysis, it is often the case that the columns and rows of the matrix represent different types of structures that are closely interplayed. In this paper we follow the autoregressive structure for modeling time series and propose a novel matrix autoregressive model in a bilinear form that maintains and utilizes the matrix structure to achieve a greater dimensional reduction, as well as more interpretable results. Probabilistic properties of the models are investigated. Estimation procedures with their theoretical properties are presented and demonstrated with simulated and real examples. △ Less

Submitted 24 July, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

MSC Class: 62M10; 62H99

arXiv:1810.11447 [pdf, other]

Scalable Unbalanced Optimal Transport using Generative Adversarial Networks

Authors: Karren D. Yang, Caroline Uhler

Abstract: Generative adversarial networks (GANs) are an expressive class of neural generative models with tremendous success in modeling high-dimensional continuous measures. In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework. We formulate unbalanced OT as a problem of simultaneously learning a transport map and a scaling factor th… ▽ More Generative adversarial networks (GANs) are an expressive class of neural generative models with tremendous success in modeling high-dimensional continuous measures. In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework. We formulate unbalanced OT as a problem of simultaneously learning a transport map and a scaling factor that push a source measure to a target measure in a cost-optimal manner. In addition, we propose an algorithm for solving this problem based on stochastic alternating gradient updates, similar in practice to GANs. We also provide theoretical justification for this formulation, showing that it is closely related to an existing static formulation by Liero et al. (2018), and perform numerical experiments demonstrating how this methodology can be applied to population modeling. △ Less

Submitted 3 August, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

MSC Class: 68T99

arXiv:1810.05731 [pdf, other]

Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Authors: Saifuddin Hitawala, Yao Li, Xian Wang, Dongyang Yang

Abstract: Over the past decade, many Super Resolution techniques have been developed using deep learning. Among those, generative adversarial networks (GAN) and very deep convolutional networks (VDSR) have shown promising results in terms of HR image quality and computational speed. In this paper, we propose two approaches based on these two algorithms: VDSR-ResNeXt, which is a deep multi-branch convolution… ▽ More Over the past decade, many Super Resolution techniques have been developed using deep learning. Among those, generative adversarial networks (GAN) and very deep convolutional networks (VDSR) have shown promising results in terms of HR image quality and computational speed. In this paper, we propose two approaches based on these two algorithms: VDSR-ResNeXt, which is a deep multi-branch convolutional network inspired by VDSR and ResNeXt; and SRCGAN, which is a conditional GAN that explicitly passes class labels as input to the GAN. The two methods were implemented on common SR benchmark datasets for both quantitative and qualitative assessment. △ Less

Submitted 10 October, 2018; originally announced October 2018.

arXiv:1810.05206 [pdf, other]

MeshAdv: Adversarial Meshes for Visual Recognition

Authors: Chaowei Xiao, Dawei Yang, Bo Li, Jia Deng, Mingyan Liu

Abstract: Highly expressive models such as deep neural networks (DNNs) have been widely applied to various applications. However, recent studies show that DNNs are vulnerable to adversarial examples, which are carefully crafted inputs aiming to mislead the predictions. Currently, the majority of these studies have focused on perturbation added to image pixels, while such manipulation is not physically reali… ▽ More Highly expressive models such as deep neural networks (DNNs) have been widely applied to various applications. However, recent studies show that DNNs are vulnerable to adversarial examples, which are carefully crafted inputs aiming to mislead the predictions. Currently, the majority of these studies have focused on perturbation added to image pixels, while such manipulation is not physically realistic. Some works have tried to overcome this limitation by attaching printable 2D patches or painting patterns onto surfaces, but can be potentially defended because 3D shape features are intact. In this paper, we propose meshAdv to generate "adversarial 3D meshes" from objects that have rich shape features but minimal textural variation. To manipulate the shape or texture of the objects, we make use of a differentiable renderer to compute accurate shading on the shape and propagate the gradient. Extensive experiments show that the generated 3D meshes are effective in attacking both classifiers and object detectors. We evaluate the attack under different viewpoints. In addition, we design a pipeline to perform black-box attack on a photorealistic renderer with unknown rendering parameters. △ Less

Submitted 29 June, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: Published in IEEE CVPR2019

arXiv:1807.01635 [pdf, other]

Randomization Inference for Peer Effects

Authors: Xinran Li, Peng Ding, Qian Lin, Dawei Yang, Jun S. Liu

Abstract: Many previous causal inference studies require no interference, that is, the potential outcomes of a unit do not depend on the treatments of other units. However, this no-interference assumption becomes unreasonable when a unit interacts with other units in the same group or cluster. In a motivating application, a university in China admits students through two channels: the college entrance exam… ▽ More Many previous causal inference studies require no interference, that is, the potential outcomes of a unit do not depend on the treatments of other units. However, this no-interference assumption becomes unreasonable when a unit interacts with other units in the same group or cluster. In a motivating application, a university in China admits students through two channels: the college entrance exam (also known as Gaokao) and recommendation (often based on Olympiads in various subjects). The university randomly assigns students to dorms, each of which hosts four students. Students within the same dorm live together and have extensive interactions. Therefore, it is likely that peer effects exist and the no-interference assumption does not hold. It is important to understand peer effects, because they give useful guidance for future roommate assignment to improve the performance of students. We define peer effects using potential outcomes. We then propose a randomization-based inference framework to study peer effects with arbitrary numbers of peers and peer types. Our inferential procedure does not assume any parametric model on the outcome distribution. Our analysis gives useful practical guidance for policy makers of the university in China. △ Less

Submitted 20 December, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

Showing 1–50 of 59 results for author: Yang, D