Skip to main content

Showing 1–50 of 59 results for author: Yu, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.17968  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Efficient Document Ranking with Learnable Late Interactions

    Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2404.10004  [pdf

    cs.LG physics.soc-ph stat.AP

    A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

    Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

    Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 20 pages, 9 figures

  3. arXiv:2403.18658  [pdf, ps, other

    math.ST stat.ML

    Theoretical Guarantees for the Subspace-Constrained Tyler's Estimator

    Authors: Gilad Lerman, Feng Yu, Teng Zhang

    Abstract: This work analyzes the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. It assumes a weak inlier-outlier model and allows the fraction of inliers to be smaller than a fraction that leads to computational hardness of the robust subspace recovery problem. It shows that in this setting, if the… ▽ More

    Submitted 12 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  4. arXiv:2403.16260  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

    Authors: Chenhui Xu, Fuxun Yu, Zirui Xu, Nathan Inkawhich, Xiang Chen

    Abstract: Recent research underscores the pivotal role of the Out-of-Distribution (OOD) feature representation field scale in determining the efficacy of models in OOD detection. Consequently, the adoption of model ensembles has emerged as a prominent strategy to augment this feature representation field, capitalizing on anticipated model diversity. However, our introduction of novel qualitative and quant… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  5. arXiv:2402.11427  [pdf, other

    cs.LG cs.AI stat.ML

    OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations

    Authors: Yao Shu, Jiongfeng Fang, Ying Tiffany He, Fei Richard Yu

    Abstract: First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately paralle… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  6. Robust Sufficient Dimension Reduction via $α$-Distance Covariance

    Authors: Hsin-Hsiung Huang, Feng Yu, Teng Zhang

    Abstract: We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using $α$-distance covariance (dCov) in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free advantage without estimating link function based on the projection on the Stiefel manifold. We establish the convergence prop… ▽ More

    Submitted 4 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  7. arXiv:2401.02544  [pdf, other

    cs.LG stat.CO

    Hyperparameter Estimation for Sparse Bayesian Learning Models

    Authors: Feng Yu, Lixin Shen, Guohui Song

    Abstract: Sparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors. The hyperparameters in SBL models are crucial for the model's performance, but they are often difficult to estimate due to the non-convexity and the high-dimensionality of the associated objective function. This paper presents a comprehensive fram… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    MSC Class: 62F15; 65K10; 65F22

  8. arXiv:2307.12862  [pdf, other

    cs.SI cs.LG stat.CO stat.ML

    Stochastic Step-wise Feature Selection for Exponential Random Graph Models (ERGMs)

    Authors: Helal El-Zaatari, Fei Yu, Michael R Kosorok

    Abstract: Statistical analysis of social networks provides valuable insights into complex network interactions across various scientific disciplines. However, accurate modeling of networks remains challenging due to the heavy computational burden and the need to account for observed network dependencies. Exponential Random Graph Models (ERGMs) have emerged as a promising technique used in social network mod… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 23 pages, 6 tables and 18 figures

  9. arXiv:2305.16590  [pdf, other

    cs.SI cs.CC cs.MA math.PR stat.AP

    Seeding with Differentially Private Network Information

    Authors: M. Amin Rahimian, Fang-Yi Yu, Carlos Hurtado

    Abstract: When designing interventions in public health, development, and education, decision makers rely on social network data to target a small number of people, capitalizing on peer effects and social contagion to bring about the most welfare benefits to the population. Develo** new methods that are privacy-preserving for network data collection and targeted interventions is critical for designing sus… ▽ More

    Submitted 1 February, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Preliminary version in AAMAS 2023: https://dl.acm.org/doi/10.5555/3545946.3599081 -- Code and data: https://github.com/aminrahimian/dp-inf-max

    MSC Class: 91D30; 05C80

  10. arXiv:2212.00936  [pdf, other

    cs.CR stat.AP

    Integer Subspace Differential Privacy

    Authors: Prathamesh Dharangutte, Jie Gao, Ruobin Gong, Fang-Yi Yu

    Abstract: We propose new differential privacy solutions for when external \emph{invariants} and \emph{integer} constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate st… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023

  11. arXiv:2210.06313  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

    Authors: Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

    Abstract: This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activation function, and by sparse we mean that on average very few entries (e.g., 3.0% for T5-Base and 6.3% for ViT-B16) are nonzero for each input to MLP… ▽ More

    Submitted 9 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: A short version was presented at ICLR 2023. Previous title: Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

  12. arXiv:2205.07106  [pdf, ps, other

    stat.ML cs.LG

    Robust Regularized Low-Rank Matrix Models for Regression and Classification

    Authors: Hsin-Hsiung Huang, Feng Yu, Xing Fan, Teng Zhang

    Abstract: While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regulariz… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 26 pages, 7 figures

    MSC Class: 62J12

  13. arXiv:2109.02371  [pdf, other

    stat.ME math.NA stat.CO

    Unbiased Estimation of the Hessian for Partially Observed Diffusions

    Authors: Neil K. Chada, Ajay Jasra, Fangyuan Yu

    Abstract: In this article we consider the development of unbiased estimators of the Hessian, of the log-likelihood function with respect to parameters, for partially observed diffusion processes. These processes arise in numerous applications, where such diffusions require derivative information, either through the Jacobian or Hessian matrix. As time-discretizations of diffusions induce a bias, we provide a… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  14. arXiv:2108.11527  [pdf, other

    cs.CR stat.AP

    Subspace Differential Privacy

    Authors: Jie Gao, Ruobin Gong, Fang-Yi Yu

    Abstract: Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees. We propose subspace differential privacy, to honestly characterize the depend… ▽ More

    Submitted 29 April, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: 25 pages, 3 figures; Published in AAAI'22

  15. arXiv:2107.01913  [pdf, other

    stat.CO math.NA

    Randomized multilevel Monte Carlo for embarrassingly parallel inference

    Authors: Ajay Jasra, Kody J. H. Law, Alexander Tarakanov, Fangyuan Yu

    Abstract: This position paper summarizes a recently developed research program focused on inference in the context of data centric science and engineering applications, and forecasts its trajectory forward over the next decade. Often one endeavours in this context to learn complex systems in order to make more informed predictions and high stakes decisions under uncertainty. Some key challenges which must b… ▽ More

    Submitted 3 December, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

  16. arXiv:2105.05736  [pdf, other

    cs.LG stat.ML

    Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

    Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off pe… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: To appear in ICML 2021

  17. arXiv:2102.10226  [pdf, ps, other

    stat.ML cs.LG math.ST

    ALMA: Alternating Minimization Algorithm for Clustering Mixture Multilayer Network

    Authors: Xing Fan, Marianna Pensky, Feng Yu, Teng Zhang

    Abstract: The paper considers a Mixture Multilayer Stochastic Block Model (MMLSBM), where layers can be partitioned into groups of similar networks, and networks in each group are equipped with a distinct Stochastic Block Model. The goal is to partition the multilayer network into clusters of similar layers, and to identify communities in those layers. **g et al. (2020) introduced the MMLSBM and developed… ▽ More

    Submitted 12 October, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

  18. arXiv:2011.10741  [pdf, other

    cs.LG math.OC stat.ML

    A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

    Authors: Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Zidong Wang, Dachuan Xu, Fan Yu

    Abstract: Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. Inspired by diagonal approximations and factored approximations such as Kronecker-Factored Approximate Curvature (KFAC), we propose a new approximation to the Fi… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

  19. arXiv:2011.04342  [pdf, other

    math.NA stat.CO

    Multilevel Ensemble Kalman-Bucy Filters

    Authors: Neil K. Chada, Ajay Jasra, Fangyuan Yu

    Abstract: In this article we consider the linear filtering problem in continuous-time. We develop and apply multilevel Monte Carlo (MLMC) strategies for ensemble Kalman-Bucy filters (EnKBFs). These filters can be viewed as approximations of conditional McKean-Vlasov-type diffusion processes. They are also interpreted as the continuous-time analogue of the \textit{ensemble Kalman filter}, which has proven to… ▽ More

    Submitted 5 April, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

  20. arXiv:2009.06182  [pdf, other

    stat.ML cs.LG

    Density Estimation via Bayesian Inference Engines

    Authors: M. P. Wand, J. C. F. Yu

    Abstract: We explain how effective automatic probability density function estimates can be constructed using contemporary Bayesian inference engines such as those based on no-U-turn sampling and expectation propagation. Extensive simulation studies demonstrate that the proposed density estimates have excellent comparative performance and scale well to very large sample sizes due to a binning strategy. Moreo… ▽ More

    Submitted 26 September, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

  21. arXiv:2009.01674  [pdf, other

    cs.LG cs.SI stat.ML

    CAGNN: Cluster-Aware Graph Neural Networks for Unsupervised Graph Representation Learning

    Authors: Yanqiao Zhu, Yichen Xu, Feng Yu, Shu Wu, Liang Wang

    Abstract: Unsupervised graph representation learning aims to learn low-dimensional node embeddings without supervision while preserving graph topological structures and node attributive features. Previous graph neural networks (GNN) require a large number of labeled nodes, which may not be accessible in real-world graph data. In this paper, we present a novel cluster-aware graph neural network (CAGNN) model… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

    Comments: 21 pages, in submission to ACM TIST

  22. arXiv:2008.06767  [pdf, other

    cs.LG stat.ML

    Heterogeneous Federated Learning

    Authors: Fuxun Yu, Weishan Zhang, Zhuwei Qin, Zirui Xu, Di Wang, Chenchen Liu, Zhi Tian, Xiang Chen

    Abstract: Federated learning learns from scattered data by fusing collaborative models from local nodes. However, due to chaotic information distribution, the model fusion may suffer from structural misalignment with regard to unmatched parameters. In this work, we propose a novel federated learning framework to resolve this issue by establishing a firm structure-information alignment across collaborative m… ▽ More

    Submitted 19 March, 2022; v1 submitted 15 August, 2020; originally announced August 2020.

    Comments: Full version [Fed2: Feature-Aligned Federated Learning] accepted in KDD'2021

  23. arXiv:2007.13660  [pdf, other

    cs.LG cs.CR cs.DS cs.IT stat.ML

    Learning discrete distributions: user vs item-level privacy

    Authors: Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

    Abstract: Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy beco… ▽ More

    Submitted 11 January, 2021; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020, 38 pages

  24. arXiv:2007.12865  [pdf, other

    cs.LG cs.IR stat.ML

    Self-supervised Learning for Large-scale Item Recommendations

    Authors: Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger

    Abstract: Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the… ▽ More

    Submitted 24 February, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

  25. arXiv:2006.04131  [pdf, other

    cs.LG stat.ML

    Deep Graph Contrastive Representation Learning

    Authors: Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Graph representation learning nowadays becomes fundamental in analyzing graph-structured data. Inspired by recent success of contrastive methods, in this paper, we propose a novel framework for unsupervised graph representation learning by leveraging a contrastive objective at the node level. Specifically, we generate two graph views by corruption and learn node representations by maximizing the a… ▽ More

    Submitted 13 July, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: Work in progress; updated experiments

  26. arXiv:2004.10915  [pdf, other

    cs.LG stat.ML

    Doubly-stochastic mining for heterogeneous retrieval

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example.… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  27. arXiv:2004.10856  [pdf, other

    cs.DC cs.LG stat.ML

    TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism

    Authors: Zhenkun Cai, Kaihao Ma, Xiao Yan, Yidi Wu, Yuzhen Huang, James Cheng, Teng Su, Fan Yu

    Abstract: A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs). Recently, several methods have been proposed to find efficient parallelization strategies but they all optimize a single objective (e.g., execution time, memory consumption) and produce only one strategy. We propose FT, an efficient algorithm that… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

  28. arXiv:2004.10342  [pdf, ps, other

    cs.LG stat.ML

    Federated Learning with Only Positive Labels

    Authors: Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class. As a result, during each federated learning round, the users need to locally update the classifier without having access to the features and the model parameters for the negative classes. Thus, naively employing conventional decentra… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

  29. arXiv:2003.07577  [pdf, other

    cs.LG stat.ML

    Efficient Bitwidth Search for Practical Mixed Precision Neural Network

    Authors: Yuhang Li, Wei Wang, Haoli Bai, Ruihao Gong, Xin Dong, Fengwei Yu

    Abstract: Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Mean… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: 21 pages, 7 figures

  30. arXiv:2002.03932  [pdf, other

    cs.LG cs.CL cs.IR stat.ML

    Pre-training Tasks for Embedding-based Large-scale Retrieval

    Authors: Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar

    Abstract: We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two steps. The retrieval phase first reduces the solution space, returning a subset of candidate documents. The scoring phase then re-ranks the documents. Criticall… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: Accepted by ICLR 2020

  31. arXiv:2002.03747  [pdf, other

    math.NA stat.CO

    Unbiased Filtering of a Class of Partially Observed Diffusions

    Authors: Ajay Jasra, Kody Law, Fangyuan Yu

    Abstract: In this article we consider a Monte Carlo-based method to filter partially observed diffusions observed at regular and discrete times. Given access only to Euler discretizations of the diffusion process, we present a new procedure which can return online estimates of the filtering distribution with no discretization bias and finite variance. Our approach is based upon a novel double application of… ▽ More

    Submitted 11 February, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  32. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  33. arXiv:1911.05466  [pdf, other

    cs.SI cs.LG stat.ML

    Attentive Geo-Social Group Recommendation

    Authors: Fei Yu, Feiyi Fan, Shouxu Jiang, Kai** Zheng

    Abstract: Social activities play an important role in people's daily life since they interact. For recommendations based on social activities, it is vital to have not only the activity information but also individuals' social relations. Thanks to the geo-social networks and widespread use of location-aware mobile devices, massive geo-social data is now readily available for exploitation by the recommendatio… ▽ More

    Submitted 14 November, 2019; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: 12 pages, 7 figures

    MSC Class: 68U35 ACM Class: H.3.3

  34. arXiv:1911.00251  [pdf, ps, other

    cs.LG stat.ML

    Robust Federated Learning with Noisy Communication

    Authors: Fan Ang, Li Chen, Nan Zhao, Yunfei Chen, Weidong Wang, F. Richard Yu

    Abstract: Federated learning is a communication-efficient training process that alternates between local training at the edge devices and averaging the updated local model at the central server. Nevertheless, it is impractical to achieve a perfect acquisition of the local models in wireless communication due to noise, which also brings serious effects on federated learning. To tackle this challenge, we prop… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

  35. arXiv:1908.07643  [pdf, other

    cs.LG cs.CR stat.ML

    AdaCliP: Adaptive Clip** for Private SGD

    Authors: Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Privacy preserving machine learning algorithms are crucial for learning models over user data to protect sensitive information. Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed. At each step, these algorithms modify the gradients and add noise proportional to the sensitivity of the modified gradients. Und… ▽ More

    Submitted 23 October, 2019; v1 submitted 20 August, 2019; originally announced August 2019.

  36. arXiv:1908.02370  [pdf, other

    math.OC stat.CO

    An Algorithm for Graph-Fused Lasso Based on Graph Decomposition

    Authors: Feng Yu, Yi Yang, Teng Zhang

    Abstract: This work proposes a new algorithm for solving the graph-fused lasso (GFL), a method for parameter estimation that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. The proposed method applies the alternating direction method of multipliers (ADMM) algorithm and is based on the decomposition of the objective function into two components. W… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

  37. arXiv:1907.10747  [pdf, other

    cs.LG stat.ML

    Sampled Softmax with Random Fourier Features

    Authors: Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

    Abstract: The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased esti… ▽ More

    Submitted 31 December, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

    Comments: In NeurIPS 2019

  38. arXiv:1906.08856  [pdf, ps, other

    cs.NE cs.LG stat.ML

    Learning Longer-term Dependencies via Grouped Distributor Unit

    Authors: Wei Luo, Feng Yu

    Abstract: Learning long-term dependencies still remains difficult for recurrent neural networks (RNNs) despite their success in sequence modeling recently. In this paper, we propose a novel gated RNN structure, which contains only one gate. Hidden states in the proposed grouped distributor unit (GDU) are partitioned into groups. For each group, the proportion of memory to be overwritten in each state transi… ▽ More

    Submitted 28 April, 2019; originally announced June 2019.

  39. arXiv:1905.08790  [pdf, other

    cs.CR cs.CV cs.LG stat.ML

    DoPa: A Comprehensive CNN Detection Methodology against Physical Adversarial Attacks

    Authors: Zirui Xu, Fuxun Yu, Xiang Chen

    Abstract: Recently, Convolutional Neural Networks (CNNs) demonstrate a considerable vulnerability to adversarial attacks, which can be easily misled by adversarial perturbations. With more aggressive methods proposed, adversarial attacks can be also applied to the physical world, causing practical issues to various CNN powered applications. To secure CNNs, adversarial attack detection is considered as the m… ▽ More

    Submitted 28 August, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: 5 pages, 3 figures

  40. arXiv:1905.04270  [pdf, other

    cs.LG cs.CV stat.ML

    Interpreting and Evaluating Neural Network Robustness

    Authors: Fuxun Yu, Zhuwei Qin, Chenchen Liu, Liang Zhao, Yanzhi Wang, Xiang Chen

    Abstract: Recently, adversarial deception becomes one of the most considerable threats to deep neural networks. However, compared to extensive research in new designs of various adversarial attacks and defenses, the neural networks' intrinsic robustness property is still lack of thorough investigation. This work aims to qualitatively interpret the adversarial attack and defense mechanism through loss visual… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted in IJCAI'19

  41. arXiv:1810.07378  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Progressive Weight Pruning of Deep Neural Networks using ADMM

    Authors: Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Kaidi Xu, Yunfei Yang, Fuxun Yu, Jian Tang, Makan Fardad, Sijia Liu, Xiang Chen, Xue Lin, Yanzhi Wang

    Abstract: Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model compression or pruning. However, most of the previous work took heuristic approaches. This work proposes a progressive weight pruning approach based on ADMM (Alt… ▽ More

    Submitted 4 November, 2018; v1 submitted 16 October, 2018; originally announced October 2018.

  42. arXiv:1810.07322  [pdf, other

    cs.LG cs.CV stat.ML

    Functionality-Oriented Convolutional Filter Pruning

    Authors: Zhuwei Qin, Fuxun Yu, Chenchen Liu, Xiang Chen

    Abstract: The sophisticated structure of Convolutional Neural Network (CNN) allows for outstanding performance, but at the cost of intensive computation. As significant redundancies inevitably present in such a structure, many works have been proposed to prune the convolutional filters for computation cost reduction. Although extremely effective, most works are based only on quantitative characteristics of… ▽ More

    Submitted 11 September, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

  43. arXiv:1810.07076  [pdf, ps, other

    cs.LG stat.ML

    Stochastic Negative Mining for Learning with Large Output Spaces

    Authors: Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar

    Abstract: We consider the problem of retrieving the most relevant labels for a given input when the size of the output space is very large. Retrieval methods are modeled as set-valued classifiers which output a small set of classes for each input, and a mistake is made if the label is not in the output set. Despite its practical importance, a statistically principled, yet practical solution to this problem… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

  44. arXiv:1810.00144  [pdf, other

    cs.LG cs.AI stat.ML

    Interpreting Adversarial Robustness: A View from Decision Surface in Input Space

    Authors: Fuxun Yu, Chenchen Liu, Yanzhi Wang, Liang Zhao, Xiang Chen

    Abstract: One popular hypothesis of neural network generalization is that the flat local minima of loss surface in parameter space leads to good generalization. However, we demonstrate that loss surface in parameter space has no obvious relationship with generalization, especially under adversarial settings. Through visualizing decision surfaces in both parameter space and input space, we instead show that… ▽ More

    Submitted 12 October, 2018; v1 submitted 29 September, 2018; originally announced October 2018.

    Comments: 15 pages, submitted to ICLR 2019

  45. arXiv:1809.04157  [pdf, other

    cs.LG cs.CV stat.ML

    Heated-Up Softmax Embedding

    Authors: Xu Zhang, Felix Xinnan Yu, Svebor Karaman, Wei Zhang, Shih-Fu Chang

    Abstract: Metric learning aims at learning a distance which is consistent with the semantic meaning of the samples. The problem is generally solved by learning an embedding for each sample such that the embeddings of samples of the same category are compact while the embeddings of samples of different categories are spread-out in the feature space. We study the features extracted from the second last layer… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: 11 pages, 4 figures

  46. arXiv:1809.01697  [pdf, other

    cs.CR cs.LG cs.SD eess.AS eess.SP stat.ML

    HASP: A High-Performance Adaptive Mobile Security Enhancement Against Malicious Speech Recognition

    Authors: Zirui Xu, Fuxun Yu, Chenchen Liu, Xiang Chen

    Abstract: Nowadays, machine learning based Automatic Speech Recognition (ASR) technique has widely spread in smartphones, home devices, and public facilities. As convenient as this technology can be, a considerable security issue also raises -- the users' speech content might be exposed to malicious ASR monitoring and cause severe privacy leakage. In this work, we propose HASP -- a high-performance security… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: 8 pages, 10 figures

  47. arXiv:1806.10175  [pdf, other

    stat.ML cs.IT cs.LG

    Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

    Authors: Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

    Abstract: Linear encoding of sparse vectors is widely popular, but is commonly data-independent -- missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used $\ell_1$ decoder. The convex $\ell_1$ decoder prevents gradient propagation as needed in standard grad… ▽ More

    Submitted 2 July, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

    Comments: 17 pages, 7 tables, 8 figures, published in ICML 2019; part of this work was done while Shanshan was an intern at Google Research, New York

  48. arXiv:1805.10559  [pdf, other

    stat.ML cs.CR cs.LG

    cpSGD: Communication-efficient and differentially-private distributed SGD

    Authors: Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan Mcmahan

    Abstract: Distributed stochastic gradient descent is an important subroutine in distributed learning. A setting of particular interest is when the clients are mobile devices, where two important concerns are communication efficiency and the privacy of the clients. Several recent works have focused on reducing the communication cost or introducing privacy guarantees, but none of the proposed communication ef… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

  49. arXiv:1805.09370  [pdf, other

    cs.LG stat.ML

    Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

    Authors: Fuxun Yu, Zirui Xu, Yanzhi Wang, Chenchen Liu, Xiang Chen

    Abstract: In recent years, neural networks have demonstrated outstanding effectiveness in a large amount of applications.However, recent works have shown that neural networks are susceptible to adversarial examples, indicating possible flaws intrinsic to the network structures. To address this problem and improve the robustness of neural networks, we investigate the fundamental mechanisms behind adversarial… ▽ More

    Submitted 6 June, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: 9 pages, 3 figures

  50. arXiv:1805.08423  [pdf, other

    stat.ME

    Fast and Accurate Binary Response Mixed Model Analysis via Expectation Propagation

    Authors: P. Hall, I. M. Johnstone, J. T. Ormerod, M. P. Wand, J. C. F. Yu

    Abstract: Expectation propagation is a general prescription for approximation of integrals in statistical inference problems. Its literature is mainly concerned with Bayesian inference scenarios. However, expectation propagation can also be used to approximate integrals arising in frequentist statistical inference. We focus on likelihood-based inference for binary response mixed models and show that fast an… ▽ More

    Submitted 22 May, 2018; originally announced May 2018.

    Comments: 35 pages, 5 figures