Search | arXiv e-print repository

Deep Learning is Provably Robust to Symmetric Label Noise

Authors: Carey E. Priebe, Ningyuan Huang, Soledad Villar, Cong Mu, Li Chen

Abstract: Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mi… ▽ More Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mitigation? We provide an affirmative answer for the case of symmetric label noise: We find that certain DNNs, including under-parameterized and over-parameterized models, can tolerate massive symmetric label noise up to the information-theoretic threshold. By appealing to classical statistical theory and universal consistency of DNNs, we prove that for multiclass classification, $L_1$-consistent DNN classifiers trained under symmetric label noise can achieve Bayes optimality asymptotically if the label noise probability is less than $\frac{K-1}{K}$, where $K \ge 2$ is the number of classes. Our results show that for symmetric label noise, no mitigation is necessary for $L_1$-consistent estimators. We conjecture that for general label noise, mitigation strategies that make use of the noisy data will outperform those that ignore the noisy data. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2208.13921 [pdf, other]

Dynamic Network Sampling for Community Detection

Authors: Cong Mu, Youngser Park, Carey E. Priebe

Abstract: We propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel (SBM) in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance, in terms of block recovery, of our method on sever… ▽ More We propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel (SBM) in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance, in terms of block recovery, of our method on several real datasets from different domains. Both theoretically and practically results suggest that our method can identify vertices that have the most impact on block structure so that one can only check whether there are edges between them to save significant resources but still recover the block structure. △ Less

Submitted 16 December, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: 18 pages, 8 figures

arXiv:2007.02156 [pdf, other]

On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates

Authors: Cong Mu, Angelo Mele, Lingxin Hao, Joshua Cape, Avanti Athreya, Carey E. Priebe

Abstract: In network inference applications, it is often desirable to detect community structure, namely to cluster vertices into groups, or blocks, according to some measure of similarity. Beyond mere adjacency matrices, many real networks also involve vertex covariates that carry key information about underlying block structure in graphs. To assess the effects of such covariates on block recovery, we pres… ▽ More In network inference applications, it is often desirable to detect community structure, namely to cluster vertices into groups, or blocks, according to some measure of similarity. Beyond mere adjacency matrices, many real networks also involve vertex covariates that carry key information about underlying block structure in graphs. To assess the effects of such covariates on block recovery, we present a comparative analysis of two model-based spectral algorithms for clustering vertices in stochastic blockmodel graphs with vertex covariates. The first algorithm uses only the adjacency matrix, and directly estimates the block assignments. The second algorithm incorporates both the adjacency matrix and the vertex covariates into the estimation of block assignments, and moreover quantifies the explicit impact of the vertex covariates on the resulting estimate of the block assignments. We employ Chernoff information to analytically compare the algorithms' performance and derive the information-theoretic Chernoff ratio for certain models of interest. Analytic results and simulations suggest that the second algorithm is often preferred: we can often better estimate the induced block assignments by first estimating the effect of vertex covariates. In addition, real data examples also indicate that the second algorithm has the advantages of revealing underlying block structure and taking observed vertex heterogeneity into account in real applications. Our findings emphasize the importance of distinguishing between observed and unobserved factors that can affect block structure in graphs. △ Less

Submitted 3 August, 2021; v1 submitted 4 July, 2020; originally announced July 2020.

Comments: 17 pages, 7 figures

arXiv:1906.10095 [pdf, ps, other]

An Empirical Comparison of FAISS and FENSHSES for Nearest Neighbor Search in Hamming Space

Authors: Cun Mu, Binwei Yang, Zheng Yan

Abstract: In this paper, we compare the performances of FAISS and FENSHSES on nearest neighbor search in Hamming space--a fundamental task with ubiquitous applications in nowadays eCommerce. Comprehensive evaluations are made in terms of indexing speed, search latency and RAM consumption. This comparison is conducted towards a better understanding on trade-offs between nearest neighbor search systems implem… ▽ More In this paper, we compare the performances of FAISS and FENSHSES on nearest neighbor search in Hamming space--a fundamental task with ubiquitous applications in nowadays eCommerce. Comprehensive evaluations are made in terms of indexing speed, search latency and RAM consumption. This comparison is conducted towards a better understanding on trade-offs between nearest neighbor search systems implemented in main memory and the ones implemented in secondary memory, which is largely unaddressed in literature. △ Less

Submitted 28 July, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: SIGIR eCom'19

arXiv:1902.08498 [pdf, other]

Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines

Authors: Cun Mu, Jun Zhao, Guang Yang, Binwei Yang, Zheng Yan

Abstract: A growing interest has been witnessed recently from both academia and industry in building nearest neighbor search (NNS) solutions on top of full-text search engines. Compared with other NNS systems, such solutions are capable of effectively reducing main memory consumption, coherently supporting multi-model search and being immediately ready for production deployment. In this paper, we continue t… ▽ More A growing interest has been witnessed recently from both academia and industry in building nearest neighbor search (NNS) solutions on top of full-text search engines. Compared with other NNS systems, such solutions are capable of effectively reducing main memory consumption, coherently supporting multi-model search and being immediately ready for production deployment. In this paper, we continue the journey to explore specifically how to empower full-text search engines with fast and exact NNS in Hamming space (i.e., the set of binary codes). By revisiting three techniques (bit operation, subs-code filtering and data preprocessing with permutation) in information retrieval literature, we develop a novel engineering solution for full-text search engines to efficiently accomplish this special but important NNS task. In the experiment, we show that our proposed approach enables full-text search engines to achieve significant speed-ups over its state-of-the-art term match approach for NNS within binary codes. △ Less

Submitted 28 July, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: A shorter version of the paper is accepted by SISAP 2019

arXiv:1809.10210 [pdf, ps, other]

A Machine Learning Approach to Ship** Box Design

Authors: Guang Yang, Cun Mu

Abstract: Having the right assortment of ship** boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens… ▽ More Having the right assortment of ship** boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens of box sizes from thousands of feasible ones to be responsible for hundreds of thousands of orders daily placed on millions of inventory products. In this paper, we present a machine learning approach to tackle the task by formulating the box design problem prescriptively as a generalized version of weighted $k$-medoids clustering problem, where the parameters are estimated through a variety of descriptive analytics. We test this machine learning approach on fulfillment data collected from Walmart U.S. eCommerce, and our approach is shown to be capable of improving the box utilization rate by more than $10\%$. △ Less

Submitted 25 March, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: Accepted by 2019 Intelligent Systems Conference (A shorter version of the paper is presented at the 13th INFORMS Workshop on Data Mining and Decision Analytics)

arXiv:1804.00306 [pdf, other]

Revisiting Skip-Gram Negative Sampling Model with Rectification

Authors: Cun Mu, Guang Yang, Zheng Yan

Abstract: We revisit skip-gram negative sampling (SGNS), one of the most popular neural-network based approaches to learning distributed word representation. We first point out the ambiguity issue undermining the SGNS model, in the sense that the word vectors can be entirely distorted without changing the objective value. To resolve the issue, we investigate the intrinsic structures in solution that a good… ▽ More We revisit skip-gram negative sampling (SGNS), one of the most popular neural-network based approaches to learning distributed word representation. We first point out the ambiguity issue undermining the SGNS model, in the sense that the word vectors can be entirely distorted without changing the objective value. To resolve the issue, we investigate the intrinsic structures in solution that a good word embedding model should deliver. Motivated by this, we rectify the SGNS model with quadratic regularization, and show that this simple modification suffices to structure the solution in the desired manner. A theoretical justification is presented, which provides novel insights into quadratic regularization . Preliminary experiments are also conducted on Google's analytical reasoning task to support the modified SGNS model. △ Less

Submitted 14 January, 2019; v1 submitted 1 April, 2018; originally announced April 2018.

Comments: Accepted for publication in the proceedings of 2019 Computing Conference

arXiv:1403.7588 [pdf, other]

doi 10.1137/15M101628X

Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods

Authors: Cun Mu, Yuqian Zhang, John Wright, Donald Goldfarb

Abstract: Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning. In theory, under certain conditions, this problem can be solved in polynomial time via a natural convex relaxation, known as Compressive Principal Component Pursuit (CPCP). However, all existing provable algorithms fo… ▽ More Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning. In theory, under certain conditions, this problem can be solved in polynomial time via a natural convex relaxation, known as Compressive Principal Component Pursuit (CPCP). However, all existing provable algorithms for CPCP suffer from superlinear per-iteration cost, which severely limits their applicability to large scale problems. In this paper, we propose provable, scalable and efficient methods to solve CPCP with (essentially) linear per-iteration cost. Our method combines classical ideas from Frank-Wolfe and proximal methods. In each iteration, we mainly exploit Frank-Wolfe to update the low-rank component with rank-one SVD and exploit the proximal step for the sparse term. Convergence results and implementation details are also discussed. We demonstrate the scalability of the proposed approach with promising numerical experiments on visual data. △ Less

Submitted 29 May, 2017; v1 submitted 29 March, 2014; originally announced March 2014.

Journal ref: SIAM Journal on Scientific Computing, 2016, Vol. 38, No. 5 : pp. A3291-A3317

arXiv:1309.5489 [pdf, other]

Computational Aspects of Optional Pólya Tree

Authors: Hui Jiang, John C. Mu, Kun Yang, Chao Du, Luo Lu, Wing Hung Wong

Abstract: Optional Pólya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional Pólya Tree (LL-OPT), aims at greatly accelerate the computation for OP… ▽ More Optional Pólya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional Pólya Tree (LL-OPT), aims at greatly accelerate the computation for OPT inference. The second improvement modifies the output of OPT or LL-OPT and produces a continuous piecewise linear density estimate. We demonstrate the performance of these two improvements using simulations. △ Less

Submitted 21 September, 2013; originally announced September 2013.

arXiv:1307.5870 [pdf, other]

Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery

Authors: Cun Mu, Bo Huang, John Wright, Donald Goldfarb

Abstract: Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way tensor of length $n$ and Tucker rank $r$ from Gaussian measureme… ▽ More Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way tensor of length $n$ and Tucker rank $r$ from Gaussian measurements requires $Ω(r n^{K-1})$ observations. In contrast, a certain (intractable) nonconvex formulation needs only $O(r^K + nrK)$ observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with $O(r^{\lfloor K/2 \rfloor}n^{\lceil K/2 \rceil})$ observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bound for the sum-of-nuclear-norms model follows from a new result on recovering signals with multiple sparse structures (e.g. sparse, low rank), which perhaps surprisingly demonstrates the significant suboptimality of the commonly used recovery approach via minimizing the sum of individual sparsity inducing norms (e.g. $l_1$, nuclear norm). Our new formulation for low-rank tensor recovery however opens the possibility in reducing the sample complexity by exploiting several structures jointly. △ Less

Submitted 15 August, 2013; v1 submitted 22 July, 2013; originally announced July 2013.

Comments: Slight modifications are made in this second version (mainly, Lemma 5)

Showing 1–10 of 10 results for author: Mu, C