Skip to main content

Showing 1–10 of 10 results for author: Mu, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2210.15083  [pdf, other

    stat.ML cs.LG

    Deep Learning is Provably Robust to Symmetric Label Noise

    Authors: Carey E. Priebe, Ningyuan Huang, Soledad Villar, Cong Mu, Li Chen

    Abstract: Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mi… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  2. arXiv:2208.13921  [pdf, other

    cs.SI math.ST stat.CO stat.ML

    Dynamic Network Sampling for Community Detection

    Authors: Cong Mu, Youngser Park, Carey E. Priebe

    Abstract: We propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel (SBM) in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance, in terms of block recovery, of our method on sever… ▽ More

    Submitted 16 December, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 18 pages, 8 figures

  3. arXiv:2007.02156  [pdf, other

    cs.SI cs.LG math.ST stat.CO stat.ML

    On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates

    Authors: Cong Mu, Angelo Mele, Lingxin Hao, Joshua Cape, Avanti Athreya, Carey E. Priebe

    Abstract: In network inference applications, it is often desirable to detect community structure, namely to cluster vertices into groups, or blocks, according to some measure of similarity. Beyond mere adjacency matrices, many real networks also involve vertex covariates that carry key information about underlying block structure in graphs. To assess the effects of such covariates on block recovery, we pres… ▽ More

    Submitted 3 August, 2021; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: 17 pages, 7 figures

  4. arXiv:1906.10095  [pdf, ps, other

    cs.LG cs.CV cs.IR stat.ML

    An Empirical Comparison of FAISS and FENSHSES for Nearest Neighbor Search in Hamming Space

    Authors: Cun Mu, Binwei Yang, Zheng Yan

    Abstract: In this paper, we compare the performances of FAISS and FENSHSES on nearest neighbor search in Hamming space--a fundamental task with ubiquitous applications in nowadays eCommerce. Comprehensive evaluations are made in terms of indexing speed, search latency and RAM consumption. This comparison is conducted towards a better understanding on trade-offs between nearest neighbor search systems implem… ▽ More

    Submitted 28 July, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: SIGIR eCom'19

  5. arXiv:1902.08498  [pdf, other

    cs.IR cs.LG stat.ML

    Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines

    Authors: Cun Mu, Jun Zhao, Guang Yang, Binwei Yang, Zheng Yan

    Abstract: A growing interest has been witnessed recently from both academia and industry in building nearest neighbor search (NNS) solutions on top of full-text search engines. Compared with other NNS systems, such solutions are capable of effectively reducing main memory consumption, coherently supporting multi-model search and being immediately ready for production deployment. In this paper, we continue t… ▽ More

    Submitted 28 July, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

    Comments: A shorter version of the paper is accepted by SISAP 2019

  6. arXiv:1809.10210  [pdf, ps, other

    stat.ML cs.LG math.OC

    A Machine Learning Approach to Ship** Box Design

    Authors: Guang Yang, Cun Mu

    Abstract: Having the right assortment of ship** boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens… ▽ More

    Submitted 25 March, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

    Comments: Accepted by 2019 Intelligent Systems Conference (A shorter version of the paper is presented at the 13th INFORMS Workshop on Data Mining and Decision Analytics)

  7. arXiv:1804.00306  [pdf, other

    cs.CL cs.LG stat.ML

    Revisiting Skip-Gram Negative Sampling Model with Rectification

    Authors: Cun Mu, Guang Yang, Zheng Yan

    Abstract: We revisit skip-gram negative sampling (SGNS), one of the most popular neural-network based approaches to learning distributed word representation. We first point out the ambiguity issue undermining the SGNS model, in the sense that the word vectors can be entirely distorted without changing the objective value. To resolve the issue, we investigate the intrinsic structures in solution that a good… ▽ More

    Submitted 14 January, 2019; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: Accepted for publication in the proceedings of 2019 Computing Conference

  8. arXiv:1403.7588  [pdf, other

    math.OC cs.CV math.NA stat.ML

    Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods

    Authors: Cun Mu, Yuqian Zhang, John Wright, Donald Goldfarb

    Abstract: Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning. In theory, under certain conditions, this problem can be solved in polynomial time via a natural convex relaxation, known as Compressive Principal Component Pursuit (CPCP). However, all existing provable algorithms fo… ▽ More

    Submitted 29 May, 2017; v1 submitted 29 March, 2014; originally announced March 2014.

    Journal ref: SIAM Journal on Scientific Computing, 2016, Vol. 38, No. 5 : pp. A3291-A3317

  9. arXiv:1309.5489  [pdf, other

    stat.CO

    Computational Aspects of Optional Pólya Tree

    Authors: Hui Jiang, John C. Mu, Kun Yang, Chao Du, Luo Lu, Wing Hung Wong

    Abstract: Optional Pólya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional Pólya Tree (LL-OPT), aims at greatly accelerate the computation for OP… ▽ More

    Submitted 21 September, 2013; originally announced September 2013.

  10. arXiv:1307.5870  [pdf, other

    stat.ML cs.LG

    Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery

    Authors: Cun Mu, Bo Huang, John Wright, Donald Goldfarb

    Abstract: Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way tensor of length $n$ and Tucker rank $r$ from Gaussian measureme… ▽ More

    Submitted 15 August, 2013; v1 submitted 22 July, 2013; originally announced July 2013.

    Comments: Slight modifications are made in this second version (mainly, Lemma 5)