Skip to main content

Showing 1–23 of 23 results for author: Chi, E H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2202.00834  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Nonlinear Initialization Methods for Low-Rank Neural Networks

    Authors: Kiran Vodrahalli, Rakesh Shivanna, Maheswaran Sathiamoorthy, Sagar Jain, Ed H. Chi

    Abstract: We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initializat… ▽ More

    Submitted 19 May, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 32 pages, 4 figures, in submission. fixed some errors in previous versions and re-structured/re-focused the paper

  2. arXiv:2106.03760  [pdf, other

    cs.LG math.OC stat.ML

    DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

    Authors: Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi

    Abstract: The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness ca… ▽ More

    Submitted 31 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Appeared in NeurIPS 2021

  3. arXiv:2105.09985  [pdf, other

    cs.LG stat.ML

    Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective

    Authors: Flavien Prost, Pranjal Awasthi, Nick Blumm, Aditee Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel

    Abstract: In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a s… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

  4. arXiv:2008.13535  [pdf, other

    cs.IR cs.LG stat.ML

    DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

    Authors: Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi

    Abstract: Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 August, 2020; originally announced August 2020.

    Journal ref: In Proceedings of the Web Conference 2021 (WWW '21)

  5. arXiv:2008.07032  [pdf, other

    cs.LG stat.ML

    Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

    Authors: Zhe Chen, Yuyan Wang, Dong Lin, Derek Zhiyuan Cheng, Lichan Hong, Ed H. Chi, Claire Cui

    Abstract: Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results. Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. However, ensembles are expensive to train and serve for web-scale tr… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: 9 pages

  6. arXiv:2008.05808  [pdf, other

    cs.LG stat.ML

    Small Towers Make Big Differences

    Authors: Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, Ed H. Chi

    Abstract: Multi-task learning aims at solving multiple machine learning tasks at the same time. A good solution to a multi-task learning problem should be generalizable in addition to being Pareto optimal. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization as a result of parameterization in multi-task deep learning models. As a multi-objective… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  7. arXiv:2008.02930  [pdf, other

    cs.LG cs.IR stat.ML

    Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval

    Authors: Tao Wu, Ellie Ka-In Chio, Heng-Tze Cheng, Yu Du, Steffen Rendle, Dima Kuzmin, Ritesh Agarwal, Li Zhang, John Anderson, Sarvjeet Singh, Tushar Chandra, Ed H. Chi, Wen Li, Ankit Kumar, Xiang Ma, Alex Soares, Nitin **dal, Pei Cao

    Abstract: Many recent advances in neural information retrieval models, which predict top-K items given a query, learn directly from a large training set of (query, item) pairs. However, they are often insufficient when there are many previously unseen (query, item) combinations, often referred to as the cold start problem. Furthermore, the search system can be biased towards items that are frequently shown… ▽ More

    Submitted 18 August, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted at CIKM 2020

  8. arXiv:2007.12865  [pdf, other

    cs.LG cs.IR stat.ML

    Self-supervised Learning for Large-scale Item Recommendations

    Authors: Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger

    Abstract: Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the… ▽ More

    Submitted 24 February, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

  9. arXiv:2006.16375  [pdf, other

    cs.LG stat.ML

    Improving Calibration through the Relationship with Adversarial Robustness

    Authors: Yao Qin, Xuezhi Wang, Alex Beutel, Ed H. Chi

    Abstract: Neural networks lack adversarial robustness, i.e., they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated predictions, i.e., the predicted probability is not a good indicator of how much we should trust our model. In this paper, we study the connection between adversarial robust… ▽ More

    Submitted 14 December, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Published at NeurIPS-2021

  10. arXiv:2006.13114  [pdf, other

    cs.LG stat.ML

    Fairness without Demographics through Adversarially Reweighted Learning

    Authors: Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, Ed H. Chi

    Abstract: Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fai… ▽ More

    Submitted 3 November, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: To appear at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  11. arXiv:2006.05067  [pdf, other

    cs.LG stat.ML

    Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model

    Authors: Jiaqi Ma, Xinyang Yi, Wei**g Tang, Zhe Zhao, Lichan Hong, Ed H. Chi, Qiaozhu Mei

    Abstract: We investigate the Plackett-Luce (PL) model based listwise learning-to-rank (LTR) on data with partitioned preference, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown. Given $N$ items with $M$ partitions, calculating the likelihood of data with partitioned preference under the PL model has a time complexity of $O(N+S!)$,… ▽ More

    Submitted 25 February, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  12. arXiv:2003.07336  [pdf, ps, other

    cs.LG cs.PF stat.ML

    Develo** a Recommendation Benchmark for MLPerf Training and Inference

    Authors: Carole-Jean Wu, Robin Burke, Ed H. Chi, Joseph Konstan, Julian McAuley, Yves Raimond, Hao Zhang

    Abstract: Deep learning-based recommendation models are used pervasively and broadly, for example, to recommend movies, products, or other information most relevant to users, in order to enhance the user experience. Among various application domains which have received significant industry and academia research attention, such as image classification, object detection, language and speech translation, the p… ▽ More

    Submitted 14 April, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

  13. arXiv:2002.03532  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding and Improving Knowledge Distillation

    Authors: Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain

    Abstract: Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student's compactness, without sacrifi… ▽ More

    Submitted 28 February, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

  14. arXiv:1911.01916  [pdf, other

    cs.LG stat.ML

    Practical Compositional Fairness: Understanding Fairness in Multi-Component Recommender Systems

    Authors: Xuezhi Wang, Nithum Thain, Anu Sinha, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel

    Abstract: How can we build recommender systems to take into account fairness? Real-world recommender systems are often composed of multiple models, built by multiple teams. However, most research on fairness focuses on improving fairness in a single model. Further, recent research on classification fairness has shown that combining multiple "fair" classifiers can still result in an "unfair" classification s… ▽ More

    Submitted 25 January, 2021; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: WSDM 2021

  15. arXiv:1910.11779  [pdf, other

    cs.LG stat.ML

    Toward a better trade-off between performance and fairness with kernel-based distribution matching

    Authors: Flavien Prost, Hai Qian, Qiuwen Chen, Ed H. Chi, Jilin Chen, Alex Beutel

    Abstract: As recent literature has demonstrated how classifiers often carry unintended biases toward some subgroups, deploying machine learned models to users demands careful consideration of the social consequences. How should we address this problem in a real-world system? How should we balance core performance and fairness metrics? In this paper, we introduce a MinDiff framework for regularizing classifi… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

  16. arXiv:1906.09688  [pdf, other

    cs.LG stat.ML

    Transfer of Machine Learning Fairness across Domains

    Authors: Candice Schumann, Xuezhi Wang, Alex Beutel, Jilin Chen, Hai Qian, Ed H. Chi

    Abstract: If our models are used in new or unexpected cases, do we know if they will make fair predictions? Previously, researchers developed ways to debias a model for a single problem domain. However, this is often not how models are trained and used in practice. For example, labels and demographics (sensitive attributes) are often hard to observe, resulting in auxiliary or synthetic data to be used for t… ▽ More

    Submitted 14 November, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

  17. arXiv:1905.09414  [pdf, other

    cs.LG stat.ML

    Quantifying Long Range Dependence in Language and User Behavior to improve RNNs

    Authors: Francois Belletti, Minmin Chen, Ed H. Chi

    Abstract: Characterizing temporal dependence patterns is a critical step in understanding the statistical properties of sequential data. Long Range Dependence (LRD) --- referring to long-range correlations decaying as a power law rather than exponentially w.r.t. distance --- demands a different set of tools for modeling the underlying dynamics of the sequential data. While it has been widely conjectured tha… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

  18. arXiv:1903.00780  [pdf, other

    cs.CY cs.AI cs.IR cs.LG stat.ML

    Fairness in Recommendation Ranking through Pairwise Comparisons

    Authors: Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, Cristos Goodrow

    Abstract: Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommend… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

  19. arXiv:1902.09689  [pdf, other

    stat.ML cs.LG

    AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks

    Authors: Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi

    Abstract: Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical f… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: Published as a conference paper at ICLR 2019

  20. arXiv:1902.08588  [pdf, other

    cs.LG cs.IR stat.ML

    Towards Neural Mixture Recommender for Long Range Dependent User Sequences

    Authors: Jiaxi Tang, Francois Belletti, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu, Ed H. Chi

    Abstract: Understanding temporal dynamics has proved to be highly valuable for accurate recommendation. Sequential recommenders have been successful in modeling the dynamics of users and items over time. However, while different model architectures excel at capturing various temporal ranges or dynamics, distinct application contexts require adapting to diverse behaviors. In this paper we examine how to buil… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

    Comments: Accepted at WWW 2019

  21. arXiv:1901.08987  [pdf, other

    cs.LG stat.ML

    Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

    Authors: Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington

    Abstract: Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and… ▽ More

    Submitted 23 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

  22. arXiv:1901.04562  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements

    Authors: Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Allison Woodruff, Christine Luu, Pierre Kreitmann, Jonathan Bischof, Ed H. Chi

    Abstract: As more researchers have become aware of and passionate about algorithmic fairness, there has been an explosion in papers laying out new metrics, suggesting algorithms to address issues, and calling attention to issues in existing applications of machine learning. This research has greatly expanded our understanding of the concerns and challenges in deploying machine learning, but there has been m… ▽ More

    Submitted 14 January, 2019; originally announced January 2019.

  23. arXiv:1809.10610  [pdf, other

    cs.LG stat.ML

    Counterfactual Fairness in Text Classification through Robustness

    Authors: Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, Alex Beutel

    Abstract: In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute referenced in the example were different? Toxicity classifiers demonstrate a counterfactual fairness issue by predicting that "Some people are gay" is toxic while "Some people are straight" is nontoxic. We offer a metric, counterfactual token f… ▽ More

    Submitted 13 February, 2019; v1 submitted 27 September, 2018; originally announced September 2018.