Skip to main content

Showing 1–29 of 29 results for author: Keerthi, S

.
  1. arXiv:2403.00803  [pdf, other

    cs.IR cs.AI cs.LG

    LiMAML: Personalization of Deep Recommender Models via Meta Learning

    Authors: Ruofan Wang, Prakruthi Prabhakar, Gaurav Srivastava, Tianqi Wang, Zeinab S. Jalali, Varun Bharill, Yunbo Ouyang, Aastha Nigam, Divya Venugopalan, Aman Gupta, Fedor Borisyuk, Sathiya Keerthi, Ajith Muralidharan

    Abstract: In the realm of recommender systems, the ubiquitous adoption of deep neural networks has emerged as a dominant paradigm for modeling diverse business objectives. As user bases continue to expand, the necessity of personalization and frequent model updates have assumed paramount significance to ensure the delivery of relevant and refreshed experiences to a diverse array of members. In this work, we… ▽ More

    Submitted 23 February, 2024; originally announced March 2024.

  2. arXiv:2402.06859  [pdf, other

    cs.LG cs.AI cs.IR

    LiRank: Industrial Large Scale Ranking Models at LinkedIn

    Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

    Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    ACM Class: H.3.3

  3. arXiv:2401.12332  [pdf, other

    cs.LG math.OC

    A Precise Characterization of SGD Stability Using Loss Surface Geometry

    Authors: Gregory Dexter, Borja Ocejo, Sathiya Keerthi, Aman Gupta, Ayan Acharya, Rajiv Khanna

    Abstract: Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a statio… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: To appear at ICLR 2024

  4. arXiv:2311.01278  [pdf, other

    gr-qc astro-ph.HE hep-ph nucl-th

    Rotating Bose-Einstein Condensate Stars at finite temperature

    Authors: P. S. Aswathi, P. S. Keerthi, O. P. Jyothilakshmi, Lakshmi J. Naik, V. Sreekanth

    Abstract: We study the effect of temperature on the global properties of static and slowly rotating self-gravitating Bose-Einstein condensate (BEC) stars within general relativity. We employ a recently developed temperature dependent BEC equation of state (EoS) to describe the stellar matter by assuming that the condensate can be described by a non-relativistic EoS. Stellar profiles are obtained using gener… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 12 pages, 10 figures; accepted for publication in Phys. Rev. D

    Journal ref: Phys. Rev. D 108, 123001 (2023)

  5. arXiv:2309.01885  [pdf, other

    stat.ML cs.CL cs.LG

    QuantEase: Optimization-based Quantization for Language Models

    Authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

    Abstract: With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is f… ▽ More

    Submitted 1 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  6. arXiv:2302.09693  [pdf, other

    stat.ML cs.LG

    mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

    Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

    Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as… ▽ More

    Submitted 30 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2212.04343

  7. arXiv:2212.04343  [pdf, other

    cs.LG math.OC

    Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

    Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, David Durfee, Ayan Acharya, Sathiya Keerthi, Rahul Mazumder

    Abstract: Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abiliti… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  8. arXiv:2108.05839  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Logit Attenuating Weight Normalization

    Authors: Aman Gupta, Rohan Ramanath, Jun Shi, Anika Ramachandran, Sirou Zhou, Mingzhou Zhou, S. Sathiya Keerthi

    Abstract: Over-parameterized deep networks trained using gradient-based optimizers are a popular choice for solving classification and ranking problems. Without appropriately tuned $\ell_2$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large, causing training loss to become too small and the network to lose its adaptivity (ability to move… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 23 pages

  9. arXiv:2103.05277  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Efficient Vertex-Oriented Polytopic Projection for Web-scale Applications

    Authors: Rohan Ramanath, S. Sathiya Keerthi, Yao Pan, Konstantin Salomatin, Kinjal Basu

    Abstract: We consider applications involving a large set of instances of projecting points to polytopes. We develop an intuition guided by theoretical and empirical analysis to show that when these instances follow certain structures, a large majority of the projections lie on vertices of the polytopes. To do these projections efficiently we derive a vertex-oriented incremental algorithm to project a point… ▽ More

    Submitted 6 January, 2022; v1 submitted 9 March, 2021; originally announced March 2021.

    ACM Class: G.1.6; I.2.11

  10. arXiv:2003.11774  [pdf, other

    cs.CV cs.LG eess.IV

    Image Generation Via Minimizing Fréchet Distance in Discriminator Feature Space

    Authors: Khoa D. Doan, Saurav Manchanda, Fengjiao Wang, Sathiya Keerthi, Avradeep Bhowmik, Chandan K. Reddy

    Abstract: For a given image generation problem, the intrinsic image manifold is often low dimensional. We use the intuition that it is much better to train the GAN generator by minimizing the distributional distance between real and generated images in a small dimensional feature space representing such a manifold than on the original pixel-space. We use the feature space of the GAN discriminator for such a… ▽ More

    Submitted 30 March, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

  11. arXiv:2003.01296  [pdf, other

    cs.LG stat.ML

    Regression via Implicit Models and Optimal Transport Cost Minimization

    Authors: Saurav Manchanda, Khoa Doan, Pranjul Yadav, S. Sathiya Keerthi

    Abstract: This paper addresses the classic problem of regression, which involves the inductive learning of a map, $y=f(x,z)$, $z$ denoting noise, $f:\mathbb{R}^n\times \mathbb{R}^k \rightarrow \mathbb{R}^m$. Recently, Conditional GAN (CGAN) has been applied for regression and has shown to be advantageous over the other standard approaches like Gaussian Process Regression, given its ability to implicitly mod… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  12. arXiv:2002.07971  [pdf, other

    cs.LG stat.ML

    Gradient Boosting Neural Networks: GrowNet

    Authors: Sarkhan Badirli, Xuanqing Liu, Zhengming Xing, Avradeep Bhowmik, Khoa Doan, Sathiya S. Keerthi

    Abstract: A novel gradient boosting framework is proposed where shallow neural networks are employed as ``weak learners''. General loss functions are considered under this unified framework with specific examples presented for classification, regression, and learning to rank. A fully corrective step is incorporated to remedy the pitfall of greedy function approximation of classic gradient boosting decision… ▽ More

    Submitted 14 June, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: Supplementary material starts after references

  13. arXiv:2002.02879  [pdf, other

    cs.LG cs.IR cs.SI stat.ML

    Targeted display advertising: the case of preferential attachment

    Authors: Saurav Manchanda, Pranjul Yadav, Khoa Doan, S. Sathiya Keerthi

    Abstract: An average adult is exposed to hundreds of digital advertisements daily (https://www.mediadynamicsinc.com/uploads/files/PR092214-Note-only-150-Ads-2mk.pdf), making the digital advertisement industry a classic example of a big-data-driven platform. As such, the ad-tech industry relies on historical engagement logs (clicks or purchases) to identify potentially interested users for the advertisement… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: IEEE BigData 2019 paper

  14. arXiv:1905.12868  [pdf, other

    cs.LG stat.ML

    Benchmarking Regression Methods: A comparison with CGAN

    Authors: Karan Aggarwal, Matthieu Kirchmeyer, Pranjul Yadav, S. Sathiya Keerthi, Patrick Gallinari

    Abstract: In recent years, impressive progress has been made in the design of implicit probabilistic models via Generative Adversarial Networks (GAN) and its extension, the Conditional GAN (CGAN). Excellent solutions have been demonstrated mostly in image processing applications which involve large, continuous output spaces. There is almost no application of these powerful tools to problems having small dim… ▽ More

    Submitted 4 February, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

  15. arXiv:1905.06425  [pdf, other

    cs.DB

    An Empirical Analysis of Deep Learning for Cardinality Estimation

    Authors: Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi

    Abstract: We implement and evaluate deep learning for cardinality estimation by studying the accuracy, space and time trade-offs across several architectures. We find that simple deep learning models can learn cardinality estimations across a variety of datasets (reducing the error by 72% - 98% on average compared to PostgreSQL). In addition, we empirically evaluate the impact of injecting cardinality estim… ▽ More

    Submitted 11 September, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

  16. arXiv:1803.08604  [pdf, other

    cs.DB cs.AI cs.LG

    Learning State Representations for Query Optimization with Deep Reinforcement Learning

    Authors: Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi

    Abstract: Deep reinforcement learning is quickly changing the field of artificial intelligence. These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains. In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in th… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  17. arXiv:1802.00130  [pdf, other

    stat.ML cs.LG math.OC

    Distributed Newton Methods for Deep Neural Networks

    Authors: Chien-Chih Wang, Kent Loong Tan, Chun-Ting Chen, Yu-Hsiang Lin, S. Sathiya Keerthi, Dhruv Mahajan, S. Sundararajan, Chih-Jen Lin

    Abstract: Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this pa… ▽ More

    Submitted 31 January, 2018; originally announced February 2018.

    Comments: Supplementary materials and experimental code are available at https://www.csie.ntu.edu.tw/~cjlin/papers/dnn

  18. arXiv:1711.05482  [pdf, ps, other

    cs.LG stat.ML

    Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

    Authors: Dhruv Mahajan, Vivek Gupta, S Sathiya Keerthi, Sellamanickam Sundararajan, Shravan Narayanamurthy, Rahul Kidambi

    Abstract: For many applications, an ensemble of base classifiers is an effective solution. The tuning of its parameters(number of classes, amount of data on which each classifier is to be trained on, etc.) requires G, the generalization error of a given ensemble. The efficient estimation of G is the focus of this paper. The key idea is to approximate the variance of the class scores/probabilities of the bas… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: 12 Pages, 4 Figures, 12 Pages, Under Review in SDM 2018

  19. arXiv:1704.06731  [pdf, ps, other

    cs.LG

    Batch-Expansion Training: An Efficient Optimization Framework

    Authors: Michał Dereziński, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer

    Abstract: We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset. As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained. Moreover, BET can be easily paired with most batch optimizers, does not requir… ▽ More

    Submitted 23 February, 2018; v1 submitted 21 April, 2017; originally announced April 2017.

  20. arXiv:1511.02024  [pdf, other

    cs.LG cs.CL

    Towards a Better Understanding of Predict and Count Models

    Authors: S. Sathiya Keerthi, Tobias Schnabel, Rajiv Khanna

    Abstract: In a recent paper, Levy and Goldberg pointed out an interesting connection between prediction-based word embedding models and count models based on pointwise mutual information. Under certain conditions, they showed that both models end up optimizing equivalent objective functions. This paper explores this connection in more detail and lays out the factors leading to differences between these mode… ▽ More

    Submitted 6 November, 2015; originally announced November 2015.

    Comments: 17 pages

  21. arXiv:1405.4544  [pdf, ps, other

    cs.LG

    A distributed block coordinate descent method for training $l_1$ regularized linear classifiers

    Authors: Dhruv Mahajan, S. Sathiya Keerthi, S. Sundararajan

    Abstract: Distributed training of $l_1$ regularized classifiers has received great attention recently. Most existing methods approach this problem by taking steps obtained from approximating the objective by a quadratic approximation that is decoupled at the individual variable level. These methods are designed for multicore and MPI platforms where communication costs are low. They are inefficient on system… ▽ More

    Submitted 16 March, 2015; v1 submitted 18 May, 2014; originally announced May 2014.

  22. arXiv:1405.4543  [pdf, other

    cs.LG

    A Distributed Algorithm for Training Nonlinear Kernel Machines

    Authors: Dhruv Mahajan, S. Sathiya Keerthi, S. Sundararajan

    Abstract: This paper concerns the distributed training of nonlinear kernel machines on Map-Reduce. We show that a re-formulation of Nyström approximation based solution which is solved using gradient based techniques is well suited for this, especially when it is necessary to work with a large number of basis points. The main advantages of this approach are: avoidance of computing the pseudo-inverse of the… ▽ More

    Submitted 18 May, 2014; originally announced May 2014.

  23. arXiv:1311.2378  [pdf, ps, other

    cs.LG

    An Empirical Evaluation of Sequence-Tagging Trainers

    Authors: P. Balamurugan, Shirish Shevade, S. Sundararajan, S. S Keerthi

    Abstract: The task of assigning label sequences to a set of observed sequences is common in computational linguistics. Several models for sequence labeling have been proposed over the last few years. Here, we focus on discriminative models for sequence labeling. Many batch and online (updating model parameters after visiting each example) learning algorithms have been proposed in the literature. On large da… ▽ More

    Submitted 11 November, 2013; originally announced November 2013.

    Comments: 18 pages, 5 figures ams.org

  24. arXiv:1311.2276  [pdf, ps, other

    cs.LG

    A Quantitative Evaluation Framework for Missing Value Imputation Algorithms

    Authors: Vinod Nair, Rahul Kidambi, Sundararajan Sellamanickam, S. Sathiya Keerthi, Johannes Gehrke, Vijay Narayanan

    Abstract: We consider the problem of quantitatively evaluating missing value imputation algorithms. Given a dataset with missing values and a choice of several imputation algorithms to fill them in, there is currently no principled way to rank the algorithms using a quantitative metric. We develop a framework based on treating imputation evaluation as a problem of comparing two distributions and show how it… ▽ More

    Submitted 10 November, 2013; originally announced November 2013.

    Comments: 9 pages

  25. arXiv:1311.2137  [pdf, ps, other

    cs.LG

    A Structured Prediction Approach for Missing Value Imputation

    Authors: Rahul Kidambi, Vinod Nair, Sundararajan Sellamanickam, S. Sathiya Keerthi

    Abstract: Missing value imputation is an important practical problem. There is a large body of work on it, but there does not exist any work that formulates the problem in a structured output setting. Also, most applications have constraints on the imputed data, for example on the distribution associated with each variable. None of the existing imputation methods use these constraints. In this paper we prop… ▽ More

    Submitted 9 November, 2013; originally announced November 2013.

    Comments: 9 Pages

  26. arXiv:1311.0636  [pdf, ps, other

    cs.LG cs.DC

    A Parallel SGD method with Strong Convergence

    Authors: Dhruv Mahajan, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

    Abstract: This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each iteration of a batch descent method. The method has strong convergence properties. Experiments on datasets with high dimensional feature spaces show the value of th… ▽ More

    Submitted 4 November, 2013; originally announced November 2013.

  27. arXiv:1310.8418  [pdf, ps, other

    cs.LG

    An efficient distributed learning algorithm based on effective local functional approximations

    Authors: Dhruv Mahajan, Nikunj Agrawal, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

    Abstract: Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are substantial and algorithms need to be designed suitably considering those costs. In this paper we give a novel approach to the distributed training of linear class… ▽ More

    Submitted 16 March, 2015; v1 submitted 31 October, 2013; originally announced October 2013.

  28. Mean Field Methods for a Special Class of Belief Networks

    Authors: C. Bhattacharyya, S. S. Keerthi

    Abstract: The chief aim of this paper is to propose mean-field approximations for a broad class of Belief networks, of which sigmoid and noisy-or networks can be seen as special cases. The approximations are based on a powerful mean-field theory suggested by Plefka. We show that Saul, Jaakkola and Jordan' s approach is the first order approximation in Plefka's approach, via a variational… ▽ More

    Submitted 1 June, 2011; originally announced June 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 15, pages 91-114, 2001

  29. arXiv:0706.1318  [pdf, ps, other

    cs.DM cs.DS

    Constructing a maximum utility slate of on-line advertisements

    Authors: S. Sathiya Keerthi, John A. Tomlin

    Abstract: We present an algorithm for constructing an optimal slate of sponsored search advertisements which respects the ordering that is the outcome of a generalized second price auction, but which must also accommodate complicating factors such as overall budget constraints. The algorithm is easily fast enough to use on the fly for typical problem sizes, or as a subroutine in an overall optimization.

    Submitted 9 June, 2007; originally announced June 2007.

    Report number: YR-2007-001