Skip to main content

Showing 1–7 of 7 results for author: Sadhanala, V

.
  1. arXiv:2301.12005  [pdf, other

    cs.LG

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Authors: Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

    Abstract: Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages… ▽ More

    Submitted 3 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  2. arXiv:2209.09175  [pdf, other

    math.ST

    Exponential Family Trend Filtering on Lattices

    Authors: Veeranjaneyulu Sadhanala, Robert Bassett, James Sharpnack, Daniel J. McDonald

    Abstract: Trend filtering is a modern approach to nonparametric regression that is more adaptive to local smoothness than splines or similar basis procedures. Existing analyses of trend filtering focus on estimating a function corrupted by homoskedastic Gaussian noise, but our work extends this technique to general exponential family distributions. This extension is motivated by the need to study massive, g… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 53 pages; 6 figures; 3 tables

  3. arXiv:2112.14758  [pdf, other

    stat.ML cs.LG math.ST

    Multivariate Trend Filtering for Lattice Data

    Authors: Veeranjaneyulu Sadhanala, Yu-Xiang Wang, Addison J. Hu, Ryan J. Tibshirani

    Abstract: We study a multivariate version of trend filtering, called Kronecker trend filtering or KTF, for the case in which the design points form a lattice in $d$ dimensions. KTF is a natural extension of univariate trend filtering (Steidl et al., 2006; Kim et al., 2009; Tibshirani, 2014), and is defined by minimizing a penalized least squares problem whose penalty term sums the absolute (higher-order) di… ▽ More

    Submitted 5 April, 2024; v1 submitted 29 December, 2021; originally announced December 2021.

  4. arXiv:1903.10083  [pdf, other

    stat.ML cs.LG

    A Higher-Order Kolmogorov-Smirnov Test

    Authors: Veeranjaneyulu Sadhanala, Yu-Xiang Wang, Aaditya Ramdas, Ryan J. Tibshirani

    Abstract: We present an extension of the Kolmogorov-Smirnov (KS) two-sample test, which can be more sensitive to differences in the tails. Our test statistic is an integral probability metric (IPM) defined over a higher-order total variation ball, recovering the original KS test as its simplest case. We give an exact representer result for our IPM, which generalizes the fact that the original KS test statis… ▽ More

    Submitted 24 March, 2019; originally announced March 2019.

    Comments: 18 pages, AISTATS 2019

  5. arXiv:1702.05037  [pdf, other

    stat.ML

    Additive Models with Trend Filtering

    Authors: Veeranjaneyulu Sadhanala, Ryan J. Tibshirani

    Abstract: We study additive models built with trend filtering, i.e., additive models whose components are each regularized by the (discrete) total variation of their $k$th (discrete) derivative, for a chosen integer $k \geq 0$. This results in $k$th degree piecewise polynomial components, (e.g., $k=0$ gives piecewise constant components, $k=1$ gives piecewise linear, $k=2$ gives piecewise quadratic, etc.).… ▽ More

    Submitted 21 November, 2018; v1 submitted 16 February, 2017; originally announced February 2017.

    Comments: 63 pages

  6. arXiv:1605.08400  [pdf, other

    math.ST stat.ML

    Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers

    Authors: Veeranjaneyulu Sadhanala, Yu-Xiang Wang, Ryan Tibshirani

    Abstract: We consider the problem of estimating a function defined over $n$ locations on a $d$-dimensional grid (having all side lengths equal to $n^{1/d}$). When the function is constrained to have discrete total variation bounded by $C_n$, we derive the minimax optimal (squared) $\ell_2$ estimation error rate, parametrized by $n$ and $C_n$. Total variation denoising, also known as the fused lasso, is seen… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  7. arXiv:1409.6086  [pdf, other

    stat.ML math.OC

    Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

    Authors: Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric P. Xing

    Abstract: We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. Whenever possible, we perform computations asynchronously, which helps attain speedups on multicore machines as well as in distributed environments. Moreover, instead of worst-case bounded delays, our methods only depend (mildly) on \emp… ▽ More

    Submitted 12 February, 2016; v1 submitted 22 September, 2014; originally announced September 2014.