Skip to main content

Showing 1–28 of 28 results for author: Veit, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17968  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Efficient Document Ranking with Learnable Late Interactions

    Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2401.09603  [pdf, other

    cs.CV

    Rethinking FID: Towards a Better Evaluation Metric for Image Generation

    Authors: Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, Sanjiv Kumar

    Abstract: As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm. We highlight important drawbacks of FID: Inception's poor representation of the… ▽ More

    Submitted 25 January, 2024; v1 submitted 30 November, 2023; originally announced January 2024.

    Comments: Code is available at: https://github.com/google-research/google-research/tree/master/cmmd

  3. arXiv:2308.10997  [pdf, other

    cs.CV cs.AI cs.LG

    MarkovGen: Structured Prediction for Efficient Text-to-Image Generation

    Authors: Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam, Andreas Veit, Ayan Chakrabarti, Sanjiv Kumar

    Abstract: Modern text-to-image generation models produce high-quality images that are both photorealistic and faithful to the text prompts. However, this quality comes at significant computational cost: nearly all of these models are iterative and require running sampling multiple times with large models. This iterative process is needed to ensure that different regions of the image are not only aligned wit… ▽ More

    Submitted 15 December, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

  4. arXiv:2211.05110  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models with Controllable Working Memory

    Authors: Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. While many downstream applications provide the model with an informational context to aid its performa… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  5. arXiv:2210.16413  [pdf, other

    cs.LG

    When does mixup promote local linearity in learned representations?

    Authors: Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

    Abstract: Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this pape… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Journal ref: NeurIPS 2022 (First Workshop on Interpolation and Beyond)

  6. arXiv:2208.06825  [pdf, other

    cs.LG

    Teacher Guided Training: An Efficient Framework for Knowledge Transfer

    Authors: Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

    Abstract: The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

  7. arXiv:2110.06821  [pdf, other

    cs.LG cs.CL cs.CV

    Leveraging redundancy in attention with Reuse Transformers

    Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

    Abstract: Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model computes such pairwise attention scores repeatedly for the same sequence, in multiple heads in multiple layers. We systematically analyze the empirical similari… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  8. arXiv:2106.08823  [pdf, other

    cs.LG

    Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

    Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

    Abstract: State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this dot product mechanism on a typical distribution of inputs, and study the principal components of their variation. Through eigen analysis of full atten… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 14 pages

  9. arXiv:2103.14586  [pdf, other

    cs.CV cs.AI cs.LG

    Understanding Robustness of Transformers for Image Classification

    Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, Andreas Veit

    Abstract: Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image classification. However, details of the Transformer architecture -- such as the use of non-overlap** patches -- lead one to wonder whether these networks are as robus… ▽ More

    Submitted 8 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: Accepted for publication at ICCV 2021. Rewrote Section 5 and made other minor changes throughout

  10. arXiv:2102.03349  [pdf, other

    cs.LG

    On the Reproducibility of Neural Network Predictions

    Authors: Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar

    Abstract: Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such randomness can cause {\em churn} -- for the same input, disagreements between predictions of the two models independently trained by the same algorithm, con… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: 19 pages, 7 figures

  11. arXiv:2011.08824  [pdf, other

    cs.LG

    Improving Calibration in Deep Metric Learning With Cross-Example Softmax

    Authors: Andreas Veit, Kimberly Wilber

    Abstract: Modern image retrieval systems increasingly rely on the use of deep neural networks to learn embedding spaces in which distance encodes the relevance between a given query and image. In this setting, existing approaches tend to emphasize one of two properties. Triplet-based methods capture top-$k$ relevancy, where all top-$k$ scoring documents are assumed to be relevant to a given query Pairwise c… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: 9 pages

  12. arXiv:2010.12230  [pdf, other

    cs.LG cs.CV math.OC

    Co** with Label Shift via Distributionally Robust Optimisation

    Authors: **gzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

    Abstract: The label shift problem refers to the supervised learning setting where the train and test label distributions do not match. Existing work addressing label shift usually assumes access to an \emph{unlabelled} test sample. This sample may be used to estimate the test label distribution, and to then train a suitably re-weighted classifier. While approaches using this idea have proven effective, thei… ▽ More

    Submitted 17 August, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

  13. arXiv:2007.07314  [pdf, other

    cs.LG stat.ML

    Long-tail learning via logit adjustment

    Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

    Abstract: Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these chall… ▽ More

    Submitted 9 July, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper in ICLR 2021

  14. arXiv:2004.10915  [pdf, other

    cs.LG stat.ML

    Doubly-stochastic mining for heterogeneous retrieval

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example.… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  15. arXiv:1912.03194  [pdf, other

    math.OC cs.LG

    Why are Adaptive Methods Good for Attention Models?

    Authors: **gzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

    Abstract: While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models. The settings under which SGD performs poorly in comparison to adaptive methods are not well understood yet. In this paper, we provide empirical and theoretical evidence that a h… ▽ More

    Submitted 23 October, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

  16. arXiv:1807.00911  [pdf, other

    cs.CV cs.AI cs.LG

    Semantic Segmentation with Scarce Data

    Authors: Isay Katsman, Rohun Tripathi, Andreas Veit, Serge Belongie

    Abstract: Semantic segmentation is a challenging vision problem that usually necessitates the collection of large amounts of finely annotated data, which is often quite expensive to obtain. Coarsely annotated data provides an interesting alternative as it is usually substantially more cheap. In this work, we present a method to leverage coarsely annotated data along with fine supervision to produce better s… ▽ More

    Submitted 1 August, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: ICML 2018 Workshop, camera-ready version

  17. arXiv:1807.00459  [pdf, other

    cs.CR cs.LG

    How To Backdoor Federated Learning

    Authors: Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, Vitaly Shmatikov

    Abstract: Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type. We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint… ▽ More

    Submitted 6 August, 2019; v1 submitted 2 July, 2018; originally announced July 2018.

  18. arXiv:1806.06422  [pdf, other

    cs.CV cs.LG

    Learning to Evaluate Image Captioning

    Authors: Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie

    Abstract: Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified. For example, the newly proposed SPICE correlates… ▽ More

    Submitted 17 June, 2018; originally announced June 2018.

    Comments: CVPR 2018

  19. arXiv:1711.11503  [pdf, other

    cs.CV cs.LG

    Convolutional Networks with Adaptive Inference Graphs

    Authors: Andreas Veit, Serge Belongie

    Abstract: Do convolutional networks really need a fixed feed-forward structure? What if, after identifying the high-level concept of an image, a network could move directly to a layer that can distinguish fine-grained differences? Currently, a network would first need to execute sometimes hundreds of intermediate layers that specialize in unrelated aspects. Ideally, the more a network already knows about an… ▽ More

    Submitted 8 May, 2020; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: IJCV 2019

  20. arXiv:1711.09825  [pdf, other

    cs.CV cs.IR cs.LG

    Separating Self-Expression and Visual Content in Hashtag Supervision

    Authors: Andreas Veit, Maximilian Nickel, Serge Belongie, Laurens van der Maaten

    Abstract: The variety, abundance, and structured nature of hashtags make them an interesting data source for training vision models. For instance, hashtags have the potential to significantly reduce the problem of manual supervision and annotation when learning vision models for a large number of concepts. However, a key challenge when learning from hashtags is that they are inherently subjective because th… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

  21. arXiv:1705.10694  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Deep Learning is Robust to Massive Label Noise

    Authors: David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit

    Abstract: Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to larger but noisy datasets that are more easily obtained. In this paper, we show that deep neural networks are capable of generalizing from training data for wh… ▽ More

    Submitted 26 February, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

  22. arXiv:1701.01619  [pdf, other

    cs.CV

    Learning From Noisy Large-Scale Datasets With Minimal Supervision

    Authors: Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie

    Abstract: We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the infor… ▽ More

    Submitted 9 April, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

    Comments: CVPR 2017

  23. arXiv:1605.06431  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Residual Networks Behave Like Ensembles of Relatively Shallow Networks

    Authors: Andreas Veit, Michael Wilber, Serge Belongie

    Abstract: In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through re… ▽ More

    Submitted 26 October, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

    Comments: NIPS 2016

  24. arXiv:1603.07810  [pdf, other

    cs.CV cs.AI cs.LG

    Conditional Similarity Networks

    Authors: Andreas Veit, Serge Belongie, Theofanis Karaletsos

    Abstract: What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity. A main reason for this is that contradicting notions o… ▽ More

    Submitted 10 April, 2017; v1 submitted 24 March, 2016; originally announced March 2016.

    Comments: CVPR 2017

  25. arXiv:1601.07140  [pdf, other

    cs.CV

    COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

    Authors: Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, Serge Belongie

    Abstract: This paper describes the COCO-Text dataset. In recent years large-scale datasets like SUN and Imagenet drove the advancement of scene understanding and object recognition. The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images. The dataset is based on the MS COCO dataset, which contains images of complex everyday scenes. The images were not collect… ▽ More

    Submitted 19 June, 2016; v1 submitted 26 January, 2016; originally announced January 2016.

  26. arXiv:1509.07543  [pdf, other

    cs.HC cs.CV

    On Optimizing Human-Machine Task Assignments

    Authors: Andreas Veit, Michael Wilber, Rajan Vaish, Serge Belongie, James Davis, Vishal Anand, Anshu Aviral, Prithvijit Chakrabarty, Yash Chandak, Sidharth Chaturvedi, Chinmaya Devaraj, Ankit Dhall, Utkarsh Dwivedi, Sanket Gupte, Sharath N. Sridhar, Karthik Paga, Anuj Pahuja, Aditya Raisinghani, Ayush Sharma, Shweta Sharma, Darpana Sinha, Nisarg Thakkar, K. Bala Vignesh, Utkarsh Verma, Kanniganti Abhishek , et al. (26 additional authors not shown)

    Abstract: When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease… ▽ More

    Submitted 24 September, 2015; originally announced September 2015.

    Comments: HCOMP 2015 Work in Progress

  27. arXiv:1509.07473  [pdf, other

    cs.CV

    Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

    Authors: Andreas Veit, Balazs Kovacs, Sean Bell, Julian McAuley, Kavita Bala, Serge Belongie

    Abstract: With the rapid proliferation of smart mobile devices, users now take millions of photos every day. These include large numbers of clothing and accessory images. We would like to answer questions like `What outfit goes well with this pair of shoes?' To answer these types of questions, one has to go beyond learning visual similarity and learn a visual notion of compatibility across categories. In th… ▽ More

    Submitted 24 September, 2015; originally announced September 2015.

    Comments: ICCV 2015

  28. arXiv:1404.0200  [pdf, other

    cs.LG stat.AP

    Household Electricity Demand Forecasting -- Benchmarking State-of-the-Art Methods

    Authors: Andreas Veit, Christoph Goebel, Rohit Tidke, Christoph Doblander, Hans-Arno Jacobsen

    Abstract: The increasing use of renewable energy sources with variable output, such as solar photovoltaic and wind power generation, calls for Smart Grids that effectively manage flexible loads and energy storage. The ability to forecast consumption at different locations in distribution systems will be a key capability of Smart Grids. The goal of this paper is to benchmark state-of-the-art methods for fore… ▽ More

    Submitted 1 April, 2014; originally announced April 2014.

    Comments: Technical Report

    ACM Class: I.2.6