Skip to main content

Showing 1–50 of 78 results for author: Globerson, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14528  [pdf, other

    cs.LG cs.AI

    DeciMamba: Exploring the Length Extrapolation Potential of Mamba

    Authors: Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes

    Abstract: Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be re… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Link To Official Implementation: https://github.com/assafbk/DeciMamba

  2. arXiv:2406.12775  [pdf, other

    cs.CL

    Hop** Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries

    Authors: Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson

    Abstract: Large language models (LLMs) can solve complex multi-step problems, but little is known about how these computations are implemented internally. Motivated by this, we study how LLMs answer multi-hop queries such as "The spouse of the performer of Imagine is". These queries require two information extraction steps: a latent one for resolving the first hop ("the performer of Imagine") into the bridg… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.04291  [pdf, other

    cs.LG stat.ML

    Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

    Authors: Adam Fisch, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson, William W. Cohen

    Abstract: Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2406.03618  [pdf, other

    cs.CL

    TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

    Authors: Avi Caciularu, Alon Jacovi, Eyal Ben-David, Sasha Goldshtein, Tal Schuster, Jonathan Herzig, Gal Elidan, Amir Globerson

    Abstract: Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Website (https://tact-benchmark.github.io), Huggingface (https://huggingface.co/datasets/google/TACT)

  5. arXiv:2406.01317  [pdf, other

    cs.LG cs.AI

    The Intelligible and Effective Graph Neural Additive Networks

    Authors: Maya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach

    Abstract: Graph Neural Networks (GNNs) have emerged as the predominant approach for learning over graph-structured data. However, most GNNs operate as black-box models and require post-hoc explanations, which may not suffice in high-stakes scenarios where transparency is crucial. In this paper, we present a GNN that is interpretable by design. Our model, Graph Neural Additive Network (GNAN), is a novel exte… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2405.06034  [pdf, other

    cs.LG

    Bayesian Prediction-Powered Inference

    Authors: R. Alex Hofer, Joshua Maynez, Bhuwan Dhingra, Adam Fisch, Amir Globerson, William W. Cohen

    Abstract: Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. Specifically, PPI methods provide tighter confidence intervals by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate, but potentially biased, automatic system. We propose a framework for PPI based on Bayesian inference that… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2404.09991  [pdf, other

    cs.RO cs.CV

    EgoPet: Egomotion and Interaction Data from an Animal's Perspective

    Authors: Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell

    Abstract: Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction.… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: https://www.amirbar.net/egopet

  8. arXiv:2404.05729  [pdf, other

    cs.CV

    Finding Visual Task Vectors

    Authors: Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar

    Abstract: Visual Prompting is a technique for teaching models to perform a visual task via in-context examples, without any additional training. In this work, we analyze the activations of MAE-VQGAN, a recent Visual Prompting model, and find task vectors, activations that encode task-specific information. Equipped with this insight, we demonstrate that it is possible to identify the task vectors and use the… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: https://github.com/alhojel/visual_task_vectors

  9. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  10. arXiv:2402.07875  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

    Authors: Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen

    Abstract: In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (re… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  11. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  12. arXiv:2310.15916  [pdf, other

    cs.CL

    In-Context Learning Creates Task Vectors

    Authors: Roee Hendel, Mor Geva, Amir Globerson

    Abstract: In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at Findings of EMNLP 2023

  13. arXiv:2309.04332  [pdf, other

    cs.LG cs.AI

    Graph Neural Networks Use Graphs When They Shouldn't

    Authors: Maya Bechler-Speicher, Ido Amos, Ran Gilad-Bachrach, Amir Globerson

    Abstract: Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph- structure in such cases, it… ▽ More

    Submitted 25 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  14. arXiv:2308.00566  [pdf, other

    cs.CV cs.AI cs.LG

    Stochastic positional embeddings improve masked image modeling

    Authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun

    Abstract: Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determi… ▽ More

    Submitted 27 February, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: Code and models available in https://github.com/amirbar/StoP

  15. arXiv:2307.12976  [pdf, other

    cs.CL

    Evaluating the Ripple Effects of Knowledge Editing in Language Models

    Authors: Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva

    Abstract: Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been success… ▽ More

    Submitted 20 December, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2024. Author's final version

  16. arXiv:2307.03319  [pdf, other

    cs.CL

    Covering Uncommon Ground: Gap-Focused Question Generation for Answer Assessment

    Authors: Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, Amir Globerson

    Abstract: Human communication often involves information gaps between the interlocutors. For example, in an educational dialogue, a student often provides an answer that is incomplete, and there is a gap between this answer and the perfect one expected by the teacher. Successful dialogue then hinges on the teacher asking about this gap in an effective manner, thus creating a rich and interactive educational… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  17. arXiv:2305.13281  [pdf, other

    cs.CL

    LM vs LM: Detecting Factual Errors via Cross Examination

    Authors: Roi Cohen, May Hamri, Mor Geva, Amir Globerson

    Abstract: A prominent weakness of modern language models (LMs) is their tendency to generate factually incorrect text, which hinders their usability. A natural question is whether such factual errors can be detected automatically. Inspired by truth-seeking mechanisms in law, we propose a factuality evaluation framework for LMs that is based on cross-examination. Our key idea is that an incorrect claim is li… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  18. arXiv:2305.06343  [pdf, other

    cs.CV

    Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs

    Authors: Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

    Abstract: Vision and language models (VLMs) have demonstrated remarkable zero-shot (ZS) performance in a variety of tasks. However, recent works have shown that even the best VLMs struggle to capture aspects of compositional scene understanding, such as object attributes, relations, and action states. In contrast, obtaining structured annotations, such as scene graphs (SGs), that could improve these models… ▽ More

    Submitted 24 October, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  19. arXiv:2304.14767  [pdf, other

    cs.CL

    Dissecting Recall of Factual Associations in Auto-Regressive Language Models

    Authors: Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson

    Abstract: Transformer-based language models (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about… ▽ More

    Submitted 13 October, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: Accepted at EMNLP 2023

  20. arXiv:2301.12810  [pdf, other

    cs.CL cs.AI

    Crawling the Internal Knowledge-Base of Language Models

    Authors: Roi Cohen, Mor Geva, Jonathan Berant, Amir Globerson

    Abstract: Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: To be published in EACL 2023 (Findings)

  21. arXiv:2212.10380  [pdf, other

    cs.CL cs.IR

    What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

    Authors: Ori Ram, Liat Bezalel, Adi Zicher, Yonatan Belinkov, Jonathan Berant, Amir Globerson

    Abstract: Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that t… ▽ More

    Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  22. arXiv:2212.04821  [pdf, other

    cs.CV

    PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

    Authors: Roei Herzig, Ofir Abramovich, Elad Ben-Avraham, Assaf Arbelle, Leonid Karlinsky, Ariel Shamir, Trevor Darrell, Amir Globerson

    Abstract: Action recognition models have achieved impressive results by incorporating scene-level annotations, such as objects, their relations, 3D structure, and more. However, obtaining annotations of scene structure for videos requires a significant amount of effort to gather and annotate, making these methods expensive to train. In contrast, synthetic datasets generated by graphics engines provide power… ▽ More

    Submitted 5 December, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: WACV 2024

  23. arXiv:2211.00575  [pdf, other

    cs.CV cs.AI cs.LG

    Text-Only Training for Image Captioning using Noise-Injected CLIP

    Authors: David Nukrai, Ron Mokady, Amir Globerson

    Abstract: We consider the task of image-captioning using only the CLIP model and additional text data at training time, and no additional captioned images. Our approach relies on the fact that CLIP is trained to make visual and textual embeddings similar. Therefore, we only need to learn how to translate CLIP textual embeddings back into text, and we can learn how to do this by learning a decoder for the fr… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: Will be presented at EMNLP 2022. GitHub: https://github.com/DavidHuji/CapDec

    Journal ref: EMNLP 2022

  24. arXiv:2210.14064  [pdf, other

    cs.LG

    Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets

    Authors: Edo Cohen-Karlik, Itamar Menuhin-Gruman, Raja Giryes, Nadav Cohen, Amir Globerson

    Abstract: Overparameterization in deep learning typically refers to settings where a trained neural network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a model may exhibit many solutions that generalize… ▽ More

    Submitted 23 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023, 9 pages, 2 figures plus supplementary

  25. arXiv:2209.00647  [pdf, other

    cs.CV

    Visual Prompting via Image Inpainting

    Authors: Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros

    Abstract: How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Project page: https://yossigandelsman.github.io/visual_prompt

  26. arXiv:2207.02760  [pdf, other

    cs.LG cs.AI

    TREE-G: Decision Trees Contesting Graph Neural Networks

    Authors: Maya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach

    Abstract: When dealing with tabular data, models based on decision trees are a popular choice due to their high accuracy on these data types, their ease of application, and explainability properties. However, when it comes to graph-structured data, it is not clear how to apply them effectively, in a way that incorporates the topological information with the tabular data available on the vertices of the grap… ▽ More

    Submitted 25 February, 2024; v1 submitted 6 July, 2022; originally announced July 2022.

  27. arXiv:2206.07689  [pdf, other

    cs.CV

    Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

    Authors: Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

    Abstract: This technical report describes the SViT approach for the Ego4D Point of No Return (PNR) Temporal Localization Challenge. We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structur… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: Ego4D CVPR22 Object State Localization challenge. arXiv admin note: substantial text overlap with arXiv:2206.06346

  28. arXiv:2206.06346  [pdf

    cs.CV

    Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

    Authors: Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

    Abstract: Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. At the same time, if a small set of annotated images is available, either within or outside the domain of interest, how cou… ▽ More

    Submitted 29 November, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Tech report

  29. arXiv:2204.04670  [pdf, other

    cs.LG

    Active Learning with Label Comparisons

    Authors: Gal Yona, Shay Moran, Gal Elidan, Amir Globerson

    Abstract: Supervised learning typically relies on manual annotation of the true labels. When there are many potential classes, searching for the best one can be prohibitive for a human annotator. On the other hand, comparing two candidate labels is often much easier. We focus on this type of pairwise supervision and ask how it can be used effectively in learning, and in particular in active learning. We obt… ▽ More

    Submitted 14 August, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Appeared in the conference on Uncertainty in AI (UAI), 2022

  30. arXiv:2202.04302  [pdf, other

    cs.LG

    On the Implicit Bias of Gradient Descent for Temporal Extrapolation

    Authors: Edo Cohen-Karlik, Avichai Ben David, Nadav Cohen, Amir Globerson

    Abstract: When using recurrent neural networks (RNNs) it is common practice to apply trained models to sequences longer than those seen in training. This "extrapolating" usage deviates from the traditional statistical learning setup where guarantees are provided under the assumption that train and test distributions are identical. Here we set out to understand when RNNs can extrapolate, focusing on a simple… ▽ More

    Submitted 24 March, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 8 pages, 5 figures (plus appendix), AISTATS2022

  31. arXiv:2112.07708  [pdf, other

    cs.CL cs.IR

    Learning to Retrieve Passages without Supervision

    Authors: Ori Ram, Gal Shachaf, Omer Levy, Jonathan Berant, Amir Globerson

    Abstract: Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. In this work we ask whether this dependence on labeled data can be reduced via unsupervised pretraining that is geared towards ODQA. We show this is in fact possible, via a novel pretraining scheme designed for retrieval. Our "recurri… ▽ More

    Submitted 17 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  32. arXiv:2110.13452  [pdf, other

    cs.LG stat.ML

    On the Optimization Landscape of Maximum Mean Discrepancy

    Authors: Itai Alon, Amir Globerson, Ami Wiesel

    Abstract: Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 October, 2021; originally announced October 2021.

  33. arXiv:2110.06915  [pdf, other

    cs.CV

    Object-Region Video Transformers

    Authors: Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

    Abstract: Recently, video transformers have shown great success in video understanding, exceeding CNN performance; yet existing video transformer models do not explicitly model objects, although objects can be essential for recognizing actions. In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly… ▽ More

    Submitted 9 June, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: CVPR 2022

  34. arXiv:2107.01641  [pdf, other

    cs.LG

    A Theoretical Analysis of Fine-tuning with Linear Teachers

    Authors: Gal Shachaf, Alon Brutzkus, Amir Globerson

    Abstract: Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. We analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the… ▽ More

    Submitted 7 November, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

  35. arXiv:2106.04550  [pdf, other

    cs.CV

    DETReg: Unsupervised Pretraining with Region Priors for Object Detection

    Authors: Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

    Abstract: Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture. Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. During pretraining, DETReg predicts object localiza… ▽ More

    Submitted 19 July, 2023; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Project page: https://www.amirbar.net/detreg/

  36. arXiv:2104.13369  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Explaining in Style: Training a GAN to explain a classifier in StyleSpace

    Authors: Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri

    Abstract: Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is t… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021. Project page: https://explaining-in-style.github.io/, Code: https://github.com/google/explaining-in-style

  37. arXiv:2103.05327  [pdf, other

    cs.CL cs.LG

    BERTese: Learning to Speak to BERT

    Authors: Adi Haviv, Jonathan Berant, Amir Globerson

    Abstract: Large pre-trained language models have been shown to encode large amounts of world and commonsense knowledge in their parameters, leading to substantial interest in methods for extracting that knowledge. In past work, knowledge was extracted by taking manually-authored queries and gathering paraphrases for them using a separate pipeline. In this work, we propose a method for automatically rewritin… ▽ More

    Submitted 11 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: Accepted to EACL 2021

  38. arXiv:2102.09769  [pdf, other

    cs.LG

    On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

    Authors: Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

    Abstract: Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, whereas small initialization leads to so called "rich regimes". However, the initialization structure is richer than the overall scale alone and involve… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 33 pages, 2 figures

    MSC Class: 68T07 (Primary) ACM Class: I.2.6; G.1.6

  39. arXiv:2101.02533  [pdf, other

    cs.LG

    Towards Understanding Learning in Neural Networks with Linear Teachers

    Authors: Roei Sarussi, Alon Brutzkus, Amir Globerson

    Abstract: Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be appr… ▽ More

    Submitted 28 July, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

  40. arXiv:2101.00438  [pdf, other

    cs.CL

    Few-Shot Question Answering by Pretraining Span Selection

    Authors: Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy

    Abstract: In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between current pretraining objectives and question an… ▽ More

    Submitted 2 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to ACL 2021

  41. arXiv:2010.13055  [pdf, other

    cs.LG stat.ML

    Regularizing Towards Permutation Invariance in Recurrent Models

    Authors: Edo Cohen-Karlik, Avichai Ben David, Amir Globerson

    Abstract: In many machine learning problems the output should not depend on the order of the input. Such "permutation invariant" functions have been studied extensively recently. Here we argue that temporal architectures such as RNNs are highly relevant for such problems, despite the inherent dependence of RNNs on order. We show that RNNs can be regularized towards permutation invariance, and that this can… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: 9 pages, 5 figures, NeurIPS 2020

  42. arXiv:2010.05077  [pdf, other

    cs.LG math.OC stat.ML

    Maximin Optimization for Binary Regression

    Authors: Nisan Chiprut, Amir Globerson, Ami Wiesel

    Abstract: We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradient ascent-descent method. Such maximin techniques are still poorly understood even in the concave-convex case. The non-convex binary constraints may l… ▽ More

    Submitted 27 November, 2020; v1 submitted 10 October, 2020; originally announced October 2020.

  43. arXiv:2009.14558  [pdf, other

    cs.CV

    Learning Object Detection from Captions via Textual Scene Attributes

    Authors: Achiya Jerbi, Roei Herzig, Jonathan Berant, Gal Chechik, Amir Globerson

    Abstract: Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect, as annotators need to label objects and their bounding boxes. Thus, it is a significant challenge to use cheaper forms of supervision effectively. Recent work has begun to explore image captions as a source for weak supervision, but to date, in the context of object detectio… ▽ More

    Submitted 30 September, 2020; originally announced September 2020.

  44. arXiv:2008.04612  [pdf, other

    cs.DC cs.LG

    Holdout SGD: Byzantine Tolerant Federated Learning

    Authors: Shahar Azulay, Lior Raz, Amir Globerson, Tomer Koren, Yehuda Afek

    Abstract: This work presents a new distributed Byzantine tolerant federated learning algorithm, HoldOut SGD, for Stochastic Gradient Descent (SGD) optimization. HoldOut SGD uses the well known machine learning technique of holdout estimation, in a distributed fashion, in order to select parameter updates that are likely to lead to models with low loss values. This makes it more effective at discarding Byzan… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: 12 pages, 2 figures

  45. arXiv:2006.15327  [pdf, other

    cs.CV cs.LG

    Compositional Video Synthesis with Action Graphs

    Authors: Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

    Abstract: Videos of actions are complex signals containing rich compositional structure in space and time. Current video generation methods lack the ability to condition the generation on multiple coordinated and potentially simultaneous timed actions. To address this challenge, we propose to represent the actions in a graph structure called Action Graph and present the new ``Action Graph To Video'' synthes… ▽ More

    Submitted 10 June, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

    Comments: ICML 2021 Camera Ready

  46. arXiv:2003.10469  [pdf, other

    cs.CV

    Learning Object Permanence from Video

    Authors: Aviv Shamsian, Ofri Kleinfeld, Amir Globerson, Gal Chechik

    Abstract: Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Intensive studies in developmental psychology suggest that object permanence is a challe… ▽ More

    Submitted 16 July, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

    Comments: 16th European Conference on Computer Vision (ECCV 2020)

  47. arXiv:2002.09781  [pdf, other

    cs.LG stat.ML

    An Optimization and Generalization Analysis for Max-Pooling Networks

    Authors: Alon Brutzkus, Amir Globerson

    Abstract: Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is… ▽ More

    Submitted 4 March, 2021; v1 submitted 22 February, 2020; originally announced February 2020.

  48. arXiv:1912.07414  [pdf, other

    cs.CV

    Learning Canonical Representations for Scene Graph to Image Generation

    Authors: Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

    Abstract: Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods i… ▽ More

    Submitted 24 August, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

    Comments: ECCV 2020

  49. arXiv:1909.13375  [pdf, other

    cs.CL

    A Simple and Effective Model for Answering Multi-span Questions

    Authors: Elad Segal, Avia Efrat, Mor Shoham, Amir Globerson, Jonathan Berant

    Abstract: Models for reading comprehension (RC) commonly restrict their output space to the set of all single contiguous spans from the input, in order to alleviate the learning problem and avoid the need for a model that generates text explicitly. However, forcing an answer to be a single span can be restrictive, and some recent datasets also include multi-span questions, i.e., questions whose answer is a… ▽ More

    Submitted 5 October, 2020; v1 submitted 29 September, 2019; originally announced September 2019.

    Comments: EMNLP 2020

  50. arXiv:1902.10200  [pdf, other

    cs.CV

    Differentiable Scene Graphs

    Authors: Moshiko Raboh, Roei Herzig, Gal Chechik, Jonathan Berant, Amir Globerson

    Abstract: Reasoning about complex visual scenes involves perception of entities and their relations. Scene graphs provide a natural representation for reasoning tasks, by assigning labels to both entities (nodes) and relations (edges). Unfortunately, reasoning systems based on SGs are typically trained in a two-step procedure: First, training a model to predict SGs from images; Then, a separate model is cre… ▽ More

    Submitted 14 March, 2020; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: Winter Conference on Applications of Computer Vision (WACV), 2020