Skip to main content

Showing 1–8 of 8 results for author: Szafraniec, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09294  [pdf, other

    cs.LG cs.CV

    You Don't Need Data-Augmentation in Self-Supervised Learning

    Authors: Théo Moutakanni, Maxime Oquab, Marc Szafraniec, Maria Vakalopoulou, Piotr Bojanowski

    Abstract: Self-Supervised learning (SSL) with Joint-Embedding Architectures (JEA) has led to outstanding performances. All instantiations of this paradigm were trained using strong and well-established hand-crafted data augmentations, leading to the general belief that they are required for the proper training and performance of such models. On the other hand, generative reconstruction-based models such as… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2405.15613  [pdf, other

    cs.LG cs.AI cs.CV

    Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

    Authors: Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski

    Abstract: Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the datas… ▽ More

    Submitted 28 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2403.11675  [pdf, other

    cs.CV

    Better (pseudo-)labels for semi-supervised instance segmentation

    Authors: François Porcher, Camille Couprie, Marc Szafraniec, Jakob Verbeek

    Abstract: Despite the availability of large datasets for tasks like image classification and image-text alignment, labeled data for more complex recognition tasks, such as detection and segmentation, is less abundant. In particular, for instance segmentation annotations are time-consuming to produce, and the distribution of instances is often highly skewed across classes. While semi-supervised teacher-stude… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Appeared at the Practical ML for Low Resource Settings workshop at ICLR 2024

  4. arXiv:2304.07193  [pdf, other

    cs.CV

    DINOv2: Learning Robust Visual Features without Supervision

    Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

    Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  5. arXiv:2207.03578  [pdf, other

    cs.PL cs.CL cs.LG

    Code Translation with Compiler Representations

    Authors: Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, Gabriel Synnaeve

    Abstract: In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces unnatural-looking code. Applying neural machine translation (NMT) approaches to code has successfully broadened the set of programs on which one can get a natural-looki… ▽ More

    Submitted 24 April, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 9 pages

  6. arXiv:2102.07492  [pdf, other

    cs.CL

    DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

    Authors: Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample

    Abstract: Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-trainin… ▽ More

    Submitted 27 October, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

  7. arXiv:2011.12438  [pdf, other

    cs.CV

    Continuous Surface Embeddings

    Authors: Natalia Neverova, David Novotny, Vasil Khalidov, Marc Szafraniec, Patrick Labatut, Andrea Vedaldi

    Abstract: In this work, we focus on the task of learning and representing dense correspondences in deformable object categories. While this problem has been considered before, solutions so far have been rather ad-hoc for specific object types (i.e., humans), often with significant manual work involved. However, scaling the geometry understanding to all objects in nature requires more automated approaches th… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: NeurIPS, 2020

  8. arXiv:1708.04120  [pdf, other

    cs.IR cs.CL

    Putting Self-Supervised Token Embedding on the Tables

    Authors: Marc Szafraniec, Gautier Marti, Philippe Donnat

    Abstract: Information distribution by electronic messages is a privileged means of transmission for many businesses and individuals, often under the form of plain-text tables. As their number grows, it becomes necessary to use an algorithm to extract text and numbers instead of a human. Usual methods are focused on regular expressions or on a strict structure in the data, but are not efficient when we have… ▽ More

    Submitted 25 October, 2017; v1 submitted 28 July, 2017; originally announced August 2017.