Skip to main content

Showing 1–22 of 22 results for author: Jenni, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14715  [pdf, other

    cs.CV cs.CL

    FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

    Authors: Hang Hua, **g Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

    Abstract: Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To add… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  2. arXiv:2404.03913  [pdf, other

    cs.CV cs.AI cs.LG

    Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

    Authors: Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

    Abstract: While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  3. arXiv:2312.13008  [pdf, other

    cs.CV cs.AI cs.LG

    No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

    Authors: Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

    Abstract: Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g., contrastive learning) that do not explicitly promote the learning of temporal features. We identify two factors that limit existing temporal self-su… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 (Main Technical Track)

  4. arXiv:2309.14400  [pdf, other

    cs.CR cs.LG eess.IV

    DECORAIT -- DECentralized Opt-in/out Registry for AI Training

    Authors: Kar Balan, Alex Black, Simon Jenni, Andrew Gilbert, Andy Parsons, John Collomosse

    Abstract: We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources. Model and content creators who may wish to share their work openly without sanctionin… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Proc. of the 20th ACM SIGGRAPH European Conference on Visual Media Production

  5. arXiv:2306.10169  [pdf, other

    cs.CV cs.CL cs.LG

    Meta-Personalizing Vision-Language Models to Find Named Instances in Video

    Authors: Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni

    Abstract: Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications. While these models allow category-level queries, they currently struggle with personalized searches for moments in a video where a specific object instance such as ``My dog Biscuit'' appears. We present the following three contributions to address this problem. First, we describe a metho… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023. Project webpage: https://danielchyeh.github.io/metaper/

  6. arXiv:2304.04639  [pdf, other

    cs.CV cs.AI

    EKILA: Synthetic Media Provenance and Attribution for Generative Art

    Authors: Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, Andrew Gilbert, John Collomosse

    Abstract: We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI). EKILA proposes a robust visual attribution technique and combines this with an emerging content provenance standard (C2PA) to address the problem of synthetic image provenance -- determining the generative model and training data responsible for an… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Proc. CVPR Workshop on Media Forensics 2023

  7. arXiv:2303.13193  [pdf, other

    cs.CV

    VADER: Video Alignment Differencing and Retrieval

    Authors: Alexander Black, Simon Jenni, Tu Bui, Md. Mehrab Tanjim, Stefano Petrangeli, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse

    Abstract: We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual descriptor and scalable search over adaptively chunked video content. A transformer-based alignment module then refines the temporal localization of th… ▽ More

    Submitted 25 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  8. arXiv:2302.07702  [pdf, other

    cs.CV

    Audio-Visual Contrastive Learning with Temporal Self-Supervision

    Authors: Simon Jenni, Alexander Black, John Collomosse

    Abstract: We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also contain sound and temporal scene dynamics. To leverage the temporal and aural dimension inherent to videos, our method extends temporal self-supervision to the a… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: AAAI-23

  9. arXiv:2211.17042  [pdf, other

    cs.CV

    Spatio-Temporal Crop Aggregation for Video Representation Learning

    Authors: Sepehr Sameni, Simon Jenni, Paolo Favaro

    Abstract: We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time. Our model builds long-range video features by learning from sets of video clip-level features extracted with a pre-trained backbone. To train the model, we propose a self-supervised objective consisting of masked clip feature predic… ▽ More

    Submitted 13 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  10. arXiv:2206.14245  [pdf, other

    cs.CV

    SImProv: Scalable Image Provenance Framework for Robust Content Attribution

    Authors: Alexander Black, Tu Bui, Simon Jenni, Zhifei Zhang, Viswanathan Swaminanthan, John Collomosse

    Abstract: We present SImProv - a scalable image provenance framework to match a query image back to a trusted database of originals and identify possible manipulations on the query. SImProv consists of three stages: a scalable search stage for retrieving top-k most similar images; a re-ranking and near-duplicated detection stage for identifying the original among the candidates; and finally a manipulation d… ▽ More

    Submitted 8 May, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Under consideration at Computer Vision and Image Understanding

  11. arXiv:2205.05609  [pdf, other

    cs.CV

    Video-ReTime: Learning Temporally Varying Speediness for Time Remap**

    Authors: Simon Jenni, Markus Woodson, Fabian Caba Heilbron

    Abstract: We propose a method for generating a temporally remapped video that matches the desired target duration while maximally preserving natural video dynamics. Our approach trains a neural network through self-supervision to recognize and accurately localize temporally varying changes in the video playback speed. To re-time videos, we 1. use the model to infer the slowness of individual video frames, a… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted at the AI for Content Creation (AICC) workshop at CVPR 2022

  12. arXiv:2204.04788  [pdf, other

    cs.CV cs.AI cs.LG

    Representation Learning by Detecting Incorrect Location Embeddings

    Authors: Sepehr Sameni, Simon Jenni, Paolo Favaro

    Abstract: In this paper, we introduce a novel self-supervised learning (SSL) loss for image representation learning. There is a growing belief that generalization in deep neural networks is linked to their ability to discriminate object shapes. Since object shape is related to the location of its parts, we propose to detect those that have been artificially misplaced. We represent object parts with image to… ▽ More

    Submitted 13 March, 2023; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: accepted at AAAI2023, https://github.com/Separius/DILEMMA

  13. arXiv:2112.07599  [pdf, other

    cs.CV cs.AI cs.GR

    Learning to Deblur and Rotate Motion-Blurred Faces

    Authors: Givi Meishvili, Attila Szabó, Simon Jenni, Paolo Favaro

    Abstract: We propose a solution to the novel task of rendering sharp videos from new viewpoints from a single motion-blurred image of a face. Our method handles the complexity of face blur by implicitly learning the geometry and motion of faces through the joint training on three large datasets: FFHQ and 300VW, which are publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we built. The fi… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: British Machine Vision Conference 2021

  14. arXiv:2112.03624  [pdf, other

    cs.CV

    Time-Equivariant Contrastive Video Representation Learning

    Authors: Simon Jenni, Hailin **

    Abstract: We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos. Existing approaches ignore the specifics of input distortions, e.g., by learning invariance to temporal transformations. Instead, we argue that video representation should preserve video dynamics and reflect temporal manipulations of the input. Therefore, we exploit novel constraints t… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Comments: ICCV 2021 (oral)

  15. VPN: Video Provenance Network for Robust Content Attribution

    Authors: Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John Collomosse

    Abstract: We present VPN - a content attribution method for recovering provenance information from videos shared online. Platforms, and users, often transform video into different quality, codecs, sizes, shapes, etc. or slightly edit its content such as adding text or emoji, as they are redistributed online. We learn a robust search embedding for matching such video, invariant to these transformations, usin… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: CVMP2021 camera-ready version

  16. arXiv:2010.06218  [pdf, other

    cs.CV

    Self-Supervised Multi-View Synchronization Learning for 3D Pose Estimation

    Authors: Simon Jenni, Paolo Favaro

    Abstract: Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets. To drive such networks towards support… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: ACCV 2020 (oral)

  17. arXiv:2007.10730  [pdf, other

    cs.CV

    Video Representation Learning by Recognizing Temporal Transformations

    Authors: Simon Jenni, Givi Meishvili, Paolo Favaro

    Abstract: We introduce a novel self-supervised learning approach to learn representations of videos that are responsive to changes in the motion dynamics. Our representations can be learned from data without human annotation and provide a substantial boost to the training of neural networks on small labeled data sets for tasks such as action recognition, which require to accurately distinguish the motion of… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  18. arXiv:2004.02331  [pdf, other

    cs.CV

    Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

    Authors: Simon Jenni, Hailin **, Paolo Favaro

    Abstract: We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations of an image. We argue that the generalization capability of learned features depends on what image neighborhood size is sufficient to discriminate different image transformations: The larger the required neighborhood size and the more global the image statistics that the feat… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 (oral)

  19. arXiv:1909.12780  [pdf, other

    cs.CV eess.AS eess.IV

    Learning to Have an Ear for Face Super-Resolution

    Authors: Givi Meishvili, Simon Jenni, Paolo Favaro

    Abstract: We propose a novel method to use both audio and a low-resolution image to perform extreme face super-resolution (a 16x increase of the input size). When the resolution of the input image is very low (e.g., 8x8 pixels), the loss of information is so dire that important details of the original identity have been lost and audio can aid the recovery of a plausible high-resolution image. In fact, audio… ▽ More

    Submitted 2 April, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  20. arXiv:1906.04612  [pdf, other

    cs.CV cs.LG

    On Stabilizing Generative Adversarial Training with Noise

    Authors: Simon Jenni, Paolo Favaro

    Abstract: We present a novel method and analysis to train generative adversarial networks (GAN) in a stable manner. As shown in recent analysis, training is often undermined by the probability distribution of the data being zero on neighborhoods of the data space. We notice that the distributions of real and generated data should match even when they undergo the same filtering. Therefore, to address the lim… ▽ More

    Submitted 17 September, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

    Comments: CVPR 2019

  21. arXiv:1809.01465  [pdf, other

    cs.CV cs.LG stat.ML

    Deep Bilevel Learning

    Authors: Simon Jenni, Paolo Favaro

    Abstract: We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting. We formulate such principles as a bilevel optimization problem. This formulation allows us to define the optimizatio… ▽ More

    Submitted 5 September, 2018; originally announced September 2018.

    Comments: ECCV 2018

  22. arXiv:1806.05024  [pdf, other

    cs.CV

    Self-Supervised Feature Learning by Learning to Spot Artifacts

    Authors: Simon Jenni, Paolo Favaro

    Abstract: We introduce a novel self-supervised learning method based on adversarial training. Our objective is to train a discriminator network to distinguish real images from images with synthetic artifacts, and then to extract features from its intermediate layers that can be transferred to other data domains and tasks. To generate images with artifacts, we pre-train a high-capacity autoencoder and then w… ▽ More

    Submitted 13 June, 2018; originally announced June 2018.

    Comments: CVPR 2018 (spotlight)