Skip to main content

Showing 1–32 of 32 results for author: Medioni, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07854  [pdf, other

    cs.CV cs.LG

    Distilling the Knowledge in Data Pruning

    Authors: Emanuel Ben-Baruch, Adam Botach, Igor Kviatkovsky, Manoj Aggarwal, Gérard Medioni

    Abstract: With the increasing size of datasets used for training neural networks, data pruning becomes an attractive field of research. However, most current data pruning algorithms are limited in their ability to preserve accuracy compared to models trained on the full data, especially in high pruning regimes. In this paper we explore the application of data pruning while incorporating knowledge distillati… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  2. arXiv:2310.19024  [pdf, other

    cs.CV

    FPGAN-Control: A Controllable Fingerprint Generator for Training with Synthetic Data

    Authors: Alon Shoshan, Nadav Bhonker, Emanuel Ben Baruch, Ori Nizan, Igor Kviatkovsky, Joshua Engelsma, Manoj Aggarwal, Gerard Medioni

    Abstract: Training fingerprint recognition models using synthetic data has recently gained increased attention in the biometric community as it alleviates the dependency on sensitive personal data. Existing approaches for fingerprint generation are limited in their ability to generate diverse impressions of the same finger, a key property for providing effective data for training recognition models. To addr… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  3. arXiv:2304.07389  [pdf, other

    cs.CV cs.AI cs.LG

    Shape of You: Precise 3D shape estimations for diverse body types

    Authors: Rohan Sarkar, Achal Dave, Gerard Medioni, Benjamin Biggs

    Abstract: This paper presents Shape of You (SoY), an approach to improve the accuracy of 3D body shape estimation for vision-based clothing recommendation systems. While existing methods have successfully estimated 3D poses, there remains a lack of work in precise shape estimation, particularly for diverse human bodies. To address this gap, we propose two loss functions that can be readily integrated into p… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  4. arXiv:2304.03879  [pdf, other

    cs.IR cs.LG

    GPT4Rec: A Generative Framework for Personalized Recommendation and User Interests Interpretation

    Authors: **ming Li, Wentao Zhang, Tian Wang, Guanglei Xiong, Alan Lu, Gerard Medioni

    Abstract: Recent advancements in Natural Language Processing (NLP) have led to the development of NLP-based recommender systems that have shown superior performance. However, current models commonly treat items as mere IDs and adopt discriminative modeling, resulting in limitations of (1) fully leveraging the content information of items and the language modeling capabilities of NLP models; (2) interpreting… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  5. arXiv:2303.17531  [pdf, other

    cs.CV

    Asymmetric Image Retrieval with Cross Model Compatible Ensembles

    Authors: Ori Linial, Alon Shoshan, Nadav Bhonker, Elad Hirsch, Lior Zamir, Igor Kviatkovsky, Gerard Medioni

    Abstract: The asymmetrical retrieval setting is a well suited solution for resource constrained applications such as face recognition and image retrieval. In this setting, a large model is used for indexing the gallery while a lightweight model is used for querying. The key principle in such systems is ensuring that both models share the same embedding space. Most methods in this domain are based on knowled… ▽ More

    Submitted 29 October, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

  6. arXiv:2210.13994  [pdf, other

    cs.CV

    Minutiae-Guided Fingerprint Embeddings via Vision Transformers

    Authors: Steven A. Grosz, Joshua J. Engelsma, Rajeev Ranjan, Naveen Ramakrishnan, Manoj Aggarwal, Gerard G. Medioni, Anil K. Jain

    Abstract: Minutiae matching has long dominated the field of fingerprint recognition. However, deep networks can be used to extract fixed-length embeddings from fingerprints. To date, the few studies that have explored the use of CNN architectures to extract such embeddings have shown extreme promise. Inspired by these early works, we propose the first use of a Vision Transformer (ViT) to learn a discriminat… ▽ More

    Submitted 25 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

  7. arXiv:2207.12033  [pdf, other

    cs.IR

    Contrastive Learning for Interactive Recommendation in Fashion

    Authors: Karin Sevegnani, Arjun Seshadri, Tian Wang, Anurag Beniwal, Julian McAuley, Alan Lu, Gerard Medioni

    Abstract: Recommender systems and search are both indispensable in facilitating personalization and ease of browsing in online fashion platforms. However, the two tools often operate independently, failing to combine the strengths of recommender systems to accurately capture user tastes with search systems' ability to process user queries. We propose a novel remedy to this problem by automatically recommend… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  8. arXiv:2204.10869  [pdf, other

    cs.CV

    Identity Preserving Loss for Learned Image Compression

    Authors: Jiuhong Xiao, Lavisha Aggarwal, Prithviraj Banerjee, Manoj Aggarwal, Gerard Medioni

    Abstract: Deep learning model inference on embedded devices is challenging due to the limited availability of computation resources. A popular alternative is to perform model inference on the cloud, which requires transmitting images from the embedded device to the cloud. Image compression techniques are commonly employed in such cloud-based architectures to reduce transmission latency over low bandwidth ne… ▽ More

    Submitted 26 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted by CVPR 2022 Workshop on New Trends in Image Restoration and Enhancement and Challenges

  9. arXiv:2204.04812  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

    Authors: Rohan Sarkar, Navaneeth Bodla, Mariya I. Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, Gerard Medioni

    Abstract: Learning an effective outfit-level representation is critical for predicting the compatibility of items in an outfit, and retrieving complementary items for a partial outfit. We present a framework, OutfitTransformer, that uses the proposed task-specific tokens and leverages the self-attention mechanism to learn effective outfit-level representations encoding the compatibility relationships betwee… ▽ More

    Submitted 15 April, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

  10. arXiv:2203.01853  [pdf, other

    cs.CV

    Efficient Video Instance Segmentation via Tracklet Query and Proposal

    Authors: Jialian Wu, Sudhir Yarram, Hui Liang, Tian Lan, Junsong Yuan, Jayan Eledath, Gerard Medioni

    Abstract: Video Instance Segmentation (VIS) aims to simultaneously classify, segment, and track multiple object instances in videos. Recent clip-level VIS takes a short video clip as input each time showing stronger performance than frame-level VIS (tracking-by-segmentation), as more temporal context from multiple frames is utilized. Yet, most clip-level methods are neither end-to-end learnable nor real-tim… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  11. arXiv:2105.00717  [pdf, other

    cs.CV

    Synthetic Data for Model Selection

    Authors: Alon Shoshan, Nadav Bhonker, Igor Kviatkovsky, Matan Fintz, Gerard Medioni

    Abstract: Recent breakthroughs in synthetic data generation approaches made it possible to produce highly photorealistic images which are hardly distinguishable from real ones. Furthermore, synthetic generation pipelines have the potential to generate an unlimited number of images. The combination of high photorealism and scale turn synthetic data into a promising candidate for improving various machine lea… ▽ More

    Submitted 5 July, 2023; v1 submitted 3 May, 2021; originally announced May 2021.

  12. arXiv:2103.02221  [pdf, other

    cs.CV cs.LG

    Energy-Based Learning for Scene Graph Generation

    Authors: Mohammed Suhail, Abhay Mittal, Behjat Siddiquie, Chris Broaddus, Jayan Eledath, Gerard Medioni, Leonid Sigal

    Abstract: Traditional scene graph generation methods are trained using cross-entropy losses that treat objects and relationships as independent entities. Such a formulation, however, ignores the structure in the output space, in an inherently structured prediction problem. In this work, we introduce a novel energy-based learning framework for generating scene graphs. The proposed formulation allows for effi… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

  13. arXiv:2101.02477  [pdf, other

    cs.CV

    GAN-Control: Explicitly Controllable GANs

    Authors: Alon Shoshan, Nadav Bhonker, Igor Kviatkovsky, Gerard Medioni

    Abstract: We present a framework for training GANs with explicit control over generated images. We are able to control the generated image by settings exact attributes such as age, pose, expression, etc. Most approaches for editing GAN-generated images achieve partial control by leveraging the latent space disentanglement properties, obtained implicitly after standard GAN training. Such methods are able to… ▽ More

    Submitted 3 October, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

  14. arXiv:2006.02110  [pdf, other

    cs.CV

    From Real to Synthetic and Back: Synthesizing Training Data for Multi-Person Scene Understanding

    Authors: Igor Kviatkovsky, Nadav Bhonker, Gerard Medioni

    Abstract: We present a method for synthesizing naturally looking images of multiple people interacting in a specific scenario. These images benefit from the advantages of synthetic data: being fully controllable and fully annotated with any type of standard or custom-defined ground truth. To reduce the synthetic-to-real domain gap, we introduce a pipeline consisting of the following steps: 1) we render scen… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

  15. arXiv:2005.10481  [pdf, other

    cs.CV cs.LG

    AOWS: Adaptive and optimal network width search with latency constraints

    Authors: Maxim Berman, Leonid Pishchulin, Ning Xu, Matthew B. Blaschko, Gerard Medioni

    Abstract: Neural architecture search (NAS) approaches aim at automatically finding novel CNN architectures that fit computational constraints while maintaining a good performance on the target platform. We introduce a novel efficient one-shot NAS approach to optimally search for channel numbers, given latency constraints on a specific hardware. We first show that we can use a black-box approach to estimate… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: Accepted to CVPR 2020 (oral)

  16. arXiv:2002.07362  [pdf, other

    cs.CV

    MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

    Authors: Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni

    Abstract: Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a ``slow-fast'' architecture, w… ▽ More

    Submitted 10 October, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

    Comments: Accepted in ICCV 2021 MTL Workshop

  17. arXiv:1802.00542  [pdf, other

    cs.CV

    ExpNet: Landmark-Free, Deep, 3D Facial Expressions

    Authors: Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni

    Abstract: We describe a deep learning based method for estimating 3D facial expression coefficients. Unlike previous work, our process does not relay on facial landmark detection methods as a proxy step. Recent methods have shown that a CNN can be trained to regress accurate and discriminative 3D morphable model (3DMM) representations, directly from image intensities. By foregoing facial landmark detection,… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

    Comments: Accepted to the IEEE International Conference on Automatic Face and Gesture Recognition, 2018

  18. arXiv:1712.05083  [pdf, other

    cs.CV

    Extreme 3D Face Reconstruction: Seeing Through Occlusions

    Authors: Anh Tuan Tran, Tal Hassner, Iacopo Masi, Eran Paz, Yuval Nirkin, Gerard Medioni

    Abstract: Existing single view, 3D face reconstruction methods can produce beautifully detailed 3D results, but typically only for near frontal, unobstructed viewpoints. We describe a system designed to provide detailed 3D reconstructions of faces viewed under extreme conditions, out of plane rotations, and occlusions. Motivated by the concept of bump map**, we propose a layered approach which decouples e… ▽ More

    Submitted 29 March, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Comments: Accepted to CVPR'18. Previously titled: "Extreme 3D Face Reconstruction: Looking Past Occlusions"

  19. arXiv:1708.07517  [pdf, other

    cs.CV

    FacePoseNet: Making a Case for Landmark-Free Face Alignment

    Authors: Fengju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni

    Abstract: We show how a simple convolutional neural network (CNN) can be trained to accurately and robustly regress 6 degrees of freedom (6DoF) 3D head pose, directly from image intensities. We further explain how this FacePoseNet (FPN) can be used to align faces in 2D and 3D as an alternative to explicit facial landmark detection for these tasks. We claim that in many cases the standard means of measuring… ▽ More

    Submitted 31 August, 2017; v1 submitted 24 August, 2017; originally announced August 2017.

  20. arXiv:1704.06729  [pdf, other

    cs.CV

    On Face Segmentation, Face Swap**, and Face Perception

    Authors: Yuval Nirkin, Iacopo Masi, Anh Tuan Tran, Tal Hassner, Gerard Medioni

    Abstract: We show that even when face images are unconstrained and arbitrarily paired, face swap** between them is actually quite simple. To this end, we make the following contributions. (a) Instead of tailoring systems for face segmentation, as others previously proposed, we show that a standard fully convolutional network (FCN) can achieve remarkably fast and accurate segmentations, provided that it is… ▽ More

    Submitted 21 April, 2017; originally announced April 2017.

  21. arXiv:1703.10714  [pdf, other

    cs.CV

    Deep 3D Face Identification

    Authors: Donghyun Kim, Matthias Hernandez, Jongmoo Choi, Gerard Medioni

    Abstract: We propose a novel 3D face recognition algorithm using a deep convolutional neural network (DCNN) and a 3D augmentation technique. The performance of 2D face recognition algorithms has significantly increased by leveraging the representational power of deep neural networks and the use of large-scale labeled training data. As opposed to 2D face recognition, training discriminative deep features for… ▽ More

    Submitted 30 March, 2017; originally announced March 2017.

    Comments: 9 pages, 5 figures, 2 tables

  22. arXiv:1612.04904  [pdf, other

    cs.CV

    Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

    Authors: Anh Tuan Tran, Tal Hassner, Iacopo Masi, Gerard Medioni

    Abstract: The 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied "in the wild", their 3D estimates are either unstable and change for different phot… ▽ More

    Submitted 14 December, 2016; originally announced December 2016.

  23. arXiv:1611.09510  [pdf, other

    cs.LG stat.ML

    Graph-Based Manifold Frequency Analysis for Denoising

    Authors: Shay Deutsch, Antonio Ortega, Gerard Medioni

    Abstract: We propose a new framework for manifold denoising based on processing in the graph Fourier frequency domain, derived from the spectral decomposition of the discrete graph Laplacian. Our approach uses the Spectral Graph Wavelet transform in order to per- form non-iterative denoising directly in the graph frequency domain, an approach inspired by conventional wavelet-based signal denoising methods.… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

  24. arXiv:1607.01450  [pdf, other

    cs.CV

    Pooling Faces: Template based Face Recognition with Pooled Face Images

    Authors: Tal Hassner, Iacopo Masi, Jungyeon Kim, Jongmoo Choi, Shai Harel, Prem Natarajan, Gerard Medioni

    Abstract: We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an approach which was proven effective in many other domains, but, to our knowledge, never fully explored for face images: average pooling of face photos. We show how (and why!) the spa… ▽ More

    Submitted 5 July, 2016; originally announced July 2016.

    Comments: Appeared in the IEEE Computer Society Workshop on Biometrics, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June, 2016

  25. arXiv:1604.02801  [pdf, other

    cs.CV

    Capturing Dynamic Textured Surfaces of Moving Targets

    Authors: Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, Hao Li

    Abstract: We present an end-to-end system for reconstructing complete watertight and textured models of moving subjects such as clothed humans and animals, using only three or four handheld sensors. The heart of our framework is a new pairwise registration algorithm that minimizes, using a particle swarm strategy, an alignment error metric based on mutual visibility and occlusion. We show that this algorith… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

    Comments: 22 pages, 12 figures

  26. arXiv:1603.08592  [pdf, other

    cs.CV

    Exploring Local Context for Multi-target Tracking in Wide Area Aerial Surveillance

    Authors: Bor-Jeng Chen, Gerard Medioni

    Abstract: Tracking many vehicles in wide coverage aerial imagery is crucial for understanding events in a large field of view. Most approaches aim to associate detections from frame differencing into tracks. However, slow or stopped vehicles result in long-term missing detections and further cause tracking discontinuities. Relying merely on appearance clue to recover missing detections is difficult as targe… ▽ More

    Submitted 28 March, 2016; originally announced March 2016.

  27. arXiv:1603.07388  [pdf, other

    cs.CV

    Face Recognition Using Deep Multi-Pose Representations

    Authors: Wael AbdAlmageed, Yue Wua, Stephen Rawlsa, Shai Harel, Tal Hassner, Iacopo Masi, Jongmoo Choi, Jatuporn Toy Leksut, Jungyeon Kim, Prem Natarajan, Ram Nevatia, Gerard Medioni

    Abstract: We introduce our method and system for face recognition using multiple pose-aware deep learning models. In our representation, a face image is processed by several pose-specific deep convolutional neural network (CNN) models to generate multiple pose-specific features. 3D rendering is used to generate multiple face poses from the input image. Sensitivity of the recognition system to pose variation… ▽ More

    Submitted 23 March, 2016; originally announced March 2016.

    Comments: WACV 2016

  28. arXiv:1603.07057  [pdf, other

    cs.CV

    Do We Really Need to Collect Millions of Faces for Effective Face Recognition?

    Authors: Iacopo Masi, Anh Tuan Tran, Jatuporn Toy Leksut, Tal Hassner, Gerard Medioni

    Abstract: Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes -- huge numbers of face images downloaded and labeled for identity -- it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible means of increasing training data sizes for face recognitio… ▽ More

    Submitted 10 April, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

  29. arXiv:1601.04888  [pdf, other

    cs.CV

    A Closed-Form Solution to Tensor Voting: Theory and Applications

    Authors: Tai-Pang Wu, Sai-Kit Yeung, Jiaya Jia, Chi-Keung Tang, Gerard Medioni

    Abstract: We prove a closed-form solution to tensor voting (CFTV): given a point set in any dimensions, our closed-form solution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation. Using CFTV, we prove the convergence of tensor voting on a Markov random field (MRF), thus termed as MRFTV… ▽ More

    Submitted 19 January, 2016; v1 submitted 19 January, 2016; originally announced January 2016.

    Comments: Addendum appended to the TPAMI paper

    Journal ref: TPAMI 34(8): 1482-1495 (2012)

  30. arXiv:1511.04031  [pdf, other

    cs.CV

    Facial Landmark Detection with Tweaked Convolutional Neural Networks

    Authors: Yue Wu, Tal Hassner, KangGeon Kim, Gerard Medioni, Prem Natarajan

    Abstract: We present a novel convolutional neural network (CNN) design for facial landmark coordinate regression. We examine the intermediate features of a standard CNN trained for landmark detection and show that features extracted from later, more specialized layers capture rough landmark locations. This provides a natural means of applying differential treatment midway through the network, tweaking proce… ▽ More

    Submitted 21 March, 2016; v1 submitted 12 November, 2015; originally announced November 2015.

    Comments: First two authors had joint first authorship / equal contribution

  31. arXiv:1506.08485  [pdf, other

    cs.CV

    The Multi-Strand Graph for a PTZ Tracker

    Authors: Shachaf Melman, Yael Moses, Gérard Medioni, Yinghao Cai

    Abstract: High-resolution images can be used to resolve matching ambiguities between trajectory fragments (tracklets), which is one of the main challenges in multiple target tracking. A PTZ camera, which can pan, tilt and zoom, is a powerful and efficient tool that offers both close-up views and wide area coverage on demand. The wide-area view makes it possible to track many targets while the close-up view… ▽ More

    Submitted 28 June, 2015; originally announced June 2015.

    Comments: 9 pages, 7 figures, AVSS2015

  32. arXiv:1206.4624  [pdf

    cs.LG stat.ML

    Robust Multiple Manifolds Structure Learning

    Authors: Dian Gong, Xuemei Zhao, Gerard Medioni

    Abstract: We present a robust multiple manifolds structure learning (RMMSL) scheme to robustly estimate data structures under the multiple low intrinsic dimensional manifolds assumption. In the local learning stage, RMMSL efficiently estimates local tangent space by weighted low-rank matrix factorization. In the global learning stage, we propose a robust manifold clustering method based on local structure l… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012