Search | arXiv e-print repository

V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data

Authors: Rotem Shalev-Arkushin, Aharon Azulay, Tavi Halperin, Eitan Richardson, Amit H. Bermano, Ohad Fried

Abstract: Diffusion-based generative models have recently shown remarkable image and video editing capabilities. However, local video editing, particularly removal of small attributes like glasses, remains a challenge. Existing methods either alter the videos excessively, generate unrealistic artifacts, or fail to perform the requested edit consistently throughout the video. In this work, we focus on consis… ▽ More Diffusion-based generative models have recently shown remarkable image and video editing capabilities. However, local video editing, particularly removal of small attributes like glasses, remains a challenge. Existing methods either alter the videos excessively, generate unrealistic artifacts, or fail to perform the requested edit consistently throughout the video. In this work, we focus on consistent and identity-preserving removal of glasses in videos, using it as a case study for consistent local attribute removal in videos. Due to the lack of paired data, we adopt a weakly supervised approach and generate synthetic imperfect data, using an adjusted pretrained diffusion model. We show that despite data imperfection, by learning from our generated data and leveraging the prior of pretrained diffusion models, our model is able to perform the desired edit consistently while preserving the original video content. Furthermore, we exemplify the generalization ability of our method to other local video editing tasks by applying it successfully to facial sticker-removal. Our approach demonstrates significant improvement over existing methods, showcasing the potential of leveraging synthetic data and strong video priors for local video editing tasks. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2312.04145 [pdf, other]

Diffusing Colors: Image Colorization with Text Guided Diffusion

Authors: Nir Zabari, Aharon Azulay, Alexey Gorkor, Tavi Halperin, Ohad Fried

Abstract: The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist. To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integrat… ▽ More The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist. To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integration not only produces colorization outputs that are semantically appropriate but also greatly improves the level of control users have over the colorization process. Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence. We leverage a pretrained generative Diffusion Model, and show that we can finetune it for the colorization task without losing its generative power or attention to text prompts. Moreover, we present a novel CLIP-based ranking model that evaluates color vividness, enabling automatic selection of the most suitable level of vividness based on the specific scene semantics. Our approach holds potential particularly for color enhancement and historical image colorization. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: SIGGRAPH Asia 2023

arXiv:2309.03318 [pdf, other]

Fitness Approximation through Machine Learning

Authors: Itai Tzruia, Tomer Halperin, Moshe Sipper, Achiya Elyasaf

Abstract: We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, through dynamic adaptation to the evolutionary state. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update a fitness-approximation ML model throughout an evolutionary run. We compare different methods for: 1) switching… ▽ More We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, through dynamic adaptation to the evolutionary state. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update a fitness-approximation ML model throughout an evolutionary run. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Although we focus on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly -- our approach is generic and can be easily applied to many different domains. △ Less

Submitted 21 May, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: 11 pages, 5 tables, 2 figures. Submitted to IEEE Transactions on Evolutionary Computation

arXiv:2306.04971 [pdf, other]

A Melting Pot of Evolution and Learning

Authors: Moshe Sipper, Achiya Elyasaf, Tomer Halperin, Zvika Haramaty, Raz Lapid, Eyal Segal, Itai Tzruia, Snir Vitrack Tamam

Abstract: We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Le… ▽ More We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Learning-Based Image Classification, 5. Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution, 6. An Evolutionary, Gradient-Free, Query-Efficient, Black-Box Algorithm for Generating Adversarial Instances in Deep Networks, 7. Foiling Explanations in Deep Neural Networks, 8. Patch of Invisibility: Naturalistic Black-Box Adversarial Attacks on Object Detectors. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: To Appear in Proceedings of Genetic Programming Theory & Practice XX, 2023

arXiv:2207.10367 [pdf, ps, other]

doi 10.1016/j.softx.2023.101381

EC-KitY: Evolutionary Computation Tool Kit in Python with Seamless Machine Learning Integration

Authors: Moshe Sipper, Tomer Halperin, Itai Tzruia, Achiya Elyasaf

Abstract: EC-KitY is a comprehensive Python library for doing evolutionary computation (EC), licensed under the BSD 3-Clause License, and compatible with scikit-learn. Designed with modern software engineering and machine learning integration in mind, EC-KitY can support all popular EC paradigms, including genetic algorithms, genetic programming, coevolution, evolutionary multi-objective optimization, and m… ▽ More EC-KitY is a comprehensive Python library for doing evolutionary computation (EC), licensed under the BSD 3-Clause License, and compatible with scikit-learn. Designed with modern software engineering and machine learning integration in mind, EC-KitY can support all popular EC paradigms, including genetic algorithms, genetic programming, coevolution, evolutionary multi-objective optimization, and more. This paper provides an overview of the package, including the ease of setting up an EC experiment, the architecture, the main features, and a comparison with other libraries. △ Less

Submitted 19 April, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 6 pages, 1 figure, 1 table. Published in Elsevier SoftwareX

Journal ref: SoftwareX, 22, 2023, 101381, ISSN 2352-7110

arXiv:2110.08893 [pdf, other]

Temporally stable video segmentation without video annotations

Authors: Aharon Azulay, Tavi Halperin, Orestis Vantzos, Nadav Borenstein, Ofir Bibi

Abstract: Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmen… ▽ More Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmented videos appear more stable in practice, we verify that the consistency measure is well correlated with human judgement via a user study. Training a new multi-input multi-output decoder using this measure as a loss, together with a technique for refining current image segmentation datasets and a temporal weighted-guided filter, we observe stability improvements in the generated segmented videos with minimal loss of accuracy. △ Less

Submitted 17 March, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3449-3458. 2022

arXiv:2105.09374 [pdf]

doi 10.1145/3450626.3459935

Endless Loops: Detecting and Animating Periodic Patterns in Still Images

Authors: Tavi Halperin, Hanit Hakim, Orestis Vantzos, Gershon Hochman, Netai Benaim, Lior Sassy, Michael Kupchik, Ofir Bibi, Ohad Fried

Abstract: We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used,… ▽ More We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used, together with suitable temporal and spatial smoothing, to warp the image and produce the frames of a continuous animation loop. Our cinemagraphs are created in under a second on a mobile device. Over 140,000 users downloaded our app and exported over 350,000 cinemagraphs. Moreover, we conducted two user studies that show that users prefer our method for creating surreal and structured cinemagraphs compared to more manual approaches and compared to previous methods. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: SIGGRAPH 2021. Project page: https://pub.res.lightricks.com/endless-loops/ . Video: https://youtu.be/8ZYUvxWuD2Y

Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 142. Publication date: August 2021

arXiv:1903.02582 [pdf, other]

Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Authors: Tavi Halperin, Harel Cain, Ofir Bibi, Michael Werman

Abstract: Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replac… ▽ More Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real-time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky. △ Less

Submitted 6 March, 2019; originally announced March 2019.

Comments: Eurographics 2019. Supplementary video: https://youtu.be/1uZ46YzX-pI

arXiv:1811.12739 [pdf, other]

Neural separation of observed and unobserved distributions

Authors: Tavi Halperin, Ariel Ephrat, Yedid Hoshen

Abstract: Separating mixed distributions is a long standing challenge for machine learning and signal processing. Most current methods either rely on making strong assumptions on the source distributions or rely on having training samples of each source in the mixture. In this work, we introduce a new method---Neural Egg Separation---to tackle the scenario of extracting a signal from an unobserved distribut… ▽ More Separating mixed distributions is a long standing challenge for machine learning and signal processing. Most current methods either rely on making strong assumptions on the source distributions or rely on having training samples of each source in the mixture. In this work, we introduce a new method---Neural Egg Separation---to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution. Our method iteratively learns to separate the known distribution from progressively finer estimates of the unknown distribution. In some settings, Neural Egg Separation is initialization sensitive, we therefore introduce Latent Mixture Masking which ensures a good initialization. Extensive experiments on audio and image separation tasks show that our method outperforms current methods that use the same level of supervision, and often achieves similar performance to full supervision. △ Less

Submitted 16 May, 2019; v1 submitted 30 November, 2018; originally announced November 2018.

Comments: ICML'19

arXiv:1808.06250 [pdf, other]

Dynamic Temporal Alignment of Speech to Lips

Authors: Tavi Halperin, Ariel Ephrat, Shmuel Peleg

Abstract: Many speech segments in movies are re-recorded in a studio during postproduction, to compensate for poor sound quality as recorded on location. Manual alignment of the newly-recorded speech with the original lip movements is a tedious task. We present an audio-to-video alignment method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements.… ▽ More Many speech segments in movies are re-recorded in a studio during postproduction, to compensate for poor sound quality as recorded on location. Manual alignment of the newly-recorded speech with the original lip movements is a tedious task. We present an audio-to-video alignment method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, map** the lips video and the speech signal to a shared representation. Using this shared representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear, and where a constant shift of the sound can not give a perfect alignment. In these cases state-of-the-art methods will fail. △ Less

Submitted 19 August, 2018; originally announced August 2018.

arXiv:1708.06767 [pdf, other]

Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Authors: Aviv Gabbay, Ariel Ephrat, Tavi Halperin, Shmuel Peleg

Abstract: Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. First, face motions captured in the video are used to estimate the speaker's voice, by passing the silent video frames through a video-to-speec… ▽ More Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. First, face motions captured in the video are used to estimate the speaker's voice, by passing the silent video frames through a video-to-speech neural network-based model. Then the speech predictions are applied as a filter on the noisy input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model. We evaluate our method on two audio-visual datasets, GRID and TCD-TIMIT, and show that our method attains significant SDR and PESQ improvements over the raw video-to-speech predictions, and a well-known audio-only method. △ Less

Submitted 9 February, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

Comments: Supplementary video: https://www.youtube.com/watch?v=qmsyj7vAzoI

arXiv:1708.01204 [pdf, other]

Improved Speech Reconstruction from Silent Video

Authors: Ariel Ephrat, Tavi Halperin, Shmuel Peleg

Abstract: Speechreading is the task of inferring phonetic information from visually observed articulatory facial movements, and is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible and natural-sounding acoustic speech signal from silent video frames of a speaking person. We train our m… ▽ More Speechreading is the task of inferring phonetic information from visually observed articulatory facial movements, and is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible and natural-sounding acoustic speech signal from silent video frames of a speaking person. We train our model on speakers from the GRID and TCD-TIMIT datasets, and evaluate the quality and intelligibility of reconstructed speech using common objective measurements. We show that speech predictions from the proposed model attain scores which indicate significantly improved quality over existing models. In addition, we show promising results towards reconstructing speech from an unconstrained dictionary. △ Less

Submitted 29 August, 2017; v1 submitted 1 August, 2017; originally announced August 2017.

Comments: Accepted to ICCV 2017 Workshop on Computer Vision for Audio-Visual Media. Supplementary video: https://www.youtube.com/watch?v=Xjbn7h7tpg0. arXiv admin note: text overlap with arXiv:1701.00495

arXiv:1703.09725 [pdf, other]

An Epipolar Line from a Single Pixel

Authors: Tavi Halperin, Michael Werman

Abstract: Computing the epipolar geometry from feature points between cameras with very different viewpoints is often error prone, as an object's appearance can vary greatly between images. For such cases, it has been shown that using motion extracted from video can achieve much better results than using a static image. This paper extends these earlier works based on the scene dynamics. In this paper we pro… ▽ More Computing the epipolar geometry from feature points between cameras with very different viewpoints is often error prone, as an object's appearance can vary greatly between images. For such cases, it has been shown that using motion extracted from video can achieve much better results than using a static image. This paper extends these earlier works based on the scene dynamics. In this paper we propose a new method to compute the epipolar geometry from a video stream, by exploiting the following observation: For a pixel p in Image A, all pixels corresponding to p in Image B are on the same epipolar line. Equivalently, the image of the line going through camera A's center and p is an epipolar line in B. Therefore, when cameras A and B are synchronized, the momentary images of two objects projecting to the same pixel, p, in camera A at times t1 and t2, lie on an epipolar line in camera B. Based on this observation we achieve fast and precise computation of epipolar lines. Calibrating cameras based on our method of finding epipolar lines is much faster and more robust than previous methods. △ Less

Submitted 15 December, 2018; v1 submitted 28 March, 2017; originally announced March 2017.

Comments: WACV 2018

arXiv:1604.07741 [pdf, other]

doi 10.1109/TCSVT.2017.2651051

EgoSampling: Wide View Hyperlapse from Egocentric Videos

Authors: Tavi Halperin, Yair Poleg, Chetan Arora, Shmuel Peleg

Abstract: The possibility of sharing one's point of view makes use of wearable cameras compelling. These videos are often long, boring and coupled with extreme shake, as the camera is worn on a moving person. Fast forwarding (i.e. frame sampling) is a natural choice for quick video browsing. However, this accentuates the shake caused by natural head motion in an egocentric video, making the fast forwarded v… ▽ More The possibility of sharing one's point of view makes use of wearable cameras compelling. These videos are often long, boring and coupled with extreme shake, as the camera is worn on a moving person. Fast forwarding (i.e. frame sampling) is a natural choice for quick video browsing. However, this accentuates the shake caused by natural head motion in an egocentric video, making the fast forwarded video useless. We propose EgoSampling, an adaptive frame sampling that gives stable, fast forwarded, hyperlapse videos. Adaptive frame sampling is formulated as an energy minimization problem, whose optimal solution can be found in polynomial time. We further turn the camera shake from a drawback into a feature, enabling the increase in field-of-view of the output video. This is obtained when each output frame is mosaiced from several input frames. The proposed technique also enables the generation of a single hyperlapse video from multiple egocentric videos, allowing even faster video consumption. △ Less

Submitted 12 January, 2017; v1 submitted 26 April, 2016; originally announced April 2016.

Comments: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:1604.04848 [pdf, other]

Epipolar Geometry Based On Line Similarity

Authors: Gil Ben-Artzi, Tavi Halperin, Michael Werman, Shmuel Peleg

Abstract: It is known that epipolar geometry can be computed from three epipolar line correspondences but this computation is rarely used in practice since there are no simple methods to find corresponding lines. Instead, methods for finding corresponding points are widely used. This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enabl… ▽ More It is known that epipolar geometry can be computed from three epipolar line correspondences but this computation is rarely used in practice since there are no simple methods to find corresponding lines. Instead, methods for finding corresponding points are widely used. This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enables finding epipolar line correspondences as needed for the computation of epipolar geometry. A similarity measure between two lines, suitable for video sequences of a dynamic scene, has been previously described. This paper suggests a stereo matching similarity measure suitable for images. It is based on the quality of stereo matching between the two lines, as corresponding epipolar lines yield a good stereo correspondence. Instead of an exhaustive search over all possible pairs of lines, the search space is substantially reduced when two corresponding point pairs are given. We validate the proposed method using real-world images and compare it to state-of-the-art methods. We found this method to be more accurate by a factor of five compared to the standard method using seven corresponding points and comparable to the 8-points algorithm. △ Less

Submitted 7 January, 2017; v1 submitted 17 April, 2016; originally announced April 2016.

Comments: ICPR 2016, Cancun, Dec 2016

Journal ref: ICPR'16, Cancun, Dec. 2016, pp. 1865-1870

arXiv:1412.3596 [pdf, other]

doi 10.1109/CVPR.2015.7299109

EgoSampling: Fast-Forward and Stereo for Egocentric Videos

Authors: Yair Poleg, Tavi Halperin, Chetan Arora, Shmuel Peleg

Abstract: While egocentric cameras like GoPro are gaining popularity, the videos they capture are long, boring, and difficult to watch from start to end. Fast forwarding (i.e. frame sampling) is a natural choice for faster video browsing. However, this accentuates the shake caused by natural head motion, making the fast forwarded video useless. We propose EgoSampling, an adaptive frame sampling that gives… ▽ More While egocentric cameras like GoPro are gaining popularity, the videos they capture are long, boring, and difficult to watch from start to end. Fast forwarding (i.e. frame sampling) is a natural choice for faster video browsing. However, this accentuates the shake caused by natural head motion, making the fast forwarded video useless. We propose EgoSampling, an adaptive frame sampling that gives more stable fast forwarded videos. Adaptive frame sampling is formulated as energy minimization, whose optimal solution can be found in polynomial time. In addition, egocentric video taken while walking suffers from the left-right movement of the head as the body weight shifts from one leg to another. We turn this drawback into a feature: Stereo video can be created by sampling the frames from the left most and right most head positions of each step, forming approximate stereo-pairs. △ Less

Submitted 27 April, 2015; v1 submitted 11 December, 2014; originally announced December 2014.

Comments: in IEEE CVPR 2015, Boston, MA, June 2015

Journal ref: CVPR'15, Boston, June 2015

Showing 1–16 of 16 results for author: Halperin, T