Search | arXiv e-print repository

arXiv:2003.01764 [pdf, other]

Blind Image Restoration without Prior Knowledge

Authors: Noam Elron, Shahar S. Yuval, Dmitry Rudoy, Noam Levy

Abstract: Many image restoration techniques are highly dependent on the degradation used during training, and their performance declines significantly when applied to slightly different input. Blind and universal techniques attempt to mitigate this by producing a trained model that can adapt to varying conditions. However, blind techniques to date require prior knowledge of the degradation process, and assu… ▽ More Many image restoration techniques are highly dependent on the degradation used during training, and their performance declines significantly when applied to slightly different input. Blind and universal techniques attempt to mitigate this by producing a trained model that can adapt to varying conditions. However, blind techniques to date require prior knowledge of the degradation process, and assumptions regarding its parameter-space. In this paper we present the Self-Normalization Side-Chain (SCNC), a novel approach to blind universal restoration in which no prior knowledge of the degradation is needed. This module can be added to any existing CNN topology, and is trained along with the rest of the network in an end-to-end manner. The imaging parameters relevant to the task, as well as their dynamics, are deduced from the variety in the training data. We apply our solution to several image restoration tasks, and demonstrate that the SNSC encodes the degradation-parameters, improving restoration performance. △ Less

Submitted 8 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: Submitted to ECCV2020

arXiv:1603.03669 [pdf, ps, other]

Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

Authors: G. Leifman, D. Rudoy, T. Swedish, E. Bayro-Corrochano, R. Raskar

Abstract: In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing RGBD videos on regular 2D screens. We train a generative convolutional neural network which predicts a saliency map for a frame, given the fixation map of the previous frame. Saliency estimation in this scenario is highly important since in the near future 3D video content will b… ▽ More In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing RGBD videos on regular 2D screens. We train a generative convolutional neural network which predicts a saliency map for a frame, given the fixation map of the previous frame. Saliency estimation in this scenario is highly important since in the near future 3D video content will be easily acquired and yet hard to display. This can be explained, on the one hand, by the dramatic improvement of 3D-capable acquisition equipment. On the other hand, despite the considerable progress in 3D display technologies, most of the 3D displays are still expensive and require wearing special glasses. To evaluate the performance of our approach, we present a new comprehensive database of eye-fixation ground-truth for RGBD videos. Our experiments indicate that integrating depth into video saliency calculation is beneficial. We demonstrate that our approach outperforms state-of-the-art methods for video saliency, achieving 15% relative improvement. △ Less

Submitted 11 March, 2016; originally announced March 2016.

arXiv:1204.3367 [pdf, other]

Crowdsourcing Gaze Data Collection

Authors: Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor

Abstract: Knowing where people look is a useful tool in many various image and video applications. However, traditional gaze tracking hardware is expensive and requires local study participants, so acquiring gaze location data from a large number of participants is very problematic. In this work we propose a crowdsourced method for acquisition of gaze direction data from a virtually unlimited number of part… ▽ More Knowing where people look is a useful tool in many various image and video applications. However, traditional gaze tracking hardware is expensive and requires local study participants, so acquiring gaze location data from a large number of participants is very problematic. In this work we propose a crowdsourced method for acquisition of gaze direction data from a virtually unlimited number of participants, using a robust self-reporting mechanism (see Figure 1). Our system collects temporally sparse but spatially dense points-of-attention in any visual information. We apply our approach to an existing video data set and demonstrate that we obtain results similar to traditional gaze tracking. We also explore the parameter ranges of our method, and collect gaze tracking data for a large set of YouTube videos. △ Less

Submitted 16 April, 2012; originally announced April 2012.

Comments: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991)

Report number: CollectiveIntelligence/2012/106

arXiv:1107.0076 [pdf, other]

doi 10.1121/1.4739462

KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking

Authors: Daryush D. Mehta, Daniel Rudoy, Patrick J. Wolfe

Abstract: Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a… ▽ More Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a statistically principled state-space framework. Extended Kalman (K) algorithms take advantage of a linearized map** to infer formant and antiformant parameters from frame-based estimates of autoregressive moving average (ARMA) cepstral coefficients. Error analysis of KARMA, WaveSurfer, and Praat is accomplished in the all-pole case using a manually marked formant database and synthesized speech waveforms. KARMA formant tracks exhibit lower overall root-mean-square error relative to the two benchmark algorithms, with third formant tracking more challenging. Antiformant tracking performance of KARMA is illustrated using synthesized and spoken nasal phonemes. The simultaneous tracking of uncertainty levels enables practitioners to recognize time-varying confidence in parameters of interest and adjust algorithmic settings accordingly. △ Less

Submitted 30 June, 2011; originally announced July 2011.

Comments: 13 pages, 7 figures; submitted for publication

Journal ref: Journal of the Acoustical Society of America, vol. 132, pp. 1732-1746, 2012

arXiv:0906.5202 [pdf, other]

doi 10.1109/TSP.2010.2041604

Superposition frames for adaptive time-frequency analysis and fast reconstruction

Authors: Daniel Rudoy, Prabahan Basu, Patrick J. Wolfe

Abstract: In this article we introduce a broad family of adaptive, linear time-frequency representations termed superposition frames, and show that they admit desirable fast overlap-add reconstruction properties akin to standard short-time Fourier techniques. This approach stands in contrast to many adaptive time-frequency representations in the extant literature, which, while more flexible than standard… ▽ More In this article we introduce a broad family of adaptive, linear time-frequency representations termed superposition frames, and show that they admit desirable fast overlap-add reconstruction properties akin to standard short-time Fourier techniques. This approach stands in contrast to many adaptive time-frequency representations in the extant literature, which, while more flexible than standard fixed-resolution approaches, typically fail to provide efficient reconstruction and often lack the regular structure necessary for precise frame-theoretic analysis. Our main technical contributions come through the development of properties which ensure that this construction provides for a numerically stable, invertible signal representation. Our primary algorithmic contributions come via the introduction and discussion of specific signal adaptation criteria in deterministic and stochastic settings, based respectively on time-frequency concentration and nonstationarity detection. We conclude with a short speech enhancement example that serves to highlight potential applications of our approach. △ Less

Submitted 3 November, 2009; v1 submitted 29 June, 2009; originally announced June 2009.

Comments: 16 pages, 6 figures; revised version

Journal ref: IEEE Transactions on Signal Processing, vol. 58, pp. 2581-2596, 2010

Showing 1–5 of 5 results for author: Rudoy, D