Search | arXiv e-print repository

arXiv:2403.02043 [pdf, other]

Iterative Occlusion-Aware Light Field Depth Estimation using 4D Geometrical Cues

Authors: Rui Lourenço, Lucas Thomaz, Eduardo A. B. Silva, Sergio M. M. Faria

Abstract: Light field cameras and multi-camera arrays have emerged as promising solutions for accurately estimating depth by passively capturing light information. This is possible because the 3D information of a scene is embedded in the 4D light field geometry. Commonly, depth estimation methods extract this information relying on gradient information, heuristic-based optimisation models, or learning-based… ▽ More Light field cameras and multi-camera arrays have emerged as promising solutions for accurately estimating depth by passively capturing light information. This is possible because the 3D information of a scene is embedded in the 4D light field geometry. Commonly, depth estimation methods extract this information relying on gradient information, heuristic-based optimisation models, or learning-based approaches. This paper focuses mainly on explicitly understanding and exploiting 4D geometrical cues for light field depth estimation. Thus, a novel method is proposed, based on a non-learning-based optimisation approach for depth estimation that explicitly considers surface normal accuracy and occlusion regions by utilising a fully explainable 4D geometric model of the light field. The 4D model performs depth/disparity estimation by determining the orientations and analysing the intersections of key 2D planes in 4D space, which are the images of 3D-space points in the 4D light field. Experimental results show that the proposed method outperforms both learning-based and non-learning-based state-of-the-art methods in terms of surface normal angle accuracy, achieving a Median Angle Error on planar surfaces, on average, 26.3\% lower than the state-of-the-art, and still being competitive with state-of-the-art methods in terms of Mean Squared Error $\vc{\times}$ 100 and Badpix 0.07. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2309.12428 [pdf, other]

Synthetic Image Detection: Highlights from the IEEE Video and Image Processing Cup 2022 Student Competition

Authors: Davide Cozzolino, Koki Nagano, Lucas Thomaz, Angshul Majumdar, Luisa Verdoliva

Abstract: The Video and Image Processing (VIP) Cup is a student competition that takes place each year at the IEEE International Conference on Image Processing. The 2022 IEEE VIP Cup asked undergraduate students to develop a system capable of distinguishing pristine images from generated ones. The interest in this topic stems from the incredible advances in the AI-based generation of visual data, with tools… ▽ More The Video and Image Processing (VIP) Cup is a student competition that takes place each year at the IEEE International Conference on Image Processing. The 2022 IEEE VIP Cup asked undergraduate students to develop a system capable of distinguishing pristine images from generated ones. The interest in this topic stems from the incredible advances in the AI-based generation of visual data, with tools that allows the synthesis of highly realistic images and videos. While this opens up a large number of new opportunities, it also undermines the trustworthiness of media content and fosters the spread of disinformation on the internet. Recently there was strong concern about the generation of extremely realistic images by means of editing software that includes the recent technology on diffusion models. In this context, there is a need to develop robust and automatic tools for synthetic image detection. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2104.06252 [pdf, other]

doi 10.1109/TIP.2022.3146009

Lossless Coding of Light Fields based on 4D Minimum Rate Predictors

Authors: João M. Santos, Lucas A. Thomaz, Pedro A. A. Assunção, Luís A. da Silva Cruz, Luís Távora, Sérgio M. M. Faria

Abstract: Common representations of light fields use four-dimensional data structures, where a given pixel is closely related not only to its spatial neighbours within the same view, but also to its angular neighbours, co-located in adjacent views. Such structure presents increased redundancy between pixels, when compared with regular single-view images. Then, these redundancies are exploited to obtain comp… ▽ More Common representations of light fields use four-dimensional data structures, where a given pixel is closely related not only to its spatial neighbours within the same view, but also to its angular neighbours, co-located in adjacent views. Such structure presents increased redundancy between pixels, when compared with regular single-view images. Then, these redundancies are exploited to obtain compressed representations of the light field, using prediction algorithms specifically tailored to estimate pixel values based on both spatial and angular references. This paper proposes new encoding schemes which take advantage of the four-dimensional light field data structures to improve the coding performance of Minimum Rate Predictors. The proposed methods expand previous research on lossless coding beyond the current state-of-the-art. The experimental results, obtained using both traditional datasets and others more challenging, show bit-rate savings no smaller than 10%, when compared with existing methods for lossless light field compression. △ Less

Submitted 18 November, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: 16 pages, 13 figures, Submitted to IEEE Transactions on Image Processing, Funded through PhD grant SFRH/BD/114894/2016, project PlenoIsla POCI-01-0145-FEDER-028325 and by FCT/MCTES through national funds and when applicable co-funded by EU funds under the project UIDB/EEA/50008/2020

arXiv:1907.11200 [pdf, other]

TuneNet: One-Shot Residual Tuning for System Identification and Sim-to-Real Robot Task Transfer

Authors: Adam Allevato, Elaine Schaertl Short, Mitch Pryor, Andrea L. Thomaz

Abstract: As researchers teach robots to perform more and more complex tasks, the need for realistic simulation environments is growing. Existing techniques for closing the reality gap by approximating real-world physics often require extensive real world data and/or thousands of simulation samples. This paper presents TuneNet, a new machine learning-based method to directly tune the parameters of one model… ▽ More As researchers teach robots to perform more and more complex tasks, the need for realistic simulation environments is growing. Existing techniques for closing the reality gap by approximating real-world physics often require extensive real world data and/or thousands of simulation samples. This paper presents TuneNet, a new machine learning-based method to directly tune the parameters of one model to match another using an *iterative residual tuning* technique. TuneNet estimates the parameter difference between two models using a single observation from the target and minimal simulation, allowing rapid, accurate and sample-efficient parameter estimation. The system can be trained via supervised learning over an auto-generated simulated dataset. We show that TuneNet can perform system identification, even when the true parameter values lie well outside the distribution seen during training, and demonstrate that simulators tuned with TuneNet outperform existing techniques for predicting rigid body motion. Finally, we show that our method can estimate real-world parameter values, allowing a robot to perform sim-to-real task transfer on a dynamic manipulation task unseen during training. Code and videos are available online at http://bit.ly/2lf1bAw. △ Less

Submitted 13 February, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

Comments: Published at CoRL 2019

arXiv:1810.01036 [pdf, other]

Towards Online Learning from Corrective Demonstrations

Authors: Reymundo A. Gutierrez, Elaine Schaertl Short, Scott Niekum, Andrea L. Thomaz

Abstract: Robots operating in real-world human environments will likely encounter task execution failures. To address this, we would like to allow co-present humans to refine the robot's task model as errors are encountered. Existing approaches to task model modification require reasoning over the entire dataset and model, limiting the rate of corrective updates. We introduce the State-Indexed Task Updates… ▽ More Robots operating in real-world human environments will likely encounter task execution failures. To address this, we would like to allow co-present humans to refine the robot's task model as errors are encountered. Existing approaches to task model modification require reasoning over the entire dataset and model, limiting the rate of corrective updates. We introduce the State-Indexed Task Updates (SITU) algorithm to efficiently incorporate corrective demonstrations into an existing task model by iteratively making local updates that only require reasoning over a small subset of the model. In future work, we will evaluate this approach with a user study. △ Less

Submitted 1 October, 2018; originally announced October 2018.

Comments: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606)

Report number: AI-HRI/2018/12

Showing 1–5 of 5 results for author: Thomaz, L