Search | arXiv e-print repository

fNeRF: High Quality Radiance Fields from Practical Cameras

Authors: Yi Hua, Christoph Lassner, Carsten Stoll, Iain Matthews

Abstract: In recent years, the development of Neural Radiance Fields has enabled a previously unseen level of photo-realistic 3D reconstruction of scenes and objects from multi-view camera data. However, previous methods use an oversimplified pinhole camera model resulting in defocus blur being `baked' into the reconstructed radiance field. We propose a modification to the ray casting that leverages the opt… ▽ More In recent years, the development of Neural Radiance Fields has enabled a previously unseen level of photo-realistic 3D reconstruction of scenes and objects from multi-view camera data. However, previous methods use an oversimplified pinhole camera model resulting in defocus blur being `baked' into the reconstructed radiance field. We propose a modification to the ray casting that leverages the optics of lenses to enhance scene reconstruction in the presence of defocus blur. This allows us to improve the quality of radiance field reconstructions from the measurements of a practical camera with finite aperture. We show that the proposed model matches the defocus blur behavior of practical cameras more closely than pinhole models and other approximations of defocus blur models, particularly in the presence of partial occlusions. This allows us to achieve sharper reconstructions, improving the PSNR on validation of all-in-focus images, on both synthetic and real datasets, by up to 3 dB. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.08042 [pdf, other]

LLAniMAtion: LLAMA Driven Gesture Animation

Authors: Jonathan Windle, Iain Matthews, Sarah Taylor

Abstract: Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal.… ▽ More Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal. In this paper we instead experiment with using LLM features for gesture generation that are extracted from text using LLAMA2. We compare against audio features, and explore combining the two modalities in both objective tests and a user study. Surprisingly, our results show that LLAMA2 features on their own perform significantly better than audio features and that including both modalities yields no significant difference to using LLAMA2 features in isolation. We demonstrate that the LLAMA2 based model can generate both beat and semantic gestures without any audio input, suggesting LLMs can provide rich encodings that are well suited for gesture generation. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2403.11634 [pdf, other]

doi 10.1109/ICCVW60793.2023.00453

Personalized 3D Human Pose and Shape Refinement

Authors: Tom Wehrbein, Bodo Rosenhahn, Iain Matthews, Carsten Stoll

Abstract: Recently, regression-based methods have dominated the field of 3D human pose and shape estimation. Despite their promising results, a common issue is the misalignment between predictions and image observations, often caused by minor joint rotation errors that accumulate along the kinematic chain. To address this issue, we propose to construct dense correspondences between initial human model estim… ▽ More Recently, regression-based methods have dominated the field of 3D human pose and shape estimation. Despite their promising results, a common issue is the misalignment between predictions and image observations, often caused by minor joint rotation errors that accumulate along the kinematic chain. To address this issue, we propose to construct dense correspondences between initial human model estimates and the corresponding images that can be used to refine the initial predictions. To this end, we utilize renderings of the 3D models to predict per-pixel 2D displacements between the synthetic renderings and the RGB images. This allows us to effectively integrate and exploit appearance information of the persons. Our per-pixel displacements can be efficiently transformed to per-visible-vertex displacements and then used for 3D model refinement by minimizing a reprojection loss. To demonstrate the effectiveness of our approach, we refine the initial 3D human mesh predictions of multiple models using different refinement procedures on 3DPW and RICH. We show that our approach not only consistently leads to better image-model alignment, but also to improved 3D accuracy. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted to 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Journal ref: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

arXiv:2103.10212 [pdf, other]

Stochastic Simulation Techniques for Inference and Sensitivity Analysis of Bayesian Attack Graphs

Authors: Isaac Matthews, Sadegh Soudjani, Aad van Moorsel

Abstract: A vulnerability scan combined with information about a computer network can be used to create an attack graph, a model of how the elements of a network could be used in an attack to reach specific states or goals in the network. These graphs can be understood probabilistically by turning them into Bayesian attack graphs, making it possible to quantitatively analyse the security of large networks.… ▽ More A vulnerability scan combined with information about a computer network can be used to create an attack graph, a model of how the elements of a network could be used in an attack to reach specific states or goals in the network. These graphs can be understood probabilistically by turning them into Bayesian attack graphs, making it possible to quantitatively analyse the security of large networks. In the event of an attack, probabilities on the graph change depending on the evidence discovered (e.g., by an intrusion detection system or knowledge of a host's activity). Since such scenarios are difficult to solve through direct computation, we discuss and compare three stochastic simulation techniques for updating the probabilities dynamically based on the evidence and compare their speed and accuracy. From our experiments we conclude that likelihood weighting is most efficient for most uses. We also consider sensitivity analysis of BAGs, to identify the most critical nodes for protection of the network and solve the uncertainty problem in the assignment of priors to nodes. Since sensitivity analysis can easily become computationally expensive, we present and demonstrate an efficient sensitivity analysis approach that exploits a quantitative relation with stochastic inference. △ Less

Submitted 18 March, 2021; originally announced March 2021.

arXiv:2005.06350 [pdf, other]

doi 10.1109/TrustCom50675.2020.00030

Cyclic Bayesian Attack Graphs: A Systematic Computational Approach

Authors: Isaac Matthews, John Mace, Sadegh Soudjani, Aad van Moorsel

Abstract: Attack graphs are commonly used to analyse the security of medium-sized to large networks. Based on a scan of the network and likelihood information of vulnerabilities, attack graphs can be transformed into Bayesian Attack Graphs (BAGs). These BAGs are used to evaluate how security controls affect a network and how changes in topology affect security. A challenge with these automatically generated… ▽ More Attack graphs are commonly used to analyse the security of medium-sized to large networks. Based on a scan of the network and likelihood information of vulnerabilities, attack graphs can be transformed into Bayesian Attack Graphs (BAGs). These BAGs are used to evaluate how security controls affect a network and how changes in topology affect security. A challenge with these automatically generated BAGs is that cycles arise naturally, which make it impossible to use Bayesian network theory to calculate state probabilities. In this paper we provide a systematic approach to analyse and perform computations over cyclic Bayesian attack graphs. %thus providing a generic approach to handle cycles as well as unifying the theory of Bayesian attack graphs. Our approach first formally introduces two commonly used versions of Bayesian attack graphs and compares their expressiveness. We then present an interpretation of Bayesian attack graphs based on combinational logic circuits, which facilitates an intuitively attractive systematic treatment of cycles. We prove properties of the associated logic circuit and present an algorithm that computes state probabilities without altering the attack graphs (e.g., remove an arc to remove a cycle). Moreover, our algorithm deals seamlessly with all cycles without the need to identify their types. A set of experiments using synthetically created networks demonstrates the scalability of the algorithm on computer networks with hundreds of machines, each with multiple vulnerabilities. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:1910.00879 [pdf, other]

The Neural Moving Average Model for Scalable Variational Inference of State Space Models

Authors: Tom Ryder, Dennis Prangle, Andrew Golightly, Isaac Matthews

Abstract: Variational inference has had great success in scaling approximate Bayesian inference to big data by exploiting mini-batch training. To date, however, this strategy has been most applicable to models of independent data. We propose an extension to state space models of time series data based on a novel generative model for latent temporal states: the neural moving average model. This permits a sub… ▽ More Variational inference has had great success in scaling approximate Bayesian inference to big data by exploiting mini-batch training. To date, however, this strategy has been most applicable to models of independent data. We propose an extension to state space models of time series data based on a novel generative model for latent temporal states: the neural moving average model. This permits a subsequence to be sampled without drawing from the entire distribution, enabling training iterations to use mini-batches of the time series at low computational cost. We illustrate our method on autoregressive, Lotka-Volterra, FitzHugh-Nagumo and stochastic volatility models, achieving accurate parameter estimation in a short time. △ Less

Submitted 18 May, 2021; v1 submitted 2 October, 2019; originally announced October 2019.

arXiv:1906.03355 [pdf, other]

Learning Physics-guided Face Relighting under Directional Light

Authors: Thomas Nestmeyer, Jean-François Lalonde, Iain Matthews, Andreas M. Lehrmann

Abstract: Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the i… ▽ More Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the input image into intrinsic components according to a diffuse physics-based image formation model. We enable non-diffuse effects including cast shadows and specular highlights by predicting a residual correction to the diffuse render. To train and evaluate our model, we collected a portrait database of 21 subjects with various expressions and poses. Each sample is captured in a controlled light stage setup with 32 individual light sources. Our method creates precise and believable relighting results and generalizes to complex illumination conditions and challenging poses, including when the subject is not looking straight at the camera. △ Less

Submitted 19 April, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

Comments: CVPR 2020 (Oral)

arXiv:1704.07809 [pdf, other]

Hand Keypoint Detection in Single Images using Multiview Bootstrap**

Authors: Tomas Simon, Hanbyul Joo, Iain Matthews, Yaser Sheikh

Abstract: We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrap**: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outlie… ▽ More We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrap**: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outliers. Finally, the reprojected triangulations are used as new labeled training data to improve the detector. We repeat this process, generating more labeled data in each iteration. We derive a result analytically relating the minimum number of views to achieve target true and false positive rates for a given detector. The method is used to train a hand keypoint detector for single images. The resulting keypoint detector runs in realtime on RGB images and has accuracy comparable to methods that use depth sensors. The single view detector, triangulated over multiple views, enables 3D markerless hand motion capture with complex object interactions. △ Less

Submitted 25 April, 2017; originally announced April 2017.

Comments: CVPR 2017

arXiv:1612.03153 [pdf, other]

Panoptic Studio: A Massively Multiview System for Social Interaction Capture

Authors: Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh

Abstract: We present an approach to capture the 3D motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent; (2) subtle motion needs to be measured over a space large enough to host a social group; (3) human appearance and configuration variation is immense; and (4) attaching markers to the body may prime… ▽ More We present an approach to capture the 3D motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent; (2) subtle motion needs to be measured over a space large enough to host a social group; (3) human appearance and configuration variation is immense; and (4) attaching markers to the body may prime the nature of interactions. The Panoptic Studio is a system organized around the thesis that social interactions should be measured through the integration of perceptual analyses over a large variety of view points. We present a modularized system designed around this principle, consisting of integrated structural, hardware, and software innovations. The system takes, as input, 480 synchronized video streams of multiple people engaged in social activities, and produces, as output, the labeled time-varying 3D structure of anatomical landmarks on individuals in the space. Our algorithm is designed to fuse the "weak" perceptual processes in the large number of views by progressively generating skeletal proposals from low-level appearance cues, and a framework for temporal refinement is also presented by associating body parts to reconstructed dense 3D trajectory stream. Our system and method are the first in reconstructing full body motion of more than five people engaged in social interactions without using markers. We also empirically demonstrate the impact of the number of views in achieving this goal. △ Less

Submitted 9 December, 2016; originally announced December 2016.

Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

Showing 1–9 of 9 results for author: Matthews, I