-
Extending Network Calculus To Deal With Partially Negative And Decreasing Service Curves
Authors:
Anja Hamscher,
Vlad-Cristian Constantin,
Jens B. Schmitt
Abstract:
Network Calculus (NC) is a versatile analytical methodology to efficiently compute performance bounds in networked systems. The arrival and service curve abstractions allow to model diverse and heterogeneous distributed systems. The operations to compute residual service curves and to concatenate sequences of systems enable an efficient and accurate calculation of per-flow timing guarantees. Yet,…
▽ More
Network Calculus (NC) is a versatile analytical methodology to efficiently compute performance bounds in networked systems. The arrival and service curve abstractions allow to model diverse and heterogeneous distributed systems. The operations to compute residual service curves and to concatenate sequences of systems enable an efficient and accurate calculation of per-flow timing guarantees. Yet, in some scenarios involving multiple concurrent flows at a system, the central notion of so-called min-plus service curves is too weak to still be able to compute a meaningful residual service curve. In these cases, one usually resorts to so-called strict service curves that enable the computation of per-flow bounds. However, strict service curves are restrictive: (1) there are service elements for which only min-plus service curves can be provided but not strict ones and (2) strict service curves generally have no concatenation property, i.e., a sequence of two strict systems does not yield a strict service curve. In this report, we extend NC to deal with systems only offering aggregate min-plus service curves to multiple flows. The key to this extension is the exploitation of minimal arrival curves, i.e., lower bounds on the arrival process. Technically speaking, we provide basic performance bounds (backlog and delay) for the case of negative service curves. We also discuss their accuracy and show them to be tight. In order to illustrate their usefulness we also present patterns of application of these new results for: (1) heterogeneous systems involving computation and communication resources and (2) finite buffers that are shared between multiple flows.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Improving Performance Bounds for Weighted Round-Robin Schedulers under Constrained Cross-Traffic
Authors:
Vlad-Cristian Constantin,
Paul Nikolaus,
Jens Schmitt
Abstract:
Weighted round robin (WRR) is an effective, yet particularly easy-to-implement packet scheduler. A slight modification in the implementation of WRR, interleaved weighted round robin, has been proposed as an enhancement of the initial version and has been recently investigated. Network calculus is a versatile framework to model and analyze such network schedulers. By means of this, one can derive t…
▽ More
Weighted round robin (WRR) is an effective, yet particularly easy-to-implement packet scheduler. A slight modification in the implementation of WRR, interleaved weighted round robin, has been proposed as an enhancement of the initial version and has been recently investigated. Network calculus is a versatile framework to model and analyze such network schedulers. By means of this, one can derive theoretical upper bounds on network performance metrics, such as delay or backlog. In our previous work, we derive performance bounds by showing that both round-robin variants belong to a class called bandwidth-sharing policy; however, the proofs are incomplete and thus, we cannot conclude that the round-robin schedulers are bandwidth-sharing policies (under variable packet sizes).To that end, in the subsequent erratum, we introduce so-called resource-segregating policies and show the round-robin schedulers to be members of this class.
We first present our original work, as published in [CNS22-1], and then the erratum correcting the previously mentioned shortcoming. In our erratum, we provide slightly worse delay bounds compared to [CNS22-1]; yet, across all our experiments, they significantly outperform the state of the art.
△ Less
Submitted 12 December, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Adversarial Parametric Pose Prior
Authors:
Andrey Davydov,
Anastasia Remizova,
Victor Constantin,
Sina Honari,
Mathieu Salzmann,
Pascal Fua
Abstract:
The Skinned Multi-Person Linear (SMPL) model can represent a human body by map** pose and shape parameters to body meshes. This has been shown to facilitate inferring 3D human pose and shape from images via different learning models. However, not all pose and shape parameter values yield physically-plausible or even realistic body meshes. In other words, SMPL is under-constrained and may thus le…
▽ More
The Skinned Multi-Person Linear (SMPL) model can represent a human body by map** pose and shape parameters to body meshes. This has been shown to facilitate inferring 3D human pose and shape from images via different learning models. However, not all pose and shape parameter values yield physically-plausible or even realistic body meshes. In other words, SMPL is under-constrained and may thus lead to invalid results when used to reconstruct humans from images, either by directly optimizing its parameters, or by learning a map** from the image to these parameters.
In this paper, we therefore learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training. We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images. We found that the prior based on spherical distribution gets the best results. Furthermore, in all these tasks, it outperforms the state-of-the-art VAE-based approach to constraining the SMPL parameters.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation
Authors:
Sina Honari,
Victor Constantin,
Helge Rhodin,
Mathieu Salzmann,
Pascal Fua
Abstract:
In this paper we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative…
▽ More
In this paper we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.
△ Less
Submitted 25 November, 2022; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Self-supervised Segmentation via Background Inpainting
Authors:
Isinsu Katircioglu,
Helge Rhodin,
Victor Constantin,
Jörg Spörri,
Mathieu Salzmann,
Pascal Fua
Abstract:
While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving c…
▽ More
While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. At the heart of our approach lies the observation that object segmentation and background reconstruction are linked tasks, and that, for structured scenes, background regions can be re-synthesized from their surroundings, whereas regions depicting the moving object cannot. We encode this intuition into a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of the proposals, we develop a Monte Carlo-based training strategy that allows the algorithm to explore the large space of object proposals. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
GarNet++: Improving Fast and Accurate Static3D Cloth Dra** by Curvature Loss
Authors:
Erhan Gundogdu,
Victor Constantin,
Shaifali Parashar,
Amrollah Seifoddini,
Minh Dang,
Mathieu Salzmann,
Pascal Fua
Abstract:
In this paper, we tackle the problem of static 3D cloth dra** on virtual human bodies. We introduce a two-stream deep network model that produces a visually plausible dra** of a template cloth on virtual 3D bodies by extracting features from both the body and garment shapes. Our network learns to mimic a Physics-Based Simulation (PBS) method while requiring two orders of magnitude less computa…
▽ More
In this paper, we tackle the problem of static 3D cloth dra** on virtual human bodies. We introduce a two-stream deep network model that produces a visually plausible dra** of a template cloth on virtual 3D bodies by extracting features from both the body and garment shapes. Our network learns to mimic a Physics-Based Simulation (PBS) method while requiring two orders of magnitude less computation time. To train the network, we introduce loss terms inspired by PBS to produce plausible results and make the model collision-aware. To increase the details of the draped garment, we introduce two loss functions that penalize the difference between the curvature of the predicted cloth and PBS. Particularly, we study the impact of mean curvature normal and a novel detail-preserving loss both qualitatively and quantitatively. Our new curvature loss computes the local covariance matrices of the 3D points, and compares the Rayleigh quotients of the prediction and PBS. This leads to more details while performing favorably or comparably against the loss that considers mean curvature normal vectors in the 3D triangulated meshes. We validate our framework on four garment types for various body shapes and poses. Finally, we achieve superior performance against a recently proposed data-driven method.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Self-supervised Training of Proposal-based Segmentation via Background Prediction
Authors:
Isinsu Katircioglu,
Helge Rhodin,
Victor Constantin,
Jörg Spörri,
Mathieu Salzmann,
Pascal Fua
Abstract:
While supervised object detection methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this in scenarios where annotating data is prohibitively expensive, we introduce a self-supervised approach to object detection and segmentation, able to work with monocular images captured with a moving c…
▽ More
While supervised object detection methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this in scenarios where annotating data is prohibitively expensive, we introduce a self-supervised approach to object detection and segmentation, able to work with monocular images captured with a moving camera. At the heart of our approach lies the observation that segmentation and background reconstruction are linked tasks, and the idea that, because we observe a structured scene, background regions can be re-synthesized from their surroundings, whereas regions depicting the object cannot. We therefore encode this intuition as a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of object proposals, we develop a Monte Carlo-based training strategy that allows us to explore the large space of object proposals. Our experiments demonstrate that our approach yields accurate detections and segmentations in images that visually depart from those of standard benchmarks, outperforming existing self-supervised methods and approaching weakly supervised ones that exploit large annotated datasets.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Neural Scene Decomposition for Multi-Person Motion Capture
Authors:
Helge Rhodin,
Victor Constantin,
Isinsu Katircioglu,
Mathieu Salzmann,
Pascal Fua
Abstract:
Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these feat…
▽ More
Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use.
In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Dra**
Authors:
Erhan Gundogdu,
Victor Constantin,
Amrollah Seifoddini,
Minh Dang,
Mathieu Salzmann,
Pascal Fua
Abstract:
While Physics-Based Simulation (PBS) can accurately drape a 3D garment on a 3D body, it remains too costly for real-time applications, such as virtual try-on. By contrast, inference in a deep network, requiring a single forward pass, is much faster. Taking advantage of this, we propose a novel architecture to fit a 3D garment template to a 3D body. Specifically, we build upon the recent progress i…
▽ More
While Physics-Based Simulation (PBS) can accurately drape a 3D garment on a 3D body, it remains too costly for real-time applications, such as virtual try-on. By contrast, inference in a deep network, requiring a single forward pass, is much faster. Taking advantage of this, we propose a novel architecture to fit a 3D garment template to a 3D body. Specifically, we build upon the recent progress in 3D point cloud processing with deep networks to extract garment features at varying levels of detail, including point-wise, patch-wise and global features. We fuse these features with those extracted in parallel from the 3D body, so as to model the cloth-body interactions. The resulting two-stream architecture, which we call as GarNet, is trained using a loss function inspired by physics-based modeling, and delivers visually plausible garment shapes whose 3D points are, on average, less than 1 cm away from those of a PBS method, while running 100 times faster. Moreover, the proposed method can model various garment types with different cutting patterns when parameters of those patterns are given as input to the network.
△ Less
Submitted 21 August, 2019; v1 submitted 27 November, 2018;
originally announced November 2018.
-
Learning Monocular 3D Human Pose Estimation from Multi-view Images
Authors:
Helge Rhodin,
Jörg Spörri,
Isinsu Katircioglu,
Victor Constantin,
Frédéric Meyer,
Erich Müller,
Mathieu Salzmann,
Pascal Fua
Abstract:
Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views, at…
▽ More
Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views, at training time only. Specifically, we train the system to predict the same pose in all views. Such a consistency constraint is necessary but not sufficient to predict accurate poses. We therefore complement it with a supervised loss aiming to predict the correct pose in a small set of labeled images, and with a regularization term that penalizes drift from initial predictions. Furthermore, we propose a method to estimate camera pose jointly with human pose, which lets us utilize multi-view footage where calibration is difficult, e.g., for pan-tilt or moving handheld cameras. We demonstrate the effectiveness of our approach on established benchmarks, as well as on a new Ski dataset with rotating cameras and expert ski motion, for which annotations are truly hard to obtain.
△ Less
Submitted 24 March, 2018; v1 submitted 13 March, 2018;
originally announced March 2018.