Search | arXiv e-print repository

Recognition of Unseen Bird Species by Learning from Field Guides

Authors: Andrés C. Rodríguez, Stefano D'Aronco, Rodrigo Caye Daudt, Jan D. Wegner, Konrad Schindler

Abstract: We exploit field guides to learn bird species recognition, in particular zero-shot recognition of unseen species. Illustrations contained in field guides deliberately focus on discriminative properties of each species, and can serve as side information to transfer knowledge from seen to unseen bird species. We study two approaches: (1) a contrastive encoding of illustrations, which can be fed into… ▽ More We exploit field guides to learn bird species recognition, in particular zero-shot recognition of unseen species. Illustrations contained in field guides deliberately focus on discriminative properties of each species, and can serve as side information to transfer knowledge from seen to unseen bird species. We study two approaches: (1) a contrastive encoding of illustrations, which can be fed into standard zero-shot learning schemes; and (2) a novel method that leverages the fact that illustrations are also images and as such structurally more similar to photographs than other kinds of side information. Our results show that illustrations from field guides, which are readily available for a wide range of species, are indeed a competitive source of side information for zero-shot learning. On a subset of the iNaturalist2021 dataset with 749 seen and 739 unseen species, we obtain a classification accuracy of unseen bird species of $12\%$ @top-1 and $38\%$ @top-10, which shows the potential of field guides for challenging real-world scenarios with many species. Our code is available at https://github.com/ac-rodriguez/zsl_billow △ Less

Submitted 2 November, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Accepted to WACV2024

arXiv:2206.00050 [pdf, other]

FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation

Authors: Mehmet Ozgur Turkoglu, Alexander Becker, Hüseyin Anil Gündüz, Mina Rezaei, Bernd Bischl, Rodrigo Caye Daudt, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler

Abstract: The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computatio… ▽ More The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computational cost and high memory demand. This challenges in particular modern deep learning, where even a single deep network is already demanding in terms of compute and memory, and has given rise to a number of attempts to emulate the model ensemble without actually instantiating separate ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation (FiLM). That technique was originally developed for multi-task learning, with the aim of decoupling different tasks. We show that the idea can be extended to uncertainty quantification: by modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity, and consequently well-calibrated estimates of epistemic uncertainty, with low computational overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit ensemble methods, and it and comes very close to the upper bound of an explicit ensemble of networks (sometimes even beating it), at a fraction of the memory cost. △ Less

Submitted 19 December, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

Comments: accepted at NeurIPS 2022

arXiv:2203.14297 [pdf, other]

Learning Graph Regularisation for Guided Super-Resolution

Authors: Riccardo de Lutio, Alexander Becker, Stefano D'Aronco, Stefania Russo, Jan D. Wegner, Konrad Schindler

Abstract: We introduce a novel formulation for guided super-resolution. Its core is a differentiable optimisation layer that operates on a learned affinity graph. The learned graph potentials make it possible to leverage rich contextual information from the guide image, while the explicit graph optimisation within the architecture guarantees rigorous fidelity of the high-resolution target to the low-resolut… ▽ More We introduce a novel formulation for guided super-resolution. Its core is a differentiable optimisation layer that operates on a learned affinity graph. The learned graph potentials make it possible to leverage rich contextual information from the guide image, while the explicit graph optimisation within the architecture guarantees rigorous fidelity of the high-resolution target to the low-resolution source. With the decision to employ the source as a constraint rather than only as an input to the prediction, our method differs from state-of-the-art deep architectures for guided super-resolution, which produce targets that, when downsampled, will only approximately reproduce the source. This is not only theoretically appealing, but also produces crisper, more natural-looking images. A key property of our method is that, although the graph connectivity is restricted to the pixel lattice, the associated edge potentials are learned with a deep feature extractor and can encode rich context information over large receptive fields. By taking advantage of the sparse graph connectivity, it becomes possible to propagate gradients through the optimisation layer and learn the edge potentials from data. We extensively evaluate our method on several datasets, and consistently outperform recent baselines in terms of quantitative reconstruction errors, while also delivering visually sharper outputs. Moreover, we demonstrate that our method generalises particularly well to new datasets not seen during training. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2202.05270 [pdf, other]

A Deep Learning Approach for Digital Color Reconstruction of Lenticular Films

Authors: Stefano D'Aronco, Giorgio Trumpy, David Pfluger, Jan Dirk Wegner

Abstract: We propose the first accurate digitization and color reconstruction process for historical lenticular film that is robust to artifacts. Lenticular films emerged in the 1920s and were one of the first technologies that permitted to capture full color information in motion. The technology leverages an RGB filter and cylindrical lenticules embossed on the film surface to encode the color in the horiz… ▽ More We propose the first accurate digitization and color reconstruction process for historical lenticular film that is robust to artifacts. Lenticular films emerged in the 1920s and were one of the first technologies that permitted to capture full color information in motion. The technology leverages an RGB filter and cylindrical lenticules embossed on the film surface to encode the color in the horizontal spatial dimension of the image. To project the pictures the encoding process was reversed using an appropriate analog device. In this work, we introduce an automated, fully digital pipeline to process the scan of lenticular films and colorize the image. Our method merges deep learning with a model-based approach in order to maximize the performance while making sure that the reconstructed colored images truthfully match the encoded color information. Our model employs different strategies to achieve an effective color reconstruction, in particular (i) we use data augmentation to create a robust lenticule segmentation network, (ii) we fit the lenticules raster prediction to obtain a precise vectorial lenticule localization, and (iii) we train a colorization network that predicts interpolation coefficients in order to obtain a truthful colorization. We validate the proposed method on a lenticular film dataset and compare it to other approaches. Since no colored groundtruth is available as reference, we conduct a user study to validate our method in a subjective manner. The results of the study show that the proposed method is largely preferred with respect to other existing and baseline methods. △ Less

Submitted 4 April, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2106.03774 [pdf, other]

Digital Taxonomist: Identifying Plant Species in Community Scientists' Photographs

Authors: Riccardo de Lutio, Yihang She, Stefano D'Aronco, Stefania Russo, Philipp Brun, Jan D. Wegner, Konrad Schindler

Abstract: Automatic identification of plant specimens from amateur photographs could improve species range maps, thus supporting ecosystems research as well as conservation efforts. However, classifying plant specimens based on image data alone is challenging: some species exhibit large variations in visual appearance, while at the same time different species are often visually similar; additionally, specie… ▽ More Automatic identification of plant specimens from amateur photographs could improve species range maps, thus supporting ecosystems research as well as conservation efforts. However, classifying plant specimens based on image data alone is challenging: some species exhibit large variations in visual appearance, while at the same time different species are often visually similar; additionally, species observations follow a highly imbalanced, long-tailed distribution due to differences in abundance as well as observer biases. On the other hand, most species observations are accompanied by side information about the spatial, temporal and ecological context. Moreover, biological species are not an unordered list of classes but embedded in a hierarchical taxonomic structure. We propose a multimodal deep learning model that takes into account these additional cues in a unified framework. Our Digital Taxonomist is able to identify plant species in photographs better than a classifier trained on the image content alone, the performance gained is over 6 percent points in terms of accuracy. △ Less

Submitted 5 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted for publication in the ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:2105.11207 [pdf, other]

doi 10.1016/j.rse.2021.112479

Map** oil palm density at country scale: An active learning approach

Authors: Andrés C. Rodríguez, Stefano D'Aronco, Konrad Schindler, Jan D. Wegner

Abstract: Accurate map** of oil palm is important for understanding its past and future impact on the environment. We propose to map and count oil palms by estimating tree densities per pixel for large-scale analysis. This allows for fine-grained analysis, for example regarding different planting patterns. To that end, we propose a new, active deep learning method to estimate oil palm density at large sca… ▽ More Accurate map** of oil palm is important for understanding its past and future impact on the environment. We propose to map and count oil palms by estimating tree densities per pixel for large-scale analysis. This allows for fine-grained analysis, for example regarding different planting patterns. To that end, we propose a new, active deep learning method to estimate oil palm density at large scale from Sentinel-2 satellite images, and apply it to generate complete maps for Malaysia and Indonesia. What makes the regression of oil palm density challenging is the need for representative reference data that covers all relevant geographical conditions across a large territory. Specifically for density estimation, generating reference data involves counting individual trees. To keep the associated labelling effort low we propose an active learning (AL) approach that automatically chooses the most relevant samples to be labelled. Our method relies on estimates of the epistemic model uncertainty and of the diversity among samples, making it possible to retrieve an entire batch of relevant samples in a single iteration. Moreover, our algorithm has linear computational complexity and is easily parallelisable to cover large areas. We use our method to compute the first oil palm density map with $10\,$m Ground Sampling Distance (GSD) , for all of Indonesia and Malaysia and for two different years, 2017 and 2019. The maps have a mean absolute error of $\pm$7.3 trees/$ha$, estimated from an independent validation set. We also analyse density variations between different states within a country and compare them to official estimates. According to our estimates there are, in total, $>1.2$ billion oil palms in Indonesia covering $>$15 million $ha$, and $>0.5$ billion oil palms in Malaysia covering $>6$ million $ha$. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Journal ref: Remote Sensing of Environment Volume 261, August 2021, 112479

arXiv:2103.02766 [pdf, other]

PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds

Authors: Yujia Liu, Stefano D'Aronco, Konrad Schindler, Jan Dirk Wegner

Abstract: We introduce PC2WF, the first end-to-end trainable deep network architecture to convert a 3D point cloud into a wireframe model. The network takes as input an unordered set of 3D points sampled from the surface of some object, and outputs a wireframe of that object, i.e., a sparse set of corner points linked by line segments. Recovering the wireframe is a challenging task, where the numbers of bot… ▽ More We introduce PC2WF, the first end-to-end trainable deep network architecture to convert a 3D point cloud into a wireframe model. The network takes as input an unordered set of 3D points sampled from the surface of some object, and outputs a wireframe of that object, i.e., a sparse set of corner points linked by line segments. Recovering the wireframe is a challenging task, where the numbers of both vertices and edges are different for every instance, and a-priori unknown. Our architecture gradually builds up the model: It starts by encoding the points into feature vectors. Based on those features, it identifies a pool of candidate vertices, then prunes those candidates to a final set of corner vertices and refines their locations. Next, the corners are linked with an exhaustive set of candidate edges, which is again pruned to obtain the final wireframe. All steps are trainable, and errors can be backpropagated through the entire sequence. We validate the proposed model on a publicly available synthetic dataset, for which the ground truth wireframes are accessible, as well as on a new real-world dataset. Our model produces wireframe abstractions of good quality and outperforms several baselines. △ Less

Submitted 3 March, 2021; originally announced March 2021.

arXiv:2102.08820 [pdf, other]

Crop map** from image time series: deep learning with multi-scale label hierarchies

Authors: Mehmet Ozgur Turkoglu, Stefano D'Aronco, Gregor Perich, Frank Liebisch, Constantin Streit, Konrad Schindler, Jan Dirk Wegner

Abstract: The aim of this paper is to map agricultural crops by classifying satellite image time series. Domain experts in agriculture work with crop type labels that are organised in a hierarchical tree structure, where coarse classes (like orchards) are subdivided into finer ones (like apples, pears, vines, etc.). We develop a crop classification method that exploits this expert knowledge and significantl… ▽ More The aim of this paper is to map agricultural crops by classifying satellite image time series. Domain experts in agriculture work with crop type labels that are organised in a hierarchical tree structure, where coarse classes (like orchards) are subdivided into finer ones (like apples, pears, vines, etc.). We develop a crop classification method that exploits this expert knowledge and significantly improves the map** of rare crop types. The three-level label hierarchy is encoded in a convolutional, recurrent neural network (convRNN), such that for each pixel the model predicts three labels at different level of granularity. This end-to-end trainable, hierarchical network architecture allows the model to learn joint feature representations of rare classes (e.g., apples, pears) at a coarser level (e.g., orchard), thereby boosting classification performance at the fine-grained level. Additionally, labelling at different granularity also makes it possible to adjust the output according to the classification scores; as coarser labels with high confidence are sometimes more useful for agricultural practice than fine-grained but very uncertain labels. We validate the proposed method on a new, large dataset that we make public. ZueriCrop covers an area of 50 km x 48 km in the Swiss cantons of Zurich and Thurgau with a total of 116'000 individual fields spanning 48 crop classes, and 28,000 (multi-temporal) image patches from Sentinel-2. We compare our proposed hierarchical convRNN model with several baselines, including methods designed for imbalanced class distributions. The hierarchical approach performs superior by at least 9.9 percentage points in F1-score. △ Less

Submitted 16 August, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

arXiv:2012.02542 [pdf, other]

Crop Classification under Varying Cloud Cover with Neural Ordinary Differential Equations

Authors: Nando Metzger, Mehmet Ozgur Turkoglu, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler

Abstract: Optical satellite sensors cannot see the Earth's surface through clouds. Despite the periodic revisit cycle, image sequences acquired by Earth observation satellites are therefore irregularly sampled in time. State-of-the-art methods for crop classification (and other time series analysis tasks) rely on techniques that implicitly assume regular temporal spacing between observations, such as recurr… ▽ More Optical satellite sensors cannot see the Earth's surface through clouds. Despite the periodic revisit cycle, image sequences acquired by Earth observation satellites are therefore irregularly sampled in time. State-of-the-art methods for crop classification (and other time series analysis tasks) rely on techniques that implicitly assume regular temporal spacing between observations, such as recurrent neural networks (RNNs). We propose to use neural ordinary differential equations (NODEs) in combination with RNNs to classify crop types in irregularly spaced image sequences. The resulting ODE-RNN models consist of two steps: an update step, where a recurrent unit assimilates new input data into the model's hidden state; and a prediction step, in which NODE propagates the hidden state until the next observation arrives. The prediction step is based on a continuous representation of the latent dynamics, which has several advantages. At the conceptual level, it is a more natural way to describe the mechanisms that govern the phenological cycle. From a practical point of view, it makes it possible to sample the system state at arbitrary points in time, such that one can integrate observations whenever they are available, and extrapolate beyond the last observation. Our experiments show that ODE-RNN indeed improves classification accuracy over common baselines such as LSTM, GRU, and temporal convolution. The gains are most prominent in the challenging scenario where only few observations are available (i.e., frequent cloud cover). Moreover, we show that the ability to extrapolate translates to better classification performance early in the season, which is important for forecasting. △ Less

Submitted 16 August, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2008.11201 [pdf, other]

Deep Active Learning in Remote Sensing for data efficient Change Detection

Authors: Vít Růžička, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler

Abstract: We investigate active learning in the context of deep neural network models for change detection and map updating. Active learning is a natural choice for a number of remote sensing tasks, including the detection of local surface changes: changes are on the one hand rare and on the other hand their appearance is varied and diffuse, making it hard to collect a representative training set in advance… ▽ More We investigate active learning in the context of deep neural network models for change detection and map updating. Active learning is a natural choice for a number of remote sensing tasks, including the detection of local surface changes: changes are on the one hand rare and on the other hand their appearance is varied and diffuse, making it hard to collect a representative training set in advance. In the active learning setting, one starts from a minimal set of training examples and progressively chooses informative samples that are annotated by a user and added to the training set. Hence, a core component of an active learning system is a mechanism to estimate model uncertainty, which is then used to pick uncertain, informative samples. We study different mechanisms to capture and quantify this uncertainty when working with deep networks, based on the variance or entropy across explicit or implicit model ensembles. We show that active learning successfully finds highly informative samples and automatically balances the training distribution, and reaches the same performance as a model supervised with a large, pre-annotated training set, with $\approx$99% fewer annotated samples. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: 10 pages, 5 figures, ECML/PKDD Workshop on Machine Learning for Earth Observation, 2020

arXiv:2007.06749 [pdf, other]

Water level prediction from social media images with a multi-task ranking approach

Authors: P. Chaudhary, S. D'Aronco, J. P. Leitao, K. Schindler, J. D. Wegner

Abstract: Floods are among the most frequent and catastrophic natural disasters and affect millions of people worldwide. It is important to create accurate flood maps to plan (offline) and conduct (real-time) flood mitigation and flood rescue operations. Arguably, images collected from social media can provide useful information for that task, which would otherwise be unavailable. We introduce a computer vi… ▽ More Floods are among the most frequent and catastrophic natural disasters and affect millions of people worldwide. It is important to create accurate flood maps to plan (offline) and conduct (real-time) flood mitigation and flood rescue operations. Arguably, images collected from social media can provide useful information for that task, which would otherwise be unavailable. We introduce a computer vision system that estimates water depth from social media images taken during flooding events, in order to build flood maps in (near) real-time. We propose a multi-task (deep) learning approach, where a model is trained using both a regression and a pairwise ranking loss. Our approach is motivated by the observation that a main bottleneck for image-based flood level estimation is training data: it is diffcult and requires a lot of effort to annotate uncontrolled images with the correct water depth. We demonstrate how to effciently learn a predictor from a small set of annotated water levels and a larger set of weaker annotations that only indicate in which of two images the water level is higher, and are much easier to obtain. Moreover, we provide a new dataset, named DeepFlood, with 8145 annotated ground-level images, and show that the proposed multi-task approach can predict the water level from a single, crowd-sourced image with ~11 cm root mean square error. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted in ISPRS Journal 2020

arXiv:2003.10151 [pdf, other]

GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end

Authors: Ahmed Samy Nassar, Stefano D'Aronco, Sébastien Lefèvre, Jan D. Wegner

Abstract: In this paper we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object. Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions given images and approximate camera poses as input. Our GNN simultaneously models relative pose a… ▽ More In this paper we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object. Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions given images and approximate camera poses as input. Our GNN simultaneously models relative pose and image evidence, and is further able to deal with an arbitrary number of input views. Our method is robust to occlusion, with similar appearance of neighboring objects, and severe changes in viewpoints by jointly reasoning about visual image appearance and relative pose. Experimental evaluation on two challenging, large-scale datasets and comparison with state-of-the-art methods show significant and systematic improvements both in accuracy and efficiency, with 2-6% gain in detection and re-ID average precision as well as 8x reduction of training time. △ Less

Submitted 24 March, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

arXiv:2003.09168 [pdf, other]

Fine-grained Species Recognition with Privileged Pooling: Better Sample Efficiency Through Supervised Attention

Authors: Andres C. Rodriguez, Stefano D'Aronco, Konrad Schindler, Jan Dirk Wegner

Abstract: We propose a scheme for supervised image classification that uses privileged information, in the form of keypoint annotations for the training data, to learn strong models from small and/or biased training sets. Our main motivation is the recognition of animal species for ecological applications such as biodiversity modelling, which is challenging because of long-tailed species distributions due t… ▽ More We propose a scheme for supervised image classification that uses privileged information, in the form of keypoint annotations for the training data, to learn strong models from small and/or biased training sets. Our main motivation is the recognition of animal species for ecological applications such as biodiversity modelling, which is challenging because of long-tailed species distributions due to rare species, and strong dataset biases such as repetitive scene background in camera traps. To counteract these challenges, we propose a visual attention mechanism that is supervised via keypoint annotations that highlight important object parts. This privileged information, implemented as a novel privileged pooling operation, is only required during training and helps the model to focus on regions that are discriminative. In experiments with three different animal species datasets, we show that deep networks with privileged pooling can use small training sets more efficiently and generalize better. △ Less

Submitted 4 August, 2023; v1 submitted 20 March, 2020; originally announced March 2020.

Comments: Updated version with iNaturalist2018 dataset. privileged pooling, supervised attention, training set bias, fine-grained species recognition, camera trap images

arXiv:1911.11033 [pdf, other]

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

Authors: Mehmet Ozgur Turkoglu, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler

Abstract: We propose a new STAckable Recurrent cell (STAR) for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients. Stacking recurrent units into deep architectures suffers from two major limitations: (i) many recurrent cells (e.g., LSTMs) are costly in terms of parameters and computation resources; and (… ▽ More We propose a new STAckable Recurrent cell (STAR) for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients. Stacking recurrent units into deep architectures suffers from two major limitations: (i) many recurrent cells (e.g., LSTMs) are costly in terms of parameters and computation resources; and (ii) deep RNNs are prone to vanishing or exploding gradients during training. We investigate the training of multi-layer RNNs and examine the magnitude of the gradients as they propagate through the network in the "vertical" direction. We show that, depending on the structure of the basic recurrent unit, the gradients are systematically attenuated or amplified. Based on our analysis we design a new type of gated cell that better preserves gradient magnitude. We validate our design on a large number of sequence modelling tasks and demonstrate that the proposed STAR cell allows to build and train deeper recurrent architectures, ultimately leading to improved performance while being computationally more efficient. △ Less

Submitted 6 March, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: To appear in TPAMI (accepted March 2021)

arXiv:1904.01501 [pdf, other]

Guided Super-Resolution as Pixel-to-Pixel Transformation

Authors: Riccardo de Lutio, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler

Abstract: Guided super-resolution is a unifying framework for several computer vision tasks where the inputs are a low-resolution source image of some target quantity (e.g., perspective depth acquired with a time-of-flight camera) and a high-resolution guide image from a different domain (e.g., a grey-scale image from a conventional camera); and the target output is a high-resolution version of the source (… ▽ More Guided super-resolution is a unifying framework for several computer vision tasks where the inputs are a low-resolution source image of some target quantity (e.g., perspective depth acquired with a time-of-flight camera) and a high-resolution guide image from a different domain (e.g., a grey-scale image from a conventional camera); and the target output is a high-resolution version of the source (in our example, a high-res depth map). The standard way of looking at this problem is to formulate it as a super-resolution task, i.e., the source image is upsampled to the target resolution, while transferring the missing high-frequency details from the guide. Here, we propose to turn that interpretation on its head and instead see it as a pixel-to-pixel map** of the guide image to the domain of the source image. The pixel-wise map** is parametrised as a multi-layer perceptron, whose weights are learned by minimising the discrepancies between the source image and the downsampled target image. Importantly, our formulation makes it possible to regularise only the map** function, while avoiding regularisation of the outputs; thus producing crisp, natural-looking images. The proposed method is unsupervised, using only the specific source and guide images to fit the map**. We evaluate our method on two different tasks, super-resolution of depth maps and of tree height maps. In both cases, we clearly outperform recent baselines in quantitative comparisons, while delivering visually much sharper outputs. △ Less

Submitted 15 August, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: Extended version, ICCV 2019

arXiv:1711.07530 [pdf, other]

doi 10.1109/TNSE.2018.2832247

Online Resource Inference in Network Utility Maximization Problems

Authors: Stefano D'Aronco, Pascal Frossard

Abstract: The amount of transmitted data in computer networks is expected to grow considerably in the future, putting more and more pressure on the network infrastructures. In order to guarantee a good service, it then becomes fundamental to use the network resources efficiently. Network Utility Maximization (NUM) provides a framework to optimize the rate allocation when network resources are limited. Unfor… ▽ More The amount of transmitted data in computer networks is expected to grow considerably in the future, putting more and more pressure on the network infrastructures. In order to guarantee a good service, it then becomes fundamental to use the network resources efficiently. Network Utility Maximization (NUM) provides a framework to optimize the rate allocation when network resources are limited. Unfortunately, in the scenario where the amount of available resources is not known a priori, classical NUM solving methods do not offer a viable solution. To overcome this limitation we design an overlay rate allocation scheme that attempts to infer the actual amount of available network resources while coordinating the users rate allocation. Due to the general and complex model assumed for the congestion measurements, a passive learning of the available resources would not lead to satisfying performance. The coordination scheme must then perform active learning in order to speed up the resources estimation and quickly increase the system performance. By adopting an optimal learning formulation we are able to balance the tradeoff between an accurate estimation, and an effective resources exploitation in order to maximize the long term quality of the service delivered to the users. △ Less

Submitted 10 May, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

arXiv:1701.01392 [pdf, other]

Price-based Controller for Quality-Fair HTTP Adaptive Streaming (Extended Version)

Authors: Stefano D'Aronco, Laura Toni, Pascal Frossard

Abstract: HTTP adaptive streaming (HAS) has become the universal technology for video streaming over the Internet. Many HAS system designs aim at sharing the network bandwidth in a rate-fair manner. However, rate fairness is in general not equivalent to quality fairness as different video sequences might have different characteristics and resource requirements. In this work, we focus on this limitation and… ▽ More HTTP adaptive streaming (HAS) has become the universal technology for video streaming over the Internet. Many HAS system designs aim at sharing the network bandwidth in a rate-fair manner. However, rate fairness is in general not equivalent to quality fairness as different video sequences might have different characteristics and resource requirements. In this work, we focus on this limitation and propose a novel controller for HAS clients that is able to reach quality fairness while preserving the main characteristics of HAS systems and with a limited support from the network devices. In particular, we adopt a price-based mechanism in order to build a controller that maximizes the aggregate video quality for a set of HAS clients that share a common bottleneck. When network resources are scarce, the clients with simple video sequences reduce the requested bitrate in favor of users that subscribe to more complex video sequences, leading to a more efficient network usage. The proposed controller has been implemented in a network simulator, and the simulation results demonstrate its ability to share the available bandwidth among the HAS users in a quality-fair manner. △ Less

Submitted 5 January, 2017; originally announced January 2017.

arXiv:1506.02799 [pdf, other]

doi 10.1109/TNET.2016.2587579

Improved Utility-based Congestion Control for Delay-Constrained Communication

Authors: Stefano D'Aronco, Laura Toni, Sergio Mena, Xiaoqing Zhu, Pascal Frossard

Abstract: Due to the presence of buffers in the inner network nodes, each congestion event leads to buffer queueing and thus to an increasing end-to-end delay. In the case of delay sensitive applications, a large delay might not be acceptable and a solution to properly manage congestion events while maintaining a low end-to-end delay is required. Delay-based congestion algorithms are a viable solution as th… ▽ More Due to the presence of buffers in the inner network nodes, each congestion event leads to buffer queueing and thus to an increasing end-to-end delay. In the case of delay sensitive applications, a large delay might not be acceptable and a solution to properly manage congestion events while maintaining a low end-to-end delay is required. Delay-based congestion algorithms are a viable solution as they target to limit the experienced end-to-end delay. Unfortunately, they do not perform well when sharing the bandwidth with congestion control algorithms not regulated by delay constraints (e.g., loss-based algorithms). Our target is to fill this gap, proposing a novel congestion control algorithm for delay-constrained communication over best effort packet switched networks. The proposed algorithm is able to maintain a bounded queueing delay when competing with other delay-based flows, and avoid starvation when competing with loss-based flows. We adopt the well-known price-based distributed mechanism as congestion control, but: 1) we introduce a novel non-linear map** between the experienced delay and the price function and 2) we combine both delay and loss information into a single price term based on packet interarrival measurements. We then provide a stability analysis for our novel algorithm and we show its performance in the simulation results carried out in the NS3 framework. Simulation results demonstrate that the proposed algorithm is able to: achieve good intra-protocol fairness properties, control efficiently the end-to-end delay, and finally, protect the flow from starvation when other flows cause the queuing delay to grow excessively. △ Less

Submitted 20 January, 2017; v1 submitted 9 June, 2015; originally announced June 2015.

Showing 1–18 of 18 results for author: D'Aronco, S