Search | arXiv e-print repository

SyntStereo2Real: Edge-Aware GAN for Remote Sensing Image-to-Image Translation while Maintaining Stereo Constraint

Authors: Vasudha Venkatesan, Daniel Panangian, Mario Fuentes Reyes, Ksenia Bittner

Abstract: In the field of remote sensing, the scarcity of stereo-matched and particularly lack of accurate ground truth data often hinders the training of deep neural networks. The use of synthetically generated images as an alternative, alleviates this problem but suffers from the problem of domain generalization. Unifying the capabilities of image-to-image translation and stereo-matching presents an effec… ▽ More In the field of remote sensing, the scarcity of stereo-matched and particularly lack of accurate ground truth data often hinders the training of deep neural networks. The use of synthetically generated images as an alternative, alleviates this problem but suffers from the problem of domain generalization. Unifying the capabilities of image-to-image translation and stereo-matching presents an effective solution to address the issue of domain generalization. Current methods involve combining two networks, an unpaired image-to-image translation network and a stereo-matching network, while jointly optimizing them. We propose an edge-aware GAN-based network that effectively tackles both tasks simultaneously. We obtain edge maps of input images from the Sobel operator and use it as an additional input to the encoder in the generator to enforce geometric consistency during translation. We additionally include a war** loss calculated from the translated images to maintain the stereo consistency. We demonstrate that our model produces qualitatively and quantitatively superior results than existing models, and its applicability extends to diverse domains, including autonomous driving. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted to IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) EarthVision

arXiv:2404.03930 [pdf, other]

Real-GDSR: Real-World Guided DSM Super-Resolution via Edge-Enhancing Residual Network

Authors: Daniel Panangian, Ksenia Bittner

Abstract: A low-resolution digital surface model (DSM) features distinctive attributes impacted by noise, sensor limitations and data acquisition conditions, which failed to be replicated using simple interpolation methods like bicubic. This causes super-resolution models trained on synthetic data does not perform effectively on real ones. Training a model on real low and high resolution DSMs pairs is also… ▽ More A low-resolution digital surface model (DSM) features distinctive attributes impacted by noise, sensor limitations and data acquisition conditions, which failed to be replicated using simple interpolation methods like bicubic. This causes super-resolution models trained on synthetic data does not perform effectively on real ones. Training a model on real low and high resolution DSMs pairs is also a challenge because of the lack of information. On the other hand, the existence of other imaging modalities of the same scene can be used to enrich the information needed for large-scale super-resolution. In this work, we introduce a novel methodology to address the intricacies of real-world DSM super-resolution, named REAL-GDSR, breaking down this ill-posed problem into two steps. The first step involves the utilization of a residual local refinement network. This strategic approach departs from conventional methods that trained to directly predict height values instead of the differences (residuals) and utilize large receptive fields in their networks. The second step introduces a diffusion-based technique that enhances the results on a global scale, with a primary focus on smoothing and edge preservation. Our experiments underscore the effectiveness of the proposed method. We conduct a comprehensive evaluation, comparing it to recent state-of-the-art techniques in the domain of real-world DSM super-resolution (SR). Our approach consistently outperforms these existing methods, as evidenced through qualitative and quantitative assessments. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Accepted for publication in the ISPRS Annals of Photogrammetry, Remote Sensing, and Spatial Information Sciences

arXiv:2401.03329 [pdf, other]

doi 10.1007/978-3-030-90525-5_38

Designing a Socially Assistive Robot to Support Older Adults with Low Vision

Authors: Emily Zhou, Zhonghao Shi, Xiaoyang Qiao, Maja J Matarić, Ava K Bittner

Abstract: Socially assistive robots (SARs) have shown great promise in supplementing and augmenting interventions to support the physical and mental well-being of older adults. However, past work has not yet explored the potential of applying SAR to lower the barriers of long-term low vision rehabilitation (LVR) interventions for older adults. In this work, we present a user-informed design process to valid… ▽ More Socially assistive robots (SARs) have shown great promise in supplementing and augmenting interventions to support the physical and mental well-being of older adults. However, past work has not yet explored the potential of applying SAR to lower the barriers of long-term low vision rehabilitation (LVR) interventions for older adults. In this work, we present a user-informed design process to validate the motivation and identify major design principles for develo** SAR for long-term LVR. To evaluate user-perceived usefulness and acceptance of SAR in this novel domain, we performed a two-phase study through user surveys. First, a group (n=38) of older adults with LV completed a mailed-in survey. Next, a new group (n=13) of older adults with LV saw an in-clinic SAR demo and then completed the survey. The study participants reported that SARs would be useful, trustworthy, easy to use, and enjoyable while providing socio-emotional support to augment LVR interventions. The in-clinic demo group reported significantly more positive opinions of the SAR's capabilities than did the baseline survey group that used mailed-in forms without the SAR demo. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: Published in Social Robotics: 13th International Conference, ICSR 2021. Springer International Publishing

arXiv:2104.06485 [pdf, other]

Machine-learned 3D Building Vectorization from Satellite Imagery

Authors: Yi Wang, Stefano Zorzi, Ksenia Bittner

Abstract: We propose a machine learning based approach for automatic 3D building reconstruction and vectorization. Taking a single-channel photogrammetric digital surface model (DSM) and panchromatic (PAN) image as input, we first filter out non-building objects and refine the building shapes of input DSM with a conditional generative adversarial network (cGAN). The refined DSM and the input PAN image are t… ▽ More We propose a machine learning based approach for automatic 3D building reconstruction and vectorization. Taking a single-channel photogrammetric digital surface model (DSM) and panchromatic (PAN) image as input, we first filter out non-building objects and refine the building shapes of input DSM with a conditional generative adversarial network (cGAN). The refined DSM and the input PAN image are then used through a semantic segmentation network to detect edges and corners of building roofs. Later, a set of vectorization algorithms are proposed to build roof polygons. Finally, the height information from the refined DSM is added to the polygons to obtain a fully vectorized level of detail (LoD)-2 building model. We verify the effectiveness of our method on large-scale satellite images, where we obtain state-of-the-art performance. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: Accepted to CVPR workshop (EarthVision 2021)

arXiv:2007.12587 [pdf, other]

Machine-learned Regularization and Polygonization of Building Segmentation Masks

Authors: Stefano Zorzi, Ksenia Bittner, Friedrich Fraundorfer

Abstract: We propose a machine learning based approach for automatic regularization and polygonization of building segmentation masks. Taking an image as input, we first predict building segmentation maps exploiting generic fully convolutional network (FCN). A generative adversarial network (GAN) is then involved to perform a regularization of building boundaries to make them more realistic, i.e., having mo… ▽ More We propose a machine learning based approach for automatic regularization and polygonization of building segmentation masks. Taking an image as input, we first predict building segmentation maps exploiting generic fully convolutional network (FCN). A generative adversarial network (GAN) is then involved to perform a regularization of building boundaries to make them more realistic, i.e., having more rectilinear outlines which construct right angles if required. This is achieved through the interplay between the discriminator which gives a probability of input image being true and generator that learns from discriminator's response to create more realistic images. Finally, we train the backbone convolutional neural network (CNN) which is adapted to predict sparse outcomes corresponding to building corners out of regularized building segmentation results. Experiments on three building segmentation datasets demonstrate that the proposed method is not only capable of obtaining accurate results, but also of producing visually pleasing building outlines parameterized as polygons. △ Less

Submitted 17 December, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

arXiv:2007.12470 [pdf, other]

Map-Repair: Deep Cadastre Maps Alignment and Temporal Inconsistencies Fix in Satellite Images

Authors: Stefano Zorzi, Ksenia Bittner, Friedrich Fraundorfer

Abstract: In the fast develo** countries it is hard to trace new buildings construction or old structures destruction and, as a result, to keep the up-to-date cadastre maps. Moreover, due to the complexity of urban regions or inconsistency of data used for cadastre maps extraction, the errors in form of misalignment is a common problem. In this work, we propose an end-to-end deep learning approach which i… ▽ More In the fast develo** countries it is hard to trace new buildings construction or old structures destruction and, as a result, to keep the up-to-date cadastre maps. Moreover, due to the complexity of urban regions or inconsistency of data used for cadastre maps extraction, the errors in form of misalignment is a common problem. In this work, we propose an end-to-end deep learning approach which is able to solve inconsistencies between the input intensity image and the available building footprints by correcting label noises and, at the same time, misalignments if needed. The obtained results demonstrate the robustness of the proposed method to even severely misaligned examples that makes it potentially suitable for real applications, like OpenStreetMap correction. △ Less

Submitted 24 July, 2020; originally announced July 2020.

arXiv:2007.00253 [pdf, other]

Private Speech Classification with Secure Multiparty Computation

Authors: Kyle Bittner, Martine De Cock, Rafael Dowsley

Abstract: Deep learning in audio signal processing, such as human voice audio signal classification, is a rich application area of machine learning. Legitimate use cases include voice authentication, gunfire detection, and emotion recognition. While there are clear advantages to automated human speech classification, application developers can gain knowledge beyond the professed scope from unprotected audio… ▽ More Deep learning in audio signal processing, such as human voice audio signal classification, is a rich application area of machine learning. Legitimate use cases include voice authentication, gunfire detection, and emotion recognition. While there are clear advantages to automated human speech classification, application developers can gain knowledge beyond the professed scope from unprotected audio signal processing. In this paper we propose the first privacy-preserving solution for deep learning-based audio classification that is provably secure. Our approach, which is based on Secure Multiparty Computation, allows to classify a speech signal of one party (Alice) with a deep neural network of another party (Bob) without Bob ever seeing Alice's speech signal in an unencrypted manner. As threat models, we consider both passive security, i.e. with semi-honest parties who follow the instructions of the cryptographic protocols, as well as active security, i.e. with malicious parties who deviate from the protocols. We evaluate the efficiency-security-accuracy trade-off of the proposed solution in a use case for privacy-preserving emotion detection from speech with a convolutional neural network. In the semi-honest case we can classify a speech signal in under 0.3 sec; in the malicious case it takes $\sim$1.6 sec. In both cases there is no leakage of information, and we achieve classification accuracies that are the same as when computations are done on unencrypted data. △ Less

Submitted 28 January, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:2004.02493 [pdf, other]

A Generalized Multi-Task Learning Approach to Stereo DSM Filtering in Urban Areas

Authors: Lukas Liebel, Ksenia Bittner, Marco Körner

Abstract: City models and height maps of urban areas serve as a valuable data source for numerous applications, such as disaster management or city planning. While this information is not globally available, it can be substituted by digital surface models (DSMs), automatically produced from inexpensive satellite imagery. However, stereo DSMs often suffer from noise and blur. Furthermore, they are heavily di… ▽ More City models and height maps of urban areas serve as a valuable data source for numerous applications, such as disaster management or city planning. While this information is not globally available, it can be substituted by digital surface models (DSMs), automatically produced from inexpensive satellite imagery. However, stereo DSMs often suffer from noise and blur. Furthermore, they are heavily distorted by vegetation, which is of lesser relevance for most applications. Such basic models can be filtered by convolutional neural networks (CNNs), trained on labels derived from digital elevation models (DEMs) and 3D city models, in order to obtain a refined DSM. We propose a modular multi-task learning concept that consolidates existing approaches into a generalized framework. Our encoder-decoder models with shared encoders and multiple task-specific decoders leverage roof type classification as a secondary task and multiple objectives including a conditional adversarial term. The contributing single-objective losses are automatically weighted in the final multi-task loss function based on learned uncertainty estimates. We evaluated the performance of specific instances of this family of network architectures. Our method consistently outperforms the state of the art on common data, both quantitatively and qualitatively, and generalizes well to a new dataset of an independent study area. △ Less

Submitted 7 April, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: This paper was accepted for publication in the ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:1904.09935 [pdf, other]

Late or Earlier Information Fusion from Depth and Spectral Data? Large-Scale Digital Surface Model Refinement by Hybrid-cGAN

Authors: Ksenia Bittner, Marco Körner, Peter Reinartz

Abstract: We present the workflow of a DSM refinement methodology using a Hybrid-cGAN where the generative part consists of two encoders and a common decoder which blends the spectral and height information within one network. The inputs to the Hybrid-cGAN are single-channel photogrammetric DSMs with continuous values and single-channel pan-chromatic (PAN) half-meter resolution satellite images. Experimenta… ▽ More We present the workflow of a DSM refinement methodology using a Hybrid-cGAN where the generative part consists of two encoders and a common decoder which blends the spectral and height information within one network. The inputs to the Hybrid-cGAN are single-channel photogrammetric DSMs with continuous values and single-channel pan-chromatic (PAN) half-meter resolution satellite images. Experimental results demonstrate that the earlier information fusion from data with different physical meanings helps to propagate fine details and complete an inaccurate or missing 3D information about building forms. Moreover, it improves the building boundaries making them more rectilinear. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: 8 pages, This work was accepted to be presented at the IEEE/ISPRS Workshop on Large Scale Computer Vision for Remote Sensing Imagery (EarthVision) to be held at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019

arXiv:1903.03519 [pdf, other]

DSM Building Shape Refinement from Combined Remote Sensing Images based on Wnet-cGANs

Authors: Ksenia Bittner, Marco Körner, Peter Reinartz

Abstract: We describe the workflow of a digital surface models (DSMs) refinement algorithm using a hybrid conditional generative adversarial network (cGAN) where the generative part consists of two parallel networks merged at the last stage forming a WNet architecture. The inputs to the so-called WNet-cGAN are stereo DSMs and panchromatic (PAN) half-meter resolution satellite images. Fusing these helps to p… ▽ More We describe the workflow of a digital surface models (DSMs) refinement algorithm using a hybrid conditional generative adversarial network (cGAN) where the generative part consists of two parallel networks merged at the last stage forming a WNet architecture. The inputs to the so-called WNet-cGAN are stereo DSMs and panchromatic (PAN) half-meter resolution satellite images. Fusing these helps to propagate fine detailed information from a spectral image and complete the missing 3D knowledge from a stereo DSM about building shapes. Besides, it refines the building outlines and edges making them more rectangular and sharp. △ Less

Submitted 8 March, 2019; originally announced March 2019.

Showing 1–10 of 10 results for author: Bittner, K