Search | arXiv e-print repository

Adapters Strike Back

Authors: Jan-Martin O. Steitz, Stefan Roth

Abstract: Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However, they have often been found to be outperformed by other adaptation mechanisms, including low-rank adaptation. In this paper, we provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We uncover pitfalls for usi… ▽ More Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However, they have often been found to be outperformed by other adaptation mechanisms, including low-rank adaptation. In this paper, we provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We uncover pitfalls for using adapters and suggest a concrete, improved adapter architecture, called Adapter+, that not only outperforms previous adapter implementations but surpasses a number of other, more complex adaptation mechanisms in several challenging settings. Despite this, our suggested adapter is highly robust and, unlike previous work, requires little to no manual intervention when addressing a novel scenario. Adapter+ reaches state-of-the-art average accuracy on the VTAB benchmark, even without a per-task hyperparameter optimization. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: To appear at CVPR 2024. Code: https://github.com/visinf/adapter_plus

arXiv:2406.04026 [pdf]

Quantification of Collateral Supply with Local-AIF Dynamic Susceptibility Contrast MRI Predicts Infarct Growth

Authors: Mira M. Liu, Niloufar Saadat, Steven P. Roth, Marek A. Niekrasz, Mihai Giurcanu, Timothy J. Carroll, Gregory A. Christoforidis

Abstract: In ischemic stroke, leptomeningeal collaterals can provide compensatory blood flow to tissue at risk despite an occlusion, and impact treatment response and infarct growth. The purpose of this work is to test the hypothesis that local perfusion with an appropriate Local Arterial Input Function (AIF) is needed to quantify the degree of collateral blood supply in tissue distal to an occlusion. Seven… ▽ More In ischemic stroke, leptomeningeal collaterals can provide compensatory blood flow to tissue at risk despite an occlusion, and impact treatment response and infarct growth. The purpose of this work is to test the hypothesis that local perfusion with an appropriate Local Arterial Input Function (AIF) is needed to quantify the degree of collateral blood supply in tissue distal to an occlusion. Seven experiments were conducted in a pre-clinical middle cerebral artery occlusion model. Magnetic resonance dynamic susceptibility contrast (DSC) was imaged and post-processed as cerebral blood flow maps with both a traditionally chosen single arterial input function (AIF) applied globally to the whole brain (i.e. "Global-AIF") and a novel automatic delay and dispersion corrected AIF (i.e. "Local AIF") that is sensitive to retrograde flow. Pial collateral recruitment was assessed from x-ray angiograms and infarct growth via serially acquired diffusion weighted MRI scans both blinded to DSC. The degree of collateralization at x-ray correlated strongly with quantitative perfusion determined using the Local AIF in the ischemic penumbra (R2=0.81) compared to a traditionally chosen Global-AIF (R2=0.05). Quantitative perfusion calculated using a Local-AIF was negatively correlated (less infarct progression as local perfusion increased) with infarct growth (R2 = 0.79) compared to Global-AIF (R2=0.02). Local DSC perfusion with a Local-AIF is more accurate for assessing tissue status and degree of leptomeningeal collateralization than traditionally chosen AIFs. These findings support use of a Local-AIF in determining quantitative tissue perfusion with collateral supply in occlusive disease. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 13 pages, 5 figures

arXiv:2405.20469 [pdf, other]

Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images

Authors: Krishnakant Singh, Thanush Navaratnam, Jannik Holmer, Simone Schaub-Meyer, Stefan Roth

Abstract: A long-standing challenge in develo** machine learning approaches has been the lack of high-quality labeled data. Recently, models trained with purely synthetic data, here termed synthetic clones, generated using large-scale pre-trained diffusion models have shown promising results in overcoming this annotation bottleneck. As these synthetic clone models progress, they are likely to be deployed… ▽ More A long-standing challenge in develo** machine learning approaches has been the lack of high-quality labeled data. Recently, models trained with purely synthetic data, here termed synthetic clones, generated using large-scale pre-trained diffusion models have shown promising results in overcoming this annotation bottleneck. As these synthetic clone models progress, they are likely to be deployed in challenging real-world settings, yet their suitability remains understudied. Our work addresses this gap by providing the first benchmark for three classes of synthetic clone models, namely supervised, self-supervised, and multi-modal ones, across a range of robustness measures. We show that existing synthetic self-supervised and multi-modal clones are comparable to or outperform state-of-the-art real-image baselines for a range of robustness metrics - shape bias, background bias, calibration, etc. However, we also find that synthetic clones are much more susceptible to adversarial and real-world noise than models trained with real data. To address this, we find that combining both real and synthetic data further increases the robustness, and that the choice of prompt used for generating synthetic images plays an important part in the robustness of synthetic clones. △ Less

Submitted 30 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024 Workshop: SyntaGen-Harnessing Generative Models for Synthetic Visual Datasets. Project page at https://synbenchmark.github.io/SynCloneBenchmark Comments: Fix typo in Fig. 1

arXiv:2405.12488 [pdf, other]

First joint oscillation analysis of Super-Kamiokande atmospheric and T2K accelerator neutrino data

Authors: Super-Kamiokande, T2K collaborations, :, S. Abe, K. Abe, N. Akhlaq, R. Akutsu, H. Alarakia-Charles, A. Ali, Y. I. Alj Hakim, S. Alonso Monsalve, S. Amanai, C. Andreopoulos, L. H. V. Anthony, M. Antonova, S. Aoki, K. A. Apte, T. Arai, T. Arihara, S. Arimoto, Y. Asada, R. Asaka, Y. Ashida, E. T. Atkin, N. Babu , et al. (524 additional authors not shown)

Abstract: The Super-Kamiokande and T2K collaborations present a joint measurement of neutrino oscillation parameters from their atmospheric and beam neutrino data. It uses a common interaction model for events overlap** in neutrino energy and correlated detector systematic uncertainties between the two datasets, which are found to be compatible. Using 3244.4 days of atmospheric data and a beam exposure of… ▽ More The Super-Kamiokande and T2K collaborations present a joint measurement of neutrino oscillation parameters from their atmospheric and beam neutrino data. It uses a common interaction model for events overlap** in neutrino energy and correlated detector systematic uncertainties between the two datasets, which are found to be compatible. Using 3244.4 days of atmospheric data and a beam exposure of $19.7(16.3) \times 10^{20}$ protons on target in (anti)neutrino mode, the analysis finds a 1.9$σ$ exclusion of CP-conservation (defined as $J_{CP}=0$) and a preference for the normal mass ordering. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 10 pages, 3 figures

arXiv:2404.16818 [pdf, other]

Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals

Authors: Oliver Hahn, Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

Abstract: Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global categories within an image corpus without any form of annotation. Building upon recent advances in self-supervised representation learning, we focus on how to leverage these large pre-trained models for the downstream task of unsupervised segmentation. We present Pri… ▽ More Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global categories within an image corpus without any form of annotation. Building upon recent advances in self-supervised representation learning, we focus on how to leverage these large pre-trained models for the downstream task of unsupervised segmentation. We present PriMaPs - Principal Mask Proposals - decomposing images into semantically meaningful masks based on their feature representation. This allows us to realize unsupervised semantic segmentation by fitting class prototypes to PriMaPs with a stochastic expectation-maximization algorithm, PriMaPs-EM. Despite its conceptual simplicity, PriMaPs-EM leads to competitive results across various pre-trained backbone models, including DINO and DINOv2, and across datasets, such as Cityscapes, COCO-Stuff, and Potsdam-3. Importantly, PriMaPs-EM is able to boost results when applied orthogonally to current state-of-the-art unsupervised semantic segmentation pipelines. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Code: https://github.com/visinf/primaps

arXiv:2404.12330 [pdf, other]

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Authors: Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

Abstract: Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and requir… ▽ More Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong compression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

arXiv:2402.13773 [pdf, other]

Spatial-Domain Wireless Jamming with Reconfigurable Intelligent Surfaces

Authors: Philipp Mackensen, Paul Staat, Stefan Roth, Aydin Sezgin, Christof Paar, Veelasha Moonsamy

Abstract: Today, we rely heavily on the constant availability of wireless communication systems. As a result, wireless jamming continues to prevail as an imminent threat: Attackers can create deliberate radio interference to overshadow desired signals, leading to denial of service. Although the broadcast nature of radio signal propagation makes such an attack possible in the first place, it likewise poses a… ▽ More Today, we rely heavily on the constant availability of wireless communication systems. As a result, wireless jamming continues to prevail as an imminent threat: Attackers can create deliberate radio interference to overshadow desired signals, leading to denial of service. Although the broadcast nature of radio signal propagation makes such an attack possible in the first place, it likewise poses a challenge for the attacker, preventing precise targeting of single devices. In particular, the jamming signal will likely not only reach the victim receiver but also other neighboring devices. In this work, we introduce spatial control of wireless jamming signals, granting a new degree of freedom to leverage for jamming attacks. Our novel strategy employs an environment-adaptive reconfigurable intelligent surface (RIS), exploiting multipath signal propagation to spatially focus jamming signals on particular victim devices. We investigate this effect through extensive experimentation and show that our approach can disable the wireless communication of a victim device while leaving neighbouring devices unaffected. In particular, we demonstrate complete denial-of-service of a Wi-Fi device while a second device located at a distance as close as 5 mm remains unaffected, sustaining wireless communication at a data rate of 60 Mbit/s. We also show that the attacker can change the attack target on-the-fly, dynamically selecting the device to be jammed. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2401.15154 [pdf, other]

Enhancing the Secrecy Rate with Direction-range Focusing with FDA and RIS

Authors: Chu Li, Stefan Roth, Aydin Sezgin

Abstract: One of the great potentials to improve the confidentiality in mmWave/THz at the physical layer of technical communication, measured by the secrecy rate, lies in the use of reconfigurable intelligent surfaces (RISs). However, an important open problem arises when the eavesdropper is aligned with the legitimate user or in proximity to the RIS or legitimate user. The limitation comes, on one hand, fr… ▽ More One of the great potentials to improve the confidentiality in mmWave/THz at the physical layer of technical communication, measured by the secrecy rate, lies in the use of reconfigurable intelligent surfaces (RISs). However, an important open problem arises when the eavesdropper is aligned with the legitimate user or in proximity to the RIS or legitimate user. The limitation comes, on one hand, from the high directional gain caused by the dominant line-of-sight (LOS) path in high-frequency transmission, and, on the other hand, from the high energy leakage in the proximity of the RIS and the legitimate user. To address these issues, we employ the concept of frequency diverse arrays (FDA) at the base station (BS) associated with random inverted transmit beamforming and reflective element subset selection (RIBES). More specifically, we consider a passive eavesdropper with unknown location, and design the transmit beamforming and RIS configuration based on the channel information of the legitimate user only. In this context, the secrecy rate with the proposed transmission technique is evaluated in the case of deterministic eavesdropper channel, demonstrating that we can ensure a secure transmission regarding both direction and range. Furthermore, assuming no prior information about the eavesdropper, we describe the wiretap region and derive the worst-case secrecy rate in closed form. The latter is further optimized by determining the optimal subset sizes of the transmit antennas and reflective elements. Simulations verify the correctness of the closed-form expressions and demonstrate that we can effectively improve the secrecy rate, especially when the eavesdropper is close to the RIS or the legitimate user. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2312.14791 [pdf, other]

EMF-Constrained Artificial Noise for Secrecy Rates with Stochastic Eavesdropper Channels

Authors: Stefan Roth, Aydin Sezgin

Abstract: An information-theoretic confidential communication is achievable if the eavesdropper has a degraded channel compared to the legitimate receiver. In wireless channels, beamforming and artificial noise can enable such confidentiality. However, only distribution knowledge of the eavesdropper channels can be assumed. Moreover, the transmission of artificial noise can lead to an increased electromagne… ▽ More An information-theoretic confidential communication is achievable if the eavesdropper has a degraded channel compared to the legitimate receiver. In wireless channels, beamforming and artificial noise can enable such confidentiality. However, only distribution knowledge of the eavesdropper channels can be assumed. Moreover, the transmission of artificial noise can lead to an increased electromagnetic field (EMF) exposure, which depends on the considered location and can thus also be seen as a random variable. Hence, we optimize the $\varepsilon$-outage secrecy rate under a $δ$-outage exposure constraint in a setup, where the base station (BS) is communicating to a user equipment (UE), while a single-antenna eavesdropper with Rayleigh distributed channels is present. Therefore, we calculate the secrecy outage probability (SOP) in closed-form. Based on this, we convexify the optimization problem and optimize the $\varepsilon$-outage secrecy rate iteratively. Numerical results show that for a moderate exposure constraint, artificial noise from the BS has a relatively large impact due to beamforming, while for a strict exposure constraint artificial noise from the UE is more important. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2311.08602 [pdf, other]

doi 10.3390/aerospace10110960

Data downloaded via parachute from a NASA super-pressure balloon

Authors: Ellen L. Sirks, Richard Massey, Ajay S. Gill, Jason Anderson, Steven J. Benton, Anthony M. Brown, Paul Clark, Joshua English, Spencer W. Everett, Aurelien A. Fraisse, Hugo Franco, John W. Hartley, David Harvey, Bradley Holder, Andrew Hunter, Eric M. Huff, Andrew Hynous, Mathilde Jauzac, William C. Jones, Nikky Joyce, Duncan Kennedy, David Lagattuta, Jason S. -Y. Leung, Lun Li, Stephen Lishman , et al. (18 additional authors not shown)

Abstract: In April to May 2023, the superBIT telescope was lifted to the Earth's stratosphere by a helium-filled super-pressure balloon, to acquire astronomical imaging from above (99.5% of) the Earth's atmosphere. It was launched from New Zealand then, for 40 days, circumnavigated the globe five times at a latitude 40 to 50 degrees South. Attached to the telescope were four 'DRS' (Data Recovery System) cap… ▽ More In April to May 2023, the superBIT telescope was lifted to the Earth's stratosphere by a helium-filled super-pressure balloon, to acquire astronomical imaging from above (99.5% of) the Earth's atmosphere. It was launched from New Zealand then, for 40 days, circumnavigated the globe five times at a latitude 40 to 50 degrees South. Attached to the telescope were four 'DRS' (Data Recovery System) capsules containing 5 TB solid state data storage, plus a GNSS receiver, Iridium transmitter, and parachute. Data from the telescope were copied to these, and two were dropped over Argentina. They drifted 61 km horizontally while they descended 32 km, but we predicted their descent vectors within 2.4 km: in this location, the discrepancy appears irreducible below 2 km because of high speed, gusty winds and local topography. The capsules then reported their own locations to within a few metres. We recovered the capsules and successfully retrieved all of superBIT's data - despite the telescope itself being later destroyed on landing. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 12 pages

Journal ref: Aerospace 2023, 10, 960

arXiv:2311.05427 [pdf, other]

Residual Entropy as a Diagnostic and Stop** Metric for CLEAN

Authors: D. C. Homan, J. S. Roth, A. B. Pushkarev

Abstract: We propose the use of entropy, measured from the spatial and flux distribution of pixels in the residual image, as a potential diagnostic and stop** metric for the CLEAN algorithm. Despite its broad success as the standard deconvolution approach in radio interferometry, finding the optimum stop** point for the iterative CLEAN algorithm is still a challenge. We show that the entropy of the resi… ▽ More We propose the use of entropy, measured from the spatial and flux distribution of pixels in the residual image, as a potential diagnostic and stop** metric for the CLEAN algorithm. Despite its broad success as the standard deconvolution approach in radio interferometry, finding the optimum stop** point for the iterative CLEAN algorithm is still a challenge. We show that the entropy of the residual image, measured during the final stages of CLEAN, can be computed without prior knowledge of the source structure or expected noise levels, and that finding the point of maximum entropy as a measure of randomness in the residual image serves as a robust stop** criterion. We also find that, when compared to the expected thermal noise in the image, the maximum entropy of the residuals is a useful diagnostic that can reveal the presence of data editing, calibration, or deconvolution issues that may limit the fidelity of the final CLEAN map. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 10 pages, 6 figures, Accepted for Publication in AJ

arXiv:2310.07706 [pdf, other]

Pixel State Value Network for Combined Prediction and Planning in Interactive Environments

Authors: Sascha Rosbach, Stefan M. Leupold, Simon Großjohann, Stefan Roth

Abstract: Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules.… ▽ More Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules. This work proposes a deep learning methodology to combine prediction and planning. A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences. The sequences represent explicit motion predictions, mainly used to train context understanding, and pixel state values suitable for planning encoding kinematic reachability, object dynamics, safety, and driving comfort. The model can be trained offline on target images rendered by a sampling-based model-predictive planner, leveraging real-world driving data. Our results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2308.16606 [pdf, ps, other]

doi 10.1103/PhysRevD.108.092009

Measurements of the $ν_μ$ and $\barν_μ$-induced Coherent Charged Pion Production Cross Sections on $^{12}C$ by the T2K experiment

Authors: K. Abe, N. Akhlaq, R. Akutsu, A. Ali, S. Alonso Monsalve, C. Alt, C. Andreopoulos, M. Antonova, S. Aoki, T. Arihara, Y. Asada, Y. Ashida, E. T. Atkin, M. Barbi, G. J. Barker, G. Barr, D. Barrow, M. Batkiewicz-Kwasniak, V. Berardi, L. Berns, S. Bhadra, A. Blanchet, A. Blondel, S. Bolognesi, T. Bonus , et al. (359 additional authors not shown)

Abstract: We report an updated measurement of the $ν_μ$-induced, and the first measurement of the $\barν_μ$-induced coherent charged pion production cross section on $^{12}C$ nuclei in the T2K experiment. This is measured in a restricted region of the final-state phase space for which $p_{μ,π} > 0.2$ GeV, $\cos(θ_μ) > 0.8$ and $\cos(θ_π) > 0.6$, and at a mean (anti)neutrino energy of 0.85 GeV using the T2K… ▽ More We report an updated measurement of the $ν_μ$-induced, and the first measurement of the $\barν_μ$-induced coherent charged pion production cross section on $^{12}C$ nuclei in the T2K experiment. This is measured in a restricted region of the final-state phase space for which $p_{μ,π} > 0.2$ GeV, $\cos(θ_μ) > 0.8$ and $\cos(θ_π) > 0.6$, and at a mean (anti)neutrino energy of 0.85 GeV using the T2K near detector. The measured $ν_μ$ CC coherent pion production flux-averaged cross section on $^{12}C$ is $(2.98 \pm 0.37 (stat.) \pm 0.31 (syst.) \substack{ +0.49 \\ -0.00 } \mathrm{ (Q^2\,model)}) \times 10^{-40}~\mathrm{cm}^{2}$. The new measurement of the $\barν_μ$-induced cross section on $^{12}{C}$ is $(3.05 \pm 0.71 (stat.) \pm 0.39 (syst.) \substack{ +0.74 \\ -0.00 } \mathrm{(Q^2\,model)}) \times 10^{-40}~\mathrm{cm}^{2}$. The results are compatible with both the NEUT 5.4.0 Berger-Sehgal (2009) and GENIE 2.8.0 Rein-Sehgal (2007) model predictions. △ Less

Submitted 14 October, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Journal ref: Phys.Rev.D 108 (2023) 9, 092009

arXiv:2308.09472 [pdf, other]

Vision Relation Transformer for Unbiased Scene Graph Generation

Authors: Gopika Sudhakaran, Devendra Singh Dhami, Kristian Kersting, Stefan Roth

Abstract: Recent years have seen a growing interest in Scene Graph Generation (SGG), a comprehensive visual scene understanding task that aims to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone. Unfortunately, current SGG methods suffer from an information loss regarding the entities local-level cues during the relation encoding pro… ▽ More Recent years have seen a growing interest in Scene Graph Generation (SGG), a comprehensive visual scene understanding task that aims to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone. Unfortunately, current SGG methods suffer from an information loss regarding the entities local-level cues during the relation encoding process. To mitigate this, we introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder. We further observe that many existing SGG methods claim to be unbiased, but are still biased towards either head or tail classes. To overcome this bias, we introduce a Mutually Exclusive ExperT (MEET) learning strategy that captures important relation features without bias towards head or tail classes. Experimental results on the VG and GQA datasets demonstrate that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted for publication in ICCV 2023

arXiv:2308.06248 [pdf, other]

FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods

Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth

Abstract: The field of explainable artificial intelligence (XAI) aims to uncover the inner workings of complex deep neural models. While being crucial for safety-critical domains, XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem. We address this challenge by proposing a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation… ▽ More The field of explainable artificial intelligence (XAI) aims to uncover the inner workings of complex deep neural models. While being crucial for safety-critical domains, XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem. We address this challenge by proposing a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation protocols. Our dataset allows performing semantically meaningful image interventions, e.g., removing individual object parts, which has three important implications. First, it enables analyzing explanations on a part level, which is closer to human comprehension than existing methods that evaluate on a pixel level. Second, by comparing the model output for inputs with removed parts, we can estimate ground-truth part importances that should be reflected in the explanations. Third, by map** individual explanations into a common space of part importances, we can analyze a variety of different explanation types in a single common framework. Using our tools, we report results for 24 different combinations of neural models and XAI methods, demonstrating the strengths and weaknesses of the assessed methods in a fully automatic and systematic manner. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: Accepted at ICCV 2023. Code: https://github.com/visinf/funnybirds

arXiv:2305.09916 [pdf, other]

Updated T2K measurements of muon neutrino and antineutrino disappearance using 3.6 $\times$ 10$^{21}$ protons on target

Authors: K. Abe, N. Akhlaq, R. Akutsu, H. Alarakia-Charles, A. Ali, Y. I. Alj Hakim, S. Alonso Monsalve, C. Alt, C. Andreopoulos, M. Antonova, S. Aoki, T. Arihara, Y. Asada, Y. Ashida, E. T. Atkin, M. Barbi, G. J. Barker, G. Barr, D. Barrow, M. Batkiewicz-Kwasniak, F. Bench, V. Berardi, L. Berns, S. Bhadra, A. Blanchet , et al. (385 additional authors not shown)

Abstract: Muon neutrino and antineutrino disappearance probabilities are identical in the standard three-flavor neutrino oscillation framework, but CPT violation and non-standard interactions can violate this symmetry. In this work we report the measurements of $\sin^{2} θ_{23}$ and $Δm_{32}^2$ independently for neutrinos and antineutrinos. The aforementioned symmetry violation would manifest as an inconsis… ▽ More Muon neutrino and antineutrino disappearance probabilities are identical in the standard three-flavor neutrino oscillation framework, but CPT violation and non-standard interactions can violate this symmetry. In this work we report the measurements of $\sin^{2} θ_{23}$ and $Δm_{32}^2$ independently for neutrinos and antineutrinos. The aforementioned symmetry violation would manifest as an inconsistency in the neutrino and antineutrino oscillation parameters. The analysis discussed here uses a total of 1.97$\times$10$^{21}$ and 1.63$\times$10$^{21}$ protons on target taken with a neutrino and antineutrino beam respectively, and benefits from improved flux and cross-section models, new near detector samples and more than double the data reducing the overall uncertainty of the result. No significant deviation is observed, consistent with the standard neutrino oscillation picture. △ Less

Submitted 16 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.09504 [pdf, other]

Content-Adaptive Downsampling in Convolutional Neural Networks

Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth

Abstract: Many convolutional neural networks (CNNs) rely on progressive downsampling of their feature maps to increase the network's receptive field and decrease computational cost. However, this comes at the price of losing granularity in the feature maps, limiting the ability to correctly understand images or recover fine detail in dense prediction tasks. To address this, common practice is to replace the… ▽ More Many convolutional neural networks (CNNs) rely on progressive downsampling of their feature maps to increase the network's receptive field and decrease computational cost. However, this comes at the price of losing granularity in the feature maps, limiting the ability to correctly understand images or recover fine detail in dense prediction tasks. To address this, common practice is to replace the last few downsampling operations in a CNN with dilated convolutions, allowing to retain the feature map resolution without reducing the receptive field, albeit increasing the computational cost. This allows to trade off predictive performance against cost, depending on the output feature resolution. By either regularly downsampling or not downsampling the entire feature map, existing work implicitly treats all regions of the input image and subsequent feature maps as equally important, which generally does not hold. We propose an adaptive downsampling scheme that generalizes the above idea by allowing to process informative regions at a higher resolution than less informative ones. In a variety of experiments, we demonstrate the versatility of our adaptive downsampling strategy and empirically show that it improves the cost-accuracy trade-off of various established CNNs. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted at CVPR 2023 Workshop on Efficient Deep Learning for Computer Vision (ECV). Code: https://github.com/visinf/cad

arXiv:2304.01888 [pdf]

Quantitative perfusion and water transport time model from multi b-value diffusion magnetic resonance imaging validated against neutron capture microspheres

Authors: M. Liu, N. Saadat, Y. Jeong, S. Roth, M. Niekrasz, M. Giurcanu, T. Carroll, G. Christoforidis

Abstract: Intravoxel Incoherent Motion (IVIM) is a non-contrast magnetic resonance imaging diffusion-based scan that uses a multitude of b-values to measure various speeds of molecular perfusion and diffusion, sidestep** inaccuracy of arterial input functions or bolus kinetics in quantitative imaging. We test a new method of IVIM quantification and compare our values to reference standard neutron capture… ▽ More Intravoxel Incoherent Motion (IVIM) is a non-contrast magnetic resonance imaging diffusion-based scan that uses a multitude of b-values to measure various speeds of molecular perfusion and diffusion, sidestep** inaccuracy of arterial input functions or bolus kinetics in quantitative imaging. We test a new method of IVIM quantification and compare our values to reference standard neutron capture microspheres across normocapnia, CO2 induced hypercapnia, and middle cerebral artery occlusion in a controlled animal model. Perfusion quantification in ml/100g/min compared to microsphere perfusion uses the 3D gaussian probability distribution and defined water transport time as when 50% of the molecules remain in the tissue of interest. Perfusion, water transport time, and infarct volume was compared to reference standards. Simulations were studied to suppress non-specific cerebrospinal fluid (CSF). Linear regression analysis of quantitative perfusion returned correlation (slope = .55, intercept = 52.5, $R^2$= .64). Linear regression for water transport time asymmetry in infarcted tissue was excellent (slope = .59, intercept = .3, $R^2$ = .93). Strong linear agreement also was found for infarct volume (slope = 1.01, $R^2$= .79). Simulation of CSF suppression via inversion recovery returned blood signal reduced by 82% from combined T1 and T2 effects. Intra-physiologic state comparison of perfusion shows potential partial volume effects which require further study especially in disease states. The accuracy and sensitivity of IVIM provides evidence that observed signal changes reflect cytotoxic edema and tissue perfusion. Partial volume contamination of CSF may be better removed during post-processing rather than with inversion recovery to avoid artificial loss of blood signal. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 7 pages, 1 table, 6 figures, 3 pages appendix

arXiv:2303.14228 [pdf, other]

doi 10.1103/PhysRevD.108.112009

First measurement of muon neutrino charged-current interactions on hydrocarbon without pions in the final state using multiple detectors with correlated energy spectra at T2K

Authors: K. Abe, N. Akhlaq, R. Akutsu, H. Alarakia-Charles, A. Ali, Y. I. Alj Hakim, S. Alonso Monsalve, C. Alt, C. Andreopoulos, M. Antonova, S. Aoki, T. Arihara, Y. Asada, Y. Ashida, E. T. Atkin, M. Barbi, G. J. Barker, G. Barr, D. Barrow, M. Batkiewicz-Kwasniak, F. Bench, V. Berardi, L. Berns, S. Bhadra, A. Blanchet , et al. (380 additional authors not shown)

Abstract: This paper reports the first measurement of muon neutrino charged-current interactions without pions in the final state using multiple detectors with correlated energy spectra at T2K. The data was collected on hydrocarbon targets using the off-axis T2K near detector (ND280) and the on-axis T2K near detector (INGRID) with neutrino energy spectra peaked at 0.6 GeV and 1.1 GeV respectively. The corre… ▽ More This paper reports the first measurement of muon neutrino charged-current interactions without pions in the final state using multiple detectors with correlated energy spectra at T2K. The data was collected on hydrocarbon targets using the off-axis T2K near detector (ND280) and the on-axis T2K near detector (INGRID) with neutrino energy spectra peaked at 0.6 GeV and 1.1 GeV respectively. The correlated neutrino flux presents an opportunity to reduce the impact of the flux uncertainty and to study the energy dependence of neutrino interactions. The extracted double-differential cross sections are compared to several Monte Carlo neutrino-nucleus interaction event generators showing the agreement between both detectors individually and with the correlated result. △ Less

Submitted 18 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Updated discussion in Sec. V-A; Updated author list

arXiv:2303.04481 [pdf, other]

Characterization of Charge Spreading and Gain of Encapsulated Resistive Micromegas Detectors for the Upgrade of the T2K Near Detector Time Projection Chambers

Authors: D. Attie, O. Ballester, M. Batkiewicz-Kwasnia, P. Billoir, A. Blondel, S. Bolognesi, R. Boullon, D. Calvet, M. P. Casado, M. G. Catanesi, M. Cicerchia, G. Cogo, P. Colas, G. Collazuol, D. D Ago, C. Dalmazzon, T. Daret, A. Delbart, A. De Lorenzis, R. de Oliveira, S. Dolan, K. Dygnarowiczi, J. Dumarchez, S. Emery-Schren, A. Ershova , et al. (70 additional authors not shown)

Abstract: An upgrade of the near detector of the T2K long baseline neutrino oscillation experiment is currently being conducted. This upgrade will include two new Time Projection Chambers, each equipped with 16 charge readout resistive Micromegas modules. A procedure to validate the performance of the detectors at different stages of production has been developed and implemented to ensure a proper and relia… ▽ More An upgrade of the near detector of the T2K long baseline neutrino oscillation experiment is currently being conducted. This upgrade will include two new Time Projection Chambers, each equipped with 16 charge readout resistive Micromegas modules. A procedure to validate the performance of the detectors at different stages of production has been developed and implemented to ensure a proper and reliable operation of the detectors once installed. A dedicated X-ray test bench is used to characterize the detectors by scanning each pad individually and to precisely measure the uniformity of the gain and the deposited energy resolution over the pad plane. An energy resolution of about 10% is obtained. A detailed physical model has been developed to describe the charge dispersion phenomena in the resistive Micromegas anode. The detailed physical description includes initial ionization, electron drift, diffusion effects and the readout electronics effects. The model provides an excellent characterization of the charge spreading of the experimental measurements and allowed the simultaneous extraction of gain and RC information of the modules. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.03222 [pdf, other]

doi 10.1140/epjc/s10052-023-11819-x

Measurements of neutrino oscillation parameters from the T2K experiment using $3.6\times10^{21}$ protons on target

Authors: The T2K Collaboration, K. Abe, N. Akhlaq, R. Akutsu, A. Ali, S. Alonso Monsalve, C. Alt, C. Andreopoulos, M. Antonova, S. Aoki, T. Arihara, Y. Asada, Y. Ashida, E. T. Atkin, M. Barbi, G. J. Barker, G. Barr, D. Barrow, M. Batkiewicz-Kwasniak, F. Bench, V. Berardi, L. Berns, S. Bhadra, A. Blanchet, A. Blondel , et al. (376 additional authors not shown)

Abstract: The T2K experiment presents new measurements of neutrino oscillation parameters using $19.7(16.3)\times10^{20}$ protons on target (POT) in (anti-)neutrino mode at the far detector (FD). Compared to the previous analysis, an additional $4.7\times10^{20}$ POT neutrino data was collected at the FD. Significant improvements were made to the analysis methodology, with the near-detector analysis introdu… ▽ More The T2K experiment presents new measurements of neutrino oscillation parameters using $19.7(16.3)\times10^{20}$ protons on target (POT) in (anti-)neutrino mode at the far detector (FD). Compared to the previous analysis, an additional $4.7\times10^{20}$ POT neutrino data was collected at the FD. Significant improvements were made to the analysis methodology, with the near-detector analysis introducing new selections and using more than double the data. Additionally, this is the first T2K oscillation analysis to use NA61/SHINE data on a replica of the T2K target to tune the neutrino flux model, and the neutrino interaction model was improved to include new nuclear effects and calculations. Frequentist and Bayesian analyses are presented, including results on $\sin^2θ_{13}$ and the impact of priors on the $δ_\mathrm{CP}$ measurement. Both analyses prefer the normal mass ordering and upper octant of $\sin^2θ_{23}$ with a nearly maximally CP-violating phase. Assuming the normal ordering and using the constraint on $\sin^2θ_{13}$ from reactors, $\sin^2θ_{23}=0.561^{+0.021}_{-0.032}$ using Feldman--Cousins corrected intervals, and $Δm^2_{32}=2.494_{-0.058}^{+0.041}\times10^{-3}~\mathrm{eV^2}$ using constant $Δχ^{2}$ intervals. The CP-violating phase is constrained to $δ_\mathrm{CP}=-1.97_{-0.70}^{+0.97}$ using Feldman--Cousins corrected intervals, and $δ_\mathrm{CP}=0,π$ is excluded at more than 90% confidence level. A Jarlskog invariant of zero is excluded at more than $2σ$ credible level using a flat prior in $δ_\mathrm{CP}$, and just below $2σ$ using a flat prior in $\sinδ_\mathrm{CP}$. When the external constraint on $\sin^2θ_{13}$ is removed, $\sin^2θ_{13}=28.0^{+2.8}_{-6.5}\times10^{-3}$, in agreement with measurements from reactor experiments. These results are consistent with previous T2K analyses. △ Less

Submitted 10 September, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Journal ref: Eur. Phys. J. C 83, 782 (2023)

arXiv:2302.08739 [pdf]

Significantly increased magnetic anisotropy in Co nano-columnar multilayer structure via a unique sequential oblique-normal deposition approach

Authors: Arun Singh Dev, Sharanjeet Singh, Anup Kumar Bera, Pooja Gupta, Velaga Srihari, Pallavi Pandit, Matthias Schwartzkopf, Stephan V. Roth, Dileep Kumar

Abstract: Oblique/normal sequential deposition technique is used to create Co based unique multilayer structure [Co-oblique(4.4nm)/Co-normal (4.2 nm)]x10, where each Co-oblique layer is deposited at an oblique angle of 75deg, to induce large in-plane uniaxial magnetic anisotropy (UMA). Compared to the previous ripple, stress and oblique angle deposition (OAD) related studies on Cobalt in literature, one-ord… ▽ More Oblique/normal sequential deposition technique is used to create Co based unique multilayer structure [Co-oblique(4.4nm)/Co-normal (4.2 nm)]x10, where each Co-oblique layer is deposited at an oblique angle of 75deg, to induce large in-plane uniaxial magnetic anisotropy (UMA). Compared to the previous ripple, stress and oblique angle deposition (OAD) related studies on Cobalt in literature, one-order higher UMA with the easy axis of magnetization along the projection of the tilted nano-columns in the multilayer plane is observed. The multilayer retains magnetic anisotropy even after annealing at 450C. The in-plane UMA in this multilayer is found to be the combination of shape, and magneto-crystalline anisotropy (MCA) confirmed by the temperature-dependent grazing incidence small angle X-ray scattering (GISAXS), in situ reflection high energy electron diffraction (RHEED) and grazing incidence X-ray diffraction (GIXRD) measurements. The crystalline texturing of hcp Co in the multilayer minimizes spin-orbit coupling energy along the column direction, which couples with the shape anisotropy energies and results in preferential orientation of the easy magnetic axis along the projection of the columns in the multilayer plane. Reduction in UMA after annealing is attributed to diffusion/merging of columns and annihilating crystallographic texturing. The obtained one-order high UMA demonstrates the potential application of the unique structure engineering technique, which may have far-reaching advantages in magnetic thin films/multilayers and spintronic devices. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 22 pages, 10 figures

arXiv:2302.03582 [pdf]

Direct Linearly-Polarised Electroluminescence from Perovskite Nanoplatelet Superlattices

Authors: Junzhi Ye, Aobo Ren, Linjie Dai, Tomi Baikie, Renjun Guo, Debapriya Pal, Sebastian Gorgon, Julian E. Heger, Junyang Huang, Yuqi Sun, Rakesh Arul, Gianluca Grimaldi, Kaiwen Zhang, Javad Shamsi, Yi-Teng Huang, Hao Wang, Jiang Wu, A. Femius Koenderink, Laura Torrente Murciano, Matthias Schwartzkopf, Stephen V. Roth, Peter Muller-Buschbaum, Jeremy J. Baumberg, Samuel D. Stranks, Neil C. Greenham , et al. (4 additional authors not shown)

Abstract: Polarised light is critical for a wide range of applications, but is usually generated by filtering unpolarised light, which leads to significant energy losses and requires additional optics. Herein, the direct emission of linearly-polarised light is achieved from light-emitting diodes (LEDs) made of CsPbI3 perovskite nanoplatelet superlattices. Through use of solvents with different vapour pressu… ▽ More Polarised light is critical for a wide range of applications, but is usually generated by filtering unpolarised light, which leads to significant energy losses and requires additional optics. Herein, the direct emission of linearly-polarised light is achieved from light-emitting diodes (LEDs) made of CsPbI3 perovskite nanoplatelet superlattices. Through use of solvents with different vapour pressures, the self-assembly of perovskite nanoplatelets is achieved to enable fine control over the orientation (either face-up or edge-up) and therefore the transition dipole moment. As a result of the highly-uniform alignment of the nanoplatelets, as well as their strong quantum and dielectric confinement, large exciton fine-structure splitting is achieved at the film level, leading to pure-red LEDs exhibiting a high degree of linear polarisation of 74.4% without any photonic structures. This work unveils the possibilities of perovskite nanoplatelets as a highly promising source of linearly-polarised electroluminescence, opening up the development of next-generation 3D displays and optical communications from this highly versatile, solution-processable system. △ Less

Submitted 8 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 26 pages, 5 figures

arXiv:2302.01998 [pdf, ps, other]

Integrated Communication and Control Systems: A Data Significance Perspective

Authors: Stefan Roth, Yasemin Karacora, Christina Chaccour, Aydin Sezgin, Walid Saad

Abstract: The interconnected smart devices and industrial internet of things devices require low-latency communication to fulfill control objectives despite limited resources. In essence, such devices have a time-critical nature but also require a highly accurate data input based on its significance. In this paper, we investigate various coordinated and distributed semantic scheduling schemes with a data si… ▽ More The interconnected smart devices and industrial internet of things devices require low-latency communication to fulfill control objectives despite limited resources. In essence, such devices have a time-critical nature but also require a highly accurate data input based on its significance. In this paper, we investigate various coordinated and distributed semantic scheduling schemes with a data significance perspective. In particular, novel algorithms are proposed to analyze the benefit of such schemes for the significance in terms of estimation accuracy. Then, we derive the bounds of the achievable estimation accuracy. Our numerical results showcase the superiority of semantic scheduling policies that adopt an integrated control and communication strategy. In essence, such policies can reduce the weighted sum of mean squared errors compared to traditional policies. △ Less

Submitted 3 February, 2023; originally announced February 2023.

arXiv:2302.00283 [pdf]

doi 10.1016/j.jmmm.2022.169663

Evolution of interface magnetism in Fe/Alq3 bilayer

Authors: Avinash Ganesh Khanderao, Sonia Kaushik, Arun Singh Dev, V. R. Reddy, Ilya Sergueev, Hans-Christian Wille, Pallavi Pandit, Stephan V Roth, Dileep Kumar

Abstract: Interface magnetism and topological structure of Fe on organic semiconductor film (Alq3) have been studied and compared with Fe film deposited directly on Si (100) substrate. To get information on the diffused Fe layer at the Fe/Alq3 interface, grazing incident nuclear resonance scattering (GINRS) measurements are made depth selective by introducing a 95% enriched thin 57Fe layer at the Interface… ▽ More Interface magnetism and topological structure of Fe on organic semiconductor film (Alq3) have been studied and compared with Fe film deposited directly on Si (100) substrate. To get information on the diffused Fe layer at the Fe/Alq3 interface, grazing incident nuclear resonance scattering (GINRS) measurements are made depth selective by introducing a 95% enriched thin 57Fe layer at the Interface and producing x-ray standing wave within the layered structure. Compared with Fe growth on Si substrate, where film exhibits a hyperfine field value of 32 T (Bulk Fe), a thick Fe- Alq3 interface has been found with reduced electron density and hyperfine fields providing evidence of deep penetration of Fe atoms into Alq3 film. Due to the soft nature of Alq3, Fe moments relax in the film plane. At the same time, Fe on Si has a resultant ~43 deg out-of-plane orientation of Fe moments at the Interface due to the stressed and rough Fe layer near Si. The evolution of magnetism at the Fe-Alq3 Interface is monitored using in-situ magneto-optical Kerr effect (MOKE) during the growth of Fe on the Alq3 surface and small-angle x-ray scattering (SAXS) measurements. It is found that the Fe atom tries to organize into clusters to minimize their surface/interface energy. The origin of the 2.4 nm thick magnetic dead layer at the Interface is attributed to the small Fe clusters of paramagnetic or superparamagnetic nature. The present work provides an understanding of interfacial magnetism at metal-organic interfaces and the topological study using the GI-NRS technique, which is made depth selective to probe magnetism of the diffused ferromagnetic layer, which is otherwise difficult for lab-based techniques. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Journal ref: Journal of Magnetism and Magnetic Materials, 560 (2022) 169663

arXiv:2212.06541 [pdf, other]

doi 10.1016/j.nima.2023.168248

Analysis of test beam data taken with a prototype of TPC with resistive Micromegas for the T2K Near Detector upgrade

Authors: D. Attié, O. Ballester, M. Batkiewicz-Kwasniak, P. Billoir, A. Blanchet, A. Blondel, S. Bolognesi, R. Boullon, D. Calvet, M. P. Casado, M. G. Catanesi, M. Cicerchia, G. Cogo, P. Colas, G. Collazuol, C. Dalmazzone, T. Daret, A. Delbart, A. De Lorenzis, S. Dolan, K. Dygnarowicz, J. Dumarchez, S. Emery-Schrenk, A. Ershova, G. Eurin , et al. (59 additional authors not shown)

Abstract: In this paper we describe the performance of a prototype of the High Angle Time Projection Chambers (HA-TPCs) that are being produced for the Near Detector (ND280) upgrade of the T2K experiment. The two HA-TPCs of ND280 will be instrumented with eight Encapsulated Resistive Anode Micromegas (ERAM) on each endplate, thus constituting in total 32 ERAMs. This innovative technique allows the detection… ▽ More In this paper we describe the performance of a prototype of the High Angle Time Projection Chambers (HA-TPCs) that are being produced for the Near Detector (ND280) upgrade of the T2K experiment. The two HA-TPCs of ND280 will be instrumented with eight Encapsulated Resistive Anode Micromegas (ERAM) on each endplate, thus constituting in total 32 ERAMs. This innovative technique allows the detection of the charge emitted by ionization electrons over several pads, improving the determination of the track position. The TPC prototype has been equipped with the first ERAM module produced for T2K and with the HA-TPC readout electronics chain and it has been exposed to the DESY Test Beam in order to measure spatial and dE/dx resolution. In this paper we characterize the performances of the ERAM and, for the first time, we compare them with a newly developed simulation of the detector response. Spatial resolution better than 800 ${μ\rm m}$ and dE/dx resolution better than 10% are observed for all the incident angles and for all the drift distances of interest. All the main features of the data are correctly reproduced by the simulation and these performances fully fulfill the requirements for the HA-TPCs of T2K. △ Less

Submitted 16 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2211.14005 [pdf, other]

Efficient Feature Extraction for High-resolution Video Frame Interpolation

Authors: Moritz Nottebaum, Stefan Roth, Simone Schaub-Meyer

Abstract: Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are designed. However, when interpolating high-resolution images, e.g. at 4K, the design choices for achieving high accuracy within reasonable memory requirements are limit… ▽ More Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are designed. However, when interpolating high-resolution images, e.g. at 4K, the design choices for achieving high accuracy within reasonable memory requirements are limited. The feature extraction layers help to compress the input and extract relevant information for the latter stages, such as motion estimation. However, these layers are often costly in parameters, computation time, and memory. We show how ideas from dimensionality reduction combined with a lightweight optimization can be used to compress the input representation while kee** the extracted information suitable for frame interpolation. Further, we require neither a pretrained flow network nor a synthesis network, additionally reducing the number of trainable parameters and required memory. When evaluating on three 4K benchmarks, we achieve state-of-the-art image quality among the methods without pretrained flow while having the lowest network complexity and memory requirements overall. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: Accepted to BMVC 2022. Code: https://github.com/visinf/fldr-vfi

arXiv:2211.12209 [pdf, other]

$S^2$-Flow: Joint Semantic and Style Editing of Facial Images

Authors: Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

Abstract: The high-quality images yielded by generative adversarial networks (GANs) have motivated investigations into their application for image editing. However, GANs are often limited in the control they provide for performing specific edits. One of the principal challenges is the entangled latent space of GANs, which is not directly suitable for performing independent and detailed edits. Recent editing… ▽ More The high-quality images yielded by generative adversarial networks (GANs) have motivated investigations into their application for image editing. However, GANs are often limited in the control they provide for performing specific edits. One of the principal challenges is the entangled latent space of GANs, which is not directly suitable for performing independent and detailed edits. Recent editing methods allow for either controlled style edits or controlled semantic edits. In addition, methods that use semantic masks to edit images have difficulty preserving the identity and are unable to perform controlled style edits. We propose a method to disentangle a GAN$\text{'}$s latent space into semantic and style spaces, enabling controlled semantic and style edits for face images independently within the same framework. To achieve this, we design an encoder-decoder based network architecture ($S^2$-Flow), which incorporates two proposed inductive biases. We show the suitability of $S^2$-Flow quantitatively and qualitatively by performing various semantic and style edits. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted to BMVC 2022

arXiv:2211.05797 [pdf, other]

doi 10.1109/ICC45041.2023.10279561

Optimizing the Age of Information in Mixed-Critical Wireless Communication Networks

Authors: Robert-Jeron Reifert, Stefan Roth, Aydin Sezgin

Abstract: Beyond fifth generation wireless communication networks (B5G) are applied in many use-cases, such as industrial control systems, smart public transport, and power grids. Those applications require innovative techniques for timely transmission and increased wireless network capacities. Hence, this paper proposes optimizing the data freshness measured by the age of information (AoI) in dense interne… ▽ More Beyond fifth generation wireless communication networks (B5G) are applied in many use-cases, such as industrial control systems, smart public transport, and power grids. Those applications require innovative techniques for timely transmission and increased wireless network capacities. Hence, this paper proposes optimizing the data freshness measured by the age of information (AoI) in dense internet of things (IoT) sensor-actuator networks. Given different priorities of data-streams, i.e., different sensitivities to outdated information, mixed-criticality is introduced by analyzing different functions of the age, i.e., we consider linear and exponential aging functions. An intricate non-convex optimization problem managing the physical transmission time and packet outage probability is derived. Such problem is tackled using stochastic reformulations, successive convex approximations, and fractional programming, resulting in an efficient iterative algorithm for AoI optimization. Simulation results validate the proposed scheme's performance in terms of AoI, mixed-criticality, and scalability. The proposed non-orthogonal transmission is shown to outperform an orthogonal access scheme in various deployment cases. Results emphasize the potential gains for dense B5G empowered IoT networks in minimizing the AoI. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 6 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Journal ref: ICC 2023 - IEEE International Conference on Communications

arXiv:2208.05991 [pdf, ps, other]

Approximation-based Threshold Optimization from Single Antenna to Massive SIMO Authentication

Authors: Stefan Roth, Aydin Sezgin, Roman Bessel, H. Vincent Poor

Abstract: In a wireless sensor network, data from various sensors are gathered to estimate the system-state of the process system. However, adversaries aim at distorting the system-state estimate, for which they may infiltrate sensors or position additional devices in the environment. To authenticate the received process values, the integrity of the measurements from different sensors can be evaluated joint… ▽ More In a wireless sensor network, data from various sensors are gathered to estimate the system-state of the process system. However, adversaries aim at distorting the system-state estimate, for which they may infiltrate sensors or position additional devices in the environment. To authenticate the received process values, the integrity of the measurements from different sensors can be evaluated jointly with the temporal integrity of channel measurements from each sensor. For this purpose, we design a security protocol, in which Kalman filters are used to predict the system-state and the channel-state values, and the received data are authenticated by a hypothesis test. We theoretically analyze the adversarial success probability and the reliability rate obtained in the hypothesis test in two ways, based on a chi-square approximation and on a Gaussian approximation. The two approximations are exact for small and large data vectors, respectively. The Gaussian approximation is suitable for analyzing massive single-input multiple-output (SIMO) setups. To obtain additional insights, the approximation is further adapted for the case of channel hardening, which occurs in massive SIMO fading channels. As adversaries always look for the weakest point of a system, a time-constant security level is required. To provide such a service, the approximations are used to propose time-varying threshold values for the hypothesis test, which approximately attain a constant security level. Numerical results show that a constant security level can only be achieved by a time-varying threshold choice, while a constant threshold value leads to a time-varying security level. △ Less

Submitted 11 August, 2022; originally announced August 2022.

arXiv:2208.05788 [pdf, other]

Semantic Self-adaptation: Enhancing Generalization with a Single Sample

Authors: Sherwin Bahmani, Oliver Hahn, Eduard Zamfir, Nikita Araslanov, Daniel Cremers, Stefan Roth

Abstract: The lack of out-of-domain generalization is a critical weakness of deep networks for semantic segmentation. Previous studies relied on the assumption of a static model, i. e., once the training process is complete, model parameters remain fixed at test time. In this work, we challenge this premise with a self-adaptive approach for semantic segmentation that adjusts the inference process to each in… ▽ More The lack of out-of-domain generalization is a critical weakness of deep networks for semantic segmentation. Previous studies relied on the assumption of a static model, i. e., once the training process is complete, model parameters remain fixed at test time. In this work, we challenge this premise with a self-adaptive approach for semantic segmentation that adjusts the inference process to each input sample. Self-adaptation operates on two levels. First, it fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Second, in Batch Normalization layers, self-adaptation interpolates between the training and the reference distribution derived from a single test sample. Despite both techniques being well known in the literature, their combination sets new state-of-the-art accuracy on synthetic-to-real generalization benchmarks. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time for improving deep network generalization to out-of-domain data. Our code and pre-trained models are available at https://github.com/visinf/self-adaptive. △ Less

Submitted 13 December, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: Published in TMLR (July 2023) | OpenReview: https://openreview.net/forum?id=ILNqQhGbLx | Code: https://github.com/visinf/self-adaptive | Video: https://youtu.be/s4DG65ic0EA

arXiv:2207.12982 [pdf, other]

Scintillator ageing of the T2K near detectors from 2010 to 2021

Authors: The T2K Collaboration, K. Abe, N. Akhlaq, R. Akutsu, A. Ali, C. Alt, C. Andreopoulos, M. Antonova, S. Aoki, T. Arihara, Y. Asada, Y. Ashida, E. T. Atkin, S. Ban, M. Barbi, G. J. Barker, G. Barr, D. Barrow, M. Batkiewicz-Kwasniak, F. Bench, V. Berardi, L. Berns, S. Bhadra, A. Blanchet, A. Blondel , et al. (333 additional authors not shown)

Abstract: The T2K experiment widely uses plastic scintillator as a target for neutrino interactions and an active medium for the measurement of charged particles produced in neutrino interactions at its near detector complex. Over 10 years of operation the measured light yield recorded by the scintillator based subsystems has been observed to degrade by 0.9--2.2\% per year. Extrapolation of the degradation… ▽ More The T2K experiment widely uses plastic scintillator as a target for neutrino interactions and an active medium for the measurement of charged particles produced in neutrino interactions at its near detector complex. Over 10 years of operation the measured light yield recorded by the scintillator based subsystems has been observed to degrade by 0.9--2.2\% per year. Extrapolation of the degradation rate through to 2040 indicates the recorded light yield should remain above the lower threshold used by the current reconstruction algorithms for all subsystems. This will allow the near detectors to continue contributing to important physics measurements during the T2K-II and Hyper-Kamiokande eras. Additionally, work to disentangle the degradation of the plastic scintillator and wavelength shifting fibres shows that the reduction in light yield can be attributed to the ageing of the plastic scintillator. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: 29 pages, 18 figures. Prepared for submission to JINST

arXiv:2205.12160 [pdf, other]

doi 10.1088/1748-0221/17/11/P11027

Double-hit separation and dE/dx resolution of a time projection chamber with GEM readout

Authors: Yumi Aoki, David Attié, Ties Behnke, Alain Bellerive, Oleg Bezshyyko, Deb Bhattacharya Sankar, Purba Bhattacharya, Sudeb Bhattacharya, Yue Chang, Paul Colas, Gilles De Lentdecker, Klaus Dehmelt, Klaus Desch, Ralf Diener, Madhu Dixit, Ulrich Einhaus, Oleksiy Fedorchuk, Ivor Fleck, Keisuke Fujii, Takahiro Fusayasu, Serguei Ganjour, Philippe Gros, Peter Hayman, Katsumasa Ikematsu, Leif Jönsson , et al. (46 additional authors not shown)

Abstract: A time projection chamber (TPC) with micropattern gaseous detector (MPGD) readout is investigated as main tracking device of the International Large Detector (ILD) concept at the planned International Linear Collider (ILC). A prototype TPC equipped with a triple gas electron multiplier (GEM) readout has been built and operated in an electron test beam. The TPC was placed in a 1 T solenoidal field… ▽ More A time projection chamber (TPC) with micropattern gaseous detector (MPGD) readout is investigated as main tracking device of the International Large Detector (ILD) concept at the planned International Linear Collider (ILC). A prototype TPC equipped with a triple gas electron multiplier (GEM) readout has been built and operated in an electron test beam. The TPC was placed in a 1 T solenoidal field at the DESY II Test Beam Facility, which provides an electron beam up to 6 GeV/c. The performance of the readout modules, in particular the spatial point resolution, is determined and compared to earlier tests. New studies are presented with first results on the separation of close-by tracks and the capability of the system to measure the specific energy loss dE/dx. This is complemented by a simulation study on the optimization of the readout granularity to improve particle identification by dE/dx. △ Less

Submitted 25 November, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 29 pages, 30 figures, 6 tables. This is the Accepted Manuscript version of an article accepted for publication in Journal of Instrumentation. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at https://doi.org/10.1088/1748-0221/17/11/P11027

Report number: PUBDB-2022-02594

Journal ref: Journal of Instrumentation, Volume 17, Number 11, P11027 -, November 2022

arXiv:2205.01813 [pdf, other]

Diverse Image Captioning with Grounded Style

Authors: Franz Klein, Shweta Mahajan, Stefan Roth

Abstract: Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments. Such prior work relies on given sentiment identifiers, which are used to express a certain global style in the caption, e.g. positive or negative, however without taking into account the stylistic content of the visua… ▽ More Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments. Such prior work relies on given sentiment identifiers, which are used to express a certain global style in the caption, e.g. positive or negative, however without taking into account the stylistic content of the visual scene. To address this shortcoming, we first analyze the limitations of current stylized captioning datasets and propose COCO attribute-based augmentations to obtain varied stylized captions from COCO annotations. Furthermore, we encode the stylized information in the latent space of a Variational Autoencoder; specifically, we leverage extracted image attributes to explicitly structure its sequential latent space according to different localized style characteristics. Our experiments on the Senticap and COCO datasets show the ability of our approach to generate accurate captions with diversity in styles that are grounded in the image. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: In the 43rd DAGM German Conference on Pattern Recognition (GCPR) 2021

Journal ref: In Proceedings of the German Conference on Pattern Recognition (GCPR), Ed. by C. Bauckhage, J. Gall, and A. G. Schwing, Vol. 13024, Lecture Notes in Computer Science, Springer, 2021, pp. 421-436

arXiv:2204.11878 [pdf, other]

doi 10.1109/TVT.2023.3296977

Comeback Kid: Resilience for Mixed-Critical Wireless Network Resource Management

Authors: Robert-Jeron Reifert, Stefan Roth, Alaa Alameer Ahmad, Aydin Sezgin

Abstract: The future sixth generation (6G) of communication systems is envisioned to provide numerous applications in safety-critical contexts, e.g., driverless traffic, modular industry, and smart cities, which require outstanding performance, high reliability and fault tolerance, as well as autonomy. Ensuring criticality awareness for diverse functional safety applications and providing fault tolerance in… ▽ More The future sixth generation (6G) of communication systems is envisioned to provide numerous applications in safety-critical contexts, e.g., driverless traffic, modular industry, and smart cities, which require outstanding performance, high reliability and fault tolerance, as well as autonomy. Ensuring criticality awareness for diverse functional safety applications and providing fault tolerance in an autonomous manner are essential for future 6G systems. Therefore, this paper proposes jointly employing the concepts of resilience and mixed criticality. In this work, we conduct physical layer resource management in cloud-based networks under the rate-splitting paradigm, which is a promising factor towards achieving high resilience. We recapitulate the concepts individually, outline a joint metric to measure the criticality-aware resilience, and verify its merits in a case study. We, thereby, formulate a non-convex optimization problem, derive an efficient iterative algorithm, propose four resilience mechanisms differing in quality and time of adaption, and conduct extensive numerical simulations. Towards this end, we propose a highly autonomous rate-splitting-enabled physical layer resource management algorithm for future 6G networks respecting mixed-critical quality of service (QoS) levels and providing high levels of resilience. Results emphasize the considerable improvements of incorporating a mixed criticality-aware resilience strategy under channel outages and strict QoS demands. The rate-splitting paradigm is particularly shown to overcome state-of-the-art interference management techniques, and the resilience and throughput adaption over consecutive outage events reveals the proposed schemes contribution towards enabling future 6G networks. △ Less

Submitted 11 June, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: 16 pages, 13 figures. Submitted to IEEE for possible publication

Journal ref: IEEE Transactions on Vehicular Technology, 2023

arXiv:2203.01090 [pdf, other]

doi 10.1016/j.nima.2023.168426

Liquid-organic time projection chamber for detecting low energy antineutrinos

Authors: Thomas Radermacher, Johannes Bosse, Sarah Friedrich, Malte Göttsche, Stefan Roth, Georg Schwefer

Abstract: The MeV region of antineutrino energy is of special interest for physics research and for monitoring nuclear nonproliferation. Whereas liquid scintillation detectors are typically used to detect the Inverse Beta Decay (IBD), it has recently been proposed to detect it with a liquid-organic Time Projection Chamber, which could allow a full reconstruction of the particle tracks of the IBD final state… ▽ More The MeV region of antineutrino energy is of special interest for physics research and for monitoring nuclear nonproliferation. Whereas liquid scintillation detectors are typically used to detect the Inverse Beta Decay (IBD), it has recently been proposed to detect it with a liquid-organic Time Projection Chamber, which could allow a full reconstruction of the particle tracks of the IBD final state. We present the first comprehensive simulation-based study of the expected signatures. Their unequivocal signature could enable a background-minimized detection of electron antineutrinos using information on energy, location and direction of all final state particles. We show that the positron track reflects the antineutrino's vertex. It can also be used to determine the initial neutrino energy. In addition, we investigate the possibility to reconstruct the antineutrino direction on an event-by-event basis by the energy deposition of the neutron-induced proton recoils. Our simulations indicate that this could be a promising approach which should be further studied through experiments with a detector prototype. △ Less

Submitted 26 May, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: Revision for submission to Nuclear Instruments and Methods in Physics Research Section A (NIM A)

arXiv:2203.00840 [pdf, other]

Flood hazard model calibration using multiresolution model output

Authors: Samantha Roth, Ben Seiyon Lee, Sanjib Sharma, Iman Hosseini-Shakib, Klaus Keller, Murali Haran

Abstract: Riverine floods pose a considerable risk to many communities. Improving flood hazard projections has the potential to inform the design and implementation of flood risk management strategies. Current flood hazard projections are uncertain, especially due to uncertain model parameters. Calibration methods use observations to quantify model parameter uncertainty. With limited computational resources… ▽ More Riverine floods pose a considerable risk to many communities. Improving flood hazard projections has the potential to inform the design and implementation of flood risk management strategies. Current flood hazard projections are uncertain, especially due to uncertain model parameters. Calibration methods use observations to quantify model parameter uncertainty. With limited computational resources, researchers typically calibrate models using either relatively few expensive model runs at high spatial resolutions or many cheaper runs at lower spatial resolutions. This leads to an open question: Is it possible to effectively combine information from the high and low resolution model runs? We propose a Bayesian emulation-calibration approach that assimilates model outputs and observations at multiple resolutions. As a case study for a riverine community in Pennsylvania, we demonstrate our approach using the LISFLOOD-FP flood hazard model. The multiresolution approach results in improved parameter inference over the single resolution approach in multiple scenarios. Results vary based on the parameter values and the number of available models runs. Our method is general and can be used to calibrate other high dimensional computer models to improve projections. △ Less

Submitted 1 August, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.07951 [pdf, other]

doi 10.1109/ICCWorkshops53468.2022.9813735

Energy Efficiency in Rate-Splitting Multiple Access with Mixed Criticality

Authors: Robert-Jeron Reifert, Stefan Roth, Alaa Alameer Ahmad, Aydin Sezgin

Abstract: Future sixth generation (6G) wireless communication networks face the need to similarly meet unprecedented quality of service (QoS) demands while also providing a larger energy efficiency (EE) to minimize their carbon footprint. Moreover, due to the diverseness of network participants, mixed criticality QoS levels are assigned to the users of such networks. In this work, with a focus on a cloud-ra… ▽ More Future sixth generation (6G) wireless communication networks face the need to similarly meet unprecedented quality of service (QoS) demands while also providing a larger energy efficiency (EE) to minimize their carbon footprint. Moreover, due to the diverseness of network participants, mixed criticality QoS levels are assigned to the users of such networks. In this work, with a focus on a cloud-radio access network (C-RAN), the fulfillment of desired QoS and minimized transmit power use is optimized jointly within a rate-splitting paradigm. Thereby, the optimization problem is non-convex. Hence, a low-complexity algorithm is proposed based on fractional programming. Numerical results validate that there is a trade-off between the QoS fulfillment and power minimization. Moreover, the energy efficiency of the proposed rate-splitting algorithm is larger than in comparative schemes, especially with mixed criticality. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 7 pages, 6 figures, 1 table. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Journal ref: 2022 IEEE International Conference on Communications Workshops (ICC Workshops)

arXiv:2201.01882 [pdf]

Trust-based Symbolic Motion Planning for Multi-robot Bounding Overwatch

Authors: Huanfei Zheng, Jonathon M. Smereka, Dariusz Mikulski, Stephanie Roth, Yue Wang

Abstract: Multi-robot bounding overwatch requires timely coordination of robot team members. Symbolic motion planning (SMP) can provide provably correct solutions for robot motion planning with high-level temporal logic task requirements. This paper aims to develop a framework for safe and reliable SMP of multi-robot systems (MRS) to satisfy complex bounding overwatch tasks constrained by temporal logics. A… ▽ More Multi-robot bounding overwatch requires timely coordination of robot team members. Symbolic motion planning (SMP) can provide provably correct solutions for robot motion planning with high-level temporal logic task requirements. This paper aims to develop a framework for safe and reliable SMP of multi-robot systems (MRS) to satisfy complex bounding overwatch tasks constrained by temporal logics. A decentralized SMP framework is first presented, which guarantees both correctness and parallel execution of the complex bounding overwatch tasks by the MRS. A computational trust model is then constructed by referring to the traversability and line of sight of robots in the terrain. The trust model predicts the trustworthiness of each robot team's potential behavior in executing a task plan. The most trustworthy task and motion plan is explored with a Dijkstra searching strategy to guarantee the reliability of MRS bounding overwatch. A robot simulation is implemented in ROS Gazebo to demonstrate the effectiveness of the proposed framework. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2112.01967 [pdf, other]

IRShield: A Countermeasure Against Adversarial Physical-Layer Wireless Sensing

Authors: Paul Staat, Simon Mulzer, Stefan Roth, Veelasha Moonsamy, Markus Heinrichs, Rainer Kronberger, Aydin Sezgin, Christof Paar

Abstract: Wireless radio channels are known to contain information about the surrounding propagation environment, which can be extracted using established wireless sensing methods. Thus, today's ubiquitous wireless devices are attractive targets for passive eavesdroppers to launch reconnaissance attacks. In particular, by overhearing standard communication signals, eavesdroppers obtain estimations of wirele… ▽ More Wireless radio channels are known to contain information about the surrounding propagation environment, which can be extracted using established wireless sensing methods. Thus, today's ubiquitous wireless devices are attractive targets for passive eavesdroppers to launch reconnaissance attacks. In particular, by overhearing standard communication signals, eavesdroppers obtain estimations of wireless channels which can give away sensitive information about indoor environments. For instance, by applying simple statistical methods, adversaries can infer human motion from wireless channel observations, allowing to remotely monitor premises of victims. In this work, building on the advent of intelligent reflecting surfaces (IRSs), we propose IRShield as a novel countermeasure against adversarial wireless sensing. IRShield is designed as a plug-and-play privacy-preserving extension to existing wireless networks. At the core of IRShield, we design an IRS configuration algorithm to obfuscate wireless channels. We validate the effectiveness with extensive experimental evaluations. In a state-of-the-art human motion detection attack using off-the-shelf Wi-Fi devices, IRShield lowered detection rates to 5% or less. △ Less

Submitted 7 April, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

arXiv:2111.07668 [pdf, other]

Fast Axiomatic Attribution for Neural Networks

Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth

Abstract: Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce the dependence on unwanted features. However, until now one needed to trade off high-quality attributions, satisfying desirable ax… ▽ More Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce the dependence on unwanted features. However, until now one needed to trade off high-quality attributions, satisfying desirable axioms, against the time required to compute them. This in turn either led to long training times or ineffective attribution priors. In this work, we break this trade-off by considering a special class of efficiently axiomatically attributable DNNs for which an axiomatic feature attribution can be computed with only a single forward/backward pass. We formally prove that nonnegatively homogeneous DNNs, here termed $\mathcal{X}$-DNNs, are efficiently axiomatically attributable and show that they can be effortlessly constructed from a wide range of regular DNNs by simply removing the bias term of each layer. Various experiments demonstrate the advantages of $\mathcal{X}$-DNNs, beating state-of-the-art generic attribution methods on regular DNNs for training with attribution priors. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: To appear at NeurIPS*2021. Project page and code: https://visinf.github.io/fast-axiomatic-attribution

arXiv:2111.06265 [pdf, other]

Dense Unsupervised Learning for Video Segmentation

Authors: Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

Abstract: We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train su… ▽ More We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train such a model results in a degenerate solution. We propose to prevent this with a simple regularisation scheme, accommodating the equivariance property of the segmentation task to similarity transformations. Our training objective admits efficient implementation and exhibits fast training convergence. On established VOS benchmarks, our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: To appear at NeurIPS*2021. Code: https://github.com/visinf/dense-ulearn-vos

arXiv:2111.02069 [pdf, other]

doi 10.1016/j.topol.2022.108035

Spaces where all closed sets are $α$-limit sets

Authors: Jana Hantáková, Samuel Roth, Ľubomír Snoha

Abstract: Metrizable spaces are studied in which every closed set is an $α$-limit set for some continuous map and some point. It is shown that this property is enjoyed by every space containing sufficiently many arcs (formalized in the notion of a space with enough arcs), though such a space need not be arcwise connected. Further it is shown that this property is not preserved by topological sums, products… ▽ More Metrizable spaces are studied in which every closed set is an $α$-limit set for some continuous map and some point. It is shown that this property is enjoyed by every space containing sufficiently many arcs (formalized in the notion of a space with enough arcs), though such a space need not be arcwise connected. Further it is shown that this property is not preserved by topological sums, products and continuous images and quotients. However, positive results do hold for metrizable spaces obtained by those constructions from spaces with enough arcs. △ Less

Submitted 11 February, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: Author Accepted Manuscript. Publication date: 1 April, 2022. Funded from the European Union's Horizon 2020 programme under the Marie Sklodowska-Curie Actions, project acronym "LISEDIDYS", grant agreement No. 883748

MSC Class: Primary: 37B02; Secondary: 37B45; 37B05

Journal ref: Topology and its Applications 310 (2022), Art. No. 108035

arXiv:2110.08787 [pdf, other]

PixelPyramids: Exact Inference Models from Lossless Image Pyramids

Authors: Shweta Mahajan, Stefan Roth

Abstract: Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images. Yet, the sequential ordering on the dimensions makes these models computationally expensive and limits their applicability to low-resolution imagery. In this work, we propose Pixel-Pyramids, a block-autoregressive approach employing… ▽ More Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images. Yet, the sequential ordering on the dimensions makes these models computationally expensive and limits their applicability to low-resolution imagery. In this work, we propose Pixel-Pyramids, a block-autoregressive approach employing a lossless pyramid decomposition with scale-specific representations to encode the joint distribution of image pixels. Crucially, it affords a sparser dependency structure compared to fully autoregressive approaches. Our PixelPyramids yield state-of-the-art results for density estimation on various image datasets, especially for high-resolution data. For CelebA-HQ 1024 x 1024, we observe that the density estimates (in terms of bits/dim) are improved to ~44% of the baseline despite sampling speeds superior even to easily parallelizable flow-based models. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: To appear at ICCV 2021

arXiv:2109.06082 [pdf, other]

xGQA: Cross-Lingual Visual Question Answering

Authors: Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, Iryna Gurevych

Abstract: Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically divers… ▽ More Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically diverse languages, enabling us to detect and explore crucial challenges in cross-lingual visual question answering. We further propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual, and -- vice versa -- multilingual models to become multimodal. Our proposed methods outperform current state-of-the-art multilingual multimodal models (e.g., M3P) in zero-shot cross-lingual settings, but the accuracy remains low across the board; a performance drop of around 38 accuracy points in target languages showcases the difficulty of zero-shot cross-lingual transfer for this task. Our results suggest that simple cross-lingual transfer of multimodal models yields latent multilingual multimodal misalignment, calling for more sophisticated methods for vision and multilingual language modeling. △ Less

Submitted 17 March, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Findings of ACL 2022

arXiv:2109.04422 [pdf, other]

TxT: Crossmodal End-to-End Learning with Transformers

Authors: Jan-Martin O. Steitz, Jonas Pfeiffer, Iryna Gurevych, Stefan Roth

Abstract: Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual r… ▽ More Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual representation is not specifically tuned to the multimodal task at hand. At the same time, while transformer-based object detectors have gained popularity, they have not been employed in today's multimodal pipelines. We address both shortcomings with TxT, a transformer-based crossmodal pipeline that enables fine-tuning both language and visual components on the downstream task in a fully end-to-end manner. We overcome existing limitations of transformer-based detectors for multimodal reasoning regarding the integration of global context and their scalability. Our transformer-based multimodal model achieves considerable gains from end-to-end learning for multimodal question answering. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: To appear at the 43rd DAGM German Conference on Pattern Recognition (GCPR) 2021

arXiv:2107.13695 [pdf, other]

Rigidity and Flexibility of Polynomial Entropy

Authors: Samuel Roth, Zuzana Roth, Ľubomír Snoha

Abstract: We introduce the notion of a one-way horseshoe and show that the polynomial entropy of an interval map is given by one-way horseshoes of iterates of the map, obtaining in such a way an analogue of Misiurewicz's theorem on topological entropy and standard `two-way' horseshoes. Moreover, if the map is of Sharkovskii type 1 then its polynomial entropy can also be computed by what we call chains of es… ▽ More We introduce the notion of a one-way horseshoe and show that the polynomial entropy of an interval map is given by one-way horseshoes of iterates of the map, obtaining in such a way an analogue of Misiurewicz's theorem on topological entropy and standard `two-way' horseshoes. Moreover, if the map is of Sharkovskii type 1 then its polynomial entropy can also be computed by what we call chains of essential intervals. As a consequence we get a rigidity result that if the polynomial entropy of an interval map is finite, then it is an integer. We also describe the possible values of polynomial entropy of maps of all Sharkovskii types. As an application we compute the polynomial entropy of all maps in the logistic family. On the other hand, we show that in the class of all continua the polynomial entropy of continuous maps is very flexible. For every value $α\in [0,\infty]$ there is a homeomorphism on a continuum with polynomial entropy $α$. We discuss also possible values of the polynomial entropy of continuous maps on dendrites. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: 33 pages

MSC Class: Primary 37B40; 37E05; Secondary 37B45; 37E99

arXiv:2106.12634 [pdf, other]

doi 10.1016/j.nima.2021.166109

Characterization of resistive Micromegas detectors for the upgrade of the T2K Near Detector Time Projection Chambers

Authors: D. Attié, M. Batkiewicz-Kwasniak, P. Billoir, A. Blanchet, A. Blondel, S. Bolognesi, D. Calvet, M. G. Catanesi, M. Cicerchia, G. Cogo, P. Colas, G. Collazuol, A. Delbart, J. Dumarchez, S. Emery-Schrenk, M. Feltre, C. Giganti, F. Gramegna, M. Grassi, M. Guigue, P. Hamacher-Baumann, S. Hassani, F. Iacob, C. Jesús-Valls, R. Kurjata , et al. (36 additional authors not shown)

Abstract: The second phase of the T2K experiment is expected to start data taking in autumn 2022. An upgrade of the Near Detector (ND280) is under development and includes the construction of two new Time Projection Chambers called High-Angle TPC (HA-TPC). The two endplates of these TPCs will be paved with eight Micromegas type charge readout modules. The Micromegas detector charge amplification structure u… ▽ More The second phase of the T2K experiment is expected to start data taking in autumn 2022. An upgrade of the Near Detector (ND280) is under development and includes the construction of two new Time Projection Chambers called High-Angle TPC (HA-TPC). The two endplates of these TPCs will be paved with eight Micromegas type charge readout modules. The Micromegas detector charge amplification structure uses a resistive anode to spread the charges over several pads to improve the space point resolution. This innovative technique is combined with the bulk-Micromegas technology to compose the "Encapsulated Resistive Anode Micromegas" detector. A prototype has been designed, built and exposed to an electron beam at the DESY II test beam facility. The data have been used to characterize the charge spreading and to produce a RC map. Spatial resolution better than 600 $μ$m and energy resolution better than 9% are obtained for all incident angles. These performances fulfil the requirements for the upgrade of the ND280 TPC. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 32 pages, 22 figures

arXiv:2105.02216 [pdf, other]

Self-Supervised Multi-Frame Monocular Scene Flow

Authors: Junhwa Hur, Stefan Roth

Abstract: Estimating 3D scene flow from a sequence of monocular images has been gaining increased attention due to the simple, economical capture setup. Owing to the severe ill-posedness of the problem, the accuracy of current methods has been limited, especially that of efficient, real-time approaches. In this paper, we introduce a multi-frame monocular scene flow network based on self-supervised learning,… ▽ More Estimating 3D scene flow from a sequence of monocular images has been gaining increased attention due to the simple, economical capture setup. Owing to the severe ill-posedness of the problem, the accuracy of current methods has been limited, especially that of efficient, real-time approaches. In this paper, we introduce a multi-frame monocular scene flow network based on self-supervised learning, improving the accuracy over previous networks while retaining real-time efficiency. Based on an advanced two-frame baseline with a split-decoder design, we propose (i) a multi-frame model using a triple frame input and convolutional LSTM connections, (ii) an occlusion-aware census loss for better accuracy, and (iii) a gradient detaching strategy to improve training stability. On the KITTI dataset, we observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: To appear at CVPR 2021. Code available: https://github.com/visinf/multi-mono-sf

arXiv:2105.00097 [pdf, other]

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

Authors: Nikita Araslanov, Stefan Roth

Abstract: We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, network ensembles and style transfer. Instead, we employ standard data augmentation techniques $-$ photometric noise, flip** and scaling $-$ and ensure consistency of the semantic pre… ▽ More We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, network ensembles and style transfer. Instead, we employ standard data augmentation techniques $-$ photometric noise, flip** and scaling $-$ and ensure consistency of the semantic predictions across these image transformations. We develop this principle in a lightweight self-supervised framework trained on co-evolving pseudo labels without the need for cumbersome extra training rounds. Simple in training from a practitioner's standpoint, our approach is remarkably effective. We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios. △ Less

Submitted 30 April, 2021; originally announced May 2021.

Comments: To appear at CVPR 2021. Code: https://github.com/visinf/da-sac

Showing 1–50 of 261 results for author: Roth, S