-
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Authors:
Markus Hiller,
Krista A. Ehinger,
Tom Drummond
Abstract:
We present a novel bi-directional Transformer architecture (BiXT) which scales linearly with input size in terms of computational cost and memory consumption, but does not suffer the drop in performance or limitation to only one input modality seen with other efficient Transformer-based approaches. BiXT is inspired by the Perceiver architectures but replaces iterative attention with an efficient b…
▽ More
We present a novel bi-directional Transformer architecture (BiXT) which scales linearly with input size in terms of computational cost and memory consumption, but does not suffer the drop in performance or limitation to only one input modality seen with other efficient Transformer-based approaches. BiXT is inspired by the Perceiver architectures but replaces iterative attention with an efficient bi-directional cross-attention module in which input tokens and latent variables attend to each other simultaneously, leveraging a naturally emerging attention-symmetry between the two. This approach unlocks a key bottleneck experienced by Perceiver-like architectures and enables the processing and interpretation of both semantics ('what') and location ('where') to develop alongside each other over multiple layers -- allowing its direct application to dense and instance-based tasks alike. By combining efficiency with the generality and performance of a full Transformer architecture, BiXT can process longer sequences like point clouds, text or images at higher feature resolutions and achieves competitive performance across a range of tasks like point cloud part segmentation, semantic image segmentation, image classification, hierarchical sequence modeling and document retrieval. Our experiments demonstrate that BiXT models outperform larger competitors by leveraging longer sequences more efficiently on vision tasks like classification and segmentation, and perform on par with full Transformer variants on sequence modeling and document retrieval -- but require $28\%$ fewer FLOPs and are up to $8.4\times$ faster.
△ Less
Submitted 26 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Rethinking Generalization in Few-Shot Classification
Authors:
Markus Hiller,
Rongkai Ma,
Mehrtash Harandi,
Tom Drummond
Abstract:
Single image-level annotations only correctly describe an often small subset of an image's content, particularly when complex real-world scenes are depicted. While this might be acceptable in many classification scenarios, it poses a significant challenge for applications where the set of classes differs significantly between training and test time. In this paper, we take a closer look at the impl…
▽ More
Single image-level annotations only correctly describe an often small subset of an image's content, particularly when complex real-world scenes are depicted. While this might be acceptable in many classification scenarios, it poses a significant challenge for applications where the set of classes differs significantly between training and test time. In this paper, we take a closer look at the implications in the context of $\textit{few-shot learning}$. Splitting the input samples into patches and encoding these via the help of Vision Transformers allows us to establish semantic correspondences between local regions across images and independent of their respective class. The most informative patch embeddings for the task at hand are then determined as a function of the support set via online optimization at inference time, additionally providing visual interpretability of `$\textit{what matters most}$' in the image. We build on recent advances in unsupervised training of networks via masked image modelling to overcome the lack of fine-grained labels and learn the more general statistical structure of the data while avoiding negative image-level annotation influence, $\textit{aka}$ supervision collapse. Experimental results show the competitiveness of our approach, achieving new state-of-the-art results on four popular few-shot classification benchmarks for $5$-shot and $1$-shot scenarios.
△ Less
Submitted 15 October, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
On Enforcing Better Conditioned Meta-Learning for Rapid Few-Shot Adaptation
Authors:
Markus Hiller,
Mehrtash Harandi,
Tom Drummond
Abstract:
Inspired by the concept of preconditioning, we propose a novel method to increase adaptation speed for gradient-based meta-learning methods without incurring extra parameters. We demonstrate that recasting the optimization problem to a non-linear least-squares formulation provides a principled way to actively enforce a $\textit{well-conditioned}$ parameter space for meta-learning models based on t…
▽ More
Inspired by the concept of preconditioning, we propose a novel method to increase adaptation speed for gradient-based meta-learning methods without incurring extra parameters. We demonstrate that recasting the optimization problem to a non-linear least-squares formulation provides a principled way to actively enforce a $\textit{well-conditioned}$ parameter space for meta-learning models based on the concepts of the condition number and local curvature. Our comprehensive evaluations show that the proposed method significantly outperforms its unconstrained counterpart especially during initial adaptation steps, while achieving comparable or better overall results on several few-shot classification tasks -- creating the possibility of dynamically choosing the number of adaptation steps at inference time.
△ Less
Submitted 15 October, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Assessing Group-level Gender Bias in Professional Evaluations: The Case of Medical Student End-of-Shift Feedback
Authors:
Emmy Liu,
Michael Henry Tessler,
Nicole Dubosh,
Katherine Mosher Hiller,
Roger Levy
Abstract:
Although approximately 50% of medical school graduates today are women, female physicians tend to be underrepresented in senior positions, make less money than their male counterparts and receive fewer promotions. There is a growing body of literature demonstrating gender bias in various forms of evaluation in medicine, but this work was mainly conducted by looking for specific words using fixed d…
▽ More
Although approximately 50% of medical school graduates today are women, female physicians tend to be underrepresented in senior positions, make less money than their male counterparts and receive fewer promotions. There is a growing body of literature demonstrating gender bias in various forms of evaluation in medicine, but this work was mainly conducted by looking for specific words using fixed dictionaries such as LIWC and focused on recommendation letters. We use a dataset of written and quantitative assessments of medical student performance on individual shifts of work, collected across multiple institutions, to investigate the extent to which gender bias exists in a day-to-day context for medical students. We investigate differences in the narrative comments given to male and female students by both male or female faculty assessors, using a fine-tuned BERT model. This allows us to examine whether groups are written about in systematically different ways, without relying on hand-crafted wordlists or topic models. We compare these results to results from the traditional LIWC method and find that, although we find no evidence of group-level gender bias in this dataset, terms related to family and children are used more in feedback given to women.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Analysis of Communication Channels Related to Physical Unclonable Functions
Authors:
Georg Maringer,
Marvin Xhemrishi,
Sven Puchinger,
Kathrin Garb,
Hedongliang Liu,
Thomas Jerkovits,
Ludwig Kürzinger,
Matthias Hiller,
Antonia Wachter-Zeh
Abstract:
Cryptographic algorithms rely on the secrecy of their corresponding keys. On embedded systems with standard CMOS chips, where secure permanent memory such as flash is not available as a key storage, the secret key can be derived from Physical Unclonable Functions (PUFs) that make use of minuscule manufacturing variations of, for instance, SRAM cells. Since PUFs are affected by environmental change…
▽ More
Cryptographic algorithms rely on the secrecy of their corresponding keys. On embedded systems with standard CMOS chips, where secure permanent memory such as flash is not available as a key storage, the secret key can be derived from Physical Unclonable Functions (PUFs) that make use of minuscule manufacturing variations of, for instance, SRAM cells. Since PUFs are affected by environmental changes, the reliable reproduction of the PUF key requires error correction. For silicon PUFs with binary output, errors occur in the form of bitflips within the PUFs response. Modelling the channel as a Binary Symmetric Channel (BSC) with fixed crossover probability $p$ is only a first-order approximation of the real behavior of the PUF response. We propose a more realistic channel model, refered to as the Varying Binary Symmetric Channel (VBSC), which takes into account that the reliability of different PUF response bits may not be equal. We investigate its channel capacity for various scenarios which differ in the channel state information (CSI) present at encoder and decoder. We compare the capacity results for the VBSC for the different CSI cases with reference to the distribution of the bitflip probability according a work by Maes et al.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers
Authors:
Tianyu Zhu,
Markus Hiller,
Mahsa Ehsanpour,
Rongkai Ma,
Tom Drummond,
Ian Reid,
Hamid Rezatofighi
Abstract:
Tracking a time-varying indefinite number of objects in a video sequence over time remains a challenge despite recent advances in the field. Most existing approaches are not able to properly handle multi-object tracking challenges such as occlusion, in part because they ignore long-term temporal information. To address these shortcomings, we present MO3TR: a truly end-to-end Transformer-based onli…
▽ More
Tracking a time-varying indefinite number of objects in a video sequence over time remains a challenge despite recent advances in the field. Most existing approaches are not able to properly handle multi-object tracking challenges such as occlusion, in part because they ignore long-term temporal information. To address these shortcomings, we present MO3TR: a truly end-to-end Transformer-based online multi-object tracking (MOT) framework that learns to handle occlusions, track initiation and termination without the need for an explicit data association module or any heuristics. MO3TR encodes object interactions into long-term temporal embeddings using a combination of spatial and temporal Transformers, and recursively uses the information jointly with the input data to estimate the states of all tracked objects over time. The spatial attention mechanism enables our framework to learn implicit representations between all the objects and the objects to the measurements, while the temporal attention mechanism focuses on specific parts of past information, allowing our approach to resolve occlusions over multiple frames. Our experiments demonstrate the potential of this new approach, achieving results on par with or better than the current state-of-the-art on multiple MOT metrics for several popular multi-object tracking benchmarks.
△ Less
Submitted 7 October, 2022; v1 submitted 27 March, 2021;
originally announced March 2021.
-
Learning Topometric Semantic Maps from Occupancy Grids
Authors:
Markus Hiller,
Chen Qiu,
Florian Particke,
Christian Hofmann,
Jörn Thielecke
Abstract:
Today's mobile robots are expected to operate in complex environments they share with humans. To allow intuitive human-robot collaboration, robots require a human-like understanding of their surroundings in terms of semantically classified instances. In this paper, we propose a new approach for deriving such instance-based semantic maps purely from occupancy grids. We employ a combination of deep…
▽ More
Today's mobile robots are expected to operate in complex environments they share with humans. To allow intuitive human-robot collaboration, robots require a human-like understanding of their surroundings in terms of semantically classified instances. In this paper, we propose a new approach for deriving such instance-based semantic maps purely from occupancy grids. We employ a combination of deep learning techniques to detect, segment and extract door hypotheses from a random-sized map. The extraction is followed by a post-processing chain to further increase the accuracy of our approach, as well as place categorization for the three classes room, door and corridor. All detected and classified entities are described as instances specified in a common coordinate system, while a topological map is derived to capture their spatial links. To train our two neural networks used for detection and map segmentation, we contribute a simulator that automatically creates and annotates the required training data. We further provide insight into which features are learned to detect doorways, and how the simulated training data can be augmented to train networks for the direct application on real-world grid maps. We evaluate our approach on several publicly available real-world data sets. Even though the used networks are solely trained on simulated data, our approach demonstrates high robustness and effectiveness in various real-world indoor environments.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
On the Burning Number of $p$-Caterpillars
Authors:
Michaela Hiller,
Eberhard Triesch,
Arie M. C. A. Koster
Abstract:
The burning number is a recently introduced graph parameter indicating the spreading speed of content in a graph through its edges. While the conjectured upper bound on the necessary numbers of time steps until all vertices are reached is proven for some specific graph classes it remains open for trees in general. We present two different proofs for ordinary caterpillars and prove the conjecture f…
▽ More
The burning number is a recently introduced graph parameter indicating the spreading speed of content in a graph through its edges. While the conjectured upper bound on the necessary numbers of time steps until all vertices are reached is proven for some specific graph classes it remains open for trees in general. We present two different proofs for ordinary caterpillars and prove the conjecture for a generalised version of caterpillars and for trees with a sufficient amount of leaves. Furthermore, determining the burning number for spider graphs, trees with maximum degree three and path-forests is known to be $\mathcal{NP}$-complete, however, we show that the complexity is already inherent in caterpillars with maximum degree three.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
On Error Correction for Physical Unclonable Functions
Authors:
Sven Puchinger,
Sven Müelich,
Martin Bossert,
Matthias Hiller,
Georg Sigl
Abstract:
Physical Unclonable Functions evaluate manufacturing variations to generate secure cryptographic keys for embedded systems without secure key storage. It is explained how methods from coding theory are applied in order to ensure reliable key reproduction. We show how better results can be obtained using code classes and decoding principles not used for this scenario before. These methods are exemp…
▽ More
Physical Unclonable Functions evaluate manufacturing variations to generate secure cryptographic keys for embedded systems without secure key storage. It is explained how methods from coding theory are applied in order to ensure reliable key reproduction. We show how better results can be obtained using code classes and decoding principles not used for this scenario before. These methods are exemplified by specific code constructions which improve existing codes with respect to error probability, decoding complexity and codeword length.
△ Less
Submitted 27 January, 2015;
originally announced January 2015.
-
Error Correction for Physical Unclonable Functions Using Generalized Concatenated Codes
Authors:
Sven Müelich,
Sven Puchinger,
Martin Bossert,
Matthias Hiller,
Georg Sigl
Abstract:
Physical Unclonable Functions can be used for secure key generation in cryptographic applications. It is explained how methods from coding theory must be applied in order to ensure reliable key regeneration. Based on previous work, we show ways how to obtain better results with respect to error probability and codeword length. Also, an example based on Generalized Concatenated codes is given, whic…
▽ More
Physical Unclonable Functions can be used for secure key generation in cryptographic applications. It is explained how methods from coding theory must be applied in order to ensure reliable key regeneration. Based on previous work, we show ways how to obtain better results with respect to error probability and codeword length. Also, an example based on Generalized Concatenated codes is given, which improves upon used coding schemes for PUFs.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.