Search | arXiv e-print repository

arXiv:2406.19509 [pdf]

Semantic orchestration and exploitation of material data: A dataspace solution demonstrated on steel and cooper applications

Authors: Yoav Nahshon, Lukas Morand, Matthias Büschelberger, Dirk Helm, Kiran Kumaraswamy, Paul Zierep, Matthias Weber, Pablo de Andrés

Abstract: In the field of materials science and manufacturing, a vast amount of heterogeneous data exists, encompassing measurement and simulation data, machine data, publications, and more. This data serves as the bedrock of valuable knowledge that can be leveraged for various engineering applications. However, efficiently storing and handling such diverse data remain significantly challenging, often due t… ▽ More In the field of materials science and manufacturing, a vast amount of heterogeneous data exists, encompassing measurement and simulation data, machine data, publications, and more. This data serves as the bedrock of valuable knowledge that can be leveraged for various engineering applications. However, efficiently storing and handling such diverse data remain significantly challenging, often due to the lack of standardization and integration across different organizational units. Addressing these issues is crucial for fully utilizing the potential of data-driven approaches in these fields. In this paper, we present a novel technology stack named Dataspace Management System (DSMS) for powering dataspace solutions. The core of DSMS lies on its distinctive knowledge management approach tuned to meet the specific demands of the materials science and manufacturing domain, all while adhering to the FAIR principles. This includes data integration, linkage, exploration, visualization, processing, and enrichment, in order to support engineers in decision-making and in solving design and optimization problems. We provide an architectural overview and describe the core components of DSMS. Additionally, we demonstrate the applicability of DSMS to typical data processing tasks in materials science through use cases from two research projects, namely StahlDigital and KupferDigital, both part of the German MaterialDigital initiative. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.07550 [pdf, other]

An Image is Worth 32 Tokens for Reconstruction and Generation

Authors: Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

Abstract: Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically… ▽ More Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically utilize 2D latent grids with fixed downsampling factors. However, these 2D tokenizations face challenges in managing the inherent redundancies present in images, where adjacent regions frequently display similarities. To overcome this issue, we introduce Transformer-based 1-Dimensional Tokenizer (TiTok), an innovative approach that tokenizes images into 1D latent sequences. TiTok provides a more compact latent representation, yielding substantially more efficient and effective representations than conventional techniques. For example, a 256 x 256 x 3 image can be reduced to just 32 discrete tokens, a significant reduction from the 256 or 1024 tokens obtained by prior methods. Despite its compact nature, TiTok achieves competitive performance to state-of-the-art approaches. Specifically, using the same generator framework, TiTok attains 1.97 gFID, outperforming MaskGIT baseline significantly by 4.21 at ImageNet 256 x 256 benchmark. The advantages of TiTok become even more significant when it comes to higher resolution. At ImageNet 512 x 512 benchmark, TiTok not only outperforms state-of-the-art diffusion model DiT-XL/2 (gFID 2.74 vs. 3.04), but also reduces the image tokens by 64x, leading to 410x faster generation process. Our best-performing variant can significantly surpasses DiT-XL/2 (gFID 2.13 vs. 3.04) while still generating high-quality samples 74x faster. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: A compact 1D Image Tokenization method, leading to SOTA generation performance while being substantially faster. Project page at https://yucornetto.github.io/projects/titok.html

arXiv:2406.01461 [pdf, other]

Hardness of Learning Neural Networks under the Manifold Hypothesis

Authors: Bobak T. Kiani, Jason Wang, Melanie Weber

Abstract: The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under… ▽ More The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.14477 [pdf, other]

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

Abstract: Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencode… ▽ More Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencoders (VAEs) with no sacrifice in output quality. We also investigate the training methodologies and the decoder architecture of LiteVAE and propose several enhancements that improve the training dynamics and reconstruction quality. Our base LiteVAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (rFID, LPIPS, PSNR, and SSIM). △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.03314 [pdf, other]

Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery

Authors: Maximilian Weber, Daniel Wild, Jens Kleesiek, Jan Egger, Christina Gsaxner

Abstract: Point cloud registration aligns 3D point clouds using spatial transformations. It is an important task in computer vision, with applications in areas such as augmented reality (AR) and medical imaging. This work explores the intersection of two research trends: the integration of AR into image-guided surgery and the use of deep learning for point cloud registration. The main objective is to evalua… ▽ More Point cloud registration aligns 3D point clouds using spatial transformations. It is an important task in computer vision, with applications in areas such as augmented reality (AR) and medical imaging. This work explores the intersection of two research trends: the integration of AR into image-guided surgery and the use of deep learning for point cloud registration. The main objective is to evaluate the feasibility of applying deep learning-based point cloud registration methods for image-to-patient registration in augmented reality-guided surgery. We created a dataset of point clouds from medical imaging and corresponding point clouds captured with a popular AR device, the HoloLens 2. We evaluate three well-established deep learning models in registering these data pairs. While we find that some deep learning methods show promise, we show that a conventional registration pipeline still outperforms them on our challenging dataset. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures; accepted at IEEE ISBI 2024

arXiv:2404.19535 [pdf, other]

Ferroelectrically-enhanced Schottky barrier transistors for Logic-in-Memory applications

Authors: Daniele Nazzari, Lukas Wind, Masiar Sistani, Dominik Mayr, Kihye Kim, Walter M. Weber

Abstract: Artificial neural networks (ANNs) have had an enormous impact on a multitude of sectors, from research to industry, generating an unprecedented demand for tailor-suited hardware platforms. Their training and execution is highly memory-intensive, clearly evidencing the limitations affecting the currently available hardware based on the von Neumann architecture, which requires frequent data shuttlin… ▽ More Artificial neural networks (ANNs) have had an enormous impact on a multitude of sectors, from research to industry, generating an unprecedented demand for tailor-suited hardware platforms. Their training and execution is highly memory-intensive, clearly evidencing the limitations affecting the currently available hardware based on the von Neumann architecture, which requires frequent data shuttling due to the physical separation of logic and memory units. This does not only limit the achievable performances but also greatly increases the energy consumption, hindering the integration of ANNs into low-power platforms. New Logic in Memory (LiM) architectures, able to unify memory and logic functionalities into a single component, are highly promising for overcoming these limitations, by drastically reducing the need of data transfers. Recently, it has been shown that a very flexible platform for logic applications can be realized recurring to a multi-gated Schottky-Barrier Field Effect Transistor (SBFET). If equipped with memory capabilities, this architecture could represent an ideal building block for versatile LiM hardware. To reach this goal, here we investigate the integration of a ferroelectric Hf$_{0.5}$Zr$_{0.5}$O$_2$ (HZO) layer onto Dual Top Gated SBFETs. We demonstrate that HZO polarization charges can be successfully employed to tune the height of the two Schottky barriers, influencing the injection behavior, thus defining the transistor mode, switching it between n and p-type transport. The modulation strength is strongly dependent on the polarization pulse height, allowing for the selection of multiple current levels. All these achievable states can be well retained over time, thanks to the HZO stability. The presented result show how ferroelectric-enhanced SBFETs are promising for the realization of novel LiM hardware, enabling low-power circuits for ANNs execution. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19109 [pdf, other]

The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset

Authors: Claudio Bellei, Muhua Xu, Ross Phillips, Tom Robinson, Mark Weber, Tim Kaler, Charles E. Leiserson, Arvind, Jie Chen

Abstract: Subgraph representation learning is a technique for analyzing local structures (or shapes) within complex networks. Enabled by recent developments in scalable Graph Neural Networks (GNNs), this approach encodes relational information at a subgroup level (multiple connected nodes) rather than at a node level of abstraction. We posit that certain domain applications, such as anti-money laundering (A… ▽ More Subgraph representation learning is a technique for analyzing local structures (or shapes) within complex networks. Enabled by recent developments in scalable Graph Neural Networks (GNNs), this approach encodes relational information at a subgroup level (multiple connected nodes) rather than at a node level of abstraction. We posit that certain domain applications, such as anti-money laundering (AML), are inherently subgraph problems and mainstream graph techniques have been operating at a suboptimal level of abstraction. This is due in part to the scarcity of annotated datasets of real-world size and complexity, as well as the lack of software tools for managing subgraph GNN workflows at scale. To enable work in fundamental algorithms as well as domain applications in AML and beyond, we introduce Elliptic2, a large graph dataset containing 122K labeled subgraphs of Bitcoin clusters within a background graph consisting of 49M node clusters and 196M edge transactions. The dataset provides subgraphs known to be linked to illicit activity for learning the set of "shapes" that money laundering exhibits in cryptocurrency and accurately classifying new criminal activity. Along with the dataset we share our graph techniques, software tooling, promising early experimental results, and new domain insights already gleaned from this approach. Taken together, we find immediate practical value in this approach and the potential for a new standard in anti-money laundering and forensic analytics in cryptocurrencies and other financial networks. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.07654 [pdf, ps, other]

rollama: An R package for using generative large language models through Ollama

Authors: Johannes B. Gruber, Maximilian Weber

Abstract: rollama is an R package that wraps the Ollama API, which allows you to run different Generative Large Language Models (GLLM) locally. The package and learning material focus on making it easy to use Ollama for annotating textual or imagine data with open-source models as well as use these models for document embedding. But users can use or extend rollama to do essentially anything else that is pos… ▽ More rollama is an R package that wraps the Ollama API, which allows you to run different Generative Large Language Models (GLLM) locally. The package and learning material focus on making it easy to use Ollama for annotating textual or imagine data with open-source models as well as use these models for document embedding. But users can use or extend rollama to do essentially anything else that is possible through OpenAI's API, yet more private, reproducible and for free. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.17778 [pdf, other]

Towards a FAIR Documentation of Workflows and Models in Applied Mathematics

Authors: Marco Reidelbach, Björn Schembera, Marcus Weber

Abstract: Modeling-Simulation-Optimization workflows play a fundamental role in applied mathematics. The Mathematical Research Data Initiative, MaRDI, responded to this by develo** a FAIR and machine-interpretable template for a comprehensive documentation of such workflows. MaRDMO, a Plugin for the Research Data Management Organiser, enables scientists from diverse fields to document and publish their wo… ▽ More Modeling-Simulation-Optimization workflows play a fundamental role in applied mathematics. The Mathematical Research Data Initiative, MaRDI, responded to this by develo** a FAIR and machine-interpretable template for a comprehensive documentation of such workflows. MaRDMO, a Plugin for the Research Data Management Organiser, enables scientists from diverse fields to document and publish their workflows on the MaRDI Portal seamlessly using the MaRDI template. Central to these workflows are mathematical models. MaRDI addresses them with the MathModDB ontology, offering a structured formal model description. Here, we showcase the interaction between MaRDMO and the MathModDB Knowledge Graph through an algebraic modeling workflow from the Digital Humanities. This demonstration underscores the versatility of both services beyond their original numerical domain. △ Less

Submitted 26 March, 2024; originally announced March 2024.

ACM Class: H.3.3; H.3.7; E.0

arXiv:2403.07621 [pdf, other]

Smartphone region-wise image indoor localization using deep learning for indoor tourist attraction

Authors: Gabriel Toshio Hirokawa Higa, Rodrigo Stuqui Monzani, Jorge Fernando da Silva Cecatto, Maria Fernanda Balestieri Mariano de Souza, Vanessa Aparecida de Moraes Weber, Hemerson Pistori, Edson Takashi Matsubara

Abstract: Smart indoor tourist attractions, such as smart museums and aquariums, usually require a significant investment in indoor localization devices. The smartphone Global Positional Systems use is unsuitable for scenarios where dense materials such as concrete and metal block weaken the GPS signals, which is the most common scenario in an indoor tourist attraction. Deep learning makes it possible to pe… ▽ More Smart indoor tourist attractions, such as smart museums and aquariums, usually require a significant investment in indoor localization devices. The smartphone Global Positional Systems use is unsuitable for scenarios where dense materials such as concrete and metal block weaken the GPS signals, which is the most common scenario in an indoor tourist attraction. Deep learning makes it possible to perform region-wise indoor localization using smartphone images. This approach does not require any investment in infrastructure, reducing the cost and time to turn museums and aquariums into smart museums or smart aquariums. This paper proposes using deep learning algorithms to classify locations using smartphone camera images for indoor tourism attractions. We evaluate our proposal in a real-world scenario in Brazil. We extensively collect images from ten different smartphones to classify biome-themed fish tanks inside the Pantanal Biopark, creating a new dataset of 3654 images. We tested seven state-of-the-art neural networks, three being transformer-based, achieving precision around 90% on average and recall and f-score around 89% on average. The results indicate good feasibility of the proposal in a most indoor tourist attractions. △ Less

Submitted 12 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07137 [pdf, other]

Exploring Cluster Analysis in Nelore Cattle Visual Score Attribution

Authors: Alexandre de Oliveira Bezerra, Rodrigo Goncalves Mateus, Vanessa Ap. de Moraes Weber, Fabricio de Lima Weber, Yasmin Alves de Arruda, Rodrigo da Costa Gomes, Gabriel Toshio Hirokawa Higa, Hemerson Pistori

Abstract: Assessing the biotype of cattle through human visual inspection is a very common and important practice in precision cattle breeding. This paper presents the results of a correlation analysis between scores produced by humans for Nelore cattle and a variety of measurements that can be derived from images or other instruments. It also presents a study using the k-means algorithm to generate new way… ▽ More Assessing the biotype of cattle through human visual inspection is a very common and important practice in precision cattle breeding. This paper presents the results of a correlation analysis between scores produced by humans for Nelore cattle and a variety of measurements that can be derived from images or other instruments. It also presents a study using the k-means algorithm to generate new ways of clustering a batch of cattle using the measurements that most correlate with the animal's body weight and visual scores. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.15452 [pdf, other]

doi 10.1145/3613904.3642837

I see an IC: A Mixed-Methods Approach to Study Human Problem-Solving Processes in Hardware Reverse Engineering

Authors: René Walendy, Markus Weber, **gjie Li, Steffen Becker, Carina Wiesen, Malte Elson, Younghyun Kim, Kassem Fawaz, Nikol Rummel, Christof Paar

Abstract: Trust in digital systems depends on secure hardware, often assured through Hardware Reverse Engineering (HRE). This work develops methods for investigating human problem-solving processes in HRE, an underexplored yet critical aspect. Since reverse engineers rely heavily on visual information, eye tracking holds promise for studying their cognitive processes. To gain further insights, we additional… ▽ More Trust in digital systems depends on secure hardware, often assured through Hardware Reverse Engineering (HRE). This work develops methods for investigating human problem-solving processes in HRE, an underexplored yet critical aspect. Since reverse engineers rely heavily on visual information, eye tracking holds promise for studying their cognitive processes. To gain further insights, we additionally employ verbal thought protocols during and immediately after HRE tasks: Concurrent and Retrospective Think Aloud. We evaluate the combination of eye tracking and Think Aloud with 41 participants in an HRE simulation. Eye tracking accurately identifies fixations on individual circuit elements and highlights critical components. Based on two use cases, we demonstrate that eye tracking and Think Aloud can complement each other to improve data quality. Our methodological insights can inform future studies in HRE, a specific setting of human-computer interaction, and in other problem-solving settings involving misleading or missing information. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2401.01869 [pdf, other]

On the hardness of learning under symmetries

Authors: Bobak T. Kiani, Thien Le, Hannah Lawrence, Stefanie Jegelka, Melanie Weber

Abstract: We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected… ▽ More We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: 52 pages, 4 figures

arXiv:2401.00284 [pdf, other]

Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models

Authors: Maximilian Weber, Merle Reichardt

Abstract: This paper explores the use of open generative Large Language Models (LLMs) for annotation tasks in the social sciences. The study highlights the challenges associated with proprietary models, such as limited reproducibility and privacy concerns, and advocates for the adoption of open (source) models that can be operated on independent devices. Two examples of annotation tasks, sentiment analysis… ▽ More This paper explores the use of open generative Large Language Models (LLMs) for annotation tasks in the social sciences. The study highlights the challenges associated with proprietary models, such as limited reproducibility and privacy concerns, and advocates for the adoption of open (source) models that can be operated on independent devices. Two examples of annotation tasks, sentiment analysis in tweets and identification of leisure activities in childhood aspirational essays are provided. The study evaluates the performance of different prompting strategies and models (neural-chat-7b-v3-2, Starling-LM-7B-alpha, openchat_3.5, zephyr-7b-alpha and zephyr-7b-beta). The results indicate the need for careful validation and tailored prompt engineering. The study highlights the advantages of open models for data privacy and reproducibility. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.10904 [pdf]

Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI)

Authors: Sabrina Toro, Anna V Anagnostopoulos, Sue Bello, Kai Blumberg, Rhiannon Cameron, Leigh Carmody, Alexander D Diehl, Damion Dooley, William Duncan, Petra Fey, Pascale Gaudet, Nomi L Harris, Marcin Joachimiak, Leila Kiani, Tiago Lubiana, Monica C Munoz-Torres, Shawn O'Neil, David Osumi-Sutherland, Aleix Puig, Justin P Reese, Leonore Reiser, Sofia Robb, Troy Ruem**, James Seager, Eric Sid , et al. (5 additional authors not shown)

Abstract: Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dyna… ▽ More Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources. Results: We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues. Conclusions: These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process. △ Less

Submitted 12 June, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.10188 [pdf, other]

WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data

Authors: Maurice Weber, Carlo Siebenschuh, Rory Butler, Anton Alexandrov, Valdemar Thanner, Georgios Tsolakis, Haris Jabbar, Ian Foster, Bo Li, Rick Stevens, Ce Zhang

Abstract: We introduce WordScape, a novel pipeline for the creation of cross-disciplinary, multilingual corpora comprising millions of pages with annotations for document layout detection. Relating visual and textual items on document pages has gained further significance with the advent of multimodal models. Various approaches proved effective for visual question answering or layout segmentation. However,… ▽ More We introduce WordScape, a novel pipeline for the creation of cross-disciplinary, multilingual corpora comprising millions of pages with annotations for document layout detection. Relating visual and textual items on document pages has gained further significance with the advent of multimodal models. Various approaches proved effective for visual question answering or layout segmentation. However, the interplay of text, tables, and visuals remains challenging for a variety of document understanding tasks. In particular, many models fail to generalize well to diverse domains and new languages due to insufficient availability of training data. WordScape addresses these limitations. Our automatic annotation pipeline parses the Open XML structure of Word documents obtained from the web, jointly providing layout-annotated document images and their textual representations. In turn, WordScape offers unique properties as it (1) leverages the ubiquity of the Word file format on the internet, (2) is readily accessible through the Common Crawl web corpus, (3) is adaptive to domain-specific documents, and (4) offers culturally and linguistically diverse document pages with natural semantic structure and high-quality text. Together with the pipeline, we will additionally release 9.5M urls to word documents which can be processed using WordScape to create a dataset of over 40M pages. Finally, we investigate the quality of text and layout annotations extracted by WordScape, assess the impact on document understanding benchmarks, and demonstrate that manual labeling costs can be substantially reduced. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023 Datasets and Benchmarks

arXiv:2312.06552 [pdf, other]

Open Data-Driven Automation of Residential Distribution Grid Modeling with Minimal Data Requirements

Authors: Moritz Weber, Luc Janecke, Hüseyin K. Çakmak, Veit Hagenmeyer

Abstract: In the present paper, we introduce a new method for the automated generation of residential distribution grid models based on novel building load estimation methods and a two-stage optimization for the generation of the 20 kV and 400 V grid topologies. Using the introduced load estimation methods, various open or proprietary data sources can be utilized to estimate the load of residential building… ▽ More In the present paper, we introduce a new method for the automated generation of residential distribution grid models based on novel building load estimation methods and a two-stage optimization for the generation of the 20 kV and 400 V grid topologies. Using the introduced load estimation methods, various open or proprietary data sources can be utilized to estimate the load of residential buildings. These data sources include available building footprints from OpenStreetMap, 3D building data from OSM Buildings, and the number of electricity meters per address provided by the respective distribution system operator (DSO). For the evaluation of the introduced methods, we compare the resulting grid models by utilizing different available data sources for a specific suburban residential area and the real grid topology provided by the DSO. This evaluation yields two key findings: First, the automated 20 kV network generation methodology works well when compared to the real network. Second, the utilization of public 3D building data for load estimation significantly increases the resulting model accuracy compared to 2D data and enables results similar to models based on DSO-supplied meter data. This substantially reduces the dependence on such normally proprietary data. △ Less

Submitted 12 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 11 pages, 12 figures, submitted to IEEE Transactions on Smart Grid

arXiv:2311.14864 [pdf, other]

Effective Structural Encodings via Local Curvature Profiles

Authors: Lukas Fesser, Melanie Weber

Abstract: Structural and Positional Encodings can significantly improve the performance of Graph Neural Networks in downstream tasks. Recent literature has begun to systematically investigate differences in the structural properties that these approaches encode, as well as performance trade-offs between them. However, the question of which structural properties yield the most effective encoding remains open… ▽ More Structural and Positional Encodings can significantly improve the performance of Graph Neural Networks in downstream tasks. Recent literature has begun to systematically investigate differences in the structural properties that these approaches encode, as well as performance trade-offs between them. However, the question of which structural properties yield the most effective encoding remains open. In this paper, we investigate this question from a geometric perspective. We propose a novel structural encoding based on discrete Ricci curvature (Local Curvature Profiles, short LCP) and show that it significantly outperforms existing encoding approaches. We further show that combining local structural encodings, such as LCP, with global positional encodings improves downstream performance, suggesting that they capture complementary geometric information. Finally, we compare different encoding types with (curvature-based) rewiring techniques. Rewiring has recently received a surge of interest due to its ability to improve the performance of Graph Neural Networks by mitigating over-smoothing and over-squashing effects. Our results suggest that utilizing curvature information for structural encodings delivers significantly larger performance increases than rewiring. △ Less

Submitted 13 March, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.07422 [pdf, other]

Sidekick compilation with xDSL

Authors: Mathieu Fehr, Michel Weber, Christian Ulmann, Alexandre Lopoukhine, Martin Lücke, Théo Degioanni, Michel Steuwer, Tobias Grosser

Abstract: Traditionally, compiler researchers either conduct experiments within an existing production compiler or develop their own prototype compiler; both options come with trade-offs. On one hand, prototy** in a production compiler can be cumbersome, as they are often optimized for program compilation speed at the expense of software simplicity and development speed. On the other hand, the transition… ▽ More Traditionally, compiler researchers either conduct experiments within an existing production compiler or develop their own prototype compiler; both options come with trade-offs. On one hand, prototy** in a production compiler can be cumbersome, as they are often optimized for program compilation speed at the expense of software simplicity and development speed. On the other hand, the transition from a prototype compiler to production requires significant engineering work. To bridge this gap, we introduce the concept of sidekick compiler frameworks, an approach that uses multiple frameworks that interoperate with each other by leveraging textual interchange formats and declarative descriptions of abstractions. Each such compiler framework is specialized for specific use cases, such as performance or prototy**. Abstractions are by design shared across frameworks, simplifying the transition from prototy** to production. We demonstrate this idea with xDSL, a sidekick for MLIR focused on prototy** and teaching. xDSL interoperates with MLIR through a shared textual IR and the exchange of IRs through an IR Definition Language. The benefits of sidekick compiler frameworks are evaluated by showing on three use cases how xDSL impacts their development: teaching, DSL compilation, and rewrite system prototy**. We also investigate the trade-offs that xDSL offers, and demonstrate how we simplify the transition between frameworks using the IRDL dialect. With sidekick compilation, we envision a future in which engineers minimize the cost of development by choosing a framework built for their immediate needs, and later transitioning to production with minimal overhead. △ Less

Submitted 16 June, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 14 pages, 15 figures; updated twice to include acknowledgements

arXiv:2310.17347 [pdf, other]

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

Abstract: While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffus… ▽ More While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffusion models that can increase generation diversity, especially at high guidance scales, with minimal loss of sample quality. Our sampling strategy anneals the conditioning signal by adding scheduled, monotonically decreasing Gaussian noise to the conditioning vector during inference to balance diversity and condition alignment. Our Condition-Annealed Diffusion Sampler (CADS) can be used with any pretrained model and sampling algorithm, and we show that it boosts the diversity of diffusion models in various conditional generation tasks. Further, using an existing pretrained diffusion model, CADS achieves a new state-of-the-art FID of 1.70 and 2.31 for class-conditional ImageNet generation at 256$\times$256 and 512$\times$512 respectively. △ Less

Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Published as a conference paper at ICLR 2024

Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2309.10048 [pdf]

What does ChatGPT know about natural science and engineering?

Authors: Lukas Schulze Balhorn, Jana M. Weber, Stefan Buijsman, Julian R. Hildebrandt, Martina Ziefle, Artur M. Schweidtmann

Abstract: ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to have a large impact on society, research, and education. An essential step to understand ChatGPT's expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the nat… ▽ More ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to have a large impact on society, research, and education. An essential step to understand ChatGPT's expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions from 198 faculty members across 5 faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are on average perceived as ``mostly correct''. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the complexity level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.09384 [pdf, other]

Mitigating Over-Smoothing and Over-Squashing using Augmentations of Forman-Ricci Curvature

Authors: Lukas Fesser, Melanie Weber

Abstract: While Graph Neural Networks (GNNs) have been successfully leveraged for learning on graph-structured data across domains, several potential pitfalls have been described recently. Those include the inability to accurately leverage information encoded in long-range connections (over-squashing), as well as difficulties distinguishing the learned representations of nearby nodes with growing network de… ▽ More While Graph Neural Networks (GNNs) have been successfully leveraged for learning on graph-structured data across domains, several potential pitfalls have been described recently. Those include the inability to accurately leverage information encoded in long-range connections (over-squashing), as well as difficulties distinguishing the learned representations of nearby nodes with growing network depth (over-smoothing). An effective way to characterize both effects is discrete curvature: Long-range connections that underlie over-squashing effects have low curvature, whereas edges that contribute to over-smoothing have high curvature. This observation has given rise to rewiring techniques, which add or remove edges to mitigate over-smoothing and over-squashing. Several rewiring approaches utilizing graph characteristics, such as curvature or the spectrum of the graph Laplacian, have been proposed. However, existing methods, especially those based on curvature, often require expensive subroutines and careful hyperparameter tuning, which limits their applicability to large-scale graphs. Here we propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation, which can be computed in linear time. We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs. We complement our theoretical results with experiments, which demonstrate that the proposed approach achieves state-of-the-art performance while significantly reducing the computational cost in comparison with other methods. Utilizing fundamental properties of discrete curvature, we propose effective heuristics for hyperparameters in curvature-based rewiring, which avoids expensive hyperparameter searches, further improving the scalability of the proposed approach. △ Less

Submitted 19 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.05740 [pdf, other]

REVERSIM: A Game-Based Environment to Study Human Aspects in Hardware Reverse Engineering

Authors: Steffen Becker, René Walendy, Markus Weber, Carina Wiesen, Nikol Rummel, Christof Paar

Abstract: Hardware Reverse Engineering (HRE) is a technique for analyzing Integrated Circuits (ICs). Experts employ HRE for security-critical tasks, such as detecting Trojans or intellectual property violations. They rely not only on their experience and customized tools but also on their cognitive abilities. Conducting controlled experiments to assess the cognitive processes involved in HRE can open new av… ▽ More Hardware Reverse Engineering (HRE) is a technique for analyzing Integrated Circuits (ICs). Experts employ HRE for security-critical tasks, such as detecting Trojans or intellectual property violations. They rely not only on their experience and customized tools but also on their cognitive abilities. Conducting controlled experiments to assess the cognitive processes involved in HRE can open new avenues for hardware protection. However, HRE experts are largely unavailable for empirical research in real-world settings. To address this challenge, we have developed REVERSIM, a game-based environment that mimics realistic HRE subprocesses and can integrate standardized cognitive tests. REVERSIM enables quantitative studies with easier-to-recruit non-experts to uncover cognitive factors relevant to HRE, which can subsequently be validated with small expert samples. To evaluate the design of REVERSIM, the minimum requirements for successful participation, and its measurement capabilities, we conducted two studies: First, we performed semi-structured interviews with 14 professionals and researchers from the HRE domain, who attested to the comparability of REVERSIM to real-world HRE problems. Second, we conducted an online user study with 109 participants, demonstrating that they could engage in REVERSIM with low domain-specific prior knowledge. We provide refined screening criteria, derive fine-grained performance metrics, and successfully perform a cognitive test for mental speed in REVERSIM, thus contributing an important piece of the puzzle for the development of innovative hardware protection mechanisms. △ Less

Submitted 24 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.06127 [pdf, other]

doi 10.1109/SSCI52147.2023.10371978

Learning Control Policies for Variable Objectives from Offline Data

Authors: Marc Weber, Phillip Swazinna, Daniel Hein, Steffen Udluft, Volkmar Sterzing

Abstract: Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a… ▽ More Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a variety of objectives, which parameterize the reward function. We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime, without need for collecting additional observation batches or re-training. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 8 pages, 7 figures

Journal ref: 2023 IEEE Symposium Series on Computational Intelligence

arXiv:2307.10155 [pdf, other]

Curvature-based Clustering on Graphs

Authors: Yu Tian, Zachary Lubberts, Melanie Weber

Abstract: Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms, which exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its comm… ▽ More Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms, which exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph, i.e., the graph's dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 65 pages, 19 figures

MSC Class: 05C82; 05C10; 53C21; 68R10; 05C75

arXiv:2307.02378 [pdf, other]

Continuum Limits of Ollivier's Ricci Curvature on data clouds: pointwise consistency and global lower bounds

Authors: Nicolas Garcia Trillos, Melanie Weber

Abstract: Let $\mathcal{M} \subseteq \mathbb{R}^d$ denote a low-dimensional manifold and let $\mathcal{X}= \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We pro… ▽ More Let $\mathcal{M} \subseteq \mathbb{R}^d$ denote a low-dimensional manifold and let $\mathcal{X}= \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from $\mathcal{M}$. We study the relationship between the curvature of a random geometric graph built from $\mathcal{X}$ and the curvature of the manifold $\mathcal{M}$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $\mathcal{M}$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that the consistency results allow for characterizing the intrinsic curvature of a manifold from extrinsic curvature. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.15938 [pdf, other]

Interpretable Anomaly Detection in Cellular Networks by Learning Concepts in Variational Autoencoders

Authors: Amandeep Singh, Michael Weber, Markus Lange-Hegermann

Abstract: This paper addresses the challenges of detecting anomalies in cellular networks in an interpretable way and proposes a new approach using variational autoencoders (VAEs) that learn interpretable representations of the latent space for each Key Performance Indicator (KPI) in the dataset. This enables the detection of anomalies based on reconstruction loss and Z-scores. We ensure the interpretabilit… ▽ More This paper addresses the challenges of detecting anomalies in cellular networks in an interpretable way and proposes a new approach using variational autoencoders (VAEs) that learn interpretable representations of the latent space for each Key Performance Indicator (KPI) in the dataset. This enables the detection of anomalies based on reconstruction loss and Z-scores. We ensure the interpretability of the anomalies via additional information centroids (c) using the K-means algorithm to enhance representation learning. We evaluate the performance of the model by analyzing patterns in the latent dimension for specific KPIs and thereby demonstrate the interpretability and anomalies. The proposed framework offers a faster and autonomous solution for detecting anomalies in cellular networks and showcases the potential of deep learning-based algorithms in handling big data. △ Less

Submitted 28 June, 2023; originally announced June 2023.

ACM Class: C.2.m; C.2.3; G.3; I.2.6; I.5.3

arXiv:2306.06474 [pdf, other]

Augmentations of Forman's Ricci Curvature and their Applications in Community Detection

Authors: Lukas Fesser, Sergio Serrano de Haro Iváñez, Karel Devriendt, Melanie Weber, Renaud Lambiotte

Abstract: The notion of curvature on graphs has recently gained traction in the networks community, with the Ollivier-Ricci curvature (ORC) in particular being used for several tasks in network analysis, such as community detection. In this work, we choose a different approach and study augmentations of the discretization of the Ricci curvature proposed by Forman (AFRC). We empirically and theoretically inv… ▽ More The notion of curvature on graphs has recently gained traction in the networks community, with the Ollivier-Ricci curvature (ORC) in particular being used for several tasks in network analysis, such as community detection. In this work, we choose a different approach and study augmentations of the discretization of the Ricci curvature proposed by Forman (AFRC). We empirically and theoretically investigate its relation to the ORC and the un-augmented Forman-Ricci curvature. In particular, we provide evidence that the AFRC frequently gives sufficient insight into the structure of a network to be used for community detection, and therefore provides a computationally cheaper alternative to previous ORC-based methods. Our novel AFRC-based community detection algorithm is competitive with an ORC-based approach. The codebase for fast and efficient computations of AFRC and the experiments in this article will be made publicly available upon publication. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: 22 pages, 11 figures

MSC Class: 05C82; 05C10 (Primary) 53C21; 68R10; 05C75 (Secondary)

arXiv:2305.09011 [pdf, other]

The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)

Authors: Hongwei Bran Li, Gian Marco Conte, Syed Muhammad Anwar, Florian Kofler, Ivan Ezhov, Koen van Leemput, Marie Piraud, Maria Diaz, Byrone Cole, Evan Calabrese, Jeff Rudie, Felix Meissen, Maruf Adewole, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Ahmed W. Moawad, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman , et al. (43 additional authors not shown)

Abstract: Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time const… ▽ More Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time constraints or image artifacts, such as patient motion. Consequently, the ability to substitute missing modalities and gain segmentation performance is highly desirable and necessary for the broader adoption of these algorithms in the clinical routine. In this work, we present the establishment of the Brain MR Image Synthesis Benchmark (BraSyn) in conjunction with the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2023. The primary objective of this challenge is to evaluate image synthesis methods that can realistically generate missing MRI modalities when multiple available images are provided. The ultimate aim is to facilitate automated brain tumor segmentation pipelines. The image dataset used in the benchmark is diverse and multi-modal, created through collaboration with various hospitals and research institutions. △ Less

Submitted 28 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: Technical report of BraSyn

arXiv:2305.08992 [pdf, other]

The Brain Tumor Segmentation (BraTS) Challenge 2023: Local Synthesis of Healthy Brain Tissue via Inpainting

Authors: Florian Kofler, Felix Meissen, Felix Steinbauer, Robert Graf, Eva Oswald, Ezequiel de da Rosa, Hongwei Bran Li, Ujjwal Baid, Florian Hoelzl, Oezguen Turgut, Izabela Horvath, Diana Waldmannstetter, Christina Bukas, Maruf Adewole, Syed Muhammad Anwar, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Ahmed W Moawad, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako , et al. (43 additional authors not shown)

Abstract: A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with a scan that is already pathological. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantees for images featuring lesions. Examples include… ▽ More A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with a scan that is already pathological. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantees for images featuring lesions. Examples include but are not limited to algorithms for brain anatomy parcellation, tissue segmentation, and brain extraction. To solve this dilemma, we introduce the BraTS 2023 inpainting challenge. Here, the participants' task is to explore inpainting techniques to synthesize healthy brain scans from lesioned ones. The following manuscript contains the task formulation, dataset, and submission procedure. Later it will be updated to summarize the findings of the challenge. The challenge is organized as part of the BraTS 2023 challenge hosted at the MICCAI 2023 conference in Vancouver, Canada. △ Less

Submitted 9 August, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 5 pages, 1 figure

arXiv:2304.12906 [pdf, other]

The Score-Difference Flow for Implicit Generative Modeling

Authors: Romann M. Weber

Abstract: Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we prese… ▽ More Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schroedinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach. △ Less

Submitted 18 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 25 pages, 5 figures, 4 tables. To appear in Transactions on Machine Learning Research (TMLR)

Journal ref: Transactions on Machine Learning Research (2023)

arXiv:2303.13006 [pdf, other]

Controllable Inversion of Black-Box Face Recognition Models via Diffusion

Authors: Manuel Kansy, Anton Raël, Graziana Mignone, Jacek Naruniec, Christopher Schroers, Markus Gross, Romann M. Weber

Abstract: Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been propose… ▽ More Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs and strong requirements for the data set and accessibility of the face recognition model. By analyzing the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively, and our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process. △ Less

Submitted 30 September, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: 8 pages main paper + 23 pages supplementary material. Moderate revisions from v1 (different template, added user study, wording). Presented at AMFG workshop at ICCV 2023. Project page: https://studios.disneyresearch.com/2023/10/02/controllable-inversion-of-black-box-face-recognition-models-via-diffusion/

ACM Class: I.2; I.3.3; I.4

arXiv:2303.02473 [pdf]

Disparity in the Evolving COVID-19 Collaboration Network

Authors: Huimin Xu, Redoan Rahman, Ajay Jaiswal, Julia Fensel, Abhinav Peri, Ka-mesh Peri, Griffin M Weber, Ying Ding

Abstract: The COVID 19 pandemic has paused many ongoing research projects and unified researchers' attention to focus on COVID 19 related issues. Our project traces 712294 scientists' publications related to COVID 19 for two years, from January 2020 to December 2021, to detect the dynamic evolution patterns of the COVID 19 collaboration network over time. By studying the collaboration network of COVID 19 sc… ▽ More The COVID 19 pandemic has paused many ongoing research projects and unified researchers' attention to focus on COVID 19 related issues. Our project traces 712294 scientists' publications related to COVID 19 for two years, from January 2020 to December 2021, to detect the dynamic evolution patterns of the COVID 19 collaboration network over time. By studying the collaboration network of COVID 19 scientists, we observe how a new scientific community has been built in preparation for a sudden shock. The number of newcomers grows incrementally, and the connectivity of the collaboration network shifts from loose to tight promptly. Even though every scientist has an equal opportunity to start a study, collaboration disparity still exists. Following the scale-free distribution, only a few top authors are highly connected with other authors. These top authors are more likely to attract newcomers and work with each other. As the collaboration network evolves, the increase rate in the probability of attracting newcomers for authors with higher degrees increases, whereas the increase rates in the likelihood of forming new links among authors with higher degrees decreases. This highlights the interesting trend that the COVID pandemic alters the research collaboration trends that star scientists are starting to collaborate more with newcomers but less with existing collaborators, which, in a certain way, reduces the collaboration disparity. △ Less

Submitted 4 March, 2023; originally announced March 2023.

arXiv:2212.08204 [pdf, other]

LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text Comprehension

Authors: Wenyue Hua, Yuchen Zhang, Zhe Chen, Josie Li, Melanie Weber

Abstract: The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on p… ▽ More The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator. We show that this improves the model's performance on processing long passages and results in better long-range text comprehension. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2211.16943 [pdf, other]

Predicting Properties of Quantum Systems with Conditional Generative Models

Authors: Haoxiang Wang, Maurice Weber, Josh Izaac, Cedric Yen-Yu Lin

Abstract: Machine learning has emerged recently as a powerful tool for predicting properties of quantum many-body systems. For many ground states of gapped Hamiltonians, generative models can learn from measurements of a single quantum state to reconstruct the state accurately enough to predict local observables. Alternatively, classification and regression models can predict local observables by learning f… ▽ More Machine learning has emerged recently as a powerful tool for predicting properties of quantum many-body systems. For many ground states of gapped Hamiltonians, generative models can learn from measurements of a single quantum state to reconstruct the state accurately enough to predict local observables. Alternatively, classification and regression models can predict local observables by learning from measurements on different but related states. In this work, we combine the benefits of both approaches and propose the use of conditional generative models to simultaneously represent a family of states, learning shared structures of different quantum states from measurements. The trained model enables us to predict arbitrary local properties of ground states, even for states not included in the training data, without necessitating further training for new observables. We first numerically validate our approach on 2D random Heisenberg models using simulations of up to 45 qubits. Furthermore, we conduct quantum simulations on a neutral-atom quantum computer and demonstrate that our method can accurately predict the quantum phases of square lattices of 13$\times$13 Rydberg atoms. △ Less

Submitted 3 March, 2024; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: 10 pages, 14 figures, 5 pages appendix. Open-source code is available at https://github.com/PennyLaneAI/generative-quantum-states

arXiv:2208.10773 [pdf, other]

Adversarial Vulnerability of Temporal Feature Networks for Object Detection

Authors: Svetlana Pavlitskaya, Nikolai Polley, Michael Weber, J. Marius Zöllner

Abstract: Taking into account information across the temporal domain helps to improve environment perception in autonomous driving. However, it has not been studied so far whether temporally fused neural networks are vulnerable to deliberately generated perturbations, i.e. adversarial attacks, or whether temporal history is an inherent defense against them. In this work, we study whether temporal feature ne… ▽ More Taking into account information across the temporal domain helps to improve environment perception in autonomous driving. However, it has not been studied so far whether temporally fused neural networks are vulnerable to deliberately generated perturbations, i.e. adversarial attacks, or whether temporal history is an inherent defense against them. In this work, we study whether temporal feature networks for object detection are vulnerable to universal adversarial attacks. We evaluate attacks of two types: imperceptible noise for the whole image and locally-bound adversarial patch. In both cases, perturbations are generated in a white-box manner using PGD. Our experiments confirm, that attacking even a portion of a temporal input suffices to fool the network. We visually assess generated perturbations to gain insights into the functioning of attacks. To enhance the robustness, we apply adversarial training using 5-PGD. Our experiments on KITTI and nuScenes datasets demonstrate, that a model robustified via K-PGD is able to withstand the studied attacks while kee** the mAP-based performance comparable to that of an unattacked model. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: Accepted for publication at ECCV 2022 SAIAD workshop

arXiv:2208.05013 [pdf, other]

Computing Brascamp-Lieb Constants through the lens of Thompson Geometry

Authors: Melanie Weber, Suvrit Sra

Abstract: This paper studies algorithms for efficiently computing Brascamp-Lieb constants, a task that has recently received much interest. In particular, we reduce the computation to a nonlinear matrix-valued iteration, whose convergence we analyze through the lens of fixed-point methods under the well-known Thompson metric. This approach permits us to obtain (weakly) polynomial time guarantees, and it off… ▽ More This paper studies algorithms for efficiently computing Brascamp-Lieb constants, a task that has recently received much interest. In particular, we reduce the computation to a nonlinear matrix-valued iteration, whose convergence we analyze through the lens of fixed-point methods under the well-known Thompson metric. This approach permits us to obtain (weakly) polynomial time guarantees, and it offers an efficient and transparent alternative to previous state-of-the-art approaches based on Riemannian optimization and geodesic convexity. △ Less

Submitted 14 April, 2024; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: Under Review

MSC Class: 46N10; 49Q99; 53Z50; 68W40

arXiv:2207.13779 [pdf, other]

doi 10.1016/j.compchemeng.2023.108202

Physical Pooling Functions in Graph Neural Networks for Molecular Property Prediction

Authors: Artur M. Schweidtmann, Jan G. Rittig, Jana M. Weber, Martin Grohe, Manuel Dahmen, Kai Leonhard, Alexander Mitsos

Abstract: Graph neural networks (GNNs) are emerging in chemical engineering for the end-to-end learning of physicochemical properties based on molecular graphs. A key element of GNNs is the pooling function which combines atom feature vectors into molecular fingerprints. Most previous works use a standard pooling function to predict a variety of properties. However, unsuitable pooling functions can lead to… ▽ More Graph neural networks (GNNs) are emerging in chemical engineering for the end-to-end learning of physicochemical properties based on molecular graphs. A key element of GNNs is the pooling function which combines atom feature vectors into molecular fingerprints. Most previous works use a standard pooling function to predict a variety of properties. However, unsuitable pooling functions can lead to unphysical GNNs that poorly generalize. We compare and select meaningful GNN pooling methods based on physical knowledge about the learned properties. The impact of physical pooling functions is demonstrated with molecular properties calculated from quantum mechanical computations. We also compare our results to the recent set2set pooling approach. We recommend using sum pooling for the prediction of properties that depend on molecular size and compare pooling functions for properties that are molecular size-independent. Overall, we show that the use of physical pooling functions significantly enhances generalization. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: Computers and Chemical Engineering Volume 172, April 2023, 108202

arXiv:2206.11426 [pdf, other]

On a class of geodesically convex optimization problems solved via Euclidean MM methods

Authors: Melanie Weber, Suvrit Sra

Abstract: We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions. This structure arises in several optimization problems in statistics and machine learning, e.g., for matrix scaling, M-estimators for covariances, and Brascamp-Lieb inequalities. Our work offers efficient algorithms that on the one hand exploit g-convexity to ensure global optimality… ▽ More We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions. This structure arises in several optimization problems in statistics and machine learning, e.g., for matrix scaling, M-estimators for covariances, and Brascamp-Lieb inequalities. Our work offers efficient algorithms that on the one hand exploit g-convexity to ensure global optimality along with guarantees on iteration complexity. On the other hand, the split structure permits us to develop Euclidean Majorization-Minorization algorithms that help us bypass the need to compute expensive Riemannian operations such as exponential maps and parallel transport. We illustrate our results by specializing them to a few concrete optimization problems that have been previously studied in the machine learning literature. Ultimately, we hope our work helps motivate the broader search for mixed Euclidean-Riemannian optimization algorithms △ Less

Submitted 20 October, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: Under Review

arXiv:2206.00619 [pdf, other]

doi 10.1002/aic.17971

Graph Machine Learning for Design of High-Octane Fuels

Authors: Jan G. Rittig, Martin Ritzert, Artur M. Schweidtmann, Stefanie Winkler, Jana M. Weber, Philipp Morsch, K. Alexander Heufer, Martin Grohe, Alexander Mitsos, Manuel Dahmen

Abstract: Fuels with high-knock resistance enable modern spark-ignition engines to achieve high efficiency and thus low CO2 emissions. Identification of molecules with desired autoignition properties indicated by a high research octane number and a high octane sensitivity is therefore of great practical relevance and can be supported by computer-aided molecular design (CAMD). Recent developments in the fiel… ▽ More Fuels with high-knock resistance enable modern spark-ignition engines to achieve high efficiency and thus low CO2 emissions. Identification of molecules with desired autoignition properties indicated by a high research octane number and a high octane sensitivity is therefore of great practical relevance and can be supported by computer-aided molecular design (CAMD). Recent developments in the field of graph machine learning (graph-ML) provide novel, promising tools for CAMD. We propose a modular graph-ML CAMD framework that integrates generative graph-ML models with graph neural networks and optimization, enabling the design of molecules with desired ignition properties in a continuous molecular space. In particular, we explore the potential of Bayesian optimization and genetic algorithms in combination with generative graph-ML models. The graph-ML CAMD framework successfully identifies well-established high-octane components. It also suggests new candidates, one of which we experimentally investigate and use to illustrate the need for further auto-ignition training data. △ Less

Submitted 14 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: manuscript (26 pages, 9 figures, 2 tables), supporting information (12 pages, 8 figures, 1 table)

Journal ref: AIChE Journal 69 (4), e17971, 2023

arXiv:2205.15494 [pdf, other]

Certifying Some Distributional Fairness with Subpopulation Decomposition

Authors: Mintong Kang, Linyi Li, Maurice Weber, Yang Liu, Ce Zhang, Bo Li

Abstract: Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model t… ▽ More Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting. Our framework is flexible to integrate additional non-skewness constraints and we show that it provides even tighter certification under different real-world scenarios. We also compare our certified fairness bound with adapted existing distributional robustness bounds on Gaussian data and demonstrate that our method is significantly tighter. △ Less

Submitted 18 November, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022, 38 pages, 11 pages for the main text

arXiv:2203.12560 [pdf, other]

DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation

Authors: Aysim Toker, Lukas Kondmann, Mark Weber, Marvin Eisenberger, Andrés Camero, **gliang Hu, Ariadna Pregel Hoderlein, Çağlar Şenaras, Timothy Davis, Daniel Cremers, Giovanni Marchisio, Xiao Xiang Zhu, Laura Leal-Taixé

Abstract: Earth observation is a fundamental tool for monitoring the evolution of land use in specific areas of interest. Observing and precisely defining change, in this context, requires both time-series data and pixel-wise segmentations. To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the g… ▽ More Earth observation is a fundamental tool for monitoring the evolution of land use in specific areas of interest. Observing and precisely defining change, in this context, requires both time-series data and pixel-wise segmentations. To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. These observations are paired with pixel-wise monthly semantic segmentation labels of 7 land use and land cover (LULC) classes. DynamicEarthNet is the first dataset that provides this unique combination of daily measurements and high-quality labels. In our experiments, we compare several established baselines that either utilize the daily observations as additional training data (semi-supervised learning) or multiple observations at once (spatio-temporal learning) as a point of reference for future research. Finally, we propose a new evaluation metric SCS that addresses the specific challenges associated with time-series semantic change segmentation. The data is available at: https://mediatum.ub.tum.de/1650201. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022, evaluation webpage: https://codalab.lisn.upsaclay.fr/competitions/2882

arXiv:2202.01679 [pdf, other]

Certifying Out-of-Domain Generalization for Blackbox Functions

Authors: Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, Ce Zhang

Abstract: Certifying the robustness of model performance under bounded data distribution drifts has recently attracted intensive interest under the umbrella of distributional robustness. However, existing techniques either make strong assumptions on the model class and loss functions that can be certified, such as smoothness expressed via Lipschitz continuity of gradients, or require to solve complex optimi… ▽ More Certifying the robustness of model performance under bounded data distribution drifts has recently attracted intensive interest under the umbrella of distributional robustness. However, existing techniques either make strong assumptions on the model class and loss functions that can be certified, such as smoothness expressed via Lipschitz continuity of gradients, or require to solve complex optimization problems. As a result, the wider application of these techniques is currently limited by its scalability and flexibility -- these techniques often do not scale to large-scale datasets with modern deep neural networks or cannot handle loss functions which may be non-smooth such as the 0-1 loss. In this paper, we focus on the problem of certifying distributional robustness for blackbox models and bounded loss functions, and propose a novel certification framework based on the Hellinger distance. Our certification technique scales to ImageNet-scale datasets, complex models, and a diverse set of loss functions. We then focus on one specific application enabled by such scalability and flexibility, i.e., certifying out-of-domain generalization for large neural networks and loss functions such as accuracy and AUC. We experimentally validate our certification method on a number of datasets, ranging from ImageNet, where we provide the first non-vacuous certified out-of-domain generalization, to smaller classification tasks where we are able to compare with the state-of-the-art and show that our method performs considerably better. △ Less

Submitted 30 July, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: 39th International Conference on Machine Learning (ICML) 2022

arXiv:2201.07032 [pdf, other]

The Mathematics of Comparing Objects

Authors: Marcus Weber, Konstantin Fackeldey

Abstract: "After reading two different crime stories, an artificial intelligence concludes that in both stories the police has found the murderer just by random." -- To what extend and under which assumptions this is a description of a realistic scenario? "After reading two different crime stories, an artificial intelligence concludes that in both stories the police has found the murderer just by random." -- To what extend and under which assumptions this is a description of a realistic scenario? △ Less

Submitted 30 March, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

MSC Class: 06Exx ACM Class: J.5

arXiv:2112.10074 [pdf, other]

doi 10.59275/j.melba.2022-354b

QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Develo** scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS. △ Less

Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2109.09946 [pdf, other]

Identifying biases in legal data: An algorithmic fairness perspective

Authors: Jackson Sargent, Melanie Weber

Abstract: The need to address representation biases and sentencing disparities in legal case data has long been recognized. Here, we study the problem of identifying and measuring biases in large-scale legal case data from an algorithmic fairness perspective. Our approach utilizes two regression models: A baseline that represents the decisions of a "typical" judge as given by the data and a "fair" judge tha… ▽ More The need to address representation biases and sentencing disparities in legal case data has long been recognized. Here, we study the problem of identifying and measuring biases in large-scale legal case data from an algorithmic fairness perspective. Our approach utilizes two regression models: A baseline that represents the decisions of a "typical" judge as given by the data and a "fair" judge that applies one of three fairness concepts. Comparing the decisions of the "typical" judge and the "fair" judge allows for quantifying biases across demographic groups, as we demonstrate in four case studies on criminal data from Cook County (Illinois). △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: EEAMO 2021

ACM Class: K.4; K.5

arXiv:2107.02314 [pdf, other]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

Authors: Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C. Kitamura, Sarthak Pati, Luciano M. Prevedello, Jeffrey D. Rudie, Chiharu Sako, Russell T. Shinohara, Timothy Bergquist, Rong Chai, James Eddy, Julia Elliott, Walter Reade, Thomas Schaffter, Thomas Yu, Jiaxin Zheng, Ahmed W. Moawad, Luiz Otavio Coelho, Olivia McDonnell , et al. (78 additional authors not shown)

Abstract: The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with wel… ▽ More The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with well-curated multi-institutional multi-parametric magnetic resonance imaging (mpMRI) data. Gliomas are the most common primary malignancies of the central nervous system, with varying degrees of aggressiveness and prognosis. The RSNA-ASNR-MICCAI BraTS 2021 challenge targets the evaluation of computational algorithms assessing the same tumor compartmentalization, as well as the underlying tumor's molecular characterization, in pre-operative baseline mpMRI data from 2,040 patients. Specifically, the two tasks that BraTS 2021 focuses on are: a) the segmentation of the histologically distinct brain tumor sub-regions, and b) the classification of the tumor's O[6]-methylguanine-DNA methyltransferase (MGMT) promoter methylation status. The performance evaluation of all participating algorithms in BraTS 2021 will be conducted through the Sage Bionetworks Synapse platform (Task 1) and Kaggle (Task 2), concluding in distributing to the top ranked participants monetary awards of $60,000 collectively. △ Less

Submitted 12 September, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 19 pages, 2 figures, 1 table

arXiv:2106.09748 [pdf, other]

DeepLab2: A TensorFlow Library for Deep Labeling

Authors: Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

Abstract: DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the sta… ▽ More DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the state-of-art systems. To showcase the effectiveness of DeepLab2, our Panoptic-DeepLab employing Axial-SWideRNet as network backbone achieves 68.0% PQ or 83.5% mIoU on Cityscaspes validation set, with only single-scale inference and ImageNet-1K pretrained checkpoints. We hope that publicly sharing our library could facilitate future research on dense pixel labeling tasks and envision new applications of this technology. Code is made publicly available at \url{https://github.com/google-research/deeplab2}. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 4-page technical report. The first three authors contributed equally to this work

arXiv:2106.02045 [pdf, other]

Least-squares fitting of Gaussian spots on graphics processing units

Authors: Marcel Leutenegger, Michael Weber

Abstract: The investigation of samples with a spatial resolution in the nanometer range relies on the precise and stable positioning of the sample. Due to inherent mechanical instabilities of typical sample stages in optical microscopes, it is usually required to control and/or monitor the sample position during the acquisition. The tracking of sparsely distributed fiducial markers at high speed allows stab… ▽ More The investigation of samples with a spatial resolution in the nanometer range relies on the precise and stable positioning of the sample. Due to inherent mechanical instabilities of typical sample stages in optical microscopes, it is usually required to control and/or monitor the sample position during the acquisition. The tracking of sparsely distributed fiducial markers at high speed allows stabilizing the sample position at millisecond time scales. For this purpose, we present a scalable fitting algorithm with significantly improved performance for two-dimensional Gaussian fits as compared to Gpufit. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: 16 pages

arXiv:2105.05874 [pdf, other]

The Federated Tumor Segmentation (FeTS) Challenge

Authors: Sarthak Pati, Ujjwal Baid, Maximilian Zenk, Brandon Edwards, Micah Sheller, G. Anthony Reina, Patrick Foley, Alexey Gruzdev, Jason Martin, Shadi Albarqouni, Yong Chen, Russell Taki Shinohara, Annika Reinke, David Zimmerer, John B. Freymann, Justin S. Kirby, Christos Davatzikos, Rivka R. Colen, Aikaterini Kotrotsou, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Hassan Fathallah-Shaykh, Roland Wiest, Andras Jakab , et al. (7 additional authors not shown)

Abstract: This manuscript describes the first challenge on Federated Learning, namely the Federated Tumor Segmentation (FeTS) challenge 2021. International challenges have become the standard for validation of biomedical image analysis methods. However, the actual performance of participating (even the winning) algorithms on "real-world" clinical data often remains unclear, as the data included in challenge… ▽ More This manuscript describes the first challenge on Federated Learning, namely the Federated Tumor Segmentation (FeTS) challenge 2021. International challenges have become the standard for validation of biomedical image analysis methods. However, the actual performance of participating (even the winning) algorithms on "real-world" clinical data often remains unclear, as the data included in challenges are usually acquired in very controlled settings at few institutions. The seemingly obvious solution of just collecting increasingly more data from more institutions in such challenges does not scale well due to privacy and ownership hurdles. Towards alleviating these concerns, we are proposing the FeTS challenge 2021 to cater towards both the development and the evaluation of models for the segmentation of intrinsically heterogeneous (in appearance, shape, and histology) brain tumors, namely gliomas. Specifically, the FeTS 2021 challenge uses clinically acquired, multi-institutional magnetic resonance imaging (MRI) scans from the BraTS 2020 challenge, as well as from various remote independent institutions included in the collaborative network of a real-world federation (https://www.fets.ai/). The goals of the FeTS challenge are directly represented by the two included tasks: 1) the identification of the optimal weight aggregation approach towards the training of a consensus model that has gained knowledge via federated learning from multiple geographically distinct institutions, while their data are always retained within each institution, and 2) the federated evaluation of the generalizability of brain tumor segmentation models "in the wild", i.e. on data from institutional distributions that were not part of the training datasets. △ Less

Submitted 13 May, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

Showing 1–50 of 110 results for author: Weber, M