Skip to main content

Showing 1–50 of 206 results for author: Schmidt, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17954  [pdf, other

    cs.LG math.OC

    Why Line Search when you can Plane Search? SO-Friendly Neural Networks allow Per-Iteration Optimization of Learning and Momentum Rates for Every Layer

    Authors: Betty Shea, Mark Schmidt

    Abstract: We introduce the class of SO-friendly neural networks, which include several models used in practice including networks with 2 layers of hidden weights where the number of inputs is larger than the number of outputs. SO-friendly networks have the property that performing a precise line search to set the step size on each iteration has the same asymptotic cost during full-batch training as using a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.17296  [pdf, other

    cs.LG

    BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

    Authors: Amrutha Varshini Ramesh, Vignesh Ganapathiraman, Issam H. Laradji, Mark Schmidt

    Abstract: Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA)… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures

  3. arXiv:2406.13584  [pdf, other

    cs.LG

    Explaining time series models using frequency masking

    Authors: Thea Brüsch, Kristoffer K. Wickstrøm, Mikkel N. Schmidt, Tommy S. Alstrøm, Robert Jenssen

    Abstract: Time series data is fundamentally important for describing many critical domains such as healthcare, finance, and climate, where explainable models are necessary for safe automated decision-making. To develop eXplainable AI (XAI) in these domains therefore implies explaining salient information in the time series. Current methods for obtaining saliency maps assumes localized information in the raw… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Submitted to the Next Generation of AI Safety workshop at ICML 2024

  4. arXiv:2406.02739  [pdf, other

    cs.DS

    Local Search k-means++ with Foresight

    Authors: Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, Melanie Schmidt

    Abstract: Since its introduction in 1957, Lloyd's algorithm for $k$-means clustering has been extensively studied and has undergone several improvements. While in its original form it does not guarantee any approximation factor at all, Arthur and Vassilvitskii (SODA 2007) proposed $k$-means++ which enhances Lloyd's algorithm by a seeding method which guarantees a $\mathcal{O}(\log k)$-approximation in expec… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.12963  [pdf

    eess.IV cs.CV cs.LG

    Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma

    Authors: Ahmed Gomaa, Yixing Huang, Amr Hagag, Charlotte Schmitter, Daniel Höfler, Thomas Weissmann, Katharina Breininger, Manuel Schmidt, Jenny Stritzelberger, Daniel Delev, Roland Coras, Arnd Dörfler, Oliver Schnell, Benjamin Frey, Udo S. Gaipl, Sabine Semrau, Christoph Bert, Rainer Fietkau, Florian Putz

    Abstract: Background: This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model, addressing data heterogeneity and performance generalizability. Method: We propose and evaluate a transformer-based non-linear and non-proportional survival prediction model. The model employs self-supervised learnin… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  6. arXiv:2405.10870  [pdf, other

    eess.IV cs.CV

    Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation

    Authors: Yixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian Putz

    Abstract: Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data. Materials and methods: A total of six BM datasets from University Hospita… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Submission to the Green Journal (Major Revision)

  7. arXiv:2405.09335  [pdf, other

    cs.CL

    Prompting-based Synthetic Data Generation for Few-Shot Question Answering

    Authors: Maximilian Schmidt, Andrea Bartezzaghi, Ngoc Thang Vu

    Abstract: Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: LREC-COLING 2024

  8. arXiv:2404.07525  [pdf, other

    cs.LG

    Enhancing Policy Gradient with the Polyak Step-Size Adaption

    Authors: Yunxiang Li, Rui Yuan, Chen Fan, Mark Schmidt, Samuel Horváth, Robert M. Gower, Martin Takáč

    Abstract: Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  9. arXiv:2404.02378  [pdf, ps, other

    math.OC cs.LG

    Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

    Authors: Aaron Mishkin, Mert Pilanci, Mark Schmidt

    Abstract: We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialize… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Results extend work from Aaron Mishkin's master's thesis

  10. arXiv:2404.02282  [pdf, other

    cs.CV

    Smooth Deep Saliency

    Authors: Rudolf Herdt, Maximilian Schmidt, Daniel Otero Baguer, Peter Maaß

    Abstract: In this work, we investigate methods to reduce the noise in deep saliency maps coming from convolutional downsampling, with the purpose of explaining how a deep learning model detects tumors in scanned histological tissue samples. Those methods make the investigated models more interpretable for gradient-based saliency maps, computed in hidden layers. We test our approach on different models train… ▽ More

    Submitted 4 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  11. arXiv:2403.10424  [pdf, other

    cs.LG stat.ML

    Structured Evaluation of Synthetic Tabular Data

    Authors: Scott Cheng-Hsin Yang, Baxter Eaves, Michael Schmidt, Ken Swanson, Patrick Shafto

    Abstract: Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics. To address this issue, we propose an evaluation framework with a single, mathematica… ▽ More

    Submitted 29 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  12. arXiv:2403.03344  [pdf, other

    cs.SE cs.AI

    Learn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation

    Authors: Tina Vartziotis, Ippolyti Dellatolas, George Dasoulas, Maximilian Schmidt, Florian Schneider, Tim Hoffmann, Sotirios Kotsopoulos, Michael Keckeisen

    Abstract: The increasing use of information technology has led to a significant share of energy consumption and carbon emissions from data centers. These contributions are expected to rise with the growing demand for big data analytics, increasing digitization, and the development of large artificial intelligence (AI) models. The need to address the environmental impact of software development has led to in… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  13. arXiv:2403.03028  [pdf, other

    cs.AI cs.CL

    Word Importance Explains How Prompts Affect Language Model Outputs

    Authors: Stefan Hackmann, Haniyeh Mahmoudian, Mark Steadman, Michael Schmidt

    Abstract: The emergence of large language models (LLMs) has revolutionized numerous applications across industries. However, their "black box" nature often hinders the understanding of how they make specific decisions, raising concerns about their transparency, reliability, and ethical use. This study presents a method to improve the explainability of LLMs by varying individual words in prompts to uncover t… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    ACM Class: I.2.7; I.5.2

  14. arXiv:2402.19449  [pdf, other

    cs.LG cs.CL math.OC stat.ML

    Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

    Authors: Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti

    Abstract: Adam has been shown to outperform gradient descent in optimizing large language transformers empirically, and by a larger margin than on other tasks, but it is unclear why this happens. We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics. When training with gradient descent, the loss associated with infrequent words decr… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  15. arXiv:2402.05681  [pdf, other

    cs.DM math.CO

    Toward Grünbaum's Conjecture

    Authors: Christian Ortlieb, Jens M. Schmidt

    Abstract: Given a spanning tree $T$ of a planar graph $G$, the co-tree of $T$ is the spanning tree of the dual graph $G^*$ with edge set $(E(G)-E(T))^*$. Grünbaum conjectured in 1970 that every planar 3-connected graph $G$ contains a spanning tree $T$ such that both $T$ and its co-tree have maximum degree at most 3. While Grünbaum's conjecture remains open, Biedl proved that there is a spanning tree $T$ s… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  16. arXiv:2402.01230  [pdf, other

    cs.DM math.CO

    Trees and co-trees in planar 3-connected graphs An easier proof via Schnyder woods

    Authors: Christian Ortlieb, Jens M. Schmidt

    Abstract: Let $G$ be a 3-connected planar graph. Define the co-tree of a spanning tree $T$ of $G$ as the graph induced by the dual edges of $E(G)-E(T)$. The well-known cut-cycle duality implies that the co-tree is itself a tree. Let a $k$-tree be a spanning tree with maximum degree $k$. In 1970, Grünbaum conjectured that every 3-connected planar graph contains a 3-tree whose co-tree is also a 3-tree. In 201… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  17. arXiv:2401.16359  [pdf, other

    cs.DL

    Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus

    Authors: Jack Culbert, Anne Hobert, Najko Jahn, Nick Haupka, Marion Schmidt, Paul Donner, Philipp Mayr

    Abstract: OpenAlex is a promising open source of scholarly metadata, and competitor to the established proprietary sources, the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rapidly evolving source and the data contained within is… ▽ More

    Submitted 26 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 16 pages, 2 figures

  18. arXiv:2312.14996  [pdf

    cs.LG cs.AI eess.SP

    Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review

    Authors: Michal Bechny, Giuliana Monachino, Luigi Fiorillo, Julia van der Meer, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzovara, Francesca D. Faraci

    Abstract: Purpose: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreem… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  19. arXiv:2312.05176  [pdf, other

    eess.IV cs.CV cs.LG

    MRI Scan Synthesis Methods based on Clustering and Pix2Pix

    Authors: Giulia Baldini, Melanie Schmidt, Charlotte Zäske, Liliana L. Caldeira

    Abstract: We consider a missing data problem in the context of automatic segmentation methods for Magnetic Resonance Imaging (MRI) brain scans. Usually, automated MRI scan segmentation is based on multiple scans (e.g., T1-weighted, T2-weighted, T1CE, FLAIR). However, quite often a scan is blurry, missing or otherwise unusable. We investigate the question whether a missing scan can be synthesized. We exempli… ▽ More

    Submitted 3 May, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted at AIME 2024

  20. arXiv:2312.04174  [pdf, other

    stat.ML cs.LG physics.comp-ph

    Coherent energy and force uncertainty in deep learning force fields

    Authors: Peter Bjørn Jørgensen, Jonas Busk, Ole Winther, Mikkel N. Schmidt

    Abstract: In machine learning energy potentials for atomic systems, forces are commonly obtained as the negative derivative of the energy function with respect to atomic positions. To quantify aleatoric uncertainty in the predicted energies, a widely used modeling approach involves predicting both a mean and variance for each energy value. However, this model is not differentiable under the usual white nois… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Presented at Advancing Molecular Machine Learning - Overcoming Limitations [ML4Molecules], ELLIS workshop, VIRTUAL, December 8, 2023, unofficial NeurIPS 2023 side-event

  21. arXiv:2312.01850  [pdf, other

    cs.CV cs.LG

    Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation

    Authors: Joshua Niemeijer, Manuel Schwonberg, Jan-Aike Termöhlen, Nico M. Schmidt, Tim Fingscheidt

    Abstract: When models, e.g., for semantic segmentation, are applied to images that are vastly different from training data, the performance will drop significantly. Domain adaptation methods try to overcome this issue, but need samples from the target domain. However, this might not always be feasible for various reasons and therefore domain generalization methods are useful as they do not require any targe… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted to WACV 2024

  22. arXiv:2311.01253  [pdf, other

    cs.RO

    A Concept for User-Centered Delegation of Abstract High-Level Tasks to Cobots for Flexible Lot Sizes

    Authors: Moritz Schmidt, Claudia Meitinger

    Abstract: Technical advances in collaborative robots (cobots) are making them increasingly attractive to companies. However, many human operators are not trained to program complex machines. Instead, humans are used to communicating with each other on a task-based level rather than through specific instructions, as is common with machines. The gap between low-level instruction-based and high-level task-base… ▽ More

    Submitted 7 June, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  23. arXiv:2309.00834  [pdf, other

    cs.DS cs.LG

    Approximating Fair $k$-Min-Sum-Radii in $\mathbb{R}^d$

    Authors: Lukas Drexler, Annika Hennes, Abhiruk Lahiri, Melanie Schmidt, Julian Wargalla

    Abstract: The $k$-center problem is a classical clustering problem in which one is asked to find a partitioning of a point set $P$ into $k$ clusters such that the maximum radius of any cluster is minimized. It is well-studied. But what if we add up the radii of the clusters instead of only considering the cluster with maximum radius? This natural variant is called the $k$-min-sum-radii problem. It has becom… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  24. 3D Adversarial Augmentations for Robust Out-of-Domain Predictions

    Authors: Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Nassir Navab, Benjamin Busam, Federico Tombari

    Abstract: Since real-world training datasets cannot properly sample the long tail of the underlying data distribution, corner cases and rare out-of-domain samples can severely hinder the performance of state-of-the-art models. This problem becomes even more severe for dense tasks, such as 3D semantic segmentation, where points of non-standard objects can be confidently associated to the wrong class. In this… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 37 pages, 12 figures

  25. arXiv:2308.10711  [pdf, other

    cs.LG

    Relax and penalize: a new bilevel approach to mixed-binary hyperparameter optimization

    Authors: Marianna de Santis, Jordan Frecon, Francesco Rinaldi, Saverio Salzo, Martin Schmidt

    Abstract: In recent years, bilevel approaches have become very popular to efficiently estimate high-dimensional hyperparameters of machine learning models. However, to date, binary parameters are handled by continuous relaxation and rounding strategies, which could lead to inconsistent solutions. In this context, we tackle the challenging optimization of mixed-binary hyperparameters by resorting to an equiv… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  26. arXiv:2307.09614  [pdf, other

    stat.ML cs.LG eess.SP

    Multi-view self-supervised learning for multivariate variable-channel time series

    Authors: Thea Brüsch, Mikkel N. Schmidt, Tommy S. Alstrøm

    Abstract: Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with differe… ▽ More

    Submitted 20 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: To appear in proceedings of 2023 IEEE International workshop on Machine Learning for Signal Processing

  27. arXiv:2307.08415  [pdf, other

    cs.CV

    Monocular 3D Object Detection with LiDAR Guided Semi Supervised Active Learning

    Authors: Aral Hekimoglu, Michael Schmidt, Alvaro Marcos-Ramiro

    Abstract: We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG), which leverages all modalities of collected data during model development. We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase. During training, we leverage the LiDAR teacher, mon… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  28. arXiv:2307.08414  [pdf, other

    cs.CV

    Active Learning for Object Detection with Non-Redundant Informative Sampling

    Authors: Aral Hekimoglu, Adrian Brucker, Alper Kagan Kayali, Michael Schmidt, Alvaro Marcos-Ramiro

    Abstract: Curating an informative and representative dataset is essential for enhancing the performance of 2D object detectors. We present a novel active learning sampling strategy that addresses both the informativeness and diversity of the selections. Our strategy integrates uncertainty and diversity-based selection principles into a joint selection objective by measuring the collective information score… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  29. arXiv:2307.01169  [pdf, other

    math.OC cs.LG stat.ML

    Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

    Authors: Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She

    Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem di… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  30. arXiv:2306.13263  [pdf, other

    cs.LG cs.CV cs.DC

    Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

    Authors: Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

    Abstract: In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable corresp… ▽ More

    Submitted 8 April, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: Accepted at TMLR

  31. arXiv:2306.12747  [pdf, other

    math.OC cs.LG

    Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

    Authors: Leonardo Galli, Holger Rauhut, Mark Schmidt

    Abstract: Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  32. arXiv:2306.12398  [pdf, other

    cs.CV

    Multi-Task Consistency for Active Learning

    Authors: Aral Hekimoglu, Philipp Friedrich, Walter Zimmer, Michael Schmidt, Alvaro Marcos-Ramiro, Alois C. Knoll

    Abstract: Learning-based solutions for vision tasks require a large amount of labeled training data to ensure their performance and reliability. In single-task vision-based settings, inconsistency-based active learning has proven to be effective in selecting informative samples for annotation. However, there is a lack of research exploiting the inconsistency between multiple tasks in multi-task networks. To… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  33. arXiv:2306.02527  [pdf, other

    math.OC cs.LG

    Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking

    Authors: Frederik Kunstner, Victor S. Portella, Mark Schmidt, Nick Harvey

    Abstract: The backtracking line-search is an effective technique to automatically tune the step-size in smooth optimization. It guarantees similar performance to using the theoretically optimal step-size. Many approaches have been developed to instead tune per-coordinate step-sizes, also known as diagonal preconditioners, but none of the existing methods are provably competitive with the optimal per-coordin… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  34. arXiv:2305.18666  [pdf, other

    cs.LG math.OC

    BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

    Authors: Chen Fan, Gaspard Choné-Ducasse, Mark Schmidt, Christos Thrampoulidis

    Abstract: The popularity of bi-level optimization (BO) in deep learning has spurred a growing interest in studying gradient-based BO algorithms. However, existing algorithms involve two coupled learning rates that can be affected by approximation errors when computing hypergradients, making careful fine-tuning necessary to ensure fast convergence. To alleviate this issue, we investigate the use of recently… ▽ More

    Submitted 2 November, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  35. arXiv:2305.16325  [pdf, other

    physics.chem-ph cs.LG

    Graph Neural Network Interatomic Potential Ensembles with Calibrated Aleatoric and Epistemic Uncertainty on Energy and Forces

    Authors: Jonas Busk, Mikkel N. Schmidt, Ole Winther, Tejs Vegge, Peter Bjørn Jørgensen

    Abstract: Inexpensive machine learning potentials are increasingly being used to speed up structural optimization and molecular dynamics simulations of materials by iteratively predicting and applying interatomic forces. In these settings, it is crucial to detect when predictions are unreliable to avoid wrong or misleading results. Here, we present a complete framework for training and recalibrating graph n… ▽ More

    Submitted 11 September, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

  36. arXiv:2305.02665  [pdf, other

    cs.CL cs.AI

    Learning Language-Specific Layers for Multilingual Machine Translation

    Authors: Telmo Pessoa Pires, Robin M. Schmidt, Yi-Hsiu Liao, Stephan Peitz

    Abstract: Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

  37. arXiv:2304.13960  [pdf, other

    cs.LG math.OC

    Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be

    Authors: Frederik Kunstner, Jacques Chen, Jonathan Wilder Lavington, Mark Schmidt

    Abstract: The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding of this discrepancy is lagging, preventing the development of significant improvements on either algorithm. Recent work advances the hypothesis that Adam and other heuristics like gradient clip** out… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  38. arXiv:2304.13097  [pdf, other

    cs.DB

    Bridging graph data models: RDF, RDF-star, and property graphs as directed acyclic graphs

    Authors: Ewout Gelling, George Fletcher, Michael Schmidt

    Abstract: Graph database users today face a choice between two technology stacks: the Resource Description Framework (RDF), on one side, is a data model with built-in semantics that was originally developed by the W3C to exchange interconnected data on the Web; on the other side, Labeled Property Graphs (LPGs) are geared towards efficient graph processing and have strong roots in developer and engineering c… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  39. arXiv:2304.12776  [pdf, other

    cs.CL cs.AI

    State Spaces Aren't Enough: Machine Translation Needs Attention

    Authors: Ali Vardasbi, Telmo Pessoa Pires, Robin M. Schmidt, Stephan Peitz

    Abstract: Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Tra… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  40. arXiv:2304.12122  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.RO

    Augmentation-based Domain Generalization for Semantic Segmentation

    Authors: Manuel Schwonberg, Fadoua El Bouazati, Nico M. Schmidt, Hanno Gottschalk

    Abstract: Unsupervised Domain Adaptation (UDA) and domain generalization (DG) are two research areas that aim to tackle the lack of generalization of Deep Neural Networks (DNNs) towards unseen domains. While UDA methods have access to unlabeled target images, domain generalization does not involve any target data and only learns generalized features from a source domain. Image-style randomization or augment… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted at Intelligent Vehicles Symposium 2023 (IV 2023) Autonomy@Scale Workshop

  41. arXiv:2304.11960  [pdf, other

    cs.CR cs.CL cs.LG

    ThreatCrawl: A BERT-based Focused Crawler for the Cybersecurity Domain

    Authors: Philipp Kuehn, Mike Schmidt, Markus Bayer, Christian Reuter

    Abstract: Publicly available information contains valuable information for Cyber Threat Intelligence (CTI). This can be used to prevent attacks that have already taken place on other systems. Ideally, only the initial attack succeeds and all subsequent ones are detected and stopped. But while there are different standards to exchange this information, a lot of it is shared in articles or blog posts in non-s… ▽ More

    Submitted 26 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 11 pages, 9 figures, 5 tables

  42. arXiv:2304.11928  [pdf, other

    cs.CV cs.AI

    Survey on Unsupervised Domain Adaptation for Semantic Segmentation for Visual Perception in Automated Driving

    Authors: Manuel Schwonberg, Joshua Niemeijer, Jan-Aike Termöhlen, Jörg P. Schäfer, Nico M. Schmidt, Hanno Gottschalk, Tim Fingscheidt

    Abstract: Deep neural networks (DNNs) have proven their capabilities in many areas in the past years, such as robotics, or automated driving, enabling technological breakthroughs. DNNs play a significant role in environment perception for the challenging application of automated driving and are employed for tasks such as detection, semantic segmentation, and sensor fusion. Despite this progress and tremendo… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: submitted to IEEE Access; Project Website: https://uda-survey.github.io/survey/

  43. arXiv:2304.00459  [pdf, other

    cs.LG math.OC

    Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

    Authors: Chen Fan, Christos Thrampoulidis, Mark Schmidt

    Abstract: Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient descent (SGD) known as random reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

  44. arXiv:2302.09738  [pdf, other

    stat.ML cs.LG

    Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning

    Authors: Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations. Here, we simplify such difficulties for a class of sparse or structured symmetric positive-definite matrices with the affine-invariant metric. We do so by proposing a generalized version of the Riem… ▽ More

    Submitted 16 March, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: A long version of the ICML 2023 paper. Updated the main text to emphasize challenges of using existing Riemannian methods to estimate sparse and structured SPD matrices

  45. arXiv:2302.08802  [pdf, other

    cs.CV

    Risk Classification of Brain Metastases via Radiomics, Delta-Radiomics and Machine Learning

    Authors: Philipp Sommer, Yixing Huang, Christoph Bert, Andreas Maier, Manuel Schmidt, Arnd Dörfler, Rainer Fietkau, Florian Putz

    Abstract: Stereotactic radiotherapy (SRT) is one of the most important treatment for patients with brain metastases (BM). Conventionally, following SRT patients are monitored by serial imaging and receive salvage treatments in case of significant tumor growth. We hypothesized that using radiomics and machine learning (ML), metastases at high risk for subsequent progression could be identified during follow-… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  46. arXiv:2302.02607  [pdf, other

    cs.LG math.OC

    Target-based Surrogates for Stochastic Optimization

    Authors: Jonathan Wilder Lavington, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Nicolas Le Roux

    Abstract: We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classificatio… ▽ More

    Submitted 8 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  47. arXiv:2302.02181  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Model Stitching and Visualization How GAN Generators can Invert Networks in Real-Time

    Authors: Rudolf Herdt, Maximilian Schmidt, Daniel Otero Baguer, Jean Le'Clerc Arrastia, Peter Maass

    Abstract: In this work, we propose a fast and accurate method to reconstruct activations of classification and semantic segmentation networks by stitching them with a GAN generator utilizing a 1x1 convolution. We test our approach on images of animals from the AFHQ wild dataset, ImageNet1K, and real-world digital pathology scans of stained tissue samples. Our results show comparable performance to establish… ▽ More

    Submitted 20 February, 2024; v1 submitted 4 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of Machine Learning Research 222 (2023) 422-437

  48. arXiv:2212.02191  [pdf, other

    cs.LG cs.DC

    On the effectiveness of partial variance reduction in federated learning with heterogeneous data

    Authors: Bo Li, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

    Abstract: Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first… ▽ More

    Submitted 9 June, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR 2023

  49. arXiv:2212.00266  [pdf, other

    cs.CV

    Multi-view Tracking, Re-ID, and Social Network Analysis of a Flock of Visually Similar Birds in an Outdoor Aviary

    Authors: Shiting Xiao, Yufu Wang, Ammon Perkes, Bernd Pfrommer, Marc Schmidt, Kostas Daniilidis, Marc Badger

    Abstract: The ability to capture detailed interactions among individuals in a social group is foundational to our study of animal behavior and neuroscience. Recent advances in deep learning and computer vision are driving rapid progress in methods that can record the actions and interactions of multiple individuals simultaneously. Many social species, such as birds, however, live deeply embedded in a three-… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  50. arXiv:2211.14880  [pdf, other

    cs.CL cs.AI

    Improving Low-Resource Question Answering using Active Learning in Multiple Stages

    Authors: Maximilian Schmidt, Andrea Bartezzaghi, Jasmina Bogojeska, A. Cristiano I. Malossi, Thang Vu

    Abstract: Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low resource sett… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: 16 pages, 8 figures

    ACM Class: I.2.7