Search | arXiv e-print repository

arXiv:2404.12549 [pdf, other]

"If the Machine Is As Good As Me, Then What Use Am I?" -- How the Use of ChatGPT Changes Young Professionals' Perception of Productivity and Accomplishment

Authors: Charlotte Kobiella, Yarhy Said Flores López, Fiona Draxler, Albrecht Schmidt

Abstract: Large language models (LLMs) like ChatGPT have been widely adopted in work contexts. We explore the impact of ChatGPT on young professionals' perception of productivity and sense of accomplishment. We collected LLMs' main use cases in knowledge work through a preliminary study, which served as the basis for a two-week diary study with 21 young professionals reflecting on their ChatGPT use. Finding… ▽ More Large language models (LLMs) like ChatGPT have been widely adopted in work contexts. We explore the impact of ChatGPT on young professionals' perception of productivity and sense of accomplishment. We collected LLMs' main use cases in knowledge work through a preliminary study, which served as the basis for a two-week diary study with 21 young professionals reflecting on their ChatGPT use. Findings indicate that ChatGPT enhanced some participants' perceptions of productivity and accomplishment by enabling greater creative output and satisfaction from efficient tool utilization. Others experienced decreased perceived productivity and accomplishment, driven by a diminished sense of ownership, perceived lack of challenge, and mediocre results. We found that the suitability of task delegation to ChatGPT varies strongly depending on the task nature. It's especially suitable for comprehending broad subject domains, generating creative solutions, and uncovering new information. It's less suitable for research tasks due to hallucinations, which necessitate extensive validation. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2402.06578 [pdf, other]

On the Universality of Coupling-based Normalizing Flows

Authors: Felix Draxler, Stefan Wahl, Christoph Schnörr, Ullrich Köthe

Abstract: We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a dis… ▽ More We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a distributional universality theorem for well-conditioned coupling-based normalizing flows such as RealNVP. In addition, we show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity. Our results support the general wisdom that affine and related couplings are expressive and in general outperform volume-preserving flows, bridging a gap between empirical results and theoretical understanding. △ Less

Submitted 5 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

arXiv:2312.09852 [pdf, other]

Learning Distributions on Manifolds with Free-form Flows

Authors: Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Ullrich Köthe

Abstract: Many real world data, particularly in the natural sciences and computer vision, lie on known Riemannian manifolds such as spheres, tori or the group of rotation matrices. The predominant approaches to learning a distribution on such a manifold require solving a differential equation in order to sample from the model and evaluate densities. The resulting sampling times are slowed down by a high num… ▽ More Many real world data, particularly in the natural sciences and computer vision, lie on known Riemannian manifolds such as spheres, tori or the group of rotation matrices. The predominant approaches to learning a distribution on such a manifold require solving a differential equation in order to sample from the model and evaluate densities. The resulting sampling times are slowed down by a high number of function evaluations. In this work, we propose an alternative approach which only requires a single function evaluation followed by a projection to the manifold. Training is achieved by an adaptation of the recently proposed free-form flow framework to Riemannian manifolds. The central idea is to estimate the gradient of the negative log-likelihood via a trace evaluated in the tangent space. We evaluate our method on various manifolds, and find significantly faster inference at competitive performance compared to previous work. We make our code public at https://github.com/vislearn/FFF. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Preprint, under review

arXiv:2310.16624 [pdf, other]

Free-form Flows: Make Any Architecture a Normalizing Flow

Authors: Felix Draxler, Peter Sorrenson, Lea Zimmermann, Armand Rousselot, Ullrich Köthe

Abstract: Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a genera… ▽ More Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures. △ Less

Submitted 24 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Camera-ready version: accepted at AISTATS 2024

arXiv:2310.06556 [pdf, other]

Gender, Age, and Technology Education Influence the Adoption and Appropriation of LLMs

Authors: Fiona Draxler, Daniel Buschek, Mikke Tavast, Perttu Hämäläinen, Albrecht Schmidt, Juhi Kulshrestha, Robin Welsch

Abstract: Large Language Models (LLMs) such as ChatGPT have become increasingly integrated into critical activities of daily life, raising concerns about equitable access and utilization across diverse demographics. This study investigates the usage of LLMs among 1,500 representative US citizens. Remarkably, 42% of participants reported utilizing an LLM. Our findings reveal a gender gap in LLM technology ad… ▽ More Large Language Models (LLMs) such as ChatGPT have become increasingly integrated into critical activities of daily life, raising concerns about equitable access and utilization across diverse demographics. This study investigates the usage of LLMs among 1,500 representative US citizens. Remarkably, 42% of participants reported utilizing an LLM. Our findings reveal a gender gap in LLM technology adoption (more male users than female users) with complex interaction patterns regarding age. Technology-related education eliminates the gender gap in our sample. Moreover, expert users are more likely than novices to list professional tasks as typical application scenarios, suggesting discrepancies in effective usage at the workplace. These results underscore the importance of providing education in artificial intelligence in our technology-driven society to promote equitable access to and benefits from LLMs. We urge for both international replication beyond the US and longitudinal observation of adoption. △ Less

Submitted 10 October, 2023; originally announced October 2023.

ACM Class: H.1.2; I.2.7

arXiv:2307.05870 [pdf, other]

Useful but Distracting: Keyword Highlights and Time-Synchronization in Captions for Language Learning

Authors: Fiona Draxler, Henrike Weingärtner, Maximiliane Windl, Albrecht Schmidt, Lewis L. Chuang

Abstract: Captions provide language learners with a scaffold for comprehension and vocabulary acquisition. Past work has proposed several enhancements such as keyword highlights for increased learning gains. However, little is known about learners' experience with enhanced captions, although this is critical for adoption in everyday life. We conducted a survey and focus group to elicit learner preferences a… ▽ More Captions provide language learners with a scaffold for comprehension and vocabulary acquisition. Past work has proposed several enhancements such as keyword highlights for increased learning gains. However, little is known about learners' experience with enhanced captions, although this is critical for adoption in everyday life. We conducted a survey and focus group to elicit learner preferences and requirements and implemented a processing pipeline for enhanced captions with keyword highlights, time-synchronized keyword highlights, and keyword captions. A subsequent online study (n = 49) showed that time-synchronized keyword highlights were the preferred design for learning but were perceived as too distracting to replace standard captions in everyday viewing scenarios. We conclude that keyword highlights and time-synchronization are suitable for integrating learning into an entertaining everyday-life activity, but the design should be optimized to provide a more seamless experience. △ Less

Submitted 11 July, 2023; originally announced July 2023.

ACM Class: H.5.2; K.3.1

arXiv:2306.13520 [pdf, other]

On the Convergence Rate of Gaussianization with Random Rotations

Authors: Felix Draxler, Lars Kühmichel, Armand Rousselot, Jens Müller, Christoph Schnörr, Ullrich Köthe

Abstract: Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model i… ▽ More Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.01843 [pdf, other]

Lifting Architectural Constraints of Injective Flows

Authors: Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann, Ullrich Köthe

Abstract: Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computat… ▽ More Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computational cost. We lift both constraints by a new efficient estimator for the maximum likelihood loss, compatible with free-form bottleneck architectures. We further show that naively learning both the data manifold and the distribution on it can lead to divergent solutions, and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model. △ Less

Submitted 27 June, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Camera-ready version: accepted to ICLR 2024

arXiv:2304.14905 [pdf, other]

Bose Einstein condensate as nonlinear block of a Machine Learning pipeline

Authors: Maurus Hans, Elinor Kath, Marius Sparn, Nikolas Liebster, Felix Draxler, Christoph Schnörr, Helmut Strobel, Markus K. Oberthaler

Abstract: Physical systems can be used as an information processing substrate and with that extend traditional computing architectures. For such an application the experimental platform must guarantee pristine control of the initial state, the temporal evolution and readout. All these ingredients are provided by modern experimental realizations of atomic Bose Einstein condensates. By embedding the nonlinear… ▽ More Physical systems can be used as an information processing substrate and with that extend traditional computing architectures. For such an application the experimental platform must guarantee pristine control of the initial state, the temporal evolution and readout. All these ingredients are provided by modern experimental realizations of atomic Bose Einstein condensates. By embedding the nonlinear evolution of a quantum gas in a Machine Learning pipeline, one can represent nonlinear functions while only linear operations on classical computing of the pipeline are necessary. We demonstrate successful regression and interpolation of a nonlinear function using a quasi one-dimensional cloud of potassium atoms and characterize the performance of our system. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2303.09989 [pdf, other]

Finding Competence Regions in Domain Generalization

Authors: Jens Müller, Stefan T. Radev, Robert Schmier, Felix Draxler, Carsten Rother, Ullrich Köthe

Abstract: We investigate a "learning to reject" framework to address the problem of silent failures in Domain Generalization (DG), where the test distribution differs from the training distribution. Assuming a mild distribution shift, we wish to accept out-of-distribution (OOD) data from a new domain whenever a model's estimated competence foresees trustworthy responses, instead of rejecting OOD data outrig… ▽ More We investigate a "learning to reject" framework to address the problem of silent failures in Domain Generalization (DG), where the test distribution differs from the training distribution. Assuming a mild distribution shift, we wish to accept out-of-distribution (OOD) data from a new domain whenever a model's estimated competence foresees trustworthy responses, instead of rejecting OOD data outright. Trustworthiness is then predicted via a proxy incompetence score that is tightly linked to the performance of a classifier. We present a comprehensive experimental evaluation of existing proxy scores as incompetence scores for classification and highlight the resulting trade-offs between rejection rate and accuracy gain. For comparability with prior work, we focus on standard DG benchmarks and consider the effect of measuring incompetence via different learned representations in a closed versus an open world setting. Our results suggest that increasing incompetence scores are indeed predictive of reduced accuracy, leading to significant improvements of the average accuracy below a suitable incompetence threshold. However, the scores are not yet good enough to allow for a favorable accuracy/rejection trade-off in all tested domains. Surprisingly, our results also indicate that classifiers optimized for DG robustness do not outperform a naive Empirical Risk Minimization (ERM) baseline in the competence region, that is, where test samples elicit low incompetence scores. △ Less

Submitted 21 June, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: The paper has been published at TMLR (see https://openreview.net/forum?id=TSy0vuwQFN)

Journal ref: Transactions on Machine Learning Research (06/2023)

arXiv:2303.03283 [pdf, other]

doi 10.1145/3637875

The AI Ghostwriter Effect: When Users Do Not Perceive Ownership of AI-Generated Text But Self-Declare as Authors

Authors: Fiona Draxler, Anna Werner, Florian Lehmann, Matthias Hoppe, Albrecht Schmidt, Daniel Buschek, Robin Welsch

Abstract: Human-AI interaction in text production increases complexity in authorship. In two empirical studies (n1 = 30 & n2 = 96), we investigate authorship and ownership in human-AI collaboration for personalized language generation. We show an AI Ghostwriter Effect: Users do not consider themselves the owners and authors of AI-generated text but refrain from publicly declaring AI authorship. Personalizat… ▽ More Human-AI interaction in text production increases complexity in authorship. In two empirical studies (n1 = 30 & n2 = 96), we investigate authorship and ownership in human-AI collaboration for personalized language generation. We show an AI Ghostwriter Effect: Users do not consider themselves the owners and authors of AI-generated text but refrain from publicly declaring AI authorship. Personalization of AI-generated texts did not impact the AI Ghostwriter Effect, and higher levels of participants' influence on texts increased their sense of ownership. Participants were more likely to attribute ownership to supposedly human ghostwriters than AI ghostwriters, resulting in a higher ownership-authorship discrepancy for human ghostwriters. Rationalizations for authorship in AI ghostwriters and human ghostwriters were similar. We discuss how our findings relate to psychological ownership and human-AI interaction to lay the foundations for adapting authorship frameworks and user interfaces in AI in text-generation tasks. △ Less

Submitted 7 November, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Pre-print; currently under review

arXiv:2210.14032 [pdf, other]

Whitening Convergence Rate of Coupling-based Normalizing Flows

Authors: Felix Draxler, Christoph Schnörr, Ullrich Köthe

Abstract: Coupling-based normalizing flows (e.g. RealNVP) are a popular family of normalizing flow architectures that work surprisingly well in practice. This calls for theoretical understanding. Existing work shows that such flows weakly converge to arbitrary data distributions. However, they make no statement about the stricter convergence criterion used in practice, the maximum likelihood loss. For the f… ▽ More Coupling-based normalizing flows (e.g. RealNVP) are a popular family of normalizing flow architectures that work surprisingly well in practice. This calls for theoretical understanding. Existing work shows that such flows weakly converge to arbitrary data distributions. However, they make no statement about the stricter convergence criterion used in practice, the maximum likelihood loss. For the first time, we make a quantitative statement about this kind of convergence: We prove that all coupling-based normalizing flows perform whitening of the data distribution (i.e. diagonalize the covariance matrix) and derive corresponding convergence bounds that show a linear convergence rate in the depth of the flow. Numerical experiments demonstrate the implications of our theory and point at open questions. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: Proceedings of 36th Conference on Neural Information Processing System (NeurIPS 2022)

arXiv:1806.08734 [pdf, other]

On the Spectral Bias of Neural Networks

Authors: Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output map**s with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuatio… ▽ More Neural networks are known to be a class of highly expressive functions able to fit even random input-output map**s with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuations without affecting their global behavior. Intuitively, this property is in line with the observation that over-parameterized networks find simple patterns that generalize across data samples. We also investigate how the shape of the data manifold affects expressivity by showing evidence that learning high frequencies gets \emph{easier} with increasing manifold complexity, and present a theoretical understanding of this behavior. Finally, we study the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions. △ Less

Submitted 31 May, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

Comments: 23 pages

Journal ref: ICML 2019

arXiv:1803.00885 [pdf, other]

Essentially No Barriers in Neural Network Energy Landscape

Authors: Felix Draxler, Kambis Veschgini, Manfred Salmhofer, Fred A. Hamprecht

Abstract: Training neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between minima of recent neural network architectures on CIFAR10 and CIFAR100. Surprisingly, the paths are essentially flat in both the training and test landscapes. This im… ▽ More Training neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between minima of recent neural network architectures on CIFAR10 and CIFAR100. Surprisingly, the paths are essentially flat in both the training and test landscapes. This implies that neural networks have enough capacity for structural changes, or that these changes are small between minima. Also, each minimum has at least one vanishing Hessian eigenvalue in addition to those resulting from trivial invariance. △ Less

Submitted 22 February, 2019; v1 submitted 2 March, 2018; originally announced March 2018.

Comments: In Proceedings of 35th International Conference on Machine Learning (ICML 2018)

Journal ref: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1308-1317, 2018

arXiv:1606.06620 [pdf, ps, other]

Equiangular Lines and Spherical Codes in Euclidean Space

Authors: Igor Balla, Felix Dräxler, Peter Keevash, Benny Sudakov

Abstract: A family of lines through the origin in Euclidean space is called equiangular if any pair of lines defines the same angle. The problem of estimating the maximum cardinality of such a family in $\mathbb{R}^n$ was extensively studied for the last 70 years. Motivated by a question of Lemmens and Seidel from 1973, in this paper we prove that for every fixed angle $θ$ and sufficiently large $n$ there a… ▽ More A family of lines through the origin in Euclidean space is called equiangular if any pair of lines defines the same angle. The problem of estimating the maximum cardinality of such a family in $\mathbb{R}^n$ was extensively studied for the last 70 years. Motivated by a question of Lemmens and Seidel from 1973, in this paper we prove that for every fixed angle $θ$ and sufficiently large $n$ there are at most $2n-2$ lines in $\mathbb{R}^n$ with common angle $θ$. Moreover, this is achievable only for $θ= \arccos(1/3)$. We also show that for any set of $k$ fixed angles, one can find at most $O(n^k)$ lines in $\mathbb{R}^n$ having these angles. This bound, conjectured by Bukh, substantially improves the estimate of Delsarte, Goethals and Seidel from 1975. Various extensions of these results to the more general setting of spherical codes will be discussed as well. △ Less

Submitted 28 June, 2017; v1 submitted 21 June, 2016; originally announced June 2016.

Comments: 24 pages, 0 figures

Showing 1–15 of 15 results for author: Draxler, F