-
Evaluating Text-to-Image Synthesis: Survey and Taxonomy of Image Quality Metrics
Authors:
Sebastian Hartwig,
Dominik Engel,
Leon Sick,
Hannah Kniesel,
Tristan Payer,
Poonam Poonam,
Michael Glöckler,
Alex Bäuerle,
Timo Ropinski
Abstract:
Recent advances in text-to-image synthesis enabled through a combination of language and vision foundation models have led to a proliferation of the tools available and an increased attention to the field. When conducting text-to-image synthesis, a central goal is to ensure that the content between text and image is aligned. As such, there exist numerous evaluation metrics that aim to mimic human…
▽ More
Recent advances in text-to-image synthesis enabled through a combination of language and vision foundation models have led to a proliferation of the tools available and an increased attention to the field. When conducting text-to-image synthesis, a central goal is to ensure that the content between text and image is aligned. As such, there exist numerous evaluation metrics that aim to mimic human judgement. However, it is often unclear which metric to use for evaluating text-to-image synthesis systems as their evaluation is highly nuanced. In this work, we provide a comprehensive overview of existing text-to-image evaluation metrics. Based on our findings, we propose a new taxonomy for categorizing these metrics. Our taxonomy is grounded in the assumption that there are two main quality criteria, namely compositionality and generality, which ideally map to human preferences. Ultimately, we derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.
△ Less
Submitted 15 April, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
On locally factorial Fano fourfolds of Picard number two
Authors:
Andreas Bäuerle,
Christian Mauz
Abstract:
We classify the locally factorial Fano fourfolds of Picard number two with a hypersurface Cox ring that admit an effective action of a three-dimensional torus.
We classify the locally factorial Fano fourfolds of Picard number two with a hypersurface Cox ring that admit an effective action of a three-dimensional torus.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
An In-depth Look at Gemini's Language Abilities
Authors:
Syeda Nahida Akter,
Zichun Yu,
Aashiq Muhamed,
Tianyue Ou,
Alex Bäuerle,
Ángel Alexander Cabrera,
Krish Dholakia,
Chenyan Xiong,
Graham Neubig
Abstract:
The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible…
▽ More
The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible code and fully transparent results. Second, we take a closer look at the results, identifying areas where one of the two model classes excels. We perform this analysis over 10 datasets testing a variety of language abilities, including reasoning, answering knowledge-based questions, solving math problems, translating between languages, generating code, and acting as instruction-following agents. From this analysis, we find that Gemini Pro achieves accuracy that is close but slightly inferior to the corresponding GPT 3.5 Turbo on all tasks that we benchmarked. We further provide explanations for some of this under-performance, including failures in mathematical reasoning with many digits, sensitivity to multiple-choice answer ordering, aggressive content filtering, and others. We also identify areas where Gemini demonstrates comparably high performance, including generation into non-English languages, and handling longer and more complex reasoning chains. Code and data for reproduction can be found at https://github.com/neulab/gemini-benchmark
△ Less
Submitted 24 December, 2023; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Sharp volume and multiplicity bounds for Fano simplices
Authors:
Andreas Bäuerle
Abstract:
We present sharp upper bounds on the volume, Mahler volume and multiplicity for Fano simplices depending on the dimension and Gorenstein index. These bounds rely on the interplay between lattice simplices and unit fraction partitions. Moreover, we present an efficient procedure for explicitly classifying Fano simplicies of any dimension and Gorenstein index and we carry out the classification up t…
▽ More
We present sharp upper bounds on the volume, Mahler volume and multiplicity for Fano simplices depending on the dimension and Gorenstein index. These bounds rely on the interplay between lattice simplices and unit fraction partitions. Moreover, we present an efficient procedure for explicitly classifying Fano simplicies of any dimension and Gorenstein index and we carry out the classification up to dimension four for various Gorenstein indices.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
VegaProf: Profiling Vega Visualizations
Authors:
Junran Yang,
Alex Bäuerle,
Dominik Moritz,
Çağatay Demiralp
Abstract:
Domain-specific languages (DSLs) for visualization aim to facilitate visualization creation by providing abstractions that offload implementation and execution details from users to the system layer. Therefore, DSLs often execute user-defined specifications by transforming them into intermediate representations (IRs) in successive lowering operations. However, DSL-specified visualizations can be d…
▽ More
Domain-specific languages (DSLs) for visualization aim to facilitate visualization creation by providing abstractions that offload implementation and execution details from users to the system layer. Therefore, DSLs often execute user-defined specifications by transforming them into intermediate representations (IRs) in successive lowering operations. However, DSL-specified visualizations can be difficult to profile and, hence, optimize due to the layered abstractions. To better understand visualization profiling workflows and challenges, we conduct formative interviews with visualization engineers who use Vega in production. Vega is a popular visualization DSL that transforms specifications into dataflow graphs, which are then executed to render visualization primitives. Our formative interviews reveal that current developer tools are ill-suited for visualization profiling since they are disconnected from the semantics of Vega's specification and its IRs at runtime. To address this gap, we introduce VegaProf, the first performance profiler for Vega visualizations. VegaProf instruments the Vega library by associating a declarative specification with its compilation and execution. Integrated into a Vega code playground, VegaProf coordinates visual performance inspection at three abstraction levels: function, dataflow graph, and visualization specification. We evaluate VegaProf through use cases and feedback from visualization engineers as well as original developers of the Vega library. Our results suggest that VegaProf makes visualization profiling more tractable and actionable by enabling users to interactively probe time performance across layered abstractions of Vega. Furthermore, we distill recommendations from our findings and advocate for co-designing visualization DSLs together with their introspection tools.
△ Less
Submitted 18 September, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Sharp degree bounds for fake weighted projective spaces
Authors:
Andreas Bäuerle
Abstract:
We give sharp upper bounds on the anticanonical degree of fake weighted projective spaces, only depending on the dimension and the Gorenstein index.
We give sharp upper bounds on the anticanonical degree of fake weighted projective spaces, only depending on the dimension and the Gorenstein index.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Neural Activation Patterns (NAPs): Visual Explainability of Learned Concepts
Authors:
Alex Bäuerle,
Daniel Jönsson,
Timo Ropinski
Abstract:
A key to deciphering the inner workings of neural networks is understanding what a model has learned. Promising methods for discovering learned features are based on analyzing activation values, whereby current techniques focus on analyzing high activation values to reveal interesting features on a neuron level. However, analyzing high activation values limits layer-level concept discovery. We pre…
▽ More
A key to deciphering the inner workings of neural networks is understanding what a model has learned. Promising methods for discovering learned features are based on analyzing activation values, whereby current techniques focus on analyzing high activation values to reveal interesting features on a neuron level. However, analyzing high activation values limits layer-level concept discovery. We present a method that instead takes into account the entire activation distribution. By extracting similar activation profiles within the high-dimensional activation space of a neural network layer, we find groups of inputs that are treated similarly. These input groups represent neural activation patterns (NAPs) and can be used to visualize and interpret learned layer concepts. We release a framework with which NAPs can be extracted from pre-trained models and provide a visual introspection tool that can be used to analyze NAPs. We tested our method with a variety of networks and show how it complements existing methods for analyzing neural network activation values.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Symphony: Composing Interactive Interfaces for Machine Learning
Authors:
Alex Bäuerle,
Ángel Alexander Cabrera,
Fred Hohman,
Megan Maher,
David Koski,
Xavier Suau,
Titus Barik,
Dominik Moritz
Abstract:
Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to b…
▽ More
Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to be reused, explored, and shared by multiple stakeholders in cross-functional teams. To enable analysis and communication between different ML practitioners, we designed and implemented Symphony, a framework for composing interactive ML interfaces with task-specific, data-driven components that can be used across platforms such as computational notebooks and web dashboards. We developed Symphony through participatory design sessions with 10 teams (n=31), and discuss our findings from deploying Symphony to 3 production ML projects at Apple. Symphony helped ML practitioners discover previously unknown issues like data duplicates and blind spots in models while enabling them to share insights with other stakeholders.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Visual Identification of Problematic Bias in Large Label Spaces
Authors:
Alex Bäuerle,
Aybuke Gul Turker,
Ken Burke,
Osman Aka,
Timo Ropinski,
Christina Greer,
Mani Varadarajan
Abstract:
While the need for well-trained, fair ML systems is increasing ever more, measuring fairness for modern models and datasets is becoming increasingly difficult as they grow at an unprecedented pace. One key challenge in scaling common fairness metrics to such models and datasets is the requirement of exhaustive ground truth labeling, which cannot always be done. Indeed, this often rules out the app…
▽ More
While the need for well-trained, fair ML systems is increasing ever more, measuring fairness for modern models and datasets is becoming increasingly difficult as they grow at an unprecedented pace. One key challenge in scaling common fairness metrics to such models and datasets is the requirement of exhaustive ground truth labeling, which cannot always be done. Indeed, this often rules out the application of traditional analysis metrics and systems. At the same time, ML-fairness assessments cannot be made algorithmically, as fairness is a highly subjective matter. Thus, domain experts need to be able to extract and reason about bias throughout models and datasets to make informed decisions. While visual analysis tools are of great help when investigating potential bias in DL models, none of the existing approaches have been designed for the specific tasks and challenges that arise in large label spaces. Addressing the lack of visualization work in this area, we propose guidelines for designing visualizations for such large label spaces, considering both technical and ethical issues. Our proposed visualization approach can be integrated into classical model and data pipelines, and we provide an implementation of our techniques open-sourced as a TensorBoard plug-in. With our approach, different models and datasets for large label spaces can be systematically and visually analyzed and compared to make informed fairness assessments tackling problematic bias.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
On Gorenstein Fano Threefolds with an Action of a Two-Dimensional Torus
Authors:
Andreas Bäuerle,
Jürgen Hausen
Abstract:
We classify the non-toric, $\mathbb Q$-factorial, Gorenstein, log terminal Fano threefolds of Picard number one that admit an effective action of a two-dimensional algebraic torus.
We classify the non-toric, $\mathbb Q$-factorial, Gorenstein, log terminal Fano threefolds of Picard number one that admit an effective action of a two-dimensional algebraic torus.
△ Less
Submitted 16 November, 2022; v1 submitted 6 August, 2021;
originally announced August 2021.
-
Measuring Model Biases in the Absence of Ground Truth
Authors:
Osman Aka,
Ken Burke,
Alex Bäuerle,
Christina Greer,
Margaret Mitchell
Abstract:
The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may…
▽ More
The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may not be easily available in practice. We present an elegant mathematical solution that tackles both issues simultaneously, using image classification as a working example. By treating a classification model's predictions for a given image as a set of labels analogous to a bag of words, we rank the biases that a model has learned with respect to different identity labels. We use (man, woman) as a concrete example of an identity label set (although this set need not be binary), and present rankings for the labels that are most biased towards one identity or the other. We demonstrate how the statistical properties of different association metrics can lead to different rankings of the most "gender biased" labels, and conclude that normalized pointwise mutual information (nPMI) is most useful in practice. Finally, we announce an open-sourced nPMI visualization tool using TensorBoard.
△ Less
Submitted 6 June, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
exploRNN: Understanding Recurrent Neural Networks through Visual Exploration
Authors:
Alex Bäuerle,
Patrick Albus,
Raphael Störk,
Tina Seufert,
Timo Ropinski
Abstract:
Due to the success of deep learning (DL) and its growing job market, students and researchers from many areas are interested in learning about DL technologies. Visualization has proven to be of great help during this learning process. While most current educational visualizations are targeted towards one specific architecture or use case, recurrent neural networks (RNNs), which are capable of proc…
▽ More
Due to the success of deep learning (DL) and its growing job market, students and researchers from many areas are interested in learning about DL technologies. Visualization has proven to be of great help during this learning process. While most current educational visualizations are targeted towards one specific architecture or use case, recurrent neural networks (RNNs), which are capable of processing sequential data, are not covered yet. This is despite the fact that tasks on sequential data, such as text and function analysis, are at the forefront of DL research. Therefore, we propose exploRNN, the first interactively explorable educational visualization for RNNs. On the basis of making learning easier and more fun, we define educational objectives targeted towards understanding RNNs. We use these objectives to form guidelines for the visual design process. By means of exploRNN, which is accessible online, we provide an overview of the training process of RNNs at a coarse level, while also allowing a detailed inspection of the data flow within LSTM cells. In an empirical study, we assessed 37 subjects in a between-subjects design to investigate the learning outcomes and cognitive load of exploRNN compared to a classic text-based learning environment. While learners in the text group are ahead in superficial knowledge acquisition, exploRNN is particularly helpful for deeper understanding of the learning content. In addition, the complex content in exploRNN is perceived as significantly easier and causes less extraneous load than in the text group. The study shows that for difficult learning material such as recurrent networks, where deep understanding is important, interactive visualizations such as exploRNN can be helpful.
△ Less
Submitted 22 June, 2022; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Net2Vis -- A Visual Grammar for Automatically Generating Publication-Tailored CNN Architecture Visualizations
Authors:
Alex Bäuerle,
Christian van Onzenoodt,
Timo Ropinski
Abstract:
To convey neural network architectures in publications, appropriate visualizations are of great importance. While most current deep learning papers contain such visualizations, these are usually handcrafted just before publication, which results in a lack of a common visual grammar, significant time investment, errors, and ambiguities. Current automatic network visualization tools focus on debuggi…
▽ More
To convey neural network architectures in publications, appropriate visualizations are of great importance. While most current deep learning papers contain such visualizations, these are usually handcrafted just before publication, which results in a lack of a common visual grammar, significant time investment, errors, and ambiguities. Current automatic network visualization tools focus on debugging the network itself and are not ideal for generating publication visualizations. Therefore, we present an approach to automate this process by translating network architectures specified in Keras into visualizations that can directly be embedded into any publication. To do so, we propose a visual grammar for convolutional neural networks (CNNs), which has been derived from an analysis of such figures extracted from all ICCV and CVPR papers published between 2013 and 2019. The proposed grammar incorporates visual encoding, network layout, layer aggregation, and legend generation. We have further realized our approach in an online system available to the community, which we have evaluated through expert feedback, and a quantitative study. It not only reduces the time needed to generate network visualizations for publications, but also enables a unified and unambiguous visualization design.
△ Less
Submitted 10 February, 2021; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Classifier-Guided Visual Correction of Noisy Labels for Image Classification Tasks
Authors:
Alex Bäuerle,
Heiko Neumann,
Timo Ropinski
Abstract:
Training data plays an essential role in modern applications of machine learning. However, gathering labeled training data is time-consuming. Therefore, labeling is often outsourced to less experienced users, or completely automated. This can introduce errors, which compromise valuable training data, and lead to suboptimal training results. We thus propose a novel approach that uses the power of p…
▽ More
Training data plays an essential role in modern applications of machine learning. However, gathering labeled training data is time-consuming. Therefore, labeling is often outsourced to less experienced users, or completely automated. This can introduce errors, which compromise valuable training data, and lead to suboptimal training results. We thus propose a novel approach that uses the power of pretrained classifiers to visually guide users to noisy labels, and let them interactively check error candidates, to iteratively improve the training data set. To systematically investigate training data, we propose a categorization of labeling errors into three different types, based on an analysis of potential pitfalls in label acquisition processes. For each of these types, we present approaches to detect, reason about, and resolve error candidates, as we propose measures and visual guidance techniques to support machine learning users. Our approach has been used to spot errors in well-known machine learning benchmark data sets, and we tested its usability during a user evaluation. While initially developed for images, the techniques presented in this paper are independent of the classification algorithm, and can also be extended to many other types of training data.
△ Less
Submitted 6 April, 2020; v1 submitted 9 August, 2018;
originally announced August 2018.