Search | arXiv e-print repository

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading. △ Less

Submitted 14 June, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2402.08273 [pdf, other]

Regional Adaptive Metropolis Light Transport

Authors: Hisanari Otsu, Killian Herveau, Johannes Hanika, Derek Nowrouzezahrai, Carsten Dachsbacher

Abstract: The design of the proposal distributions, and most notably the kernel parameters, are crucial for the performance of Markov chain Monte Carlo (MCMC) rendering. A poor selection of parameters can increase the correlation of the Markov chain and result in bad rendering performance. We approach this problem by a novel path perturbation strategy for online-learning of state-dependent kernel parameters… ▽ More The design of the proposal distributions, and most notably the kernel parameters, are crucial for the performance of Markov chain Monte Carlo (MCMC) rendering. A poor selection of parameters can increase the correlation of the Markov chain and result in bad rendering performance. We approach this problem by a novel path perturbation strategy for online-learning of state-dependent kernel parameters. We base our approach on the theoretical framework of regional adaptive MCMC which enables the adaptation of parameters depending on the region of the state space which contains the current sample, and on information collected from previous samples. For this, we define a partitioning of the path space on a low-dimensional canonical space to capture the characteristics of paths, with a focus on path segments closer to the sensor. Fast convergence is achieved by adaptive refinement of the partitions. Exemplarily, we present two novel regional adaptive path perturbation techniques akin to lens and multi-chain perturbations. Our approach can easily be used on top of existing path space MLT methods to improve rendering efficiency, while being agnostic to the initial choice of kernel parameters. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 14 pages, 12 figures

arXiv:2308.16619 [pdf, other]

doi 10.1109/TVCG.2023.3326573

Fast Compressed Segmentation Volumes for Scientific Visualization

Authors: Max Piochowiak, Carsten Dachsbacher

Abstract: Voxel-based segmentation volumes often store a large number of labels and voxels, and the resulting amount of data can make storage, transfer, and interactive visualization difficult. We present a lossless compression technique which addresses these challenges. It processes individual small bricks of a segmentation volume and compactly encodes the labelled regions and their boundaries by an iterat… ▽ More Voxel-based segmentation volumes often store a large number of labels and voxels, and the resulting amount of data can make storage, transfer, and interactive visualization difficult. We present a lossless compression technique which addresses these challenges. It processes individual small bricks of a segmentation volume and compactly encodes the labelled regions and their boundaries by an iterative refinement scheme. The result for each brick is a list of labels, and a sequence of operations to reconstruct the brick which is further compressed using rANS-entropy coding. As the relative frequencies of operations are very similar across bricks, the entropy coding can use global frequency tables for an entire data set which enables efficient and effective parallel (de)compression. Our technique achieves high throughput (up to gigabytes per second both for compression and decompression) and strong compression ratios of about 1% to 3% of the original data set size while being applicable to GPU-based rendering. We evaluate our method for various data sets from different fields and demonstrate GPU-based volume visualization with on-the-fly decompression, level-of-detail rendering (with optional on-demand streaming of detail coefficients to the GPU), and a caching strategy for decompressed bricks for further performance improvement. △ Less

Submitted 16 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: IEEE VIS 2023 - The supplemental materials can be found at https://osf.io/8nbdk/ | ACM 2012 CCS - Human-centered computing, Visualization, Visualization application domains, Scientific visualization; Information systems, Data management systems, Data structures, Data layout, Data compression

ACM Class: E.4; I.3

arXiv:2009.04531 [pdf, other]

Uncertain Transport in Unsteady Flows

Authors: Tobias Rapp, Carsten Dachsbacher

Abstract: We study uncertainty in the dynamics of time-dependent flows by identifying barriers and enhancers to stochastic transport. This topological segmentation is closely related to the theory of Lagrangian coherent structures and is based on a recently introduced quantity, the diffusion barrier strength (DBS). The DBS is defined similar to the finite-time Lyapunov exponent (FTLE), but incorporates diff… ▽ More We study uncertainty in the dynamics of time-dependent flows by identifying barriers and enhancers to stochastic transport. This topological segmentation is closely related to the theory of Lagrangian coherent structures and is based on a recently introduced quantity, the diffusion barrier strength (DBS). The DBS is defined similar to the finite-time Lyapunov exponent (FTLE), but incorporates diffusion during flow integration. Height ridges of the DBS indicate stochastic transport barriers and enhancers, i.e. material surfaces that are minimally or maximally diffusive. To apply these concepts to real-world data, we represent uncertainty in a flow by a stochastic differential equation that consists of a deterministic and a stochastic component modeled by a Gaussian. With this formulation we identify barriers and enhancers to stochastic transport, without performing expensive Monte Carlo simulation and with a computational complexity comparable to FTLE. In addition, we propose a complementary visualization to convey the absolute scale of uncertainties in the Lagrangian frame of reference. This enables us to study uncertainty in real-world datasets, for example due to small deviations, data reduction, or estimated from multiple ensemble runs. △ Less

Submitted 30 August, 2020; originally announced September 2020.

Comments: Accepted to IEEE VIS 2020 short papers

arXiv:2008.09544 [pdf, other]

doi 10.1109/TVCG.2020.3030379

Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries

Authors: Tobias Rapp, Christoph Peters, Carsten Dachsbacher

Abstract: Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily struc… ▽ More Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach. △ Less

Submitted 15 October, 2020; v1 submitted 21 August, 2020; originally announced August 2020.

Comments: Accepted to IEEE SciVis 2020

arXiv:1907.05073 [pdf, other]

doi 10.1109/TVCG.2019.2934335

Void-and-Cluster Sampling of Large Scattered Data and Trajectories

Authors: Tobias Rapp, Christoph Peters, Carsten Dachsbacher

Abstract: We propose a data reduction technique for scattered data based on statistical sampling. Our void-and-cluster sampling technique finds a representative subset that is optimally distributed in the spatial domain with respect to the blue noise property. In addition, it can adapt to a given density function, which we use to sample regions of high complexity in the multivariate value domain more densel… ▽ More We propose a data reduction technique for scattered data based on statistical sampling. Our void-and-cluster sampling technique finds a representative subset that is optimally distributed in the spatial domain with respect to the blue noise property. In addition, it can adapt to a given density function, which we use to sample regions of high complexity in the multivariate value domain more densely. Moreover, our sampling technique implicitly defines an ordering on the samples that enables progressive data loading and a continuous level-of-detail representation. We extend our technique to sample time-dependent trajectories, for example pathlines in a time interval, using an efficient and iterative approach. Furthermore, we introduce a local and continuous error measure to quantify how well a set of samples represents the original dataset. We apply this error measure during sampling to guide the number of samples that are taken. Finally, we use this error measure and other quantities to evaluate the quality, performance, and scalability of our algorithm. △ Less

Submitted 7 October, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

Comments: To appear in IEEE Transactions on Visualization and Computer Graphics as a special issue from the proceedings of VIS 2019

arXiv:1610.05141 [pdf, other]

doi 10.1145/3157734

Efficient Random Sampling -- Parallel, Vectorized, Cache-Efficient, and Online

Authors: Peter Sanders, Sebastian Lamm, Lorenz Hübschle-Schneider, Emanuel Schrade, Carsten Dachsbacher

Abstract: We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time $\mathcal{O}(n/p+\log p)$ on $p$ processors, i.e., scales to massively parallel machines even for moderate v… ▽ More We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time $\mathcal{O}(n/p+\log p)$ on $p$ processors, i.e., scales to massively parallel machines even for moderate values of $n$. The amount of communication between the processors is very small (at most $\mathcal{O}(\log p)$) and independent of the sample size. We also discuss modifications needed for load balancing, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs. △ Less

Submitted 15 November, 2019; v1 submitted 17 October, 2016; originally announced October 2016.

ACM Class: G.4; G.3; G.2

Journal ref: ACM Transactions on Mathematical Software (TOMS), Volume 44, Issue 3 (April 2018), pages 29:1-29:14

Showing 1–7 of 7 results for author: Dachsbacher, C