-
Privacy-Preserving Sharing of Data Analytics Runtime Metrics for Performance Modeling
Authors:
Jonathan Will,
Dominik Scheinert,
Jan Bode,
Cedric Kring,
Seraphin Zunzer,
Lauritz Thamsen
Abstract:
Performance modeling for large-scale data analytics workloads can improve the efficiency of cluster resource allocations and job scheduling. However, the performance of these workloads is influenced by numerous factors, such as job inputs and the assigned cluster resources. As a result, performance models require significant amounts of training data. This data can be obtained by exchanging runtime…
▽ More
Performance modeling for large-scale data analytics workloads can improve the efficiency of cluster resource allocations and job scheduling. However, the performance of these workloads is influenced by numerous factors, such as job inputs and the assigned cluster resources. As a result, performance models require significant amounts of training data. This data can be obtained by exchanging runtime metrics between collaborating organizations. Yet, not all organizations may be inclined to publicly disclose such metadata.
We present a privacy-preserving approach for sharing runtime metrics based on differential privacy and data synthesis. Our evaluation on performance data from 736 Spark job executions indicates that fully anonymized training data largely maintains performance prediction accuracy, particularly when there is minimal original data available. With 30 or fewer available original data samples, the use of synthetic training data resulted only in a one percent reduction in performance model accuracy on average.
△ Less
Submitted 13 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications
Authors:
Dominik Scheinert,
Soeren Becker,
Jonathan Will,
Luis Englaender,
Lauritz Thamsen
Abstract:
Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the sharing of performance data or models, have been shown to improve resource efficiency, but there has been little focus on actual data sharing strategies and implem…
▽ More
Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the sharing of performance data or models, have been shown to improve resource efficiency, but there has been little focus on actual data sharing strategies and implementation in production environments. This missing building block holds back the realization of proposed collaborative solutions.
In this paper, we envision, design, and evaluate a peer-to-peer performance data sharing approach for collaborative performance modeling of distributed dataflow applications. Our proposed data distribution layer enables access to performance data in a decentralized manner, thereby facilitating collaborative modeling approaches and allowing for improved prediction capabilities and hence increased resource efficiency. In our evaluation, we assess our approach with regard to deployment, data replication, and data validation, through experiments with a prototype implementation and simulation, demonstrating feasibility and allowing discussion of potential limitations and next steps.
△ Less
Submitted 23 January, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Understanding and Visualizing Droplet Distributions in Simulations of Shallow Clouds
Authors:
Justus C. Will,
Andrea M. Jenney,
Kara D. Lamb,
Michael S. Pritchard,
Colleen Kaul,
Po-Lun Ma,
Kyle Pressel,
Jacob Shpund,
Marcus van Lier-Walqui,
Stephan Mandt
Abstract:
Thorough analysis of local droplet-level interactions is crucial to better understand the microphysical processes in clouds and their effect on the global climate. High-accuracy simulations of relevant droplet size distributions from Large Eddy Simulations (LES) of bin microphysics challenge current analysis techniques due to their high dimensionality involving three spatial dimensions, time, and…
▽ More
Thorough analysis of local droplet-level interactions is crucial to better understand the microphysical processes in clouds and their effect on the global climate. High-accuracy simulations of relevant droplet size distributions from Large Eddy Simulations (LES) of bin microphysics challenge current analysis techniques due to their high dimensionality involving three spatial dimensions, time, and a continuous range of droplet sizes. Utilizing the compact latent representations from Variational Autoencoders (VAEs), we produce novel and intuitive visualizations for the organization of droplet sizes and their evolution over time beyond what is possible with clustering techniques. This greatly improves interpretation and allows us to examine aerosol-cloud interactions by contrasting simulations with different aerosol concentrations. We find that the evolution of the droplet spectrum is similar across aerosol levels but occurs at different paces. This similarity suggests that precipitation initiation processes are alike despite variations in onset times.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics
Authors:
Dominik Scheinert,
Philipp Wiesner,
Thorsten Wittkopp,
Lauritz Thamsen,
Jonathan Will,
Odej Kao
Abstract:
Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due…
▽ More
Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due to the cold-start problem, this often leads to lengthy and costly profiling phases. However, big data analytics jobs across users can share many common properties: they often operate on similar infrastructure, using similar algorithms implemented in similar frameworks. The potential in sharing aggregated profiling runs to collaboratively address the cold start problem is largely unexplored.
We present Karasu, an approach to more efficient resource configuration profiling that promotes data sharing among users working with similar infrastructures, frameworks, algorithms, or datasets. Karasu trains lightweight performance models using aggregated runtime information of collaborators and combines them into an ensemble method to exploit inherent knowledge of the configuration search space. Moreover, Karasu allows the optimization of multiple objectives simultaneously. Our evaluation is based on performance data from diverse workload executions in a public cloud environment. We show that Karasu is able to significantly boost existing methods in terms of performance, search time, and cost, even when few comparable profiling runs are available that share only partial common characteristics with the target job.
△ Less
Submitted 23 November, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Rising and settling 2D cylinders with centre-of-mass offset
Authors:
Martin P. A. Assen,
Jelle B. Will,
Chong Shen Ng,
Detlef Lohse,
Roberto Verzicco,
Dominik Krug
Abstract:
Rotational effects are commonly neglected when considering the dynamics of freely rising or settling isotropic particles. Here, we demonstrate that particle rotations play an important role for rising as well as for settling cylinders in situations when mass eccentricity, and thereby a new pendulum timescale, is introduced to the system. We employ two-dimensional simulations to study the motion of…
▽ More
Rotational effects are commonly neglected when considering the dynamics of freely rising or settling isotropic particles. Here, we demonstrate that particle rotations play an important role for rising as well as for settling cylinders in situations when mass eccentricity, and thereby a new pendulum timescale, is introduced to the system. We employ two-dimensional simulations to study the motion of a single cylinder in a quiescent unbounded incompressible Newtonian fluid. This allows us to vary the Galileo number, density ratio, relative moment of inertia, and Centre-Of-Mass offset (COM) systematically and beyond what is feasible experimentally. For certain buoyant density ratios, the particle dynamics exhibit a resonance mode, during which the coupling via the Magnus lift force causes a positive feedback between translational and rotational motions. This mode results in vastly different trajectories with significantly larger rotational and translational amplitudes and an increase of the drag coefficient easily exceeding a factor two. We propose a simple model that captures how the occurrence of the COM offset induced resonance regime varies, depending on the other input parameters, specifically the density ratio, the Galileo number, and the relative moment of inertia. Remarkably, depending on the input parameters, resonance can be observed for centre-of-mass offsets as small as a few percent of the particle diameter, showing that the particle dynamics can be highly sensitive to this parameter.
△ Less
Submitted 7 January, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Compressed Sensing of Field-resolved Molecular Fingerprints Beyond the Nyquist Frequency
Authors:
Kilian Scheffter,
Jonathan Will,
Claudius Riek,
Herve Jousselin,
Sebastien Coudreau,
Nicolas Forget,
Hanieh Fattahi
Abstract:
Ultrashort time-domain spectroscopy and field-resolved spectroscopy of molecular fingerprints are gold standards for detecting samples' constituents and internal dynamics. However, they are hindered by the Nyquist criterion, leading to prolonged data acquisition, processing times, and sizable data volumes. In this work, we present the first experimental demonstration of compressed sensing on field…
▽ More
Ultrashort time-domain spectroscopy and field-resolved spectroscopy of molecular fingerprints are gold standards for detecting samples' constituents and internal dynamics. However, they are hindered by the Nyquist criterion, leading to prolonged data acquisition, processing times, and sizable data volumes. In this work, we present the first experimental demonstration of compressed sensing on field-resolved molecular fingerprinting by employing random scanning. Our measurements enable pinpointing the primary absorption peaks of atmospheric water vapor in response to terahertz light transients while sampling beyond the Nyquist limit. By drastically undersampling the electric field of the molecular response at a Nyquist frequency of 0.8 THz, we could successfully identify water absorption peaks up to 2.5 THz with a mean squared error of 12 * 10^-4. To our knowledge, this is the first experimental demonstration of time-domain compressed sensing, paving the path towards real-time field-resolved fingerprinting and acceleration of advanced spectroscopic techniques.
△ Less
Submitted 4 April, 2024; v1 submitted 21 July, 2023;
originally announced July 2023.
-
ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation
Authors:
Sungduk Yu,
Walter Hannah,
Liran Peng,
Jerry Lin,
Mohamed Aziz Bhouri,
Ritwik Gupta,
Björn Lütjens,
Justus Christopher Will,
Gunnar Behrens,
Julius Busecke,
Nora Loose,
Charles I Stern,
Tom Beucler,
Bryce Harrop,
Benjamin R Hillman,
Andrea Jenney,
Savannah Ferretti,
Nana Liu,
Anima Anandkumar,
Noah D Brenowitz,
Veronika Eyring,
Nicholas Geneva,
Pierre Gentine,
Stephan Mandt,
Jaideep Pathak
, et al. (31 additional authors not shown)
Abstract:
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short,…
▽ More
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state.
The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.
△ Less
Submitted 6 February, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?
Authors:
Jonathan Will,
Lauritz Thamsen,
Dominik Scheinert,
Odej Kao
Abstract:
Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration.
In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in…
▽ More
Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration.
In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in-memory processing with in-memory data processing frameworks can undermine resource efficiency. Based on the findings of our trace data analysis, we compile requirements towards an automated solution for efficient cluster resource allocation.
△ Less
Submitted 7 June, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics
Authors:
Dominik Scheinert,
Soeren Becker,
Jonathan Bader,
Lauritz Thamsen,
Jonathan Will,
Odej Kao
Abstract:
Choosing a good resource configuration for big data analytics applications can be challenging, especially in cloud environments. Automated approaches are desirable as poor decisions can reduce performance and raise costs. The majority of existing automated approaches either build performance models from previous workload executions or conduct iterative resource configuration profiling until a near…
▽ More
Choosing a good resource configuration for big data analytics applications can be challenging, especially in cloud environments. Automated approaches are desirable as poor decisions can reduce performance and raise costs. The majority of existing automated approaches either build performance models from previous workload executions or conduct iterative resource configuration profiling until a near-optimal solution has been found. In doing so, they only obtain an implicit understanding of the underlying infrastructure, which is difficult to transfer to alternative infrastructures and, thus, profiling and modeling insights are not sustained beyond very specific situations.
We present Perona, a novel approach to robust infrastructure fingerprinting for usage in the context of big data analytics. Perona employs common sets and configurations of benchmarking tools for target resources, so that resulting benchmark metrics are directly comparable and ranking is enabled. Insignificant benchmark metrics are discarded by learning a low-dimensional representation of the input metric vector, and previous benchmark executions are taken into consideration for context-awareness as well, allowing to detect resource degradation. We evaluate our approach both on data gathered from our own experiments as well as within related works for resource configuration optimization, demonstrating that Perona captures the characteristics from benchmark runs in a compact manner and produces representations that can be used directly.
△ Less
Submitted 30 January, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing
Authors:
Jonathan Will,
Lauritz Thamsen,
Jonathan Bader,
Dominik Scheinert,
Odej Kao
Abstract:
Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving performance. One crucial aspect of selecting an efficient resource configuration is avoiding memory bottlenecks. By knowing the required memory of a job in advance,…
▽ More
Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving performance. One crucial aspect of selecting an efficient resource configuration is avoiding memory bottlenecks. By knowing the required memory of a job in advance, the search space for an optimal resource configuration can be greatly reduced.
Therefore, we present Ruya, a method for memory-aware optimization of data processing cluster configurations based on iteratively exploring a narrowed-down search space. First, we perform job profiling runs with small samples of the dataset on just a single machine to model the job's memory usage patterns. Second, we prioritize cluster configurations with a suitable amount of total memory and within this reduced search space, we iteratively search for the best cluster configuration with Bayesian optimization. This search process stops once it converges on a configuration that is believed to be optimal for the given job. In our evaluation on a dataset with 1031 Spark and Hadoop jobs, we see a reduction of search iterations to find an optimal configuration by around half, compared to the baseline.
△ Less
Submitted 3 February, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Reshi: Recommending Resources for Scientific Workflow Tasks on Heterogeneous Infrastructures
Authors:
Jonathan Bader,
Fabian Lehmann,
Alexander Groth,
Lauritz Thamsen,
Dominik Scheinert,
Jonathan Will,
Ulf Leser,
Odej Kao
Abstract:
Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in…
▽ More
Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in many infrastructures, compute nodes offer highly heterogeneous resources. In consequence, predictions of the runtime of a given task on a given node, as required by many scheduling algorithms, are often rather imprecise, which can lead to sub-optimal scheduling decisions.
We propose Reshi, a method for recommending task-node assignments during workflow execution that can cope with heterogeneous tasks and heterogeneous nodes. Reshi approaches the problem as a regression task, where task-node pairs are modeled as feature vectors over the results of dedicated micro benchmarks and past task executions. Based on these features, Reshi trains a regression tree model to rank and recommend nodes for each ready-to-run task, which can be used as input to a scheduler. For our evaluation, we benchmarked 27 AWS machine types using three representative workflows. We compare Reshi's recommendations with three state-of-the-art schedulers. Our evaluation shows that Reshi outperforms HEFT by a mean makespan reduction of 7.18% and 18.01% assuming a mean task runtime prediction error of 15%.
△ Less
Submitted 17 October, 2022; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Get Your Memory Right: The Crispy Resource Allocation Assistant for Large-Scale Data Processing
Authors:
Jonathan Will,
Lauritz Thamsen,
Jonathan Bader,
Dominik Scheinert,
Odej Kao
Abstract:
Distributed dataflow systems like Apache Spark and Apache Hadoop enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs -- that neither lead to bottlenecks nor to low resource utilization -- is often challenging, even for expert users such as data engineers. Further, existing automated approaches to resource selection rel…
▽ More
Distributed dataflow systems like Apache Spark and Apache Hadoop enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs -- that neither lead to bottlenecks nor to low resource utilization -- is often challenging, even for expert users such as data engineers. Further, existing automated approaches to resource selection rely on the assumption that a job is recurring to learn from previous runs or to warrant the cost of full test runs to learn from. However, this assumption often does not hold since many jobs are too unique.
Therefore, we present Crispy, a method for optimizing data processing cluster configurations based on job profiling runs with small samples of the dataset on just a single machine. Crispy attempts to extrapolate the memory usage for the full dataset to then choose a cluster configuration with enough total memory. In our evaluation on a dataset with 1031 Spark and Hadoop jobs, we see a reduction of job execution costs by 56% compared to the baseline, while on average spending less than ten minutes on profiling runs per job on a consumer-grade laptop.
△ Less
Submitted 10 January, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview
Authors:
Lauritz Thamsen,
Dominik Scheinert,
Jonathan Will,
Jonathan Bader,
Odej Kao
Abstract:
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate pe…
▽ More
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate performance accurately, users frequently overprovision resources for their jobs, leading to low resource utilization and high costs. In this paper, we present major building blocks towards a collaborative approach for optimization of data processing cluster configurations based on runtime data and performance models. We believe that runtime data can be shared and used for performance models across different execution contexts, significantly reducing the reliance on the recurrence of individual processing jobs or, else, dedicated job profiling. For this, we describe how the similarity of processing jobs and cluster infrastructures can be employed to combine suitable data points from local and global job executions into accurate performance models. Furthermore, we outline approaches to performance prediction via more context-aware and reusable models. Finally, we lay out how metrics from previous executions can be combined with runtime monitoring to effectively re-configure models and clusters dynamically.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters
Authors:
Jonathan Bader,
Fabian Lehmann,
Lauritz Thamsen,
Jonathan Will,
Ulf Leser,
Odej Kao
Abstract:
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure chan…
▽ More
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure changes. In contrast, online methods, which predict task runtimes on specific nodes while the workflow is running, have to cope with the lack of example runs, especially during the start-up.
In this paper, we present Lotaru, a novel online method for locally estimating task runtimes in scientific workflows on heterogeneous clusters. Lotaru first profiles all nodes of a cluster with a set of short-running and uniform microbenchmarks. Next, it runs the workflow to be scheduled on the user's local machine with drastically reduced data to determine important task characteristics. Based on these measurements, Lotaru learns a Bayesian linear regression model to predict a task's runtime given the input size and finally adjusts the predicted runtime specifically for each task-node pair in the cluster based on the micro-benchmark results. Due to its Bayesian approach, Lotaru can also compute robust uncertainty estimates and provides them as an input for advanced scheduling methods.
Our evaluation with five real-world scientific workflows and different datasets shows that Lotaru significantly outperforms the baselines in terms of prediction errors for homogeneous and heterogeneous clusters.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds
Authors:
Dominik Scheinert,
Alireza Alamgiralem,
Jonathan Bader,
Jonathan Will,
Thorsten Wittkopp,
Lauritz Thamsen
Abstract:
With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed…
▽ More
With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, performance data to train such methods is often lacking and must be costly collected.
In this paper, we propose a collaborative approach for sharing anonymized workload execution traces among users, mining them for general patterns, and exploiting clusters of historical workloads for future optimizations. We evaluate our prototype implementation for mining workload execution graphs on a publicly available trace dataset and demonstrate the predictive value of workload clusters determined through traces only.
△ Less
Submitted 16 January, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud
Authors:
Jonathan Will,
Onur Arslan,
Jonathan Bader,
Dominik Scheinert,
Lauritz Thamsen
Abstract:
Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and number is difficult, especially for users who do not have access to previous performance metrics. One approach to overcoming this issue is to have users share runt…
▽ More
Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and number is difficult, especially for users who do not have access to previous performance metrics. One approach to overcoming this issue is to have users share runtime metrics to train context-aware performance models that help find a suitable configuration for the job at hand. A problem when sharing runtime data instead of trained models or model parameters is that the data size can grow substantially over time.
This paper examines several clustering techniques to minimize training data size while kee** the associated performance models accurate. Our results indicate that efficiency gains in data transfer, storage, and model training can be achieved through training data reduction. In the evaluation of our solution on a dataset of runtime data from 930 unique distributed dataflow jobs, we observed that, on average, a 75% data reduction only increases prediction errors by one percentage point.
△ Less
Submitted 11 March, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters
Authors:
Jonathan Bader,
Lauritz Thamsen,
Svetlana Kulagina,
Jonathan Will,
Henning Meyerhenke,
Odej Kao
Abstract:
Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However,…
▽ More
Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However, these resource managers only consider the number of CPUs and the amount of available memory when assigning tasks to resources; they do not consider hardware differences beyond these numbers, while computational speed and memory access rates can differ significantly.
We propose Tarema, a system for allocating task instances to heterogeneous cluster resources during the execution of scalable scientific workflows. First, Tarema profiles the available infrastructure with a set of benchmark programs and groups cluster nodes with similar performance. Second, Tarema uses online monitoring data of tasks, assigning labels to tasks depending on their resource usage. Third, Tarema uses the node groups and task labels to dynamically assign task instances evenly to resources based on resource demand. Our evaluation of a prototype implementation for Kubernetes, using five real-world Nextflow workflows from the popular nf-core framework and two 15-node clusters consisting of different virtual machines, shows a mean reduction of isolated job runtimes by 19.8% compared to popular schedulers in widely-used resource managers and 4.54% compared to the heuristic SJFN, while providing a better cluster usage. Moreover, executing two long-running workflows in parallel and on restricted resources shows that Tarema is able to reduce the runtimes even more while providing a fair cluster usage.
△ Less
Submitted 19 January, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation
Authors:
Dominik Scheinert,
Houkun Zhu,
Lauritz Thamsen,
Morgan K. Geldenhuys,
Jonathan Will,
Alexander Acker,
Odej Kao
Abstract:
Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime…
▽ More
Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance.
This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs and, thus, allows for deriving effective rescaling decisions. For this, Enel incorporates descriptive properties that capture the respective execution context, considers statistics from individual dataflow tasks, and propagates predictions through the job graph to eventually find an optimized new scale-out. Our evaluation of Enel with four iterative Spark jobs shows that our approach is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts.
△ Less
Submitted 26 January, 2022; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Dependable IoT Data Stream Processing for Monitoring and Control of Urban Infrastructures
Authors:
Morgan K. Geldenhuys,
Jonathan Will,
Benjamin J. J. Pfister,
Martin Haug,
Alexander Scharmann,
Lauritz Thamsen
Abstract:
The Internet of Things describes a network of physical devices interacting and producing vast streams of sensor data. At present there are a number of general challenges which exist while develo** solutions for use cases involving the monitoring and control of urban infrastructures. These include the need for a dependable method for extracting value from these high volume streams of time sensiti…
▽ More
The Internet of Things describes a network of physical devices interacting and producing vast streams of sensor data. At present there are a number of general challenges which exist while develo** solutions for use cases involving the monitoring and control of urban infrastructures. These include the need for a dependable method for extracting value from these high volume streams of time sensitive data which is adaptive to changing workloads. Low-latency access to the current state for live monitoring is a necessity as well as the ability to perform queries on historical data. At the same time, many design choices need to be made and the number of possible technology options available further adds to the complexity.
In this paper we present a dependable IoT data processing platform for the monitoring and control of urban infrastructures. We define requirements in terms of dependability and then select a number of mature open-source technologies to match these requirements. We examine the disparate parts necessary for delivering a holistic overall architecture and describe the dataflows between each of these components. We likewise present generalizable methods for the enrichment and analysis of sensor data applicable across various application areas. We demonstrate the usefulness of this approach by providing an exemplary prototype platform executing on top of Kubernetes and evaluate the effectiveness of jobs processing sensor data in this environment.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts
Authors:
Dominik Scheinert,
Lauritz Thamsen,
Houkun Zhu,
Jonathan Will,
Alexander Acker,
Thorsten Wittkopp,
Odej Kao
Abstract:
Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of a concrete job are helpful in such situations, yet they are usually bound to a specific job execution context (e.g. node type, software versions, job parameters…
▽ More
Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of a concrete job are helpful in such situations, yet they are usually bound to a specific job execution context (e.g. node type, software versions, job parameters) due to the few considered input parameters. Even in case of slight context changes, such supportive models need to be retrained and cannot benefit from historical execution data from related contexts.
This paper presents Bellamy, a novel modeling approach that combines scale-outs, dataset sizes, and runtimes with additional descriptive properties of a dataflow job. It is thereby able to capture the context of a job execution. Moreover, Bellamy is realizing a two-step modeling approach. First, a general model is trained on all the available data for a specific scalable analytics algorithm, hereby incorporating data from different contexts. Subsequently, the general model is optimized for the specific situation at hand, based on the available data for the concrete context. We evaluate our approach on two publicly available datasets consisting of execution data from various dataflow jobs carried out in different environments, showing that Bellamy outperforms state-of-the-art methods.
△ Less
Submitted 17 October, 2021; v1 submitted 29 July, 2021;
originally announced July 2021.
-
C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds
Authors:
Jonathan Will,
Lauritz Thamsen,
Dominik Scheinert,
Jonathan Bader,
Odej Kao
Abstract:
Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud resources for dataflow jobs - that neither lead to bottlenecks nor to low resource utilization - is often challenging, even for expert users such as data engineers.
W…
▽ More
Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud resources for dataflow jobs - that neither lead to bottlenecks nor to low resource utilization - is often challenging, even for expert users such as data engineers.
We present C3O, a collaborative system for optimizing data processing cluster configurations in public clouds based on shared historical runtime data. The shared data is utilized for predicting the runtimes of data processing jobs on different possible cluster configurations, using specialized regression models. These models take the diverse execution contexts of different users into account and exhibit mean absolute errors below 3% in our experimental evaluation with 930 unique Spark jobs.
△ Less
Submitted 1 December, 2021; v1 submitted 28 July, 2021;
originally announced July 2021.
-
Strong alignment of prolate ellipsoids in Taylor-Couette flow
Authors:
Martin P. A. Assen,
Chong Shen Ng,
Jelle B. Will,
Richard J. A. M. Stevens,
Detlef Lohse,
Roberto Verzicco
Abstract:
We report on the mobility and orientation of finite-size, neutrally buoyant prolate ellipsoids (of aspect ratio $Λ=4$) in Taylor-Couette flow, using interface resolved numerical simulations. The setup consists of a particle-laden flow in between a rotating inner and a stationary outer cylinder. We simulate two particle sizes $\ell/d=0.1$ and $\ell/d=0.2$, $\ell$ denoting the particle major axis an…
▽ More
We report on the mobility and orientation of finite-size, neutrally buoyant prolate ellipsoids (of aspect ratio $Λ=4$) in Taylor-Couette flow, using interface resolved numerical simulations. The setup consists of a particle-laden flow in between a rotating inner and a stationary outer cylinder. We simulate two particle sizes $\ell/d=0.1$ and $\ell/d=0.2$, $\ell$ denoting the particle major axis and $d$ the gap-width between the cylinders. The volume fractions are $0.01\%$ and $0.07\%$, respectively. The particles, which are initially randomly positioned, ultimately display characteristic spatial distributions which can be categorised into four modes. Modes $(i)$ to $(iii)$ are observed in the Taylor vortex flow regime, while mode ($iv$) encompasses both the wavy vortex, and turbulent Taylor vortex flow regimes. Mode $(i)$ corresponds to stable orbits away from the vortex cores. Remarkably, in a narrow $\textit{Ta}$ range, particles get trapped in the Taylor vortex cores (mode ($ii$)). Mode $(iii)$ is the transition when both modes $(i)$ and $(ii)$ are observed. For mode $(iv)$, particles distribute throughout the domain due to flow instabilities. All four modes show characteristic orientational statistics. We find the particle clustering for mode ($ii$) to be size-dependent, with two main observations. Firstly, particle agglomeration at the core is much higher for $\ell/d=0.2$ compared to $\ell/d=0.1$. Secondly, the $\textit{Ta}$ range for which clustering is observed depends on the particle size. For this mode $(ii)$ we observe particles to align strongly with the local cylinder tangent. The most pronounced particle alignment is observed for $\ell/d=0.2$ around $\textit{Ta}=4.2\times10^5$. This observation is found to closely correspond to a minimum of axial vorticity at the Taylor vortex core ($\textit{Ta}=6\times10^5$) and we explain why.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Dynamics of freely rising spheres: the effect of moment of inertia
Authors:
Jelle B. Will,
Dominik Krug
Abstract:
The goal of this study is to elucidate the effect the particle moment of inertia (MOI) has on the dynamics of spherical particles rising in a quiescent and turbulent fluid. To this end, we performed experiments with varying density ratios $Γ$, the ratio of the particle density and fluid density, ranging from $0.37$ up to $0.97$. At each $Γ$ the MOI was varied by shifting mass between the shell and…
▽ More
The goal of this study is to elucidate the effect the particle moment of inertia (MOI) has on the dynamics of spherical particles rising in a quiescent and turbulent fluid. To this end, we performed experiments with varying density ratios $Γ$, the ratio of the particle density and fluid density, ranging from $0.37$ up to $0.97$. At each $Γ$ the MOI was varied by shifting mass between the shell and the center of the particle to vary $I^*$ (the particle MOI normalised by the MOI of particle with the same weight and a uniform mass distribution). Helical paths are observed for low, and `3D chaotic' trajectories at higher values of $Γ$. The present data suggests no influence of $I^*$ on the critical value for this transition $0.42<Γ_{\textrm{crit}}<0.52$. For the `3D chaotic' rise mode we identify trends of decreasing particle drag coefficient ($C_d$) and amplitude of oscillation with increasing $I^*$. Due to limited data it remains unclear if a similar dependence exists in the helical regime as well. Path oscillations remain finite for all cases studied and no `rectilinear' mode is encountered, which may be the consequence of allowing for a longer transient distance in the present compared to earlier work. Rotational dynamics did not vary significantly between quiescent and turbulent surroundings, indicating that these are predominantly wake driven.
△ Less
Submitted 3 June, 2021; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Long-term LHC Discovery Reach for Compressed Higgsino-like Models using VBF Processes
Authors:
Cardona Natalia,
Flórez Andrés,
Gurrola Alfredo,
Johns Will,
Sheldon Paul,
Tao cheng
Abstract:
The identity of Dark Matter (DM) is one of the most active topics in particle physics today. Supersymmetry (SUSY) is an extension of the standard model (SM) that could describe the particle nature of DM in the form of the lightest neutralino in R-parity conserving models. We focus on SUSY models that solve the hierarchy problem with small fine tuning, and where the lightest SUSY particles (…
▽ More
The identity of Dark Matter (DM) is one of the most active topics in particle physics today. Supersymmetry (SUSY) is an extension of the standard model (SM) that could describe the particle nature of DM in the form of the lightest neutralino in R-parity conserving models. We focus on SUSY models that solve the hierarchy problem with small fine tuning, and where the lightest SUSY particles ($\tildeχ_{1}^{0}$, $\tildeχ_{1}^{\pm}$, $\tildeχ_{2}^{0}$) are a triplet of higgsino-like states, such that the mass difference $Δm(\tildeχ^{0}_{2},\tildeχ^{0}_{1})$ is 2-50 GeV. We perform a feasibility study to assess the long-term discovery potential for these compressed SUSY models with higgsino-like states, using vector boson fusion (VBF) processes in the context of proton-proton collisions at $\sqrt{s} = 13$ TeV, at the CERN Large Hadron Collider. Assuming an integrated luminosity of 3000 fb$^{-1}$, we find that stringent VBF requirements, combined with large missing momentum and one or two low-$p_{T}$ leptons, is effective at reducing the major SM backgrounds, leading to a 5$σ$ (3$σ$) discovery reach for $m(\tildeχ^{0}_{2}) < 180$ $(260)$ GeV, and a projected 95\% confidence level exclusion region that covers $m(\tildeχ^{0}_{2})$ up to 385 GeV, parameter space that is currently unconstrained by other experiments.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Rising and Sinking in Resonance: Probing the critical role of rotational dynamics for buoyancy driven spheres
Authors:
Jelle Will,
Dominik Krug
Abstract:
We present experimental results for spherical particles rising and settling in a still fluid. Imposing a well-controlled center of mass offset enables us to vary the rotational dynamics selectively by introducing an intrinsic rotational timescale to the problem. Results are highly sensitive even to small degrees of offset, rendering this a practically relevant parameter by itself. We further find…
▽ More
We present experimental results for spherical particles rising and settling in a still fluid. Imposing a well-controlled center of mass offset enables us to vary the rotational dynamics selectively by introducing an intrinsic rotational timescale to the problem. Results are highly sensitive even to small degrees of offset, rendering this a practically relevant parameter by itself. We further find that for a certain ratio of the rotational to a vortex shedding timescale (capturing a Froude-type similarity) a resonance phenomenon sets in. Even though this is a rotational effect in origin, it also strongly affects translational oscillation frequency and amplitude, and most importantly the drag coefficient. This observation equally applies to both heavy and light spheres, albeit with slightly different characteristics for which we offer an explanation. Our findings highlight the need to consider rotational parameters when trying to understand and classify path properties of rising and settling spheres.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs
Authors:
Jonathan Will,
Jonathan Bader,
Lauritz Thamsen
Abstract:
Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources in both type and number can often be challenging, as the selected configuration needs to match a distributed dataflow job's resource demands and access patterns.…
▽ More
Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources in both type and number can often be challenging, as the selected configuration needs to match a distributed dataflow job's resource demands and access patterns. A good cluster configuration avoids hardware bottlenecks and maximizes resource utilization, avoiding costly overprovisioning.
We propose a collaborative approach for finding optimal cluster configurations based on sharing and learning from historical runtime data of distributed dataflow jobs. Collaboratively shared data can be utilized to predict runtimes of future job executions through the use of specialized regression models. However, training prediction models on historical runtime data that were produced by different users and in diverse contexts requires the models to take these contexts into account.
△ Less
Submitted 27 April, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Kinematics and dynamics of freely rising spheroids at high Reynolds numbers
Authors:
J. B. Will,
V. Mathai,
S. G. Huisman,
D. Lohse,
C. Sun,
D. Krug
Abstract:
We experimentally investigate the effect of geometrical anisotropy for buoyant ellipsoidal particles rising in a still fluid. All other parameters, such as the Galileo number $Ga \approx 6000$ and the particle density ratio $Γ\approx 0.53$ are kept constant. The geometrical aspect ratio, $χ$, of the particle is varied systematically from $χ$ = 0.2 (oblate) to 5 (prolate). Based on tracking all deg…
▽ More
We experimentally investigate the effect of geometrical anisotropy for buoyant ellipsoidal particles rising in a still fluid. All other parameters, such as the Galileo number $Ga \approx 6000$ and the particle density ratio $Γ\approx 0.53$ are kept constant. The geometrical aspect ratio, $χ$, of the particle is varied systematically from $χ$ = 0.2 (oblate) to 5 (prolate). Based on tracking all degrees of particle motion, we identify six regimes characterised by distinct rise dynamics. Firstly, for $0.83 \le χ\le 1.20$, increased rotational dynamics are observed and the particle flips over semi-regularly in a "tumbling"-like motion. Secondly, for oblate particles with $0.29 \le χ\le 0.75$, planar regular "zig-zag" motion is observed, where the drag coefficient is independent of $χ$. Thirdly, for the most extreme oblate geometries ($χ\le 0.25$) a "flutter"-like behaviour is found, characterised by precession of the oscillation plane and an increase in the drag coefficient. For prolate geometries, we observed two coexisting oscillation modes that contribute to complex trajectories: the first is related to oscillations of the pointing vector and the second corresponds to a motion perpendicular to the particle's symmetry axis. We identify a "longitudinal" regime ($1.33 \le χ\le 2.5$), where both modes are active and a different one, the "broadside"-regime ($3 \le χ\le 4$), where only the second mode is present. Remarkably, for the most prolate particles ($χ= 5$), we observe an entirely different "helical" rise with completely unique features.
△ Less
Submitted 6 December, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Simulation of XXZ Spin Models using Sideband Transitions in Trapped Bosonic Gases
Authors:
Anjun Chu,
Johannes Will,
Jan Arlt,
Carsten Klempt,
Ana Maria Rey
Abstract:
We theoretically propose and experimentally demonstrate the use of motional sidebands in a trapped ensemble of $^{87}$Rb atoms to engineer tunable long-range XXZ spin models. We benchmark our simulator by probing a ferromagnetic to paramagnetic dynamical phase transition in the Lipkin-Meshkov-Glick (LMG) model, a collective XXZ model plus additional transverse and longitudinal fields, via Rabi spe…
▽ More
We theoretically propose and experimentally demonstrate the use of motional sidebands in a trapped ensemble of $^{87}$Rb atoms to engineer tunable long-range XXZ spin models. We benchmark our simulator by probing a ferromagnetic to paramagnetic dynamical phase transition in the Lipkin-Meshkov-Glick (LMG) model, a collective XXZ model plus additional transverse and longitudinal fields, via Rabi spectroscopy. We experimentally reconstruct the boundary between the dynamical phases, which is in good agreement with mean-field theoretical predictions. Our work introduces new possibilities in quantum simulation of anisotropic spin-spin interactions and quantum metrology enhanced by many-body entanglement.
△ Less
Submitted 28 October, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Searching for New Heavy Neutral Gauge Bosons using Vector Boson Fusion Processes at the LHC
Authors:
Florez Andres,
Gurrola Alfredo,
Johns Will,
Do oh Young,
Sheldon Paul,
Teague Dylan,
Weiler Thomas
Abstract:
New massive resonances are predicted in many extensions to the Standard Model (SM) of particle physics and constitutes one of the most promising searches for new physics at the LHC. We present a feasibility study to search for new heavy neutral gauge bosons using vector boson fusion (VBF) processes, which become especially important as the LHC probes higher collision energies. In particular, we co…
▽ More
New massive resonances are predicted in many extensions to the Standard Model (SM) of particle physics and constitutes one of the most promising searches for new physics at the LHC. We present a feasibility study to search for new heavy neutral gauge bosons using vector boson fusion (VBF) processes, which become especially important as the LHC probes higher collision energies. In particular, we consider the possibility that the discovery of a $Z'$ boson may have eluded searches at the LHC. The coupling of the $Z'$ boson to the SM quarks can be small, and thus the $Z'$ would not be discoverable by the searches conducted thus far. In the context of a simplified phenomenological approach, we consider the $Z'\toττ$ and $Z'\toμμ$ decay modes to show that the requirement of a dilepton pair combined with two high $p_{T}$ forward jets with large separation in pseudorapidity and with large dijet mass is effective in reducing SM backgrounds. The expected exclusion bounds (at 95\% confidence level) are $m(Z') < 1.8$ TeV and $m(Z') < 2.5$ TeV in the $ττj_{f}j_{f}$ and $μμj_{f}j_{f}$ channels, respectively, assuming 1000 fb$^{-1}$ of 13 TeV data from the LHC. The use of the VBF topology to search for massive neutral gauge bosons provides a discovery reach with expected significances greater than 5$σ$ (3$σ$) for $Z'$ masses up to 1.4 (1.6) TeV and 2.0 (2.2) TeV in the $ττj_{f}j_{f}$ and $μμj_{f}j_{f}$ channels.
△ Less
Submitted 16 December, 2016; v1 submitted 30 September, 2016;
originally announced September 2016.
-
Extended coherence time on the clock transition of optically trapped Rubidium
Authors:
G. Kleine Büning,
J. Will,
W. Ertmer,
E. Rasel,
J. Arlt,
C. Klempt,
F. Ramirez-Martinez,
F. Piéchon,
P. Rosenbusch
Abstract:
Optically trapped ensembles are of crucial importance for frequency measurements and quantum memories, but generally suffer from strong dephasing due to inhomogeneous density and light shifts. We demonstrate a drastic increase of the coherence time to 21 s on the magnetic field insensitive clock transition of Rb-87 by applying the recently discovered spin self-rephasing. This result confirms the g…
▽ More
Optically trapped ensembles are of crucial importance for frequency measurements and quantum memories, but generally suffer from strong dephasing due to inhomogeneous density and light shifts. We demonstrate a drastic increase of the coherence time to 21 s on the magnetic field insensitive clock transition of Rb-87 by applying the recently discovered spin self-rephasing. This result confirms the general nature of this new mechanism and thus shows its applicability in atom clocks and quantum memories. A systematic investigation of all relevant frequency shifts and noise contributions yields a stability of 2.4E-11 x tau^(-1/2), where tau is the integration time in seconds. Based on a set of technical improvements, the presented frequency standard is predicted to rival the stability of microwave fountain clocks in a potentially much more compact setup.
△ Less
Submitted 11 March, 2011;
originally announced March 2011.
-
A slow gravity compensated Atom Laser
Authors:
G. Kleine Büning,
J. Will,
W. Ertmer,
C. Klempt,
J. Arlt
Abstract:
We report on a slow guided atom laser beam outcoupled from a Bose-Einstein condensate of 87Rb atoms in a hybrid trap. The acceleration of the atom laser beam can be controlled by compensating the gravitational acceleration and we reach residual accelerations as low as 0.0027 g. The outcoupling mechanism allows for the production of a constant flux of 4.5x10^6 atoms per second and due to transverse…
▽ More
We report on a slow guided atom laser beam outcoupled from a Bose-Einstein condensate of 87Rb atoms in a hybrid trap. The acceleration of the atom laser beam can be controlled by compensating the gravitational acceleration and we reach residual accelerations as low as 0.0027 g. The outcoupling mechanism allows for the production of a constant flux of 4.5x10^6 atoms per second and due to transverse guiding we obtain an upper limit for the mean beam width of 4.6 μ\m. The transverse velocity spread is only 0.2 mm/s and thus an upper limit for the beam quality parameter is M^2=2.5. We demonstrate the potential of the long interrogation times available with this atom laser beam by measuring the trap frequency in a single measurement. The small beam width together with the long evolution and interrogation time makes this atom laser beam a promising tool for continuous interferometric measurements.
△ Less
Submitted 21 May, 2010;
originally announced May 2010.
-
Damped Bloch Oscillations of Bose-Einstein Condensates in Disordered Potential Gradients
Authors:
S. Drenkelforth,
G. Kleine Büning,
J. Will,
T. Schulte,
N. Murray,
W. Ertmer,
L. Santos,
J. J. Arlt
Abstract:
We investigate both experimentally and theoretically disorder induced dam** of Bloch oscillations of Bose-Einstein condensates in optical lattices. The spatially inhomogeneous force responsible for the dam** is realised by a combination of a disordered optical and a magnetic gradient potential. We show that the inhomogeneity of this force results in a broadening of the quasimomentum spectrum…
▽ More
We investigate both experimentally and theoretically disorder induced dam** of Bloch oscillations of Bose-Einstein condensates in optical lattices. The spatially inhomogeneous force responsible for the dam** is realised by a combination of a disordered optical and a magnetic gradient potential. We show that the inhomogeneity of this force results in a broadening of the quasimomentum spectrum, which in turn causes dam** of the centre-of-mass oscillation. We quantitatively compare the obtained dam** rates to the simulations using the Gross-Pitaevskii equation. Our results are relevant for high precision experiments on very small forces, which require the observation of a large number of oscillation cycles.
△ Less
Submitted 7 April, 2008; v1 submitted 22 January, 2008;
originally announced January 2008.
-
Transport of a quantum degenerate heteronuclear Bose-Fermi mixture in a harmonic trap
Authors:
C. Klempt,
T. Henninger,
O. Topic,
J. Will,
St. Falke,
W. Ertmer,
J. Arlt
Abstract:
We report on the transport of mixed quantum degenerate gases of bosonic 87Rb and fermionic 40K in a harmonic potential provided by a modified QUIC trap. The samples are transported over a distance of 6 mm to the geometric center of the anti-Helmholtz coils of the QUIC trap. This transport mechanism was implemented by a small modification of the QUIC trap and is free of losses and heating. It all…
▽ More
We report on the transport of mixed quantum degenerate gases of bosonic 87Rb and fermionic 40K in a harmonic potential provided by a modified QUIC trap. The samples are transported over a distance of 6 mm to the geometric center of the anti-Helmholtz coils of the QUIC trap. This transport mechanism was implemented by a small modification of the QUIC trap and is free of losses and heating. It allows all experiments using QUIC traps to use the highly homogeneous magnetic fields that can be created in the center of a QUIC trap and improves the optical access to the atoms, e.g., for experiments with optical lattices. This mechanism may be cascaded to cover even larger distances for applications with quantum degenerate samples.
△ Less
Submitted 20 March, 2008; v1 submitted 21 August, 2007;
originally announced August 2007.
-
KRb Feshbach Resonances: Modeling the interatomic potential
Authors:
C. Klempt,
T. Henninger,
O. Topic,
J. Will,
W. Ertmer,
E. Tiemann,
J. Arlt
Abstract:
We have observed 28 heteronuclear Feshbach resonances in 10 spin combinations of the hyperfine ground states of a 40K 87Rb mixture. The measurements were performed by observing the loss rates from an atomic mixture at magnetic fields between 0 and 700 G. This data was used to significantly refine an interatomic potential derived from molecular spectroscopy, yielding a highly consistent model of…
▽ More
We have observed 28 heteronuclear Feshbach resonances in 10 spin combinations of the hyperfine ground states of a 40K 87Rb mixture. The measurements were performed by observing the loss rates from an atomic mixture at magnetic fields between 0 and 700 G. This data was used to significantly refine an interatomic potential derived from molecular spectroscopy, yielding a highly consistent model of the KRb interaction. Thus, the measured resonances can be assigned to the corresponding molecular states. In addition, this potential allows for an accurate calculation of the energy differences between highly excited levels and the rovibrational ground level. This information is of particular relevance for the formation of deeply bound heteronuclear molecules. Finally, the model is used to predict Feshbach resonances in mixtures of 87Rb combined with 39K or 41K.
△ Less
Submitted 29 June, 2007; v1 submitted 1 June, 2007;
originally announced June 2007.
-
Evidence for short range orbital order in paramagnetic insulating (Al,V)_2O_3
Authors:
P. Pfalzer,
J. Will,
A. Nateprov,
M. Klemm,
V. Eyert,
S. Horn,
A. I. Frenkel,
S. Calvin,
M. L. denBoer
Abstract:
The local structure of (Al_0.06V_0.94)_2O_3 in the paramagnetic insulating (PI) and antiferromagnetically ordered insulating (AFI) phase has been investigated using hard and soft x-ray absorption techniques. It is shown that: 1) on a local scale, the symmetry of the vanadium sites in both the PI and the AFI phase is the same; and 2) the vanadium 3d - oxygen 2p hybridization, as gauged by the oxy…
▽ More
The local structure of (Al_0.06V_0.94)_2O_3 in the paramagnetic insulating (PI) and antiferromagnetically ordered insulating (AFI) phase has been investigated using hard and soft x-ray absorption techniques. It is shown that: 1) on a local scale, the symmetry of the vanadium sites in both the PI and the AFI phase is the same; and 2) the vanadium 3d - oxygen 2p hybridization, as gauged by the oxygen 1s absorption edge, is the same for both phases, but distinctly different from the paramagnetic metallic phase of pure V_2O_3. These findings can be understood in the context of a recently proposed model which relates the long range monoclinic distortion of the antiferromagnetically ordered state to orbital ordering, if orbital short range order in the PI phase is assumed. The measured anisotropy of the x-ray absorption spectra is discussed in relation to spin-polarized density functional calculations.
△ Less
Submitted 2 December, 2001;
originally announced December 2001.
-
Atomic scale imaging and spectroscopy of the V_2O_3 (0001)-surface: bulk versus surface effects
Authors:
M. Preisinger,
J. Will,
M. Klemm,
S. Klimm,
S. Horn
Abstract:
We present atomic scale images of a V_2O_3 (0001)-surface, which show that the surface is susceptible to reconstruction by dimerization of vanadium ions. The atomic order of the surface depends sensitively on the surface preparation. Scanning tunneling spectroscopy proves a dimerized surface has a gap in the electronic density of states at the Fermi energy, while a surface prepared by sputtering…
▽ More
We present atomic scale images of a V_2O_3 (0001)-surface, which show that the surface is susceptible to reconstruction by dimerization of vanadium ions. The atomic order of the surface depends sensitively on the surface preparation. Scanning tunneling spectroscopy proves a dimerized surface has a gap in the electronic density of states at the Fermi energy, while a surface prepared by sputtering and successive annealing shows no dimerization and no gap. Photoemission spectra depend sensitively on the surface structure and are consistent with scanning tunneling spectroscopy data. The measurements explain inconsistencies in photoemission experiments performed on such oxides in the past.
△ Less
Submitted 7 November, 2001;
originally announced November 2001.
-
Photometric and kinematic studies of open star clusters. III. NGC 4103, NGC 5281, and NGC 4755
Authors:
Joerg Sanner,
Jens Brunzendorf,
Jean-Marie Will,
Michael Geffert
Abstract:
We present CCD photometry and proper motion studies of the three open star clusters NGC 4103, NGC 5281, and NGC 4755 (kappa Cru). By fitting isochrones to the colour magnitude diagrams, we found that all three objects are young open star clusters with ages of at most t=45 Myr. They are located at distances from approx. 1600 pc to 2200 pc, derived from distance moduli (m-M)_0 ranging from 11 mag…
▽ More
We present CCD photometry and proper motion studies of the three open star clusters NGC 4103, NGC 5281, and NGC 4755 (kappa Cru). By fitting isochrones to the colour magnitude diagrams, we found that all three objects are young open star clusters with ages of at most t=45 Myr. They are located at distances from approx. 1600 pc to 2200 pc, derived from distance moduli (m-M)_0 ranging from 11 mag to 12 mag. We combined membership determinations based on proper motions and statistical field star subtraction to derive the initial mass function (IMF) of the clusters. The shape of the IMFs could be represented by power laws with exponents of Gamma=-1.46 +/- 0.22 for NGC 4103, Gamma=-1.60 +/- 0.50 for NGC 5281, and Gamma=-1.68 +/- 0.14 for NGC 4755, when - as a reference - Salpeter's (1955) value would be Gamma=-1.35. These results agree well with other IMF studies of open star clusters.
△ Less
Submitted 30 January, 2001;
originally announced January 2001.
-
No stellar age gradient inside supergiant shell LMC 4
Authors:
Jochen M. Braun,
Dominik J. Bomans,
Jean-Marie Will,
Klaas S. de Boer
Abstract:
The youngest stellar populations of a 'J'-shaped region (400 pc strip E-W across LH 77 and 850 pc S-N) inside the supergiant shell (SGS) LMC 4 (with a diameter of 1.4 kpc) have been analysed with CCD photometry in B,V passbands. Isochrone fitting to the colour-magnitude diagrams yields ages in the range from 9 Myr to 16 Myr without correlation with the distance to the LMC 4 centre. We construct…
▽ More
The youngest stellar populations of a 'J'-shaped region (400 pc strip E-W across LH 77 and 850 pc S-N) inside the supergiant shell (SGS) LMC 4 (with a diameter of 1.4 kpc) have been analysed with CCD photometry in B,V passbands. Isochrone fitting to the colour-magnitude diagrams yields ages in the range from 9 Myr to 16 Myr without correlation with the distance to the LMC 4 centre. We construct the luminosity function and the mass function of five regions to ensure that projection effects don't mask the results. The slopes lie in the expected range (gamma in [0.22;0.41] and Gamma in [-1.3;-2.4] respectively, with the Salpeter value of Gamma = -1.35). After our calculations a total of 5-7 10^3 supernovae has dumped the energy of 10^54.5 erg over the past 10 Myr into LMC 4, in fact enough to tear the original star-forming cloud apart in the time span between 5 and 8 Myr after the starformation burst, initiated by a large scale triggering event. We conclude that LMC 4 can have been formed without a contribution from stochastic self-propagating star formation (SSPSF).
△ Less
Submitted 8 August, 1997;
originally announced August 1997.