-
Detailed Report on the Measurement of the Positive Muon Anomalous Magnetic Moment to 0.20 ppm
Authors:
D. P. Aguillard,
T. Albahri,
D. Allspach,
A. Anisenkov,
K. Badgley,
S. Baeßler,
I. Bailey,
L. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
E. Barzi,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
S. Braun,
M. Bressler,
G. Cantatore,
R. M. Carey,
B. C. K. Casey
, et al. (168 additional authors not shown)
Abstract:
We present details on a new measurement of the muon magnetic anomaly, $a_μ= (g_μ-2)/2$. The result is based on positive muon data taken at Fermilab's Muon Campus during the 2019 and 2020 accelerator runs. The measurement uses $3.1$ GeV$/c$ polarized muons stored in a $7.1$-m-radius storage ring with a $1.45$ T uniform magnetic field. The value of $ a_μ$ is determined from the measured difference b…
▽ More
We present details on a new measurement of the muon magnetic anomaly, $a_μ= (g_μ-2)/2$. The result is based on positive muon data taken at Fermilab's Muon Campus during the 2019 and 2020 accelerator runs. The measurement uses $3.1$ GeV$/c$ polarized muons stored in a $7.1$-m-radius storage ring with a $1.45$ T uniform magnetic field. The value of $ a_μ$ is determined from the measured difference between the muon spin precession frequency and its cyclotron frequency. This difference is normalized to the strength of the magnetic field, measured using Nuclear Magnetic Resonance (NMR). The ratio is then corrected for small contributions from beam motion, beam dispersion, and transient magnetic fields. We measure $a_μ= 116 592 057 (25) \times 10^{-11}$ (0.21 ppm). This is the world's most precise measurement of this quantity and represents a factor of $2.2$ improvement over our previous result based on the 2018 dataset. In combination, the two datasets yield $a_μ(\text{FNAL}) = 116 592 055 (24) \times 10^{-11}$ (0.20 ppm). Combining this with the measurements from Brookhaven National Laboratory for both positive and negative muons, the new world average is $a_μ$(exp) $ = 116 592 059 (22) \times 10^{-11}$ (0.19 ppm).
△ Less
Submitted 22 May, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
The Functional Gait Deviation Index
Authors:
Sajal Kaur Minhas,
Morgan Sangeux,
Julia Polak,
Michelle Carey
Abstract:
A typical gait analysis requires the examination of the motion of nine joint angles on the left-hand side and six joint angles on the right-hand side across multiple subjects. Due to the quantity and complexity of the data, it is useful to calculate the amount by which a subject's gait deviates from an average normal profile and to represent this deviation as a single number. Such a measure can qu…
▽ More
A typical gait analysis requires the examination of the motion of nine joint angles on the left-hand side and six joint angles on the right-hand side across multiple subjects. Due to the quantity and complexity of the data, it is useful to calculate the amount by which a subject's gait deviates from an average normal profile and to represent this deviation as a single number. Such a measure can quantify the overall severity of a condition affecting walking, monitor progress, or evaluate the outcome of an intervention prescribed to improve the gait pattern. The gait deviation index, gait profile score, and the overall abnormality measure are standard benchmarks for quantifying gait abnormality. However, these indices do not account for the intrinsic smoothness of the gait movement at each joint/plane and the potential co-variation between the joints/planes. Utilizing a multivariate functional principal components analysis we propose the functional gait deviation index (FGDI). FGDI accounts for the intrinsic smoothness of the gait movement at each joint/plane and the potential co-variation between the joints. We show that FGDI scales with overall gait function, provides a consistent measure of gait abnormality, and is implemented easily using an interactive web app.
△ Less
Submitted 26 October, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Optimizing beam-splitter pulses for atom interferometry: a geometric approach
Authors:
Nikolaos Dedes,
Jack Saywell,
Max Carey,
Ilya Kuprov,
Tim Freegarde
Abstract:
We present a methodology for the design of optimal Raman beam-splitter pulses suitable for cold atom inertial sensors. The methodology, based on time-dependent perturbation theory, links optimal control and the sensitivity function formalism in the Bloch sphere picture, thus providing a geometric interpretation of the optimization problem. Optimized pulse waveforms are found to be more resilient t…
▽ More
We present a methodology for the design of optimal Raman beam-splitter pulses suitable for cold atom inertial sensors. The methodology, based on time-dependent perturbation theory, links optimal control and the sensitivity function formalism in the Bloch sphere picture, thus providing a geometric interpretation of the optimization problem. Optimized pulse waveforms are found to be more resilient than conventional beam-splitter pulses and ensure a near-flat superposition phase for a range of detunings approaching the Rabi frequency. As a practical application, we have simulated the performance of an optimized Mach-Zehnder interferometer in terms of scale-factor error and bias induced by inter-pulse laser intensity variations. Our findings reveal enhancements compared to conventional interferometers operating with constant-power beam-splitter pulses.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Measurement of the Positive Muon Anomalous Magnetic Moment to 0.20 ppm
Authors:
D. P. Aguillard,
T. Albahri,
D. Allspach,
A. Anisenkov,
K. Badgley,
S. Baeßler,
I. Bailey,
L. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
E. Barzi,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
S. Braun,
M. Bressler,
G. Cantatore,
R. M. Carey,
B. C. K. Casey
, et al. (166 additional authors not shown)
Abstract:
We present a new measurement of the positive muon magnetic anomaly, $a_μ\equiv (g_μ- 2)/2$, from the Fermilab Muon $g\!-\!2$ Experiment using data collected in 2019 and 2020. We have analyzed more than 4 times the number of positrons from muon decay than in our previous result from 2018 data. The systematic error is reduced by more than a factor of 2 due to better running conditions, a more stable…
▽ More
We present a new measurement of the positive muon magnetic anomaly, $a_μ\equiv (g_μ- 2)/2$, from the Fermilab Muon $g\!-\!2$ Experiment using data collected in 2019 and 2020. We have analyzed more than 4 times the number of positrons from muon decay than in our previous result from 2018 data. The systematic error is reduced by more than a factor of 2 due to better running conditions, a more stable beam, and improved knowledge of the magnetic field weighted by the muon distribution, $\tildeω'^{}_p$, and of the anomalous precession frequency corrected for beam dynamics effects, $ω_a$. From the ratio $ω_a / \tildeω'^{}_p$, together with precisely determined external parameters, we determine $a_μ= 116\,592\,057(25) \times 10^{-11}$ (0.21 ppm). Combining this result with our previous result from the 2018 data, we obtain $a_μ\text{(FNAL)} = 116\,592\,055(24) \times 10^{-11}$ (0.20 ppm). The new experimental world average is $a_μ(\text{Exp}) = 116\,592\,059(22)\times 10^{-11}$ (0.19 ppm), which represents a factor of 2 improvement in precision.
△ Less
Submitted 4 October, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Enhancing the sensitivity of atom-interferometric inertial sensors using robust control
Authors:
J. C. Saywell,
M. S. Carey,
P. S. Light,
S. S. Szigeti,
A. R. Milne,
K. S. Gill,
M. L. Goh,
V. S. Perunicic,
N. M. Wilson,
C. D. Macrae,
A. Rischka,
P. J. Everitt,
N. P. Robins,
R. P. Anderson,
M. R. Hush,
M. J. Biercuk
Abstract:
Atom-interferometric quantum sensors could revolutionize navigation, civil engineering, and Earth observation. However, operation in real-world environments is challenging due to external interference, platform noise, and constraints on size, weight, and power. Here we experimentally demonstrate that tailored light pulses designed using robust control techniques mitigate significant error sources…
▽ More
Atom-interferometric quantum sensors could revolutionize navigation, civil engineering, and Earth observation. However, operation in real-world environments is challenging due to external interference, platform noise, and constraints on size, weight, and power. Here we experimentally demonstrate that tailored light pulses designed using robust control techniques mitigate significant error sources in an atom-interferometric accelerometer. To mimic the effect of unpredictable lateral platform motion, we apply laser-intensity noise that varies up to 20$\%$ from pulse-to-pulse. Our robust control solution maintains performant sensing, while the utility of conventional pulses collapses. By measuring local gravity, we show that our robust pulses preserve interferometer scale factor and improve measurement precision by 10$\times$ in the presence of this noise. We further validate these enhancements by measuring applied accelerations over a 200 $μg$ range up to 21$\times$ more precisely at the highest applied noise level. Our demonstration provides a pathway to improved atom-interferometric inertial sensing in real-world settings.
△ Less
Submitted 30 November, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
JEDI: These aren't the JSON documents you're looking for... (Extended Version*)
Authors:
Thomas Hütter,
Nikolaus Augsten,
Christoph M. Kirsch,
Michael J. Carey,
Chen Li
Abstract:
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold $τ$, retrieve all JSON documents that are within $τ$ from the query document. Due to its recursive definition, JSON data are naturally represented as…
▽ More
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold $τ$, retrieve all JSON documents that are within $τ$ from the query document. Due to its recursive definition, JSON data are naturally represented as trees. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document. This feature poses a new challenge to the tree model and distance computation. We propose JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON documents. We develop an algorithm, called QuickJEDI, for computing JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. To boost the performance of JSON similarity queries, we introduce an index called JSIM and a highly effective upper bound based on tree sorting. Our algorithm for the upper bound runs in $O(n τ)$ time and $O(n + τ\log n)$ space, which substantially improves the previous best bound of $O(n^2)$ time and $O(n \log n)$ space (where $n$ is the tree size). Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.
△ Less
Submitted 21 January, 2022; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Parallel locomotor control strategies in mice and flies
Authors:
Ana I. Gonçalves,
Jacob A. Zavatone-Veth,
Megan R. Carey,
Damon A. Clark
Abstract:
Our understanding of the neural basis of locomotor behavior can be informed by careful quantification of animal movement. Classical descriptions of legged locomotion have defined discrete locomotor gaits, characterized by distinct patterns of limb movement. Recent technical advances have enabled increasingly detailed characterization of limb kinematics across many species, imposing tighter constra…
▽ More
Our understanding of the neural basis of locomotor behavior can be informed by careful quantification of animal movement. Classical descriptions of legged locomotion have defined discrete locomotor gaits, characterized by distinct patterns of limb movement. Recent technical advances have enabled increasingly detailed characterization of limb kinematics across many species, imposing tighter constraints on neural control. Here, we highlight striking similarities between coordination patterns observed in two genetic model organisms: the laboratory mouse and Drosophila. Both species exhibit continuously-variable coordination patterns with similar low-dimensional structure, suggesting shared principles for limb coordination and descending neural control.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Design Trade-offs for a Robust Dynamic Hybrid Hash Join (Extended Version)
Authors:
Shiva Jahangiri,
Michael J. Carey,
Johann-Christoph Freytag
Abstract:
The Join operator, as one of the most expensive and commonly used operators in database systems, plays a substantial role in Database Management System (DBMS) performance. Among the many different Join algorithms studied over the last decades, Hybrid Hash Join (HHJ) has proven to be one of the most efficient and widely-used join algorithms. While the performance of HHJ depends largely on accurate…
▽ More
The Join operator, as one of the most expensive and commonly used operators in database systems, plays a substantial role in Database Management System (DBMS) performance. Among the many different Join algorithms studied over the last decades, Hybrid Hash Join (HHJ) has proven to be one of the most efficient and widely-used join algorithms. While the performance of HHJ depends largely on accurate statistics and information about the input relations, it may not always be practical or possible for a system to have such information available.
The design of HHJ depends on many details to perform well. This paper is an experimental and analytical study of the trade-offs in designing a robust and dynamic HHJ operator. We revisit the design and optimization techniques suggested by previous studies through extensive experiments, comparing them with other algorithms designed by us or used in related studies.
We explore the impact of the number of partitions on the performance of HHJ and propose a lower bound and a default value for the number of partitions. We continue by designing and evaluating different partition insertion techniques to maximize memory utilization with the least CPU cost. In addition, we consider a comprehensive set of algorithms for dynamically selecting a partition to spill and compare the results against previously published studies. We then present two alternative growth policies for spilled partitions and study their effectiveness using experimental and model-based analyses.
These algorithms have been implemented in the context of Apache AsterixDB and evaluated under different scenarios such as variable record sizes, different distributions of join attributes, and different storage types, including HDD, SSD, and Amazon Elastic Block Store (Amazon EBS).
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
Columnar Formats for Schemaless LSM-based Document Stores
Authors:
Wail Y. Alkowaileet,
Michael J. Carey
Abstract:
In the last decade, document store database systems have gained more traction for storing and querying large volumes of semi-structured data. However, the flexibility of the document stores' data models has limited their ability to store data in a columnar-major layout - making them less performant for analytical workloads than column store relational databases. In this paper, we propose several t…
▽ More
In the last decade, document store database systems have gained more traction for storing and querying large volumes of semi-structured data. However, the flexibility of the document stores' data models has limited their ability to store data in a columnar-major layout - making them less performant for analytical workloads than column store relational databases. In this paper, we propose several techniques based on piggy-backing on Log-Structured Merge (LSM) tree events and tailored to document stores to store document data in a columnar layout. We first extend the Dremel format, a popular on-disk columnar format for semi-structured data, to comply with document stores' flexible data model. We then introduce two columnar layouts for organizing and storing data in LSM-based storage. We also highlight the potential of using query compilation techniques for document stores, where values' types are known only at runtime. We have implemented and evaluated our techniques to measure their impact on storage, data ingestion, and query performance in Apache AsterixDB. Our experiments show significant performance gains, improving the query execution time by orders of magnitude while minimally impacting ingestion performance.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
The Straw Tracking Detector for the Fermilab Muon $g-2$ Experiment
Authors:
B. T. King,
T. Albahri,
S. Al-Kilani,
D. Allspach,
D. Beckner,
A. Behnke,
T. J. V. Bowcock,
D. Boyden,
R. M. Carey,
J. Carroll,
B. C. K. Casey,
S. Charity,
R. Chislett,
M. Eads,
A. Epps,
S. B. Foster,
D. Gastler,
S. Grant,
T. Halewood-Leagas,
K. Hardin,
E. Hazen,
G. Hesketh,
D. J. Hollywood,
T. Jones,
C. Kenziora
, et al. (32 additional authors not shown)
Abstract:
The Muon $g-2$ Experiment at Fermilab uses a gaseous straw tracking detector to make detailed measurements of the stored muon beam profile, which are essential for the experiment to achieve its uncertainty goals. Positrons from muon decays spiral inward and pass through the tracking detector before striking an electromagnetic calorimeter. The tracking detector is therefore located inside the vacuu…
▽ More
The Muon $g-2$ Experiment at Fermilab uses a gaseous straw tracking detector to make detailed measurements of the stored muon beam profile, which are essential for the experiment to achieve its uncertainty goals. Positrons from muon decays spiral inward and pass through the tracking detector before striking an electromagnetic calorimeter. The tracking detector is therefore located inside the vacuum chamber in a region where the magnetic field is large and non-uniform. As such, the tracking detector must have a low leak rate to maintain a high-quality vacuum, must be non-magnetic so as not to perturb the magnetic field and, to minimize energy loss, must have a low radiation length. The performance of the tracking detector has met or surpassed the design requirements, with adequate electronic noise levels, an average straw hit resolution of $(110 \pm 20) \,μ$m, a detection efficiency of 97% or higher, and no performance degradation or signs of aging. The tracking detector's measurements result in an otherwise unachievable understanding of the muon's beam motion, particularly at early times in the experiment's measurement period when there are a significantly greater number of muons decaying. This is vital to the statistical power of the experiment, as well as facilitating the precise extraction of several systematic corrections and uncertainties. This paper describes the design, construction, testing, commissioning, and performance of the tracking detector.
△ Less
Submitted 24 February, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
DynaHash: Efficient Data Rebalancing in Apache AsterixDB (Extended Version)
Authors:
Chen Luo,
Michael J. Carey
Abstract:
Parallel shared-nothing data management systems have been widely used to exploit a cluster of machines for efficient and scalable data processing. When a cluster needs to be dynamically scaled in or out, data must be efficiently rebalanced. Ideally, data rebalancing should have a low data movement cost, incur a small overhead on data ingestion and query processing, and be performed online without…
▽ More
Parallel shared-nothing data management systems have been widely used to exploit a cluster of machines for efficient and scalable data processing. When a cluster needs to be dynamically scaled in or out, data must be efficiently rebalanced. Ideally, data rebalancing should have a low data movement cost, incur a small overhead on data ingestion and query processing, and be performed online without blocking reads or writes. However, existing parallel data management systems often exhibit certain limitations and drawbacks in terms of efficient data rebalancing.
In this paper, we introduce DynaHash, an efficient data rebalancing approach that combines dynamic bucketing with extendible hashing for shared-nothing OLAP-style parallel data management systems. DynaHash dynamically partitions the records into a number of buckets using extendible hashing to achieve good a load balance with small rebalancing costs. We further describe an end-to-end implementation of the proposed approach inside an open-source Big Data Management System (BDMS), Apache AsterixDB. Our implementation exploits the out-of-place update design of LSM-trees to efficiently rebalance data without blocking concurrent reads and writes. Finally, we have conducted performance experiments using the TPC-H benchmark and we present the results here.
△ Less
Submitted 23 May, 2021;
originally announced May 2021.
-
Measurement of the Positive Muon Anomalous Magnetic Moment to 0.46 ppm
Authors:
B. Abi,
T. Albahri,
S. Al-Kilani,
D. Allspach,
L. P. Alonzi,
A. Anastasi,
A. Anisenkov,
F. Azfar,
K. Badgley,
S. Baeßler,
I. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
E. Barzi,
A. Basti,
F. Bedeschi,
A. Behnke,
M. Berz,
M. Bhattacharya,
H. P. Binney,
R. Bjorkquist,
P. Bloom,
J. Bono,
E. Bottalico
, et al. (212 additional authors not shown)
Abstract:
We present the first results of the Fermilab Muon g-2 Experiment for the positive muon magnetic anomaly $a_μ\equiv (g_μ-2)/2$. The anomaly is determined from the precision measurements of two angular frequencies. Intensity variation of high-energy positrons from muon decays directly encodes the difference frequency $ω_a$ between the spin-precession and cyclotron frequencies for polarized muons in…
▽ More
We present the first results of the Fermilab Muon g-2 Experiment for the positive muon magnetic anomaly $a_μ\equiv (g_μ-2)/2$. The anomaly is determined from the precision measurements of two angular frequencies. Intensity variation of high-energy positrons from muon decays directly encodes the difference frequency $ω_a$ between the spin-precession and cyclotron frequencies for polarized muons in a magnetic storage ring. The storage ring magnetic field is measured using nuclear magnetic resonance probes calibrated in terms of the equivalent proton spin precession frequency ${\tildeω'^{}_p}$ in a spherical water sample at 34.7$^{\circ}$C. The ratio $ω_a / {\tildeω'^{}_p}$, together with known fundamental constants, determines $a_μ({\rm FNAL}) = 116\,592\,040(54)\times 10^{-11}$ (0.46\,ppm). The result is 3.3 standard deviations greater than the standard model prediction and is in excellent agreement with the previous Brookhaven National Laboratory (BNL) E821 measurement. After combination with previous measurements of both $μ^+$ and $μ^-$, the new experimental average of $a_μ({\rm Exp}) = 116\,592\,061(41)\times 10^{-11}$ (0.35\,ppm) increases the tension between experiment and theory to 4.2 standard deviations
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Measurement of the anomalous precession frequency of the muon in the Fermilab Muon g-2 experiment
Authors:
T. Albahri,
A. Anastasi,
A. Anisenkov,
K. Badgley,
S. Baeßler,
I. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
A. Basti,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
G. Cantatore,
R. M. Carey,
B. C. K. Casey,
D. Cauz,
R. Chakraborty,
S. P. Chang,
A. Chapelain
, et al. (153 additional authors not shown)
Abstract:
The Muon g-2 Experiment at Fermi National Accelerator Laboratory (FNAL) has measured the muon anomalous precession frequency $ω_a$ to an uncertainty of 434 parts per billion (ppb), statistical, and 56 ppb, systematic, with data collected in four storage ring configurations during its first physics run in 2018. When combined with a precision measurement of the magnetic field of the experiment's muo…
▽ More
The Muon g-2 Experiment at Fermi National Accelerator Laboratory (FNAL) has measured the muon anomalous precession frequency $ω_a$ to an uncertainty of 434 parts per billion (ppb), statistical, and 56 ppb, systematic, with data collected in four storage ring configurations during its first physics run in 2018. When combined with a precision measurement of the magnetic field of the experiment's muon storage ring, the precession frequency measurement determines a muon magnetic anomaly of $a_μ({\rm FNAL}) = 116\,592\,040(54) \times 10^{-11}$ (0.46 ppm). This article describes the multiple techniques employed in the reconstruction, analysis and fitting of the data to measure the precession frequency. It also presents the averaging of the results from the eleven separate determinations of ω_a, and the systematic uncertainties on the result.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Beam dynamics corrections to the Run-1 measurement of the muon anomalous magnetic moment at Fermilab
Authors:
T. Albahri,
A. Anastasi,
K. Badgley,
S. Baeßler,
I. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
G. Cantatore,
R. M. Carey,
B. C. K. Casey,
D. Cauz,
R. Chakraborty,
S. P. Chang,
A. Chapelain,
S. Charity,
R. Chislett
, et al. (152 additional authors not shown)
Abstract:
This paper presents the beam dynamics systematic corrections and their uncertainties for the Run-1 data set of the Fermilab Muon g-2 Experiment. Two corrections to the measured muon precession frequency $ω_a^m$ are associated with well-known effects owing to the use of electrostatic quadrupole (ESQ) vertical focusing in the storage ring. An average vertically oriented motional magnetic field is fe…
▽ More
This paper presents the beam dynamics systematic corrections and their uncertainties for the Run-1 data set of the Fermilab Muon g-2 Experiment. Two corrections to the measured muon precession frequency $ω_a^m$ are associated with well-known effects owing to the use of electrostatic quadrupole (ESQ) vertical focusing in the storage ring. An average vertically oriented motional magnetic field is felt by relativistic muons passing transversely through the radial electric field components created by the ESQ system. The correction depends on the stored momentum distribution and the tunes of the ring, which has relatively weak vertical focusing. Vertical betatron motions imply that the muons do not orbit the ring in a plane exactly orthogonal to the vertical magnetic field direction. A correction is necessary to account for an average pitch angle associated with their trajectories. A third small correction is necessary because muons that escape the ring during the storage time are slightly biased in initial spin phase compared to the parent distribution. Finally, because two high-voltage resistors in the ESQ network had longer than designed RC time constants, the vertical and horizontal centroids and envelopes of the stored muon beam drifted slightly, but coherently, during each storage ring fill. This led to the discovery of an important phase-acceptance relationship that requires a correction. The sum of the corrections to $ω_a^m$ is 0.50 $\pm$ 0.09 ppm; the uncertainty is small compared to the 0.43 ppm statistical precision of $ω_a^m$.
△ Less
Submitted 23 April, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Magnetic Field Measurement and Analysis for the Muon g-2 Experiment at Fermilab
Authors:
T. Albahri,
A. Anastasi,
K. Badgley,
S. Baeßler,
I. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
G. Cantatore,
R. M. Carey,
B. C. K. Casey,
D. Cauz,
R. Chakraborty,
S. P. Chang,
A. Chapelain,
S. Charity,
R. Chislett
, et al. (148 additional authors not shown)
Abstract:
The Fermi National Accelerator Laboratory has measured the anomalous precession frequency $a^{}_μ= (g^{}_μ-2)/2$ of the muon to a combined precision of 0.46 parts per million with data collected during its first physics run in 2018. This paper documents the measurement of the magnetic field in the muon storage ring. The magnetic field is monitored by nuclear magnetic resonance systems and calibrat…
▽ More
The Fermi National Accelerator Laboratory has measured the anomalous precession frequency $a^{}_μ= (g^{}_μ-2)/2$ of the muon to a combined precision of 0.46 parts per million with data collected during its first physics run in 2018. This paper documents the measurement of the magnetic field in the muon storage ring. The magnetic field is monitored by nuclear magnetic resonance systems and calibrated in terms of the equivalent proton spin precession frequency in a spherical water sample at 34.7$^\circ$C. The measured field is weighted by the muon distribution resulting in $\tildeω'^{}_p$, the denominator in the ratio $ω^{}_a$/$\tildeω'^{}_p$ that together with known fundamental constants yields $a^{}_μ$. The reported uncertainty on $\tildeω'^{}_p$ for the Run-1 data set is 114 ppb consisting of uncertainty contributions from frequency extraction, calibration, map**, tracking, and averaging of 56 ppb, and contributions from fast transient fields of 99 ppb.
△ Less
Submitted 17 June, 2022; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Bridging BAD Islands: Declarative Data Sharing at Scale
Authors:
Xikui Wang,
Michael J. Carey,
Vassilis J. Tsotras
Abstract:
In many Big Data applications today, information needs to be actively shared between systems managed by different organizations. To enable sharing Big Data at scale, developers would have to create dedicated server programs and glue together multiple Big Data systems for scalability. Develo** and managing such glued data sharing services requires a significant amount of work from developers. In…
▽ More
In many Big Data applications today, information needs to be actively shared between systems managed by different organizations. To enable sharing Big Data at scale, developers would have to create dedicated server programs and glue together multiple Big Data systems for scalability. Develo** and managing such glued data sharing services requires a significant amount of work from developers. In our prior work, we developed a Big Active Data (BAD) system for enabling Big Data subscriptions and analytics with millions of subscribers. Based on that, we introduce a new mechanism for enabling the sharing of Big Data at scale declaratively so that developers can easily create and provide data sharing services using declarative statements and can benefit from an underlying scalable infrastructure. We show our implementation on top of the BAD system, explain the data sharing data flow among multiple systems, and present a prototype system with experimental results.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
A control hardware based on a field programmable gate array for experiments in atomic physics
Authors:
A. Bertoldi,
C. -H. Feng,
H. Eneriz Imaz,
M. Carey,
D. S. Naik,
J. Junca,
X. Zou,
D. O. Sabulsky,
B. Canuel,
P. Bouyer,
M. Prevedelli
Abstract:
Experiments in Atomic, Molecular, and Optical (AMO) physics require precise and accurate control of digital, analog, and radio frequency (RF) signals. We present a control hardware based on a field programmable gate array (FPGA) core which drives various modules via a simple interface bus. The system supports an operating frequency of 10 MHz and a memory depth of 8 M (2$^{23}$) instructions, both…
▽ More
Experiments in Atomic, Molecular, and Optical (AMO) physics require precise and accurate control of digital, analog, and radio frequency (RF) signals. We present a control hardware based on a field programmable gate array (FPGA) core which drives various modules via a simple interface bus. The system supports an operating frequency of 10 MHz and a memory depth of 8 M (2$^{23}$) instructions, both easily scalable. Successive experimental sequences can be stacked with no dead time and synchronized with external events at any instructions. Two or more units can be cascaded and synchronized to a common clock, a feature useful to operate large experimental setups in a modular way.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
PolyFrame: A Retargetable Query-based Approach to Scaling DataFrames (Extended Version)
Authors:
Phanwadee Sinthong,
Michael J. Carey
Abstract:
In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision making and applications. Scaling data analysis, possibly including the application of custom machine learning models, to large volumes of data requires the utilization of distributed frameworks. This can lead to serious t…
▽ More
In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision making and applications. Scaling data analysis, possibly including the application of custom machine learning models, to large volumes of data requires the utilization of distributed frameworks. This can lead to serious technical challenges for data analysts and reduce their productivity. AFrame, a Python data analytics library, is implemented as a layer on top of Apache AsterixDB, addressing these issues by incorporating the data scientists' development environment and transparently scaling out the evaluation of analytical operations through a Big Data management system. While AFrame is able to leverage data management facilities (e.g., indexes and query optimization) and allows users to interact with a very large volume of data, the initial version only generated SQL++ queries and only operated against Apache AsterixDB. In this work, we describe a new design that retargets AFrame's incremental query formation to other query-based database systems as well, making it more flexible for deployment against other data management systems with composable query languages.
△ Less
Submitted 10 February, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems
Authors:
Christina Pavlopoulou,
Michael J. Carey,
Vassilis J. Tsotras
Abstract:
Query Optimization remains an open problem for Big Data Management Systems. Traditional optimizers are cost-based and use statistical estimates of intermediate result cardinalities to assign costs and pick the best plan. However, such estimates tend to become less accurate because of filtering conditions caused either from undetected correlations between multiple predicates local to a single datas…
▽ More
Query Optimization remains an open problem for Big Data Management Systems. Traditional optimizers are cost-based and use statistical estimates of intermediate result cardinalities to assign costs and pick the best plan. However, such estimates tend to become less accurate because of filtering conditions caused either from undetected correlations between multiple predicates local to a single dataset, predicates with query parameters, or predicates involving user-defined functions (UDFs). Consequently, traditional query optimizers tend to ignore or miscalculate those settings, thus leading to suboptimal execution plans. Given the volume of today's data, a suboptimal plan can quickly become very inefficient.
In this work, we revisit the old idea of runtime dynamic optimization and adapt it to a shared-nothing distributed database system, AsterixDB. The optimization runs in stages (re-optimization points), starting by first executing all predicates local to a single dataset. The intermediate result created from each stage is used to re-optimize the remaining query. This re-optimization approach avoids inaccurate intermediate result cardinality estimations, thus leading to much better execution plans. While it introduces the overhead for materializing these intermediate results, our experiments show that this overhead is relatively small and it is an acceptable price to pay given the optimization benefits. In fact, our experimental evaluation shows that runtime dynamic optimization leads to much better execution plans as compared to the current default AsterixDB plans as well as to plans produced by static cost-based optimization (i.e. based on the initial dataset statistics) and other state-of-the-art approaches.
△ Less
Submitted 5 October, 2020; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Subscribing to Big Data at Scale
Authors:
Xikui Wang,
Michael J. Carey,
Vassilis J. Tsotras
Abstract:
Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus on passively answering queries from users, rather than actively collecting data, processing it, and serving it to users. To satisf…
▽ More
Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus on passively answering queries from users, rather than actively collecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, users need either to heavily customize an existing passive Big Data system or to glue multiple systems together. Either choice would require significant effort from users and incur additional overhead. In this paper, we present the BAD (Big Active Data) system, which is designed to preserve the merits of passive Big Data systems and introduce new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system's performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a "glued" system.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Ultrathin perpendicular free layers for lowering the switching current in STT-MRAM
Authors:
Tiffany S. Santos,
Goran Mihajlovic,
Neil Smith,
J. -L. Li,
Matthew Carey,
Jordan A. Katine,
Bruce D. Terris
Abstract:
The critical current density $J_{c0}$ required for switching the magnetization of the free layer (FL) in a spin-transfer torque magnetic random access memory (STT-MRAM) cell is proportional to the product of the dam** parameter, saturation magnetization and thickness of the free layer, $αM_S t_F$. Conventional FLs have the structure CoFeB/nonmagnetic spacer/CoFeB. By reducing the spacer thicknes…
▽ More
The critical current density $J_{c0}$ required for switching the magnetization of the free layer (FL) in a spin-transfer torque magnetic random access memory (STT-MRAM) cell is proportional to the product of the dam** parameter, saturation magnetization and thickness of the free layer, $αM_S t_F$. Conventional FLs have the structure CoFeB/nonmagnetic spacer/CoFeB. By reducing the spacer thickness, W in our case, and also splitting the single W layer into two layers of sub-monolayer thickness, we have reduced $t_F$ while minimizing $α$ and maximizing $M_S$, ultimately leading to lower $J_{c0}$ while maintaining high thermal stability. Bottom-pinned MRAM cells with device diameter in the range of 55-130 nm were fabricated, and $J_{c0}$ is lowest for the thinnest (1.2 nm) FLs, down to 4 MA/cm$^2$ for 65 nm devices, $\sim$30% lower than 1.7 nm FLs. The thermal stability factor $Δ_{\mathrm{dw}}$, as high as 150 for the smallest device size, was determined using a domain wall reversal model from field switching probability measurements. With high $Δ_{\mathrm{dw}}$ and lowest $J_{c0}$, the thinnest FLs have the highest spin-transfer torque efficiency.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Bi-selective pulses for large-area atom interferometry
Authors:
Jack Saywell,
Max Carey,
Ilya Kuprov,
Tim Freegarde
Abstract:
We present designs for the augmentation 'mirror' pulses of large-momentum-transfer atom interferometers that maintain their fidelity as the wavepacket momentum difference is increased. These bi-selective pulses, tailored using optimal control methods to the evolving bi-modal momentum distribution, should allow greater interferometer areas and hence increased inertial measurement sensitivity, witho…
▽ More
We present designs for the augmentation 'mirror' pulses of large-momentum-transfer atom interferometers that maintain their fidelity as the wavepacket momentum difference is increased. These bi-selective pulses, tailored using optimal control methods to the evolving bi-modal momentum distribution, should allow greater interferometer areas and hence increased inertial measurement sensitivity, without requiring elevated Rabi frequencies or extended frequency chirps. Using an experimentally validated model, we have simulated the application of our pulse designs to large-momentum-transfer atom interferometry using stimulated Raman transitions in a laser-cooled atomic sample of $^{85}$Rb at 1 $μ$K. After the wavepackets have separated by 42 photon recoil momenta, our pulses maintain a fringe contrast of 90% whereas, for adiabatic rapid passage and conventional $π$ pulses, the contrast is less than 10%. Furthermore, we show how these pulses may be adapted to suppress the detrimental off-resonant excitation that limits other broadband pulse schemes.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems (Extended Version)
Authors:
Chen Luo,
Michael J. Carey
Abstract:
Log-Structured Merge-trees (LSM-trees) have been widely used in modern NoSQL systems. Due to their out-of-place update design, LSM-trees have introduced memory walls among the memory components of multiple LSM-trees and between the write memory and the buffer cache. Optimal memory allocation among these regions is non-trivial because it is highly workload-dependent. Existing LSM-tree implementatio…
▽ More
Log-Structured Merge-trees (LSM-trees) have been widely used in modern NoSQL systems. Due to their out-of-place update design, LSM-trees have introduced memory walls among the memory components of multiple LSM-trees and between the write memory and the buffer cache. Optimal memory allocation among these regions is non-trivial because it is highly workload-dependent. Existing LSM-tree implementations instead adopt static memory allocation schemes due to their simplicity and robustness, sacrificing performance. In this paper, we attempt to break down these memory walls in LSM-based storage systems. We first present a memory management architecture that enables adaptive memory management. We then present a partitioned memory component structure with new flush policies to better exploit the write memory to minimize the write cost. To break down the memory wall between the write memory and the buffer cache, we further introduce a memory tuner that tunes the memory allocation between these two regions. We have conducted extensive experiments in the context of Apache AsterixDB using the YCSB and TPC-C benchmarks and we present the results here.
△ Less
Submitted 14 July, 2020; v1 submitted 21 April, 2020;
originally announced April 2020.
-
BAD to the Bone: Big Active Data at its Core
Authors:
Steven Jacobs,
Xikui Wang,
Michael J. Carey,
Vassilis J. Tsotras,
Md Yusuf Sarwar Uddin
Abstract:
Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as sup…
▽ More
Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information. While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data. To this end we have created a BAD platform that combines ideas and capabilities from both Big Data and Active Data (e.g., Publish/Subscribe, Streaming Engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. Our platform extends an existing open-source Big Data Management System, Apache AsterixDB, with an active toolkit. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user. This paper describes the features and design of our current BAD data platform and demonstrates its ability to scale without sacrificing query capabilities or result individualization.
△ Less
Submitted 23 May, 2020; v1 submitted 22 February, 2020;
originally announced February 2020.
-
Fast Stable Parameter Estimation for Linear Dynamical Systems
Authors:
Michelle Carey,
James O. Ramsay
Abstract:
Dynamical systems describe the changes in processes that arise naturally from their underlying physical principles, such as the laws of motion or the conservation of mass, energy or momentum. These models facilitate a causal explanation for the drivers and impediments of the processes. But do they describe the behaviour of the observed data? And how can we quantify the models' parameters that cann…
▽ More
Dynamical systems describe the changes in processes that arise naturally from their underlying physical principles, such as the laws of motion or the conservation of mass, energy or momentum. These models facilitate a causal explanation for the drivers and impediments of the processes. But do they describe the behaviour of the observed data? And how can we quantify the models' parameters that cannot be measured directly? This paper addresses these two questions by providing a methodology for estimating the solution; and the parameters of linear dynamical systems from incomplete and noisy observations of the processes.
The proposed procedure builds on the parameter cascading approach, where a linear combination of basis functions approximates the implicitly defined solution of the dynamical system. The systems' parameters are then estimated so that this approximating solution adheres to the data. By taking advantage of the linearity of the system, we have simplified the parameter cascading estimation procedure, and by develo** a new iterative scheme, we achieve fast and stable computation.
We illustrate our approach by obtaining a linear differential equation that represents real data from biomechanics. Comparing our approach with popular methods for estimating the parameters of linear dynamical systems, namely, the non-linear least-squares approach, simulated annealing, parameter cascading and smooth functional tempering reveals a considerable reduction in computation and an improved bias and sampling variance.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
Search for muon catalyzed $d^3He$ fusion
Authors:
V. D. Fotev,
V. A. Ganzha,
K. A. Ivshin,
P. V. Kravchenko,
P. A. Kravtsov,
E. M. Maev,
A. V. Nadtochy,
A. N. Solovev,
I. N. Solovyev,
A. A. Vasilyev,
A. A. Vorobyov,
N. I. Voropaev,
M. E. Vznuzdaev,
P. Kammel,
E. T. Muldoon,
R. A. Ryan,
D. J. Salvat,
D. Prindle,
M. Hildebrandt,
B. Lauss,
C. Petitjean,
T. Gorringe,
R. M. Carey,
F. E. Gray
Abstract:
This report presents the results of an experiment aimed at observation of the muon catalyzed $^3\!He\;d$ fusion reaction $^3\!He + μ\;d\to^3\!He\;μ\;d\to^4\!He(3.66MeV)+p(14.64MeV)+μ$ which might occur after a negative muon stop in the $D_2+^3\!He$ gas mixture. The basic element of the experimental setup is a Time Projection Chamber (TPC) which can detect the incoming muons and the products of the…
▽ More
This report presents the results of an experiment aimed at observation of the muon catalyzed $^3\!He\;d$ fusion reaction $^3\!He + μ\;d\to^3\!He\;μ\;d\to^4\!He(3.66MeV)+p(14.64MeV)+μ$ which might occur after a negative muon stop in the $D_2+^3\!He$ gas mixture. The basic element of the experimental setup is a Time Projection Chamber (TPC) which can detect the incoming muons and the products of the fusion reaction. The TPC operated with the $D_2 + ^3He (5%)$ gas mixture at $31K$ temperature. About $10^8$ $^3\!He\;μ\;d$ molecules were produced with only 2 registered candidates for the muon catalyzed $^3\!He\;d$ fusion with the expected background $N_{bg}=2.2\pm 0.3$ events. This gives an upper limit for the probability of the fusion decay of the $^3\!He\;μ\;d$ molecule $P_{F}(^3\!He\;μ\;d)\leq 1.1\cdot 10^{-7}$ at 90% C.L. Also presented are the measured formation rate of the $^3\!He\;μ\;d$ molecule $λ_{d3He}=192(3)\cdot 10^6 s^{-1}$ and the probability of the fast muon transfer from the excited to the ground state of the $μ\;d$ atom $q_{1S}=0.80(3)$.
△ Less
Submitted 17 June, 2021; v1 submitted 27 January, 2020;
originally announced January 2020.
-
Optimized Raman pulses for atom interferometry
Authors:
Jack Saywell,
Max Carey,
Mohammad Belal,
Ilya Kuprov,
Tim Freegarde
Abstract:
We present mirror and beamsplitter pulse designs that improve the fidelity of atom interferometry and increase its tolerance of systematic inhomogeneities. These designs are demonstrated experimentally with a cold thermal sample of $^{85}$Rb atoms. We first show a stimulated Raman inversion pulse design that achieves a ground hyperfine state transfer efficiency of 99.8(3)%, compared with a convent…
▽ More
We present mirror and beamsplitter pulse designs that improve the fidelity of atom interferometry and increase its tolerance of systematic inhomogeneities. These designs are demonstrated experimentally with a cold thermal sample of $^{85}$Rb atoms. We first show a stimulated Raman inversion pulse design that achieves a ground hyperfine state transfer efficiency of 99.8(3)%, compared with a conventional $π$ pulse efficiency of 75(3)%. This inversion pulse is robust to variations in laser intensity and detuning, maintaining a transfer efficiency of 90% at detunings for which the $π$ pulse fidelity is below 20%, and is thus suitable for large momentum transfer interferometers using thermal atoms or operating in non-ideal environments. We then extend our optimization to all components of a Mach-Zehnder atom interferometer sequence and show that with a highly inhomogeneous atomic sample the fringe visibility is increased threefold over that using conventional $π$ and $π/2$ pulses.
△ Less
Submitted 10 December, 2019; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Loading and Cooling in an Optical Trap via Hyperfine Dark States
Authors:
D. S. Naik,
H. Eneriz-Imaz,
M. Carey,
T. Freegarde,
F. Minardi,
B. Battelier,
P. Bouyer,
A. Bertoldi
Abstract:
We present a novel optical cooling scheme that relies on hyperfine dark states to enhance loading and cooling atoms inside deep optical dipole traps. We demonstrate a seven-fold increase in the number of atoms loaded in the conservative potential with strongly shifted excited states. In addition, we use the energy selective dark-state to efficiently cool the atoms trapped inside the conservative p…
▽ More
We present a novel optical cooling scheme that relies on hyperfine dark states to enhance loading and cooling atoms inside deep optical dipole traps. We demonstrate a seven-fold increase in the number of atoms loaded in the conservative potential with strongly shifted excited states. In addition, we use the energy selective dark-state to efficiently cool the atoms trapped inside the conservative potential rapidly and without losses. Our findings open the door to optically assisted cooling of trapped atoms and molecules which lack the closed cycling transitions normally needed to achieve low temperatures and the high initial densities required for evaporative cooling.
△ Less
Submitted 25 November, 2019; v1 submitted 28 October, 2019;
originally announced October 2019.
-
An LSM-based Tuple Compaction Framework for Apache AsterixDB (Extended Version)
Authors:
Wail Y. Alkowaileet,
Sattam Alsubaiee,
Michael J. Carey
Abstract:
Document database systems store self-describing semi-structured records, such as JSON, "as-is" without requiring the users to pre-define a schema. This provides users with the flexibility to change the structure of incoming records without worrying about taking the system offline or hindering the performance of currently running queries. However, the flexibility of such systems does not free. The…
▽ More
Document database systems store self-describing semi-structured records, such as JSON, "as-is" without requiring the users to pre-define a schema. This provides users with the flexibility to change the structure of incoming records without worrying about taking the system offline or hindering the performance of currently running queries. However, the flexibility of such systems does not free. The large amount of redundancy in the records can introduce an unnecessary storage overhead and impact query performance.
Our focus in this paper is to address the storage overhead issue by introducing a tuple compactor framework that infers and extracts the schema from self-describing semi-structured records during the data ingestion. As many prominent document stores, such as MongoDB and Couchbase, adopt Log Structured Merge (LSM) trees in their storage engines, our framework exploits LSM lifecycle events to piggyback the schema inference and extraction operations. We have implemented and empirically evaluated our approach to measure its impact on storage, data ingestion, and query performance in the context of Apache AsterixDB.
△ Less
Submitted 11 May, 2020; v1 submitted 17 October, 2019;
originally announced October 2019.
-
AFrame: Extending DataFrames for Large-Scale Modern Data Analysis (Extended Version)
Authors:
Phanwadee Sinthong,
Michael J. Carey
Abstract:
Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity issues for "normal" data scientists. This paper introduces AFrame, a new scalable data analysis package powered by a Big Data management system that extends the da…
▽ More
Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity issues for "normal" data scientists. This paper introduces AFrame, a new scalable data analysis package powered by a Big Data management system that extends the data scientists' familiar DataFrame operations to efficiently operate on managed data at scale. AFrame is implemented as a layer on top of Apache AsterixDB, transparently scaling out the execution of DataFrame operations and machine learning model invocation through a parallel, shared-nothing big data management system. AFrame incrementally constructs SQL++ queries and leverages AsterixDB's semistructured data management facilities, user-defined function support, and live data ingestion support. In order to evaluate the proposed approach, this paper also introduces an extensible micro-benchmark for use in evaluating DataFrame performance in both single-node and distributed settings via a collection of representative analytic operations. This paper presents the architecture of AFrame, describes the underlying capabilities of AsterixDB that efficiently support modern data analytic operations, and utilizes the proposed benchmark to evaluate and compare the performance and support for large-scale data analyses provided by alternative DataFrame libraries.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
On Performance Stability in LSM-based Storage Systems (Extended Version)
Authors:
Chen Luo,
Michael J. Carey
Abstract:
The Log-Structured Merge-Tree (LSM-tree) has been widely adopted for use in modern NoSQL systems for its superior write performance. Despite the popularity of LSM-trees, they have been criticized for suffering from write stalls and large performance variances due to the inherent mismatch between their fast in-memory writes and slow background I/O operations. In this paper, we use a simple yet effe…
▽ More
The Log-Structured Merge-Tree (LSM-tree) has been widely adopted for use in modern NoSQL systems for its superior write performance. Despite the popularity of LSM-trees, they have been criticized for suffering from write stalls and large performance variances due to the inherent mismatch between their fast in-memory writes and slow background I/O operations. In this paper, we use a simple yet effective two-phase experimental approach to evaluate write stalls for various LSM-tree designs. We further explore the design choices of LSM merge schedulers to minimize write stalls given an I/O bandwidth budget. We have conducted extensive experiments in the context of the Apache AsterixDB system and we present the results here.
△ Less
Submitted 11 April, 2020; v1 submitted 23 June, 2019;
originally announced June 2019.
-
Origin of the resistance-area product dependence of spin transfer torque switching in perpendicular magnetic random access memory cells
Authors:
Goran Mihajlovic,
Neil Smith,
Tiffany Santos,
Jui-Lung Li,
Michael Tran,
Matthew Carey,
Bruce D. Terris,
Jordan A. Katine
Abstract:
We report on an experimental study of current induced switching in perpendicular magnetic random access memory (MRAM) cells with variable resistance-area products (RAs). Our results show that in addition to spin transfer torque (STT), current induced self-heating and voltage controlled magnetic anisotropy also contribute to switching and can explain the RA dependencies of switching current density…
▽ More
We report on an experimental study of current induced switching in perpendicular magnetic random access memory (MRAM) cells with variable resistance-area products (RAs). Our results show that in addition to spin transfer torque (STT), current induced self-heating and voltage controlled magnetic anisotropy also contribute to switching and can explain the RA dependencies of switching current density and STT efficiency. Our findings suggest that thermal optimization of perpendicular MRAM cells can result in significant reduction of switching currents.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
An IDEA: An Ingestion Framework for Data Enrichment in AsterixDB
Authors:
Xikui Wang,
Michael J. Carey
Abstract:
Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichment…
▽ More
Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichments that will be frequently used in the future, it can be advantageous to push their computation into the ingestion pipeline so that they can be stored (and queried) together with the data. In some cases, the referenced information may change over time, so the ingestion pipeline should be able to adapt to such changes to guarantee the currency and/or correctness of the enrichment results.
In this paper, we present a new data ingestion framework that supports data ingestion at scale, enrichments requiring complex operations, and adaptiveness to reference data changes. We explain how this framework has been built on top of Apache AsterixDB and investigate its performance at scale under various workloads.
△ Less
Submitted 15 August, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
LSM-based Storage Techniques: A Survey
Authors:
Chen Luo,
Michael J. Carey
Abstract:
Recently, the Log-Structured Merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this paper, we provide a survey of recent research efforts on LSM-trees so that…
▽ More
Recently, the Log-Structured Merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state-of-the-art in LSM-based storage techniques. We provide a general taxonomy to classify the literature of LSM-trees, survey the efforts in detail, and discuss their strengths and trade-offs. We further survey several representative LSM-based open-source NoSQL systems and discuss some potential future research directions resulting from the survey.
△ Less
Submitted 19 July, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems
Authors:
Chen Luo,
Michael J. Carey
Abstract:
In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with only a primary LSM-tree index; auxiliary structures, which are critical for supporting ad-hoc queries, have received much less attention. In this paper, we focu…
▽ More
In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with only a primary LSM-tree index; auxiliary structures, which are critical for supporting ad-hoc queries, have received much less attention. In this paper, we focus on efficient data ingestion and query processing for general-purpose LSM-based storage systems. We first propose and evaluate a series of optimizations for efficient batched point lookups, significantly improving the range of applicability of LSM-based secondary indexes. We then present several new and efficient maintenance strategies for LSM-based storage systems. Finally, we have implemented and experimentally evaluated the proposed techniques in the context of the Apache AsterixDB system, and we present the results here.
△ Less
Submitted 7 January, 2019; v1 submitted 27 August, 2018;
originally announced August 2018.
-
Optimal control of mirror pulses for atom interferometry
Authors:
Jack Saywell,
Ilya Kuprov,
David Goodwin,
Max Carey,
Tim Freegarde
Abstract:
Atom matterwave interferometry requires mirror and beamsplitter pulses that are robust to inhomogeneities in field intensity, magnetic environment, atom velocity and Zeeman sub-state. Pulse shapes determined using quantum control methods offer significantly improved interferometer performance by allowing broader atom distributions, larger interferometer areas and higher contrast. We have applied g…
▽ More
Atom matterwave interferometry requires mirror and beamsplitter pulses that are robust to inhomogeneities in field intensity, magnetic environment, atom velocity and Zeeman sub-state. Pulse shapes determined using quantum control methods offer significantly improved interferometer performance by allowing broader atom distributions, larger interferometer areas and higher contrast. We have applied gradient ascent pulse engineering (GRAPE) to optimise the design of phase-modulated mirror pulses for a Mach-Zehnder light-pulse atom interferometer, with the aim of increasing fringe contrast when averaged over atoms with an experimentally relevant range of velocities, beam intensities, and Zeeman states. Pulses were found to be highly robust to variations in detuning and coupling strength, and offer a clear improvement in robustness over the best established composite pulses. The peak mirror fidelity in a cloud of $\sim 80\ μ$K ${}^{85}$Rb atoms is predicted to be improved by a factor of 2 compared with standard rectangular $π$ pulses.
△ Less
Submitted 19 April, 2018; v1 submitted 12 April, 2018;
originally announced April 2018.
-
Velocimetry of cold atoms by matterwave interferometry
Authors:
Max Carey,
Jack Saywell,
David Elcock,
Mohammad Belal,
Tim Freegarde
Abstract:
We present an elegant application of matterwave interferometry to the velocimetry of cold atoms whereby, in analogy to Fourier transform spectroscopy, the 1-D velocity distribution is manifest in the frequency domain of the interferometer output. By using stimulated Raman transitions between hyperfine ground states to perform a three-pulse interferometer sequence, we have measured the velocity dis…
▽ More
We present an elegant application of matterwave interferometry to the velocimetry of cold atoms whereby, in analogy to Fourier transform spectroscopy, the 1-D velocity distribution is manifest in the frequency domain of the interferometer output. By using stimulated Raman transitions between hyperfine ground states to perform a three-pulse interferometer sequence, we have measured the velocity distributions of clouds of freely-expanding $^{85}$Rb atoms with temperatures of 33 $μ$K and 17 $μ$K. Quadrature measurement of the interferometer output as a function of the temporal asymmetry yields velocity distributions with excellent fidelity. Our technique, which is particularly suited to ultracold samples, compares favourably with conventional Doppler and time-of-flight techniques, and reveals artefacts in standard Raman Doppler methods. The technique is related to, and provides a conceptual foundation of, interferometric matterwave accelerometry, gravimetry and rotation sensing.
△ Less
Submitted 20 February, 2019; v1 submitted 6 February, 2018;
originally announced February 2018.
-
Matterwave interferometric velocimetry of cold Rb atoms
Authors:
Max Carey,
Mohammad Belal,
Matthew Himsworth,
James Bateman,
Tim Freegarde
Abstract:
We consider the matterwave interferometric measurement of atomic velocities, which forms a building block for all matterwave inertial measurements. A theoretical analysis, addressing both the laboratory and atomic frames and accounting for residual Doppler sensitivity in the beamsplitter and recombiner pulses, is followed by an experimental demonstration, with measurements of the velocity distribu…
▽ More
We consider the matterwave interferometric measurement of atomic velocities, which forms a building block for all matterwave inertial measurements. A theoretical analysis, addressing both the laboratory and atomic frames and accounting for residual Doppler sensitivity in the beamsplitter and recombiner pulses, is followed by an experimental demonstration, with measurements of the velocity distribution within a 20 $μ$K cloud of rubidium atoms. Our experiments use Raman transitions between the long-lived ground hyperfine states, and allow quadrature measurements that yield the full complex interferometer signal and hence discriminate between positive and negative velocities. The technique is most suitable for measurement of colder samples.
△ Less
Submitted 3 November, 2017; v1 submitted 11 September, 2017;
originally announced September 2017.
-
Speaker Recognition for Children's Speech
Authors:
Saeid Safavi,
Maryam Najafian,
Abualsoud Hanani,
Martin J Russell,
Peter Jancovic,
Michael J Carey
Abstract:
This paper presents results on Speaker Recognition (SR) for children's speech, using the OGI Kids corpus and GMM-UBM and GMM-SVM SR systems. Regions of the spectrum containing important speaker information for children are identified by conducting SR experiments over 21 frequency bands. As for adults, the spectrum can be split into four regions, with the first (containing primary vocal tract reson…
▽ More
This paper presents results on Speaker Recognition (SR) for children's speech, using the OGI Kids corpus and GMM-UBM and GMM-SVM SR systems. Regions of the spectrum containing important speaker information for children are identified by conducting SR experiments over 21 frequency bands. As for adults, the spectrum can be split into four regions, with the first (containing primary vocal tract resonance information) and third (corresponding to high frequency speech sounds) being most useful for SR. However, the frequencies at which these regions occur are from 11% to 38% higher for children. It is also noted that subband SR rates are lower for younger children. Finally results are presented of SR experiments to identify a child in a class (30 children, similar age) and school (288 children, varying ages). Class performance depends on age, with accuracy varying from 90% for young children to 99% for older children. The identification rate achieved for a child in a school is 81%.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
Apache VXQuery: A Scalable XQuery Implementation
Authors:
E. Preston Carman Jr.,
Till Westmann,
Vinayak R. Borkar,
Michael J. Carey,
Vassilis J. Tsotras
Abstract:
The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks -- Hyracks, a parallel execution engine, and Algebricks, a…
▽ More
The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks -- Hyracks, a parallel execution engine, and Algebricks, a language agnostic compiler toolbox. Apache VXQuery extends these two frameworks and provides an implementation of the XQuery specifics (data model, data-model dependent functions and optimizations, and a parser). We describe the architecture of Apache VXQuery, its integration with Hyracks and Algebricks, and the XQuery optimization rules applied to the query plan to improve path expression efficiency and to enable query parallelism. An experimental evaluation using a real 500GB dataset with various selection, aggregation and join XML queries shows that Apache VXQuery performs well both in terms of scale-up and speed-up. Our experiments show that it is about 3x faster than Saxon (an open-source and commercial XQuery processor) on a 4-core, single node implementation, and around 2.5x faster than Apache MRQL (a MapReduce-based parallel query processor) on an eight (4-core) node cluster.
△ Less
Submitted 1 April, 2015;
originally announced April 2015.
-
Measurement of the Formation Rate of Muonic Hydrogen Molecules
Authors:
MuCap Collaboration,
V. A. Andreev,
T. I. Banks,
R. M. Carey,
T. A. Case,
S. M. Clayton,
K. M. Crowe,
J. Deutsch,
J. Egger,
S. J. Freedman,
V. A. Ganzha,
T. Gorringe,
F. E. Gray,
D. W. Hertzog,
M. Hildebrandt,
P. Kammel,
B. Kiburg,
S. Knaack,
P. A. Kravtsov,
A. G. Krivshich,
B. Lauss,
K. R. Lynch,
E. M. Maev,
O. E. Maev,
F. Mulhauser
, et al. (11 additional authors not shown)
Abstract:
Background: The rate λ_ppμ characterizes the formation of ppμ molecules in collisions of muonic pμ atoms with hydrogen. In measurements of the basic weak muon capture reaction on the proton to determine the pseudoscalar coupling g_P, capture occurs from both atomic and molecular states. Thus knowledge of λ_ppμ is required for a correct interpretation of these experiments.
Purpose: Recently the M…
▽ More
Background: The rate λ_ppμ characterizes the formation of ppμ molecules in collisions of muonic pμ atoms with hydrogen. In measurements of the basic weak muon capture reaction on the proton to determine the pseudoscalar coupling g_P, capture occurs from both atomic and molecular states. Thus knowledge of λ_ppμ is required for a correct interpretation of these experiments.
Purpose: Recently the MuCap experiment has measured the capture rate Λ_S from the singlet pμ atom, employing a low density active target to suppress ppμ formation (PRL 110, 12504 (2013)). Nevertheless, given the unprecedented precision of this experiment, the existing experimental knowledge in λ_ppμ had to be improved.
Method: The MuCap experiment derived the weak capture rate from the muon disappearance rate in ultra-pure hydrogen. By do** the hydrogen with 20 ppm of argon, a competing process to ppμ formation was introduced, which allowed the extraction of λ_ppμ from the observed time distribution of decay electrons.
Results: The ppμ formation rate was measured as λ_ppμ= (2.01 +- 0.06(stat) +- 0.03(sys)) 10^6 s^-1. This result updates the λ_ppμ value used in the above mentioned MuCap publication.
Conclusions: The 2.5x higher precision compared to earlier experiments and the fact that the measurement was performed at nearly identical conditions to the main data taking, reduces the uncertainty induced by λ_ppμ to a minor contribution to the overall uncertainty of Λ_S and g_P, as determined in MuCap. Our final value for λ_ppμ shifts Λ_S and g_P by less than one tenth of their respective uncertainties compared to our results published earlier.
△ Less
Submitted 3 February, 2015;
originally announced February 2015.
-
Muon (g-2) Technical Design Report
Authors:
J. Grange,
V. Guarino,
P. Winter,
K. Wood,
H. Zhao,
R. M. Carey,
D. Gastler,
E. Hazen,
N. Kinnaird,
J. P. Miller,
J. Mott,
B. L. Roberts,
J. Benante,
J. Crnkovic,
W. M. Morse,
H. Sayed,
V. Tishchenko,
V. P. Druzhinin,
B. I. Khazin,
I. A. Koop,
I. Logashenko,
Y. M. Shatunov,
E. Solodov,
M. Korostelev,
D. Newton
, et al. (176 additional authors not shown)
Abstract:
The Muon (g-2) Experiment, E989 at Fermilab, will measure the muon anomalous magnetic moment a factor-of-four more precisely than was done in E821 at the Brookhaven National Laboratory AGS. The E821 result appears to be greater than the Standard-Model prediction by more than three standard deviations. When combined with expected improvement in the Standard-Model hadronic contributions, E989 should…
▽ More
The Muon (g-2) Experiment, E989 at Fermilab, will measure the muon anomalous magnetic moment a factor-of-four more precisely than was done in E821 at the Brookhaven National Laboratory AGS. The E821 result appears to be greater than the Standard-Model prediction by more than three standard deviations. When combined with expected improvement in the Standard-Model hadronic contributions, E989 should be able to determine definitively whether or not the E821 result is evidence for physics beyond the Standard Model. After a review of the physics motivation and the basic technique, which will use the muon storage ring built at BNL and now relocated to Fermilab, the design of the new experiment is presented. This document was created in partial fulfillment of the requirements necessary to obtain DOE CD-2/3 approval.
△ Less
Submitted 11 May, 2018; v1 submitted 27 January, 2015;
originally announced January 2015.
-
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
Authors:
Yingyi Bu,
Vinayak Borkar,
Jianfeng Jia,
Michael J. Carey,
Tyson Condie
Abstract:
There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on…
▽ More
There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up to 35x speedup compared to distributed GraphLab), and makes more effective use of available machine resources to support Big(ger) Graph Analytics.
△ Less
Submitted 2 July, 2014;
originally announced July 2014.
-
AsterixDB: A Scalable, Open Source BDMS
Authors:
Sattam Alsubaiee,
Yasser Altowim,
Hotham Altwaijry,
Alexander Behm,
Vinayak Borkar,
Yingyi Bu,
Michael Carey,
Inci Cetindil,
Madhusudan Cheelangi,
Khurram Faraaz,
Eugenia Gabrielova,
Raman Grover,
Zachary Heilbron,
Young-Seok Kim,
Chen Li,
Guangqiang Li,
Ji Mahn Ok,
Nicola Onose,
Pouria Pirzadeh,
Vassilis Tsotras,
Rares Vernica,
Jian Wen,
Till Westmann
Abstract:
AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that suppo…
▽ More
AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.
△ Less
Submitted 2 July, 2014;
originally announced July 2014.
-
Scalable Fault-Tolerant Data Feeds in AsterixDB
Authors:
Raman Grover,
Michael J. Carey
Abstract:
In this paper we describe the support for data feed ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured data. Data feeds are a mechanism for having continuous data arrive into a BDMS from external sources and incrementally populate a persisted dataset and associated indexes. The need to persi…
▽ More
In this paper we describe the support for data feed ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured data. Data feeds are a mechanism for having continuous data arrive into a BDMS from external sources and incrementally populate a persisted dataset and associated indexes. The need to persist and index "fast-flowing" high-velocity data (and support ad hoc analytical queries) is ubiquitous. However, the state of the art today involves 'gluing' together different systems. AsterixDB is different in being a unified system with "native support" for data feed ingestion.
We discuss the challenges and present the design and implementation of the concepts involved in modeling and managing data feeds in AsterixDB. AsterixDB allows the runtime behavior, allocation of resources and the offered degree of robustness to be customized to suit the high-level application(s) that wish to consume the ingested data. Initial experiments that evaluate scalability and fault-tolerance of AsterixDB data feeds facility are reported.
△ Less
Submitted 7 May, 2014;
originally announced May 2014.
-
Revisiting Aggregation for Data Intensive Applications: A Performance Study
Authors:
Jian Wen,
Vinayak R. Borkar,
Michael J. Carey,
Vassilis J. Tsotras
Abstract:
Aggregation has been an important operation since the early days of relational databases. Today's Big Data applications bring further challenges when processing aggregation queries, demanding adaptive aggregation algorithms that can process large volumes of data relative to a potentially limited memory budget (especially in multiuser settings). Despite its importance, the design and evaluation of…
▽ More
Aggregation has been an important operation since the early days of relational databases. Today's Big Data applications bring further challenges when processing aggregation queries, demanding adaptive aggregation algorithms that can process large volumes of data relative to a potentially limited memory budget (especially in multiuser settings). Despite its importance, the design and evaluation of aggregation algorithms has not received the same attention that other basic operators, such as joins, have received in the literature. As a result, when considering which aggregation algorithm(s) to implement in a new parallel Big Data processing platform (AsterixDB), we faced a lack of "off the shelf" answers that we could simply read about and then implement based on prior performance studies.
In this paper we revisit the engineering of efficient local aggregation algorithms for use in Big Data platforms. We discuss the salient implementation details of several candidate algorithms and present an in-depth experimental performance study to guide future Big Data engine developers. We show that the efficient implementation of the aggregation operator for a Big Data platform is non-trivial and that many factors, including memory usage, spilling strategy, and I/O and CPU cost, should be considered. Further, we introduce precise cost models that can help in choosing an appropriate algorithm based on input parameters including memory budget, grou** key cardinality, and data skew.
△ Less
Submitted 31 October, 2013;
originally announced November 2013.
-
Exploiting Opportunistic Physical Design in Large-scale Data Analytics
Authors:
Jeff LeFevre,
Jagan Sankaranarayanan,
Hakan Hacigumus,
Junichi Tatemura,
Neoklis Polyzotis,
Michael J. Carey
Abstract:
Large-scale systems, such as MapReduce and Hadoop, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these materializations yield a large set of materialized views that typically capture common computation among successive queries from the same analyst, or even across queries o…
▽ More
Large-scale systems, such as MapReduce and Hadoop, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these materializations yield a large set of materialized views that typically capture common computation among successive queries from the same analyst, or even across queries of different analysts who test similar hypotheses. We propose to treat these views as an opportunistic physical design and use them for the purpose of query optimization. We develop a novel query-rewrite algorithm that addresses the two main challenges in this context: how to search the large space of rewrites, and how to reason about views that contain UDFs (a common feature in large-scale data analytics). The algorithm, which provably finds the minimum-cost rewrite, is inspired by nearest-neighbor searches in non-metric spaces. We present an extensive experimental study on real-world datasets with a prototype data-analytics system based on Hive. The results demonstrate that our approach can result in dramatic performance improvements on complex data-analysis queries, reducing total execution time by an average of 61% and up to two orders of magnitude.
△ Less
Submitted 10 December, 2013; v1 submitted 26 March, 2013;
originally announced March 2013.
-
Iterative MapReduce for Large Scale Machine Learning
Authors:
Joshua Rosen,
Neoklis Polyzotis,
Vinayak Borkar,
Yingyi Bu,
Michael J. Carey,
Markus Weimer,
Tyson Condie,
Raghu Ramakrishnan
Abstract:
Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one of the foundational disciplines for data analysis, summarization and inference - on Big Data has become routine at most organizations that operate large clouds,…
▽ More
Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one of the foundational disciplines for data analysis, summarization and inference - on Big Data has become routine at most organizations that operate large clouds, usually based on systems such as Hadoop that support the MapReduce programming paradigm. It is now widely recognized that while MapReduce is highly scalable, it suffers from a critical weakness for machine learning: it does not support iteration. Consequently, one has to program around this limitation, leading to fragile, inefficient code. Further, reliance on the programmer is inherently flawed in a multi-tenanted cloud environment, since the programmer does not have visibility into the state of the system when his or her program executes. Prior work has sought to address this problem by either develo** specialized systems aimed at stylized applications, or by augmenting MapReduce with ad hoc support for saving state across iterations (driven by an external loop). In this paper, we advocate support for loo** as a first-class construct, and propose an extension of the MapReduce programming paradigm called {\em Iterative MapReduce}. We then develop an optimizer for a class of Iterative MapReduce programs that cover most machine learning techniques, provide theoretical justifications for the key optimization steps, and empirically demonstrate that system-optimized programs for significant machine learning tasks are competitive with state-of-the-art specialized solutions.
△ Less
Submitted 13 March, 2013;
originally announced March 2013.
-
Mu2e Conceptual Design Report
Authors:
The Mu2e Project,
Collaboration,
:,
R. J. Abrams,
D. Alezander,
G. Ambrosio,
N. Andreev,
C. M. Ankenbrandt,
D. M. Asner,
D. Arnold,
A. Artikov,
E. Barnes,
L. Bartoszek,
R. H. Bernstein,
K. Biery,
V. Biliyar,
R. Bonicalzi,
R. Bossert,
M. Bowden,
J. Brandt,
D. N. Brown,
J. Budagov,
M. Buehler,
A. Burov,
R. Carcagno
, et al. (203 additional authors not shown)
Abstract:
Mu2e at Fermilab will search for charged lepton flavor violation via the coherent conversion process mu- N --> e- N with a sensitivity approximately four orders of magnitude better than the current world's best limits for this process. The experiment's sensitivity offers discovery potential over a wide array of new physics models and probes mass scales well beyond the reach of the LHC. We describe…
▽ More
Mu2e at Fermilab will search for charged lepton flavor violation via the coherent conversion process mu- N --> e- N with a sensitivity approximately four orders of magnitude better than the current world's best limits for this process. The experiment's sensitivity offers discovery potential over a wide array of new physics models and probes mass scales well beyond the reach of the LHC. We describe herein the conceptual design of the proposed Mu2e experiment. This document was created in partial fulfillment of the requirements necessary to obtain DOE CD-1 approval, which was granted July 11, 2012.
△ Less
Submitted 29 November, 2012;
originally announced November 2012.
-
Detailed Report of the MuLan Measurement of the Positive Muon Lifetime and Determination of the Fermi Constant
Authors:
V. Tishchenko,
S. Battu,
R. M. Carey,
D. B. Chitwood,
J. Crnkovic,
P. T. Debevec,
S. Dhamija,
W. Earle,
A. Gafarov,
K. Giovanetti,
T. P. Gorringe,
F. E. Gray,
Z. Hartwig,
D. W. Hertzog,
B. Johnson,
P. Kammel,
B. Kiburg,
S. Kizilgul,
J. Kunkle,
B. Lauss,
I. Logashenko,
K. R. Lynch,
R. McNabb,
J. P. Miller,
F. Mulhauser
, et al. (8 additional authors not shown)
Abstract:
We present a detailed report of the method, setup, analysis and results of a precision measurement of the positive muon lifetime. The experiment was conducted at the Paul Scherrer Institute using a time-structured, nearly 100%-polarized, surface muon beam and a segmented, fast-timing, plastic scintillator array. The measurement employed two target arrangements; a magnetized ferromagnetic target wi…
▽ More
We present a detailed report of the method, setup, analysis and results of a precision measurement of the positive muon lifetime. The experiment was conducted at the Paul Scherrer Institute using a time-structured, nearly 100%-polarized, surface muon beam and a segmented, fast-timing, plastic scintillator array. The measurement employed two target arrangements; a magnetized ferromagnetic target with a ~4 kG internal magnetic field and a crystal quartz target in a 130 G external magnetic field. Approximately 1.6 x 10^{12} positrons were accumulated and together the data yield a muon lifetime of tau_{mu}(MuLan) = 2196980.3(2.2) ps (1.0 ppm), thirty times more precise than previous generations of lifetime experiments. The lifetime measurement yields the most accurate value of the Fermi constant G_F (MuLan) = 1.1663787(6) x 10^{-5} GeV^{-2} (0.5 ppm). It also enables new precision studies of weak interactions via lifetime measurements of muonic atoms.
△ Less
Submitted 5 November, 2012;
originally announced November 2012.