-
Croissant: A Metadata Format for ML-Ready Datasets
Authors:
Mubashara Akhtar,
Omar Benjelloun,
Costanza Conforti,
Pieter Gijsbers,
Joan Giner-Miguelez,
Nitisha Jain,
Michael Kuchnik,
Quentin Lhoest,
Pierre Marcenac,
Manil Maskey,
Peter Mattson,
Luis Oala,
Pierre Ruyssen,
Rajat Shinde,
Elena Simperl,
Goeffry Thomas,
Slava Tykhonov,
Joaquin Vanschoren,
Jos van der Velde,
Steffen Vogler,
Carole-Jean Wu
Abstract:
Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is…
▽ More
Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
△ Less
Submitted 30 May, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Blockchain for Genomics: A Systematic Literature Review
Authors:
Mohammed Alghazwi,
Fatih Turkmen,
Joeri van der Velde,
Dimka Karastoyanova
Abstract:
Human genomic data carry unique information about an individual and offer unprecedented opportunities for healthcare. The clinical interpretations derived from large genomic datasets can greatly improve healthcare and pave the way for personalized medicine. Sharing genomic datasets, however, pose major challenges, as genomic data is different from traditional medical data, indirectly revealing inf…
▽ More
Human genomic data carry unique information about an individual and offer unprecedented opportunities for healthcare. The clinical interpretations derived from large genomic datasets can greatly improve healthcare and pave the way for personalized medicine. Sharing genomic datasets, however, pose major challenges, as genomic data is different from traditional medical data, indirectly revealing information about descendants and relatives of the data owner and carrying valid information even after the owner passes away. Therefore, stringent data ownership and control measures are required when dealing with genomic data. In order to provide secure and accountable infrastructure, blockchain technologies offer a promising alternative to traditional distributed systems. Indeed, the research on blockchain-based infrastructures tailored to genomics is on the rise. However, there is a lack of a comprehensive literature review that summarizes the current state-of-the-art methods in the applications of blockchain in genomics. In this paper, we systematically look at the existing work both commercial and academic, and discuss the major opportunities and challenges. Our study is driven by five research questions that we aim to answer in our review. We also present our projections of future research directions which we hope the researchers interested in the area can benefit from.
△ Less
Submitted 16 September, 2022; v1 submitted 19 November, 2021;
originally announced November 2021.
-
Odd and even Kondo effects from emergent localisation in quantum point contacts
Authors:
M. J. Iqbal,
Roi Levy,
E. J. Koop,
J. B. Dekker,
J. P. de Jong,
J. H. M. van der Velde,
D. Reuter,
A. D. Wieck,
R. Aguado,
Yigal Meir,
C. H. van der Wal
Abstract:
A quantum point contact (QPC) is a very basic nano-electronic device: a short and narrow transport channel between two electron reservoirs. In clean channels electron transport is ballistic and the conductance $G$ is then quantised as a function of channel width with plateaus at integer multiples of $2e^2/h$ ($e$ is the electron charge and $h$ Planck's constant). This can be understood in a pictur…
▽ More
A quantum point contact (QPC) is a very basic nano-electronic device: a short and narrow transport channel between two electron reservoirs. In clean channels electron transport is ballistic and the conductance $G$ is then quantised as a function of channel width with plateaus at integer multiples of $2e^2/h$ ($e$ is the electron charge and $h$ Planck's constant). This can be understood in a picture where the electron states are propagating waves, without need to account for electron-electron interactions. Quantised conductance could thus be the signature of ultimate control over nanoscale electron transport. However, even studies with the cleanest QPCs generically show significant anomalies on the quantised conductance traces and there is consensus that these result from electron many-body effects. Despite extensive experimental and theoretical studies understanding of these anomalies is an open problem. We report evidence that the many-body effects have their origin in one or more spontaneously localised states that emerge from Friedel oscillations in the QPC channel. Kondo physics will then also contribute to the formation of the many-body state with Kondo signatures that reflect the parity of the number of localised states. Evidence comes from experiments with length-tunable QPCs that show a periodic modulation of the many-body physics with Kondo signatures of alternating parity. Our results are of importance for assessing the role of QPCs in more complex hybrid devices and proposals for spintronic and quantum information applications. In addition, our results show that tunable QPCs offer a rich platform for investigating many-body effects in nanoscale systems, with the ability to probe such physics at the level of a single site.
△ Less
Submitted 26 July, 2013;
originally announced July 2013.
-
Measurement of the Cosmic Ray Energy Spectrum and Composition from 10^{17} to 10^{18.3} eV Using a Hybrid Fluorescence Technique
Authors:
T. Abu-Zayyad,
K. Belov,
D. J. Bird,
J. Boyer,
Z. Cao,
M. Catanese,
G. F. Chen,
R. W. Clay,
C. E. Covault,
H. Y. Dai,
B. R. Dawson,
J. W. Elbert,
B. E. Fick,
L. F. Fortson,
J. W. Fowler,
K. G. Gibbs,
M. A. K. Glasmacher,
K. D. Green,
Y. Ho,
A. Huang,
C. C. Jui,
M. J. Kidd,
D. B. Kieda,
B. C. Knapp,
S. Ko
, et al. (22 additional authors not shown)
Abstract:
We study the spectrum and average mass composition of cosmic rays with primary energies between 10^{17} eV and 10^{18} eV using a hybrid detector consisting of the High Resolution Fly's Eye (HiRes) prototype and the MIA muon array. Measurements have been made of the change in the depth of shower maximum as a function of energy. A complete Monte Carlo simulation of the detector response and compa…
▽ More
We study the spectrum and average mass composition of cosmic rays with primary energies between 10^{17} eV and 10^{18} eV using a hybrid detector consisting of the High Resolution Fly's Eye (HiRes) prototype and the MIA muon array. Measurements have been made of the change in the depth of shower maximum as a function of energy. A complete Monte Carlo simulation of the detector response and comparisons with shower simulations leads to the conclusion that the cosmic ray intensity is changing f rom a heavier to a lighter composition in this energy range. The spectrum is consistent with earlier Fly's Eye measurements and supports the previously found steepening near 4 \times 10^{17} eV .
△ Less
Submitted 31 October, 2000;
originally announced October 2000.
-
A Multi-Component Measurement of the Cosmic Ray Composition Between 10^{17} eV and 10^{18} eV
Authors:
T. Abu-Zayyad,
K. Belov,
D. J. Bird,
J. Boyer,
Z. Cao,
M. Catanese,
G. F. Chen,
R. W. Clay,
C. E. Covault,
J. W. Cronin,
H. Y. Dai,
B. R. Dawson,
J. W. Elbert,
B. E. Fick,
L. F. Fortson,
J. W. Fowler,
K. G. Gibbs,
M. A. K. Glasmacher,
K. D. Green,
Y. Ho,
A. Huang,
C. C. Jui,
M. J. Kidd,
D. B. Kieda,
B. C. Knapp
, et al. (23 additional authors not shown)
Abstract:
The average mass composition of cosmic rays with primary energies between $10^{17}$eV and $10^{18}$eV has been studied using a hybrid detector consisting of the High Resolution Fly's Eye (HiRes) prototype and the MIA muon array. Measurements have been made of the change in the depth of shower maximum, $X_{max}$, and in the change in the muon density at a fixed core location, $ρ_μ(600m)$, as a fu…
▽ More
The average mass composition of cosmic rays with primary energies between $10^{17}$eV and $10^{18}$eV has been studied using a hybrid detector consisting of the High Resolution Fly's Eye (HiRes) prototype and the MIA muon array. Measurements have been made of the change in the depth of shower maximum, $X_{max}$, and in the change in the muon density at a fixed core location, $ρ_μ(600m)$, as a function of energy. The composition has also been evaluated in terms of the combination of $X_{max}$ and $ρ_μ(600m)$. The results show that the composition is changing from a heavy to lighter mix as the energy increases.
△ Less
Submitted 9 November, 1999;
originally announced November 1999.
-
Constraints on Gamma-ray Emission from the Galactic Plane at 300 TeV
Authors:
A. Borione,
M. A. Catanese,
M. C. Chantell,
C. E. Covault,
J. W. Cronin,
B. E. Fick,
L. F. Fortson,
J. Fowler,
M. A. K. Glasmacher,
K. D. Green,
D. B. Kieda,
J. Matthews,
B. J. Newport,
D. Nitz,
R. A. Ong,
S. Oser,
D. Sinclair,
J. C. van der Velde
Abstract:
We describe a new search for diffuse ultrahigh energy gamma-ray emission associated with molecular clouds in the galactic disk. The Chicago Air Shower Array (CASA), operating in coincidence with the Michigan muon array (MIA), has recorded over 2.2 x 10^{9} air showers from April 4, 1990 to October 7, 1995. We search for gamma rays based upon the muon content of air showers arriving from the dire…
▽ More
We describe a new search for diffuse ultrahigh energy gamma-ray emission associated with molecular clouds in the galactic disk. The Chicago Air Shower Array (CASA), operating in coincidence with the Michigan muon array (MIA), has recorded over 2.2 x 10^{9} air showers from April 4, 1990 to October 7, 1995. We search for gamma rays based upon the muon content of air showers arriving from the direction of the galactic plane. We find no significant evidence for diffuse gamma-ray emission, and we set an upper limit on the ratio of gamma rays to normal hadronic cosmic rays at less than 2.4 x 10^{-5} at 310 TeV (90% confidence limit) from the galactic plane region: (50 degrees < l < 200 degrees); -5 degrees < b < 5 degrees). This limit places a strong constraint on models for emission from molecular clouds in the galaxy. We rule out significant spectral hardening in the outer galaxy, and conclude that emission from the plane at these energies is likely to be dominated by the decay of neutral pions resulting from cosmic rays interactions with passive target gas molecules.
△ Less
Submitted 10 March, 1997;
originally announced March 1997.