-
LLMClean: Context-Aware Tabular Data Cleaning via LLM-Generated OFDs
Authors:
Fabian Biester,
Mohamed Abdelaal,
Daniel Del Gaudio
Abstract:
Machine learning's influence is expanding rapidly, now integral to decision-making processes from corporate strategy to the advancements in Industry 4.0. The efficacy of Artificial Intelligence broadly hinges on the caliber of data used during its training phase; optimal performance is tied to exceptional data quality. Data cleaning tools, particularly those that exploit functional dependencies wi…
▽ More
Machine learning's influence is expanding rapidly, now integral to decision-making processes from corporate strategy to the advancements in Industry 4.0. The efficacy of Artificial Intelligence broadly hinges on the caliber of data used during its training phase; optimal performance is tied to exceptional data quality. Data cleaning tools, particularly those that exploit functional dependencies within ontological frameworks or context models, are instrumental in augmenting data quality. Nevertheless, crafting these context models is a demanding task, both in terms of resources and expertise, often necessitating specialized knowledge from domain experts.
In light of these challenges, this paper introduces an innovative approach, called LLMClean, for the automated generation of context models, utilizing Large Language Models to analyze and understand various datasets. LLMClean encompasses a sequence of actions, starting with categorizing the dataset, extracting or map** relevant models, and ultimately synthesizing the context model. To demonstrate its potential, we have developed and tested a prototype that applies our approach to three distinct datasets from the Internet of Things, healthcare, and Industry 4.0 sectors. The results of our evaluation indicate that our automated approach can achieve data cleaning efficacy comparable with that of context models crafted by human experts.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
RTClean: Context-aware Tabular Data Cleaning using Real-time OFDs
Authors:
Daniel Del Gaudio,
Tim Schubert,
Mohamed Abdelaal
Abstract:
Nowadays, machine learning plays a key role in develo** plenty of applications, e.g., smart homes, smart medical assistance, and autonomous driving. A major challenge of these applications is preserving high quality of the training and the serving data. Nevertheless, existing data cleaning methods cannot exploit context information. Thus, they usually fail to track shifts in the data distributio…
▽ More
Nowadays, machine learning plays a key role in develo** plenty of applications, e.g., smart homes, smart medical assistance, and autonomous driving. A major challenge of these applications is preserving high quality of the training and the serving data. Nevertheless, existing data cleaning methods cannot exploit context information. Thus, they usually fail to track shifts in the data distributions or the associated error profiles. To overcome these limitations, we introduce, in this paper, a novel method for automated tabular data cleaning powered by dynamic functional dependency rules extracted from a live context model. As a proof of concept, we create a smart home use case to collect data while preserving the context information. Using two different data sets, our evaluations show that the proposed cleaning method outperforms a set of baseline methods in terms of the detection and repair accuracy.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Why do nanowires grow with their c-axis vertically-aligned in the absence of epitaxy?
Authors:
Almog R. Azulay,
Yury Turkulets,
Davide Del Gaudio,
Rachel S. Goldman,
Ilan Shalish
Abstract:
Images of uniform and upright nanowires are fascinating, but often, they are quite puzzling, when epitaxial templating from the substrate is clearly absent. Here, we reveal the physics underlying one such hidden growth guidance mechanism through a specific example - the case of ZnO nanowires grown on silicon oxide and glass. We show how electric fields exerted by the insulating substrate may be ma…
▽ More
Images of uniform and upright nanowires are fascinating, but often, they are quite puzzling, when epitaxial templating from the substrate is clearly absent. Here, we reveal the physics underlying one such hidden growth guidance mechanism through a specific example - the case of ZnO nanowires grown on silicon oxide and glass. We show how electric fields exerted by the insulating substrate may be manipulated through the surface charge to define the orientation and polarity of the nanowires. Surface charge is ubiquitous on the surfaces of semiconductors and insulators, and as a result, substrate electric fields need always be considered. Our results suggest a new concept, according to which the growth of wurtzite semiconductors may often be described as a process of electric-charge-induced self assembly, wherein the internal built-in field in the polar material tends to align in parallel to an external field exerted by the substrate to minimize the interfacial energy of the system.
△ Less
Submitted 4 November, 2019; v1 submitted 6 September, 2018;
originally announced September 2018.
-
Current-induced spin polarization in InGaAs and GaAs epilayers with varying do** densities
Authors:
M. Luengo-Kovac,
S. Huang,
D. Del Gaudio,
J. Occena,
R. S. Goldman,
R. Raimondi,
V. Sih
Abstract:
The current-induced spin polarization and momentum-dependent spin-orbit field were measured in In$_{x}$Ga$_{1-x}$As epilayers with varying indium concentrations and silicon do** densities. Samples with higher indium concentrations and carrier concentrations and lower mobilities were found to have larger electrical spin generation efficiencies. Furthermore, current-induced spin polarization was d…
▽ More
The current-induced spin polarization and momentum-dependent spin-orbit field were measured in In$_{x}$Ga$_{1-x}$As epilayers with varying indium concentrations and silicon do** densities. Samples with higher indium concentrations and carrier concentrations and lower mobilities were found to have larger electrical spin generation efficiencies. Furthermore, current-induced spin polarization was detected in GaAs epilayers despite the absence of measurable spin-orbit fields, indicating that the extrinsic contributions to the spin polarization mechanism must be considered. Theoretical calculations based on a model that includes extrinsic contributions to the spin dephasing and the spin Hall effect, in addition to the intrinsic Rashba and Dresselhaus spin-orbit coupling, are found to qualitatively agree with the experimental results.
△ Less
Submitted 1 June, 2017;
originally announced June 2017.