Skip to main content

Showing 1–19 of 19 results for author: Ludaescher, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08257  [pdf, other

    cs.DB

    Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation

    Authors: Yilin Xia, Shawn Bowers, Lan Li, Bertram Ludäscher

    Abstract: We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal \emph{argumentation framework}(AF). Such… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to IDCC 2024. Source code is available at https://github.com/idaks/Games-and-Argumentation/tree/idcc

  2. arXiv:2310.05649  [pdf, other

    cs.CE

    Context, Composition, Automation, and Communication -- The C2AC Roadmap for Modeling and Simulation

    Authors: Adelinde Uhrmacher, Peter Frazier, Reiner Hähnle, Franziska Klügl, Fabian Lorig, Bertram Ludäscher, Laura Nenzi, Cristina Ruiz-Martin, Bernhard Rumpe, Claudia Szabo, Gabriel A. Wainer, Pia Wilsdorf

    Abstract: Simulation has become, in many application areas, a sine-qua-non. Most recently, COVID-19 has underlined the importance of simulation studies and limitations in current practices and methods. We identify four goals of methodological work for addressing these limitations. The first is to provide better support for capturing, representing, and evaluating the context of simulation studies, including… ▽ More

    Submitted 27 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    ACM Class: I.6

  3. arXiv:2309.06620  [pdf, other

    cs.LO

    Games and Argumentation: Time for a Family Reunion!

    Authors: Bertram Ludäscher, Yilin Xia

    Abstract: The rule "defeated(X) $\leftarrow$ attacks(Y,X), $\neg$ defeated(Y)" states that an argument is defeated if it is attacked by an argument that is not defeated. The rule "win(X) $\leftarrow$ move(X,Y), $\neg$ win(Y)" states that in a game a position is won if there is a move to a position that is not won. Both logic rules can be seen as close relatives (even identical twins) and both rules have bee… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Fourth Workshop on Explainable Logic-Based Knowledge Representation (XLoKR), Sept 2, 2023. Rhodes, Greece

  4. arXiv:2303.12640  [pdf, other

    hep-lat cs.CE cs.DB

    Provenance for Lattice QCD workflows

    Authors: Tanja Auge, Gunnar Bali, Meike Klettke, Bertram Ludäscher, Wolfgang Söldner, Simon Weishäupl, Tilo Wettig

    Abstract: We present a provenance model for the generic workflow of numerical Lattice Quantum Chromodynamics (QCD) calculations, which constitute an important component of particle physics research. These calculations are carried out on the largest supercomputers worldwide with data in the multi-PetaByte range being generated and analyzed. In the Lattice QCD community, a custom metadata standard (QCDml) tha… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  5. arXiv:2301.04770  [pdf, other

    cs.CL cs.DB cs.LG

    KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution

    Authors: Liri Fang, Lan Li, Yiren Liu, Vetle I. Torvik, Bertram Ludäscher

    Abstract: Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In thi… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  6. arXiv:2112.08259  [pdf, other

    cs.DB

    or2yw: Modeling and Visualizing OpenRefineHistories as YesWorkflow Diagrams

    Authors: Nikolaus Nova Parulian, Lan Li, Bertram Ludaescher

    Abstract: OpenRefine is a popular open-source data cleaning tool. It allows users to export a previously executed data cleaning workflow in a JSON format for possible reuse on other datasets. We have developed or2yw, a novel tool that maps a JSON-formatted OpenRefine operation history to a YesWorkflow (YW) model, which then can be visualized and queried using the YW tool. The latter was originally developed… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  7. Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Tainã Coleman, Dan Laney, Dong Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reys, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caino-Lores, Scott Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu M. A. Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux , et al. (33 additional authors not shown)

    Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  8. Workflows Community Summit: Bringing the Scientific Workflows Community Together

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Dan Laney, Dong Ahn, Shantenu Jha, Carole Goble, Lavanya Ramakrishnan, Luc Peterson, Bjoern Enders, Douglas Thain, Ilkay Altintas, Yadu Babuji, Rosa M. Badia, Vivien Bonazzi, Taina Coleman, Michael Crusoe, Ewa Deelman, Frank Di Natale, Paolo Di Tommaso, Thomas Fahringer, Rosa Filgueira, Grigori Fursin, Alex Ganose, Bjorn Gruning , et al. (20 additional authors not shown)

    Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) pla… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  9. Toward Enabling Reproducibility for Data-Intensive Research using the Whole Tale Platform

    Authors: Kyle Chard, Niall Gaffney, Mihael Hategan, Kacper Kowalik, Bertram Ludaescher, Timothy McPhillips, Jarek Nabrzyski, Victoria Stodden, Ian Taylor, Thomas Thelen, Matthew J. Turk, Craig Willis

    Abstract: Whole Tale http://wholetale.org is a web-based, open-source platform for reproducible research supporting the creation, sharing, execution, and verification of "Tales" for the scientific research community. Tales are executable research objects that capture the code, data, and environment along with narrative and workflow information needed to re-create computational results from scientific studie… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Journal ref: Advances in Parallel Computing 2020

  10. arXiv:2002.00084  [pdf, other

    cs.DB

    Approximate Summaries for Why and Why-not Provenance (Extended Version)

    Authors: Seokki Lee, Bertram Ludaescher, Boris Glavic

    Abstract: Why and why-not provenance have been studied extensively in recent years. However, why-not provenance, and to a lesser degree why provenance, can be very large resulting in severe scalability and usability challenges. In this paper, we introduce a novel approximate summarization technique for provenance which overcomes these challenges. Our approach uses patterns to encode (why-not) provenance con… ▽ More

    Submitted 27 April, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

  11. arXiv:1808.05752  [pdf, other

    cs.DB

    PUG: A Framework and Practical Implementation for Why & Why-Not Provenance (extended version)

    Authors: Seokki Lee, Bertram Ludaescher, Boris Glavic

    Abstract: Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first- order queries). Specifically, we introduce a graph-based provenance model that, while synta… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

    Comments: Extended version of VLDB journal article of the same name. arXiv admin note: text overlap with arXiv:1701.05699

    Report number: IIT/CS-DB-2018-02

  12. arXiv:1807.09899  [pdf, other

    cs.DB

    Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

    Authors: Shawn Bowers, Timothy McPhillips, Bertram Ludäscher

    Abstract: An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage" relationships among data items, which can help answer prove… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

    Comments: To appear in: Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, King's College London, UK, July 9-10, 2018, Proceedings

  13. arXiv:1805.00400  [pdf, other

    cs.CY

    Computing Environments for Reproducibility: Capturing the "Whole Tale"

    Authors: Adam Brinckman, Kyle Chard, Niall Gaffney, Mihael Hategan, Matthew B. Jones, Kacper Kowalik, Sivakumar Kulasekaran, Bertram Ludäscher, Bryce D. Mecum, Jarek Nabrzyski, Victoria Stodden, Ian J. Taylor, Matthew J. Turk, Kandace Turner

    Abstract: The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale p… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: Future Generation Computer Systems, 2018

  14. arXiv:1701.05699  [pdf, other

    cs.DB

    Efficiently Computing Provenance Graphs for Queries with Negation

    Authors: Seokki Lee, Sven Koehler, Bertram Ludaescher, Boris Glavic

    Abstract: Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we present the first practical approach for answering such questions for q… ▽ More

    Submitted 20 January, 2017; originally announced January 2017.

    Comments: Illinois Institute of Technology, IIT/CS-DB-2016-03

  15. arXiv:1610.09958  [pdf

    cs.DL

    Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments

    Authors: Bertram Ludaescher, Kyle Chard, Niall Gaffney, Matthew B. Jones, Jaroslaw Nabrzyski, Victoria Stodden, Matthew Turk

    Abstract: We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research public… ▽ More

    Submitted 28 October, 2016; originally announced October 2016.

    Report number: Gateways2016 paper 30

  16. arXiv:1502.02403  [pdf, other

    cs.SE

    YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

    Authors: Timothy McPhillips, Tianhong Song, Tyler Kolisnik, Steve Aulenbach, Khalid Belhajjame, Kyle Bocinsky, Yang Cao, Fernando Chirigati, Saumen Dey, Juliana Freire, Deborah Huntzinger, Christopher Jones, David Koop, Paolo Missier, Mark Schildhauer, Christopher Schwalm, Yaxing Wei, James Cheney, Mark Bieda, Bertram Ludaescher

    Abstract: Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow… ▽ More

    Submitted 9 February, 2015; originally announced February 2015.

  17. Win-Move is Coordination-Free (Sometimes)

    Authors: Daniel Zinn, Todd J Green, Bertram Ludäscher

    Abstract: In a recent paper by Hellerstein [15], a tight relationship was conjectured between the number of strata of a Datalog${}^\neg$ program and the number of "coordination stages" required for its distributed computation. Indeed, Ameloot et al. [9] showed that a query can be computed by a coordination-free relational transducer network iff it is monotone, thus answering in the affirmative a variant of… ▽ More

    Submitted 10 December, 2013; originally announced December 2013.

    Comments: Proceedings of the 15th International Conference on Database Theory. Pages 99-113. March 26-30, 2012, Berlin, Germany

    ACM Class: H.2.4

  18. Scientific Workflows and Provenance: Introduction and Research Opportunities

    Authors: Víctor Cuevas-Vicenttín, Saumen Dey, Sven Köhler, Sean Riddle, Bertram Ludäscher

    Abstract: Scientific workflows are becoming increasingly popular for compute-intensive and data-intensive scientific applications. The vision and promise of scientific workflows includes rapid, easy workflow design, reuse, scalable execution, and other advantages, e.g., to facilitate "reproducible science" through provenance (e.g., data lineage) support. However, as described in the paper, important researc… ▽ More

    Submitted 23 November, 2013; v1 submitted 18 November, 2013; originally announced November 2013.

    Comments: 12 pages, 2 figures

    Journal ref: Datenbank-Spektrum, November 2012, Volume 12, Issue 3, pp 193-203

  19. First-Order Provenance Games

    Authors: Sven Köhler, Bertram Ludäscher, Daniel Zinn

    Abstract: We propose a new model of provenance, based on a game-theoretic approach to query evaluation. First, we study games G in their own right, and ask how to explain that a position x in G is won, lost, or drawn. The resulting notion of game provenance is closely related to winning strategies, and excludes from provenance all "bad moves", i.e., those which unnecessarily allow the opponent to improve th… ▽ More

    Submitted 10 September, 2013; originally announced September 2013.

    Journal ref: Peter Buneman Festschrift, LNCS 8000, 2013