Skip to main content

Showing 1–13 of 13 results for author: Costan, A

Searching in archive cs. Search in all archives.
.
  1. Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers

    Authors: Thomas Bouvier, Bogdan Nicolae, Hugo Chaugier, Alexandru Costan, Ian Foster, Gabriel Antoniu

    Abstract: Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e., is not fully available from the beginning), incremental training suffers from catastrophic forgetting (i.e., new patterns are reinforced at the expense of previously acquired knowledge). Training from scratch each time new traini… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 2024, Philadelphia (PA), United States

  2. KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

    Authors: Daniel Rosendo, Kate Keahey, Alexandru Costan, Matthieu Simonin, Patrick Valduriez, Gabriel Antoniu

    Abstract: Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuu… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Journal ref: ACM REP '23: ACM Conference on Reproducibility and Replicability, Jun 2023, Santa Cruz, California, United States. pp.62-73

  3. arXiv:2307.10658  [pdf, other

    cs.DB cs.DC cs.PF

    ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

    Authors: Daniel Rosendo, Marta Mattoso, Alexandru Costan, Renan Souza, Débora Pina, Patrick Valduriez, Gabriel Antoniu

    Abstract: Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related dat… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Journal ref: Cluster 2023 - IEEE International Conference on Cluster Computing, Oct 2023, Santa Fe, New Mexico, United States

  4. Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review

    Authors: Daniel Rosendo, Alexandru Costan, Patrick Valduriez, Gabriel Antoniu

    Abstract: The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML). While data analytics used to be mainly performed on cloud infrastructures, the rapid development of IoT infrastructures and the requirements for low-latency, secure processing has motivated the devel… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

    Journal ref: Journal of Parallel and Distributed Computing, Elsevier, 2022, 166, pp.71-94

  5. arXiv:2109.01379  [pdf, ps, other

    cs.DC cs.NI cs.PF

    Enabling Reproducible Analysis of Complex Workflows on the Edge-to-Cloud Continuum

    Authors: Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Patrick Valduriez

    Abstract: Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex applications to be executed from IoT Edge devices to the HPC Cloud (aka the Computing Continuum, the Digital Continuum, or the Transcontinuum). Understanding end-to-end performance in such a complex continuum is challenging. This breaks down to reconciling many, t… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Journal ref: Conf{é}rence sur la Gestion de Donn{é}es -- Principles, Technologies et Applications, Oct 2021, Paris, France

  6. arXiv:2108.04033  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

    Authors: Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo, Alexis Joly, Patrick Valduriez

    Abstract: In more and more application areas, we are witnessing the emergence of complex workflows that combine computing, analytics and learning. They often require a hybrid execution infrastructure with IoT devices interconnected to cloud/HPC systems (aka Computing Continuum). Such workflows are subject to complex constraints and requirements in terms of performance, resource usage, energy consumption and… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Journal ref: Cluster 2021 - IEEE International Conference on Cluster Computing, Sep 2021, Portland, OR, United States

  7. arXiv:1910.02004  [pdf, other

    cs.OH

    A Survey of Benchmarks to Evaluate Data Analytics for Smart-* Applications

    Authors: Athanasios Kiatipis, Alvaro Brandon, Rizkallah Touma, Pierre Matri, Michal Zasadzinski, Linh Thuy Nhuyen, Adrien Lebre, Alexandru Costan

    Abstract: The growth of ubiquitous sensor networks at an accelerating pace cuts across many areas of modern day life. They enable measuring, inferring, understanding and acting upon a wide variety of indicators, in fields ranging from agriculture to healthcare or to complex urban environments. The applications devoted to this task are designated as Smart-* Applications. They hide a staggering complexity, re… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  8. arXiv:1106.5846  [pdf

    cs.DC

    An Architectural Model for a Grid based Workflow Management Platform in Scientific Applications

    Authors: Alexandru Costan, Florin Pop, Corina Stratan, Ciprian Dobre, Catalin Leordeanu, Valentin Cristea

    Abstract: With recent increasing computational and data requirements of scientific applications, the use of large clustered systems as well as distributed resources is inevitable. Although executing large applications in these environments brings increased performance, the automation of the process becomes more and more challenging. While the use of complex workflow management systems has been a viable solu… ▽ More

    Submitted 29 June, 2011; originally announced June 2011.

    Comments: 17th International Conference on Control Systems and Computer Science (CSCS 17), Bucharest, Romania, May 26-29, 2009. Vol. 1, pp. 407-414, ISSN: 2066-4451

  9. arXiv:1106.5576  [pdf

    cs.DC

    Models and Techniques for Ensuring Reliability, Safety, Availability and Security of Large Scale Distributed Systems

    Authors: Valentin Cristea, Ciprian Dobre, Florin Pop, Corina Stratan, Alexandru Costan, Catalin Leordeanu

    Abstract: 17th International Conference on Control Systems and Computer Science (CSCS 17), Bucharest, Romania, May 26-29, 2009. Vol. 1, pp. 401-406, ISSN: 2066-4451.

    Submitted 28 June, 2011; originally announced June 2011.

  10. arXiv:0910.2942  [pdf

    cs.DC cs.NI

    Critical Analysis of Middleware Architectures for Large Scale Distributed Systems

    Authors: Florin Pop, Ciprian Mihai Dobre, Alexandru Costan, Mugurel Ionut Andreica, Eliana-Dina Tirsa, Corina Stratan, Valentin Cristea

    Abstract: Distributed computing is increasingly being viewed as the next phase of Large Scale Distributed Systems (LSDSs). However, the vision of large scale resource sharing is not yet a reality in many areas - Grid computing is an evolving area of computing, where standards and technology are still being developed to enable this new paradigm. Hence, in this paper we analyze the current development of mi… ▽ More

    Submitted 15 October, 2009; originally announced October 2009.

    ACM Class: C.2.4; D.4

    Journal ref: Proc. of the 17th Intl. Conf. on Control Systems and Computer Science (CSCS), vol. 1, pp. 29-36, Bucharest, Romania, 26-29 May, 2009. (ISSN: 2066-4451)

  11. arXiv:0910.0708  [pdf

    cs.DC cs.NI

    Robust Failure Detection Architecture for Large Scale Distributed Systems

    Authors: Ciprian Mihai Dobre, Florin Pop, Alexandru Costan, Mugurel Ionut Andreica, Valentin Cristea

    Abstract: Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. There are lots of approaches and implementations in failure detectors. Providing flexible failure detection in off-the-shelf distributed systems is difficult. In this paper we present an innovative solution to this problem. Our approach is based on adaptive, decentralized failure de… ▽ More

    Submitted 5 October, 2009; originally announced October 2009.

    ACM Class: C.2.4; C.4; D.4.5

    Journal ref: Proc. of the 17th Intl. Conf. on Control Systems and Computer Science (CSCS), vol. 1, pp. 433-440, Bucharest, Romania, 26-29 May, 2009

  12. arXiv:0910.0626  [pdf

    cs.DC cs.NI

    Towards a Grid Platform for Scientific Workflows Management

    Authors: Alexandru Costan, Corina Stratan, Eliana-Dina Tirsa, Mugurel Ionut Andreica, Valentin Cristea

    Abstract: Workflow management systems allow the users to develop complex applications at a higher level, by orchestrating functional components without handling the implementation details. Although a wide range of workflow engines are developed in enterprise environments, the open source engines available for scientific applications lack some functionalities or are too difficult to use for non-specialists… ▽ More

    Submitted 4 October, 2009; originally announced October 2009.

    ACM Class: H.3.4; H.4.1

    Journal ref: Proc. of the 17th Intl. Conf. on Control Systems and Computer Science (CSCS), vol. 1, pp. 37-44, Bucharest, Romania, 26-29 May, 2009. (ISSN: 2066-4451)

  13. arXiv:0906.0376  [pdf

    cs.DS cs.DC cs.NI

    Offline Algorithms for Several Network Design, Clustering and QoS Optimization Problems

    Authors: Mugurel Ionut Andreica, Eliana-Dina Tirsa, Alexandru Costan, Nicolae Tapus

    Abstract: In this paper we address several network design, clustering and Quality of Service (QoS) optimization problems and present novel, efficient, offline algorithms which compute optimal or near-optimal solutions. The QoS optimization problems consist of reliability improvement (by computing backup shortest paths) and network link upgrades (in order to reduce the latency on several paths). The networ… ▽ More

    Submitted 1 June, 2009; originally announced June 2009.

    ACM Class: G.2.2; G.2.1; C.2.4

    Journal ref: Proceedings of the 17th International Conference on Control Systems and Computer Science (CSCS), vol. 1, pp. 273-280, Bucharest, Romania, 26-29 May, 2009. (ISSN: 2066-4451)