Skip to main content

Showing 1–23 of 23 results for author: Díaz, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07378  [pdf, other

    cs.AI cs.CL

    Large Language Models for Constrained-Based Causal Discovery

    Authors: Kai-Hendrik Cohrs, Gherardo Varando, Emiliano Diaz, Vasileios Sitokonstantinou, Gustau Camps-Valls

    Abstract: Causality is essential for understanding complex systems, such as the economy, the brain, and the climate. Constructing causal graphs often relies on either data-driven or expert-driven approaches, both fraught with challenges. The former methods, like the celebrated PC algorithm, face issues with data requirements and assumptions of causal sufficiency, while the latter demand substantial time and… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2403.14228  [pdf, other

    stat.ML cs.LG

    Recovering Latent Confounders from High-dimensional Proxy Variables

    Authors: Nathan Mankovich, Homer Durand, Emiliano Diaz, Gherardo Varando, Gustau Camps-Valls

    Abstract: Detecting latent confounders from proxy variables is an essential problem in causal effect estimation. Previous approaches are limited to low-dimensional proxies, sorted proxies, and binary treatments. We remove these assumptions and present a novel Proxy Confounder Factorization (PCF) framework for continuous treatment effect estimation when latent confounders manifest through high-dimensional, m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  3. What Can Self-Admitted Technical Debt Tell Us About Security? A Mixed-Methods Study

    Authors: Nicolás E. Díaz Ferreyra, Mojtaba Shahin, Mansooreh Zahedi, Sodiq Quadri, Ricardo Scandariato

    Abstract: Self-Admitted Technical Debt (SATD) encompasses a wide array of sub-optimal design and implementation choices reported in software artefacts (e.g., code comments and commit messages) by developers themselves. Such reports have been central to the study of software maintenance and evolution over the last decades. However, they can also be deemed as dreadful sources of information on potentially exp… ▽ More

    Submitted 2 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted in the 21th International Conference on Mining Software Repositories (MSR '24)

  4. arXiv:2305.13341  [pdf, other

    physics.data-an cs.AI cs.LG stat.ME

    Discovering Causal Relations and Equations from Data

    Authors: Gustau Camps-Valls, Andreas Gerhardus, Urmi Ninad, Gherardo Varando, Georg Martius, Emili Balaguer-Ballester, Ricardo Vinuesa, Emiliano Diaz, Laure Zanna, Jakob Runge

    Abstract: Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing t… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: 137 pages

  5. arXiv:2305.09778  [pdf, other

    cs.GR cs.CG

    Shortest Path to Boundary for Self-Intersecting Meshes

    Authors: He Chen, Elie Diaz, Cem Yuksel

    Abstract: We introduce a method for efficiently computing the exact shortest path to the boundary of a mesh from a given internal point in the presence of self-intersections. We provide a formal definition of shortest boundary paths for self-intersecting objects and present a robust algorithm for computing the actual shortest boundary path. The resulting method offers an effective solution for collision and… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    ACM Class: I.3.5

  6. arXiv:2303.14229  [pdf, ps, other

    math.CO cs.DM

    Sharp threshold for embedding balanced spanning trees in random geometric graphs

    Authors: Alberto Espuny Díaz, Lyuben Lichev, Dieter Mitsche, Alexandra Wesolek

    Abstract: A rooted tree is balanced if the degree of a vertex depends only on its distance to the root. In this paper we determine the sharp threshold for the appearance of a large family of balanced spanning trees in the random geometric graph $\mathcal{G}(n,r,d)$. In particular, we find the sharp threshold for balanced binary trees. More generally, we show that all sequences of balanced trees with uniform… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  7. arXiv:2303.09384  [pdf, other

    cs.SE cs.IR cs.LG

    LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

    Authors: Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra, Riccardo Scandariato

    Abstract: Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs prom… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted at MSR '23 Data and Tool Showcase Track

  8. Regret, Delete, (Do Not) Repeat: An Analysis of Self-Cleaning Practices on Twitter After the Outbreak of the COVID-19 Pandemic

    Authors: Nicolás E. Díaz Ferreyra, Gautam Kishore Shahi, Catherine Tony, Stefan Stieglitz, Riccardo Scandariato

    Abstract: During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, ho** for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifesta… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted at CHI '23 Late Breaking Work (LBW)

  9. arXiv:2303.01822  [pdf, other

    cs.SE cs.HC cs.SI

    Developers Need Protection, Too: Perspectives and Research Challenges for Privacy in Social Coding Platforms

    Authors: Nicolás E. Díaz Ferreyra, Abdessamad Imine, Melina Vidoni, Riccardo Scandariato

    Abstract: Social Coding Platforms (SCPs) like GitHub have become central to modern software engineering thanks to their collaborative and version-control features. Like in mainstream Online Social Networks (OSNs) such as Facebook, users of SCPs are subjected to privacy attacks and threats given the high amounts of personal and project-related data available in their profiles and software repositories. Howev… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted at the 16th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE 2023)

  10. arXiv:2211.13498  [pdf, other

    cs.CR cs.LG cs.SE

    GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences

    Authors: Catherine Tony, Nicolás E. Díaz Ferreyra, Riccardo Scandariato

    Abstract: GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted at QRS 2022

  11. arXiv:2208.07462  [pdf, other

    math.PR cs.DM math.CO

    Speeding up random walk mixing by starting from a uniform vertex

    Authors: Alberto Espuny Díaz, Patrick Morris, Guillem Perarnau, Oriol Serra

    Abstract: The theory of rapid mixing random walks plays a fundamental role in the study of modern randomised algorithms. Usually, the mixing time is measured with respect to the worst initial position. It is well known that the presence of bottlenecks in a graph hampers mixing and, in particular, starting inside a small bottleneck significantly slows down the diffusion of the walk in the first steps of the… ▽ More

    Submitted 27 January, 2024; v1 submitted 15 August, 2022; originally announced August 2022.

    Comments: To appear in Electronic Journal of Probability

  12. arXiv:2208.04649  [pdf, other

    cs.HC cs.CY cs.SI

    ENAGRAM: An App to Evaluate Preventative Nudges for Instagram

    Authors: Nicolás E. Díaz Ferreyra, Sina Ostendorf, Esma Aïmeur, Maritta Heisel, Matthias Brand

    Abstract: Online self-disclosure is perhaps one of the last decade's most studied communication processes, thanks to the introduction of Online Social Networks (OSNs) like Facebook. Self-disclosure research has contributed significantly to the design of preventative nudges seeking to support and guide users when revealing private information in OSNs. Still, assessing the effectiveness of these solutions is… ▽ More

    Submitted 18 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Accepted at the 2022 European Symposium on Usable Security (EuroUSEC 2022)

  13. arXiv:2207.01529  [pdf, other

    cs.HC cs.CR cs.SE cs.SI

    Cybersecurity Discussions in Stack Overflow: A Developer-Centred Analysis of Engagement and Self-Disclosure Behaviour

    Authors: Nicolás E. Díaz Ferreyra, Melina Vidoni, Maritta Heisel, Riccardo Scandariato

    Abstract: Stack Overflow (SO) is a popular platform among developers seeking advice on various software-related topics, including privacy and security. As for many knowledge-sharing websites, the value of SO depends largely on users' engagement, namely their willingness to answer, comment or post technical questions. Still, many of these questions (including cybersecurity-related ones) remain unanswered, pu… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Submitted for publication

  14. arXiv:2205.06200  [pdf, other

    cs.HC cs.CR cs.SE

    Conversational DevBots for Secure Programming: An Empirical Study on SKF Chatbot

    Authors: Catherine Tony, Mohana Balasubramanian, Nicolás E. Díaz Ferreyra, Riccardo Scandariato

    Abstract: Conversational agents or chatbots are widely investigated and used across different fields including healthcare, education, and marketing. Still, the development of chatbots for assisting secure coding practices is in its infancy. In this paper, we present the results of an empirical study on SKF chatbot, a software-development bot (DevBot) designed to answer queries about software security. To th… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted paper at the 2022 International Conference on Evaluation and Assessment in Software Engineering (EASE)

  15. arXiv:2202.11969  [pdf, ps, other

    cs.SE cs.IR

    Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants

    Authors: Melina Vidoni, Nicolás E. Díaz Ferreyra

    Abstract: Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained informati… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: Accepted at ROPES'22: 1st International Workshop on Recruiting Participants for Empirical Software Engineering (co-located with ICSE 2022)

  16. SoK: Security of Microservice Applications: A Practitioners' Perspective on Challenges and Best Practices

    Authors: Priyanka Billawa, Anusha Bambhore Tukaram, Nicolás E. Díaz Ferreyra, Jan-Philipp Steghöfer, Riccardo Scandariato, Georg Simhandl

    Abstract: Cloud-based application deployment is becoming increasingly popular among businesses, thanks to the emergence of microservices. However, securing such architectures is a challenging task since traditional security concepts cannot be directly applied to microservice architectures due to their distributed nature. The situation is exacerbated by the scattered nature of guidelines and best practices a… ▽ More

    Submitted 2 September, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Accepted at the 17th International Conference on Availability, Reliability and Security (ARES 2022)

    ACM Class: D.4.6

  17. Community Detection for Access-Control Decisions: Analysing the Role of Homophily and Information Diffusion in Online Social Networks

    Authors: Nicolas E. Diaz Ferreyra, Tobias Hecking, Esma Aïmeur, Maritta Heisel, H. Ulrich Hoppe

    Abstract: Access-Control Lists (ACLs) (a.k.a. friend lists) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In… ▽ More

    Submitted 7 June, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  18. arXiv:2012.04922  [pdf, other

    stat.ME cs.LG stat.ML

    Consistent regression of biophysical parameters with kernel methods

    Authors: Emiliano Díaz, Adrián Pérez-Suay, Valero Laparra, Gustau Camps-Valls

    Abstract: This paper introduces a novel statistical regression framework that allows the incorporation of consistency constraints. A linear and nonlinear (kernel-based) formulation are introduced, and both imply closed-form analytical solutions. The models exploit all the information from a set of drivers while being maximally independent of a set of auxiliary, protected variables. We successfully illustrat… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1710.05578

  19. arXiv:2009.12853  [pdf, other

    cs.CY cs.HC cs.SI

    Persuasion Meets AI: Ethical Considerations for the Design of Social Engineering Countermeasures

    Authors: Nicolas E. Díaz Ferreyra, Esma Aïmeur, Hicham Hage, Maritta Heisel, Catherine García van Hoogstraten

    Abstract: Privacy in Social Network Sites (SNSs) like Facebook or Instagram is closely related to people's self-disclosure decisions and their ability to foresee the consequences of sharing personal information with large and diverse audiences. Nonetheless, online privacy decisions are often based on spurious risk judgements that make people liable to reveal sensitive data to untrusted recipients and become… ▽ More

    Submitted 27 September, 2020; originally announced September 2020.

    Comments: Accepted for publication at IC3K 2020

  20. Learning from Online Regrets: From Deleted Posts to Risk Awareness in Social Network Sites

    Authors: Nicolas E. Diaz Ferreyra, Rene Meis, Maritta Heisel

    Abstract: Social Network Sites (SNSs) like Facebook or Instagram are spaces where people expose their lives to wide and diverse audiences. This practice can lead to unwanted incidents such as reputation damage, job loss or harassment when pieces of private information reach unintended recipients. As a consequence, users often regret to have posted private information in these platforms and proceed to delete… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  21. arXiv:1809.10421  [pdf, ps, other

    math.CO cs.IT

    Entropy versions of additive inequalities

    Authors: Alberto Espuny Díaz, Oriol Serra

    Abstract: The connection between inequalities in additive combinatorics and analogous versions in terms of the entropy of random variables has been extensively explored over the past few years. This paper extends a device introduced by Ruzsa in his seminal work introducing this correspondence. This extension provides a toolbox for establishing the equivalence between sumset inequalities and their entropic v… ▽ More

    Submitted 27 May, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: The former version had an error the authors could not fix. This version keeps only that parts that did not depend on the incorrect statement

  22. arXiv:1704.00829  [pdf, other

    stat.AP cs.CV

    Online deforestation detection

    Authors: Emiliano Diaz

    Abstract: Deforestation detection using satellite images can make an important contribution to forest management. Current approaches can be broadly divided into those that compare two images taken at similar periods of the year and those that monitor changes by using multiple images taken during the growing season. The CMFDA algorithm described in Zhu et al. (2012) is an algorithm that builds on the latter… ▽ More

    Submitted 3 April, 2017; originally announced April 2017.

  23. arXiv:1704.00575  [pdf, other

    stat.AP cs.IT

    Sparse mean localization by information theory

    Authors: Emiliano Diaz

    Abstract: Sparse feature selection is necessary when we fit statistical models, we have access to a large group of features, don't know which are relevant, but assume that most are not. Alternatively, when the number of features is larger than the available data the model becomes over parametrized and the sparse feature selection task involves selecting the most informative variables for the model. When the… ▽ More

    Submitted 3 April, 2017; originally announced April 2017.