Skip to main content

Showing 1–50 of 81 results for author: Dey, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10560  [pdf, other

    cs.CL

    Facts-and-Feelings: Capturing both Objectivity and Subjectivity in Table-to-Text Generation

    Authors: Tathagata Dey, Pushpak Bhattacharyya

    Abstract: Table-to-text generation, a long-standing challenge in natural language generation, has remained unexplored through the lens of subjectivity. Subjectivity here encompasses the comprehension of information derived from the table that cannot be described solely by objective data. Given the absence of pre-existing datasets, we introduce the Ta2TS dataset with 3849 data instances. We perform the task… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  2. arXiv:2406.07100  [pdf, other

    cs.LG cs.AI math.AT

    D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

    Authors: Soham Mukherjee, Shreyas N. Samaga, Cheng Xin, Steve Oudot, Tamal K. Dey

    Abstract: End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on s… ▽ More

    Submitted 27 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2406.02732  [pdf, other

    cs.LG cs.DS

    GEFL: Extended Filtration Learning for Graph Classification

    Authors: Simon Zhang, Soham Mukherjee, Tamal K. Dey

    Abstract: Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 26 pages, 13 figures, Learning on Graphs Conference (LoG 2022)

  4. arXiv:2403.10958  [pdf, other

    math.AT cs.CG math.AC

    Efficient Algorithms for Complexes of Persistence Modules with Applications

    Authors: Tamal K. Dey, Florian Russold, Shreyas N. Samaga

    Abstract: We extend the persistence algorithm, viewed as an algorithm computing the homology of a complex of free persistence or graded modules, to complexes of modules that are not free. We replace persistence modules by their presentations and develop an efficient algorithm to compute the homology of a complex of presentations. To deal with inputs that are not given in terms of presentations, we give an e… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: This is the full version of a paper accepted at the 40th International Symposium on Computational Geometry (SoCG 2024)

  5. arXiv:2403.08110  [pdf, other

    math.AT cs.CG

    Computing Generalized Ranks of Persistence Modules via Unfolding to Zigzag Modules

    Authors: Tamal K. Dey, Cheng Xin

    Abstract: For a $P$-indexed persistence module ${\sf M}$, the (generalized) rank of ${\sf M}$ is defined as the rank of the limit-to-colimit map for the diagram of vector spaces of ${\sf M}$ over the poset $P$. For $2$-parameter persistence modules, recently a zigzag persistence based algorithm has been proposed that takes advantage of the fact that generalized rank for $2$-parameter modules is equal to the… ▽ More

    Submitted 4 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  6. arXiv:2402.11339  [pdf, other

    cs.LG stat.ML

    Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking

    Authors: Simon Zhang, Cheng Xin, Tamal K. Dey

    Abstract: A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher-order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph repres… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 46 pages, 4 figures

  7. ICML 2023 Topological Deep Learning Challenge : Design and Results

    Authors: Mathilde Papillon, Mustafa Hajij, Helen Jenne, Johan Mathe, Audun Myers, Theodore Papamarkou, Tolga Birdal, Tamal Dey, Tim Doster, Tegan Emerson, Gurusankar Gopalakrishnan, Devendra Govil, Aldo Guzmán-Sáenz, Henry Kvinge, Neal Livesay, Soham Mukherjee, Shreyas N. Samaga, Karthikeyan Natesan Ramamurthy, Maneel Reddy Karri, Paul Rosen, Sophia Sanborn, Robin Walters, Jens Agerberg, Sadrodin Barikbin, Claudio Battiloro , et al. (31 additional authors not shown)

    Abstract: This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The chal… ▽ More

    Submitted 18 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

  8. arXiv:2307.07462  [pdf, other

    cs.CG math.AT

    Computing Zigzag Vineyard Efficiently Including Expansions and Contractions

    Authors: Tamal K. Dey, Tao Hou

    Abstract: Vines and vineyard connecting a stack of persistence diagrams have been introduced in the non-zigzag setting by Cohen-Steiner et al. We consider computing these vines over changing filtrations for zigzag persistence while incorporating two more operations: expansions and contractions in addition to the transpositions considered in the non-zigzag setting. Although expansions and contractions can be… ▽ More

    Submitted 18 February, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Updated funding information for one co-author

  9. arXiv:2304.06048  [pdf, other

    cs.LG cs.AI

    RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization

    Authors: Yuanhang Shao, Tonmoy Dey, Nikola Vuckovic, Luke Van Popering, Alan Kuhnle

    Abstract: Combinatorial optimization (CO) aims to efficiently find the best solution to NP-hard problems ranging from statistical physics to social media marketing. A wide range of CO applications can benefit from local search methods because they allow reversible action over greedy policies. Deep Q-learning (DQN) using message-passing neural networks (MPNN) has shown promise in replicating the local search… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  10. arXiv:2304.04970  [pdf, other

    cs.LG cs.AI cs.CG math.AT

    GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning

    Authors: Cheng Xin, Soham Mukherjee, Shreyas N. Samaga, Tamal K. Dey

    Abstract: $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2… ▽ More

    Submitted 30 June, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

  11. arXiv:2303.08270  [pdf, other

    math.AT cs.CG

    Meta-Diagrams for 2-Parameter Persistence

    Authors: Nate Clause, Tamal K. Dey, Facundo Mémoli, Bei Wang

    Abstract: We first introduce the notion of meta-rank for a 2-parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the meta-diagram of a 2-parameter persistence module to be the Möbius inversion of the meta-rank, resulting in a function that takes values from signed 1-parameter persistence modules. We show that the… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: 22 pages, 8 figures. Full version of the paper that is to appear in the Proceedings of the 39th International Symposium on Computational Geometry (SoCG 2023)

  12. arXiv:2303.02549  [pdf, other

    math.AT cs.CG math.DS

    Computing Connection Matrices via Persistence-like Reductions

    Authors: Tamal K. Dey, Michał Lipiński, Marian Mrozek, Ryan Slechta

    Abstract: Connection matrices are a generalization of Morse boundary operators from the classical Morse theory for gradient vector fields. Develo** an efficient computational framework for connection matrices is particularly important in the context of a rapidly growing data science that requires new mathematical tools for discrete data. Toward this goal, the classical theory for connection matrices has b… ▽ More

    Submitted 23 September, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

  13. arXiv:2302.12796  [pdf, other

    cs.CG math.AT

    Revisiting Graph Persistence for Updates and Efficiency

    Authors: Tamal K. Dey, Tao Hou, Salman Parsa

    Abstract: It is well known that ordinary persistence on graphs can be computed more efficiently than the general persistence. Recently, it has been shown that zigzag persistence on graphs also exhibits similar behavior. Motivated by these results, we revisit graph persistence and propose efficient algorithms especially for local updates on filtrations, similar to what is done in ordinary persistence for com… ▽ More

    Submitted 11 May, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  14. arXiv:2212.01633  [pdf, other

    cs.CG math.AT

    Cup Product Persistence and Its Efficient Computation

    Authors: Tamal K. Dey, Abhishek Rathod

    Abstract: It is well-known that the cohomology ring has a richer structure than homology groups. However, until recently, the use of cohomology in persistence setting has been limited to speeding up of barcode computations. Some of the recently introduced invariants, namely, persistent cup-length, persistent cup modules and persistent Steenrod modules, to some extent, fill this gap. When added to the standa… ▽ More

    Submitted 17 March, 2024; v1 submitted 3 December, 2022; originally announced December 2022.

    Comments: To appear in Proceedings of 40th International Symposium on Computational Geometry

  15. arXiv:2207.14358  [pdf, other

    cs.LG cs.HC cs.SI math.AT

    Topological structure of complex predictions

    Authors: Meng Liu, Tamal K. Dey, David F. Gleich

    Abstract: Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data. These are now standard tools in science. A key challenge with the current generation of models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to tr… ▽ More

    Submitted 19 October, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

  16. arXiv:2207.08475  [pdf, other

    cs.SE cs.SI

    Knights and Gold Stars: A Tale of InnerSource Incentivization

    Authors: Tapajit Dey, Willem Jiang, Brian Fitzgerald

    Abstract: Given the success of the open source phenomenon, it is not surprising that many organizations are seeking to emulate this success by adopting open source practices internally in what is termed InnerSource. However, while open source development and InnerSource are similar in some aspects, they differ significantly on others, and thus need to be implemented and managed differently. To the best of o… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  17. arXiv:2207.01015  [pdf, other

    cs.SE cs.CY

    One-off Events? An Empirical Study of Hackathon Code Creation and Reuse

    Authors: Ahmed Samir Imam Mahmoud, Tapajit Dey, Alexander Nolte, Audris Mockus, James D. Herbsleb

    Abstract: Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the hackathon code. Aim: We aim to understand the evolution of code used in and created during hackathon events, with a particular focus on the code blobs, specifically, how fr… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: Accepted in Empirical Software Engineering Journal. arXiv admin note: substantial text overlap with arXiv:2103.01145

  18. arXiv:2206.09563  [pdf, other

    cs.DS cs.DC cs.LG

    Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

    Authors: Tonmoy Dey, Yixin Chen, Alan Kuhnle

    Abstract: Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property - which had previously only been known to be satisfied by the standard greedy and continous… ▽ More

    Submitted 1 April, 2024; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: 35 pages, 5 figures

  19. arXiv:2206.00606  [pdf, other

    cs.LG cs.CV cs.SI math.AT stat.ML

    Topological Deep Learning: Going Beyond Graph Data

    Authors: Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou, Nina Miolane, Aldo Guzmán-Sáenz, Karthikeyan Natesan Ramamurthy, Tolga Birdal, Tamal K. Dey, Soham Mukherjee, Shreyas N. Samaga, Neal Livesay, Robin Walters, Paul Rosen, Michael T. Schaub

    Abstract: Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widel… ▽ More

    Submitted 19 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

  20. arXiv:2204.11080  [pdf, other

    cs.CG math.AT

    Fast Computation of Zigzag Persistence

    Authors: Tamal K. Dey, Tao Hou

    Abstract: Zigzag persistence is a powerful extension of the standard persistence which allows deletions of simplices besides insertions. However, computing zigzag persistence usually takes considerably more time than the standard persistence. We propose an algorithm called FastZigzag which narrows this efficiency gap. Our main result is that an input simplex-wise zigzag filtration can be converted to a cell… ▽ More

    Submitted 4 July, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.06315

  21. arXiv:2203.05727  [pdf, other

    math.AT cs.CG math.DS

    Tracking Dynamical Features via Continuation and Persistence

    Authors: Tamal K. Dey, Michał Lipiński, Marian Mrozek, Ryan Slechta

    Abstract: Multivector fields and combinatorial dynamical systems have recently become a subject of interest due to their potential for use in computational methods. In this paper, we develop a method to track an isolated invariant set -- a salient feature of a combinatorial dynamical system -- across a sequence of multivector fields. This goal is attained by placing the classical notion of the "continuation… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: Full version of SoCG 2022 paper

  22. arXiv:2112.02352  [pdf, other

    cs.CG math.AT

    Updating Barcodes and Representatives for Zigzag Persistence

    Authors: Tamal K. Dey, Tao Hou

    Abstract: Computing persistence over changing filtrations give rise to a stack of 2D persistence diagrams where the birth-death points are connected by the so-called `vines'. We consider computing these vines over changing filtrations for zigzag persistence. We observe that eight atomic operations are sufficient for changing one zigzag filtration to another and provide update algorithms for each of them. Si… ▽ More

    Submitted 1 August, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

  23. arXiv:2111.15058  [pdf, other

    math.AT cs.CG

    Computing Generalized Rank Invariant for 2-Parameter Persistence Modules via Zigzag Persistence and its Applications

    Authors: Tamal K. Dey, Woo** Kim, Facundo Mémoli

    Abstract: The notion of generalized rank invariant in the context of multiparameter persistence has become an important ingredient for defining interesting homological structures such as generalized persistence diagrams. Naturally, computing these rank invariants efficiently is a prelude to computing any of these derived structures efficiently. We show that the generalized rank over a finite interval $I$ of… ▽ More

    Submitted 30 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Full version of the paper in the Proceedings of the 38th International Symposium on Computational Geometry (SoCG 2022). Shortened the proof of Theorem 3.12 and added new sections 4.4 and 4.5; 21 pages, 4 figures

  24. arXiv:2111.07917  [pdf, other

    cs.DS cs.LG

    Best of Both Worlds: Practical and Theoretically Optimal Submodular Maximization in Parallel

    Authors: Yixin Chen, Tonmoy Dey, Alan Kuhnle

    Abstract: For the problem of maximizing a monotone, submodular function with respect to a cardinality constraint $k$ on a ground set of size $n$, we provide an algorithm that achieves the state-of-the-art in both its empirical performance and its theoretical properties, in terms of adaptive complexity, query complexity, and approximation ratio; that is, it obtains, with high probability, query complexity of… ▽ More

    Submitted 8 February, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 32 pages, 8 figures, to be published in NeurIPS 2021

  25. arXiv:2110.14734  [pdf, other

    cs.CG

    Approximating 1-Wasserstein Distance between Persistence Diagrams by Graph Sparsification

    Authors: Tamal K. Dey, Simon Zhang

    Abstract: Persistence diagrams (PD)s play a central role in topological data analysis. This analysis requires computing distances among such diagrams such as the 1-Wasserstein distance. Accurate computation of these PD distances for large data sets that render large diagrams may not scale appropriately with the existing methods. The main source of difficulty ensues from the size of the bipartite graph on wh… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: 31 pages, 12 figures; extended version of paper published in ALENEX 2022

  26. arXiv:2110.06315  [pdf, other

    cs.CG math.AT

    On Association between Absolute and Relative Zigzag Persistence

    Authors: Tamal K. Dey, Tao Hou

    Abstract: Duality results connecting persistence modules for absolute and relative homology provides a fundamental understanding into persistence theory. In this paper, we study similar associations in the context of zigzag persistence. Our main finding is a weak duality for the so-called non-repetitive zigzag filtrations in which a simplex is never added again after being deleted. The technique used to pro… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  27. arXiv:2108.07429  [pdf, other

    cs.CG math.AT

    Rectangular Approximation and Stability of $2$-parameter Persistence Modules

    Authors: Tamal K. Dey, Cheng Xin

    Abstract: One of the main reasons for topological persistence being useful in data analysis is that it is backed up by a stability (isometry) property: persistence diagrams of $1$-parameter persistence modules are stable in the sense that the bottleneck distance between two diagrams equals the interleaving distance between their generating modules. However, in multi-parameter setting this property breaks do… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

  28. arXiv:2107.02115  [pdf, other

    math.DS cs.CG math.AT

    Persistence of Conley-Morse Graphs in Combinatorial Dynamical Systems

    Authors: Tamal K. Dey, Marian Mrozek, Ryan Slechta

    Abstract: Multivector fields provide an avenue for studying continuous dynamical systems in a combinatorial framework. There are currently two approaches in the literature which use persistent homology to capture changes in combinatorial dynamical systems. The first captures changes in the Conley index, while the second captures changes in the Morse decomposition. However, such approaches have limitations.… ▽ More

    Submitted 5 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

  29. arXiv:2105.00518  [pdf, ps, other

    cs.CG math.AT

    Computing Optimal Persistent Cycles for Levelset Zigzag on Manifold-like Complexes

    Authors: Tamal K. Dey, Tao Hou

    Abstract: In standard persistent homology, a persistent cycle born and dying with a persistence interval (bar) associates the bar with a concrete topological representative, which provides means to effectively navigate back from the barcode to the topological space. Among the possibly many, optimal persistent cycles bring forth further information due to having guaranteed quality. However, topological featu… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

  30. arXiv:2104.13430  [pdf, other

    cond-mat.mtrl-sci cs.CG

    Topological Filtering for 3D Microstructure Segmentation

    Authors: Anand V. Patel, Tao Hou, Juan D. Beltran Rodriguez, Tamal K. Dey, Dunbar P. Birnie III

    Abstract: Tomography is a widely used tool for analyzing microstructures in three dimensions (3D). The analysis, however, faces difficulty because the constituent materials produce similar grey-scale values. Sometimes, this prompts the image segmentation process to assign a pixel/voxel to the wrong phase (active material or pore). Consequently, errors are introduced in the microstructure characteristics cal… ▽ More

    Submitted 26 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

  31. arXiv:2103.10167  [pdf, other

    cs.SE

    Tracking Hackathon Code Creation and Reuse

    Authors: Ahmed Imam, Tapajit Dey

    Abstract: Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the code brought to or created during a hackathon. Aim: We aim to understand the evolution of hackathon-related code, specifically, how much hackathon teams rely on pre-existin… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: An abridged version of arXiv:2103.01145. Required for publication pre-print distribution

  32. arXiv:2103.09583  [pdf, other

    cs.GR

    2D Points Curve Reconstruction Survey and Benchmark

    Authors: Stefan Ohrhallinger, Jiju Peethambaran, Amal D. Parakkat, Tamal K. Dey, Ramanathan Muthuganapathy

    Abstract: Curve reconstruction from unstructured points in a plane is a fundamental problem with many applications that has generated research interest for decades. Involved aspects like handling open, sharp, multiple and non-manifold outlines, run-time and provability as well as potential extension to 3D for surface reconstruction have led to many different algorithms. We survey the literature on 2D curve… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: 24 pages, 22 figures, 5 tables

  33. arXiv:2103.07353  [pdf, ps, other

    cs.CG math.AT

    Computing Zigzag Persistence on Graphs in Near-Linear Time

    Authors: Tamal K. Dey, Tao Hou

    Abstract: Graphs model real-world circumstances in many applications where they may constantly change to capture the dynamic behavior of the phenomena. Topological persistence which provides a set of birth and death pairs for the topological features is one instrument for analyzing such changing graph data. However, standard persistent homology defined over a growing space cannot always capture such a dynam… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: The full version of the paper

  34. arXiv:2103.01145  [pdf, other

    cs.SE

    The Secret Life of Hackathon Code

    Authors: Ahmed Imam, Tapajit Dey, Alexander Nolte, Audris Mockus, James D. Herbsleb

    Abstract: Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the code brought to or created during a hackathon. Aim: We aim to understand the evolution of hackathon-related code, specifically, how much hackathon teams rely on pre-existin… ▽ More

    Submitted 18 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: Accepted in Proceedings of the 18th International Conference on Mining Software Repositories, MSR '21

  35. arXiv:2010.16196  [pdf, other

    cs.SE

    World of Code: Enabling a Research Workflow for Mining and Analyzing the Universe of Open Source VCS data

    Authors: Yuxing Ma, Tapajit Dey, Chris Bogart, Sadika Amreen, Marat Valiev, Adam Tutko, David Kennard, Russell Zaretzki, Audris Mockus

    Abstract: Open source software (OSS) is essential for modern society and, while substantial research has been done on individual (typically central) projects, only a limited understanding of the periphery of the entire OSS ecosystem exists. For example, how are the tens of millions of projects in the periphery interconnected through. technical dependencies, code sharing, or knowledge flow? To answer such qu… ▽ More

    Submitted 30 October, 2020; originally announced October 2020.

  36. Effect of Technical and Social Factors on Pull Request Quality for the NPM Ecosystem

    Authors: Tapajit Dey, Audris Mockus

    Abstract: Pull request (PR) based development, which is a norm for the social coding platforms, entails the challenge of evaluating the contributions of, often unfamiliar, developers from across the open source ecosystem and, conversely, submitting a contribution to a project with unfamiliar maintainers. Previous studies suggest that the decision of accepting or rejecting a PR may be influenced by a divergi… ▽ More

    Submitted 20 July, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: text overlap with arXiv:2003.01153. Preprint of the paper accepted in ESEM,2020 conference

    ACM Class: D.2.7

  37. arXiv:2005.10176  [pdf, other

    cs.SE cs.LG

    Representation of Developer Expertise in Open Source Software

    Authors: Tapajit Dey, Andrey Karnauch, Audris Mockus

    Abstract: Background: Accurate representation of developer expertise has always been an important research problem. While a number of studies proposed novel methods of representing expertise within individual projects, these methods are difficult to apply at an ecosystem level. However, with the focus of software development shifting from monolithic to modular, a method of representing developers' expertise… ▽ More

    Submitted 2 February, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Accepted in ICSE 2021 Main Technical Track

  38. Do Code Review Measures Explain the Incidence of Post-Release Defects?

    Authors: Andrey Krutauz, Tapajit Dey, Peter C. Rigby, Audris Mockus

    Abstract: Aim: In contrast to studies of defects found during code review, we aim to clarify whether code reviews measures can explain the prevalence of post-release defects. Method: We replicate a study by McIntoshet. al that uses additive regression to model the relationship between defects and code reviews. To increase external validity, we apply the same methodology on a new software project. We discuss… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  39. A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits

    Authors: Tanner Fry, Tapajit Dey, Andrey Karnauch, Audris Mockus

    Abstract: The data collected from open source projects provide means to model large software ecosystems, but often suffer from data quality issues, specifically, multiple author identification strings in code commits might actually be associated with one developer. While many methods have been proposed for addressing this problem, they are either heuristics requiring manual tweaking, or require too much cal… ▽ More

    Submitted 27 March, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

  40. An Exploratory Study of Bot Commits

    Authors: Tapajit Dey, Bogdan Vasilescu, Audris Mockus

    Abstract: Background: Bots help automate many of the tasks performed by software developers and are widely used to commit code in various social coding platforms. At present, it is not clear what types of activities these bots perform and understanding it may help design better bots, and find application areas which might benefit from bot adoption. Aim: We aim to categorize the Bot Commits by the type of ch… ▽ More

    Submitted 27 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  41. arXiv:2003.05579  [pdf, other

    math.AT cs.CG math.DS

    Persistence of the Conley Index in Combinatorial Dynamical Systems

    Authors: Tamal K. Dey, Marian Mrozek, Ryan Slechta

    Abstract: A combinatorial framework for dynamical systems provides an avenue for connecting classical dynamics with data-oriented, algorithmic methods. Combinatorial vector fields introduced by Forman and their recent generalization to multivector fields have provided a starting point for building such a connection. In this work, we strengthen this relationship by placing the Conley index in the persistent… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

  42. arXiv:2003.03172  [pdf, other

    cs.SE cs.CR cs.LG cs.SI stat.ML

    Detecting and Characterizing Bots that Commit Code

    Authors: Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan Vasilescu, Anna Filippova, Audris Mockus

    Abstract: Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer… ▽ More

    Submitted 27 March, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Preprint of the paper accepted in MSR, 2020 conference

  43. arXiv:2003.01153  [pdf, other

    cs.SE

    Which Pull Requests Get Accepted and Why? A study of popular NPM Packages

    Authors: Tapajit Dey, Audris Mockus

    Abstract: Background: Pull Request (PR) Integrators often face challenges in terms of multiple concurrent PRs, so the ability to gauge which of the PRs will get accepted can help them balance their workload. PR creators would benefit from knowing if certain characteristics of their PRs may increase the chances of acceptance. Aim: We modeled the probability that a PR will be accepted within a month after cre… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  44. Deriving a Usage-Independent Software Quality Metric

    Authors: Tapajit Dey, Audris Mockus

    Abstract: Context:The extent of post-release use of software affects the number of faults, thus biasing quality metrics and adversely affecting associated decisions. The proprietary nature of usage data limited deeper exploration of this subject in the past. Objective: To determine how software faults and software use are related and how an accurate quality measure can be designed. Method: New users, usage… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  45. arXiv:2001.09549  [pdf, other

    cs.CG math.CO

    An efficient algorithm for $1$-dimensional (persistent) path homology

    Authors: Tamal K. Dey, Tianqi Li, Yusu Wang

    Abstract: This paper focuses on develo** an efficient algorithm for analyzing a directed network (graph) from a topological viewpoint. A prevalent technique for such topological analysis involves computation of homology groups and their persistence. These concepts are well suited for spaces that are not directed. As a result, one needs a concept of homology that accommodates orientations in input space. P… ▽ More

    Submitted 26 January, 2020; originally announced January 2020.

  46. Road Network Reconstruction from Satellite Images with Machine Learning Supported by Topological Methods

    Authors: Tamal K. Dey, Jiayuan Wang, Yusu Wang

    Abstract: Automatic Extraction of road network from satellite images is a goal that can benefit and even enable new technologies. Methods that combine machine learning (ML) and computer vision have been proposed in recent years which make the task semi-automatic by requiring the user to provide curated training samples. The process can be fully automatized if training samples can be produced algorithmically… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: 26 pages, 13 figures, ACM SIGSPATIAL 2019

  47. arXiv:1907.06538  [pdf, other

    cs.SE cs.CY

    Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem

    Authors: Tapajit Dey, Yuxing Ma, Audris Mockus

    Abstract: Background: Open source requires participation of volunteer and commercial developers (users) in order to deliver functional high-quality components. Developers both contribute effort in the form of patches and demand effort from the component maintainers to resolve issues reported against it. Aim: Identify and characterize patterns of effort contribution and demand throughout the open source supp… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: 10 pages, 5 Tables, 2 Figures, Accepted in The 15th International Conference on Predictive Models and Data Analytics in Software Engineering 2019

  48. arXiv:1907.04889  [pdf, other

    cs.CG math.AT

    Computing Minimal Persistent Cycles: Polynomial and Hard Cases

    Authors: Tamal K. Dey, Tao Hou, Sayan Mandal

    Abstract: Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable.… ▽ More

    Submitted 14 February, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Content same as appeared in the proceeding of SODA20'

  49. arXiv:1904.03766  [pdf, other

    math.AT cs.CG

    Generalized Persistence Algorithm for Decomposing Multi-parameter Persistence Modules

    Authors: Tamal K. Dey, Cheng Xin

    Abstract: The classical persistence algorithm computes the unique decomposition of a persistence module implicitly given by an input simplicial filtration. Based on matrix reduction, this algorithm is a cornerstone of the emergent area of topological data analysis. Its input is a simplicial filtration defined over the integers $\mathbb{Z}$ giving rise to a $1$-parameter persistence module. It has been recog… ▽ More

    Submitted 6 December, 2021; v1 submitted 7 April, 2019; originally announced April 2019.

  50. arXiv:1810.04807  [pdf, other

    cs.CG math.AT

    Persistent 1-Cycles: Definition, Computation, and Its Application

    Authors: Tamal K. Dey, Tao Hou, Sayan Mandal

    Abstract: Persistence diagrams, which summarize the birth and death of homological features extracted from data, are employed as stable signatures for applications in image analysis and other areas. Besides simply considering the multiset of intervals included in a persistence diagram, some applications need to find representative cycles for the intervals. In this paper, we address the problem of computing… ▽ More

    Submitted 15 October, 2018; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: Correct the algorithm numbering issue