Skip to main content

Showing 1–22 of 22 results for author: Pucci, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.09410  [pdf, other

    cs.CL cs.AI

    When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour

    Authors: Leonardo Ranaldi, Giulia Pucci

    Abstract: Large Language Models have been demonstrating the ability to solve complex tasks by delivering answers that are positively evaluated by humans due in part to the intensive use of human feedback that refines responses. However, the suggestibility transmitted through human feedback increases the inclination to produce responses that correspond to the users' beliefs or misleading prompts as opposed t… ▽ More

    Submitted 28 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  2. arXiv:2311.08097  [pdf, other

    cs.CL cs.AI

    Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts

    Authors: Leonardo Ranaldi, Giulia Pucci, Federico Ranaldi, Elena Sofia Ruzzetti, Fabio Massimo Zanzotto

    Abstract: Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, whic… ▽ More

    Submitted 21 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Findings of the Association for Computational Linguistics: NAACL 2024

    Report number: 2024.findings-naacl.78

    Journal ref: 2024.findings-naacl.78

  3. arXiv:2308.14186  [pdf, other

    cs.CL cs.AI

    Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations

    Authors: Leonardo Ranaldi, Giulia Pucci, Andre Freitas

    Abstract: The language ability of Large Language Models (LLMs) is often unbalanced towards English because of the imbalance in the distribution of the pre-training data. This disparity is demanded in further fine-tuning and affecting the cross-lingual abilities of LLMs. In this paper, we propose to empower Instructiontuned LLMs (It-LLMs) in languages other than English by building semantic alignment between… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  4. Fully dynamic clustering and diversity maximization in doubling metrics

    Authors: Paolo Pellizzoni, Andrea Pietracaprina, Geppino Pucci

    Abstract: We present approximation algorithms for some variants of center-based clustering and related problems in the fully dynamic setting, where the pointset evolves through an arbitrary sequence of insertions and deletions. Specifically, we target the following problems: $k$-center (with and without outliers), matroid-center, and diversity maximization. All algorithms employ a coreset-based strategy and… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Journal ref: WADS 2023. Lecture Notes in Computer Science, vol 14079. Springer, Cham

  5. arXiv:2202.08173  [pdf, other

    cs.DC cs.DS cs.LG

    Distributed k-Means with Outliers in General Metrics

    Authors: Enrico Dandolo, Andrea Pietracaprina, Geppino Pucci

    Abstract: Center-based clustering is a pivotal primitive for unsupervised learning and data analysis. A popular variant is undoubtedly the k-means problem, which, given a set $P$ of points from a metric space and a parameter $k<|P|$, requires to determine a subset $S$ of $k$ centers minimizing the sum of all squared distances of points in $P$ from their closest center. A more general formulation, known as k… ▽ More

    Submitted 18 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  6. k-Center Clustering with Outliers in Sliding Windows

    Authors: Paolo Pellizzoni, Andrea Pietracaprina, Geppino Pucci

    Abstract: Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a given number $z$ of points of the dataset, called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding wind… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Journal ref: Algorithms. 2022; 15(2):52

  7. arXiv:2003.01430  [pdf, other

    cs.DS cs.LG

    Scalable Distributed Approximation of Internal Measures for Clustering Evaluation

    Authors: Federico Altieri, Andrea Pietracaprina, Geppino Pucci, Fabio Vandin

    Abstract: The most widely used internal measure for clustering evaluation is the silhouette coefficient, whose naive computation requires a quadratic number of distance calculations, which is clearly unfeasible for massive datasets. Surprisingly, there are no known general methods to efficiently approximate the silhouette coefficient of a clustering with rigorously provable high accuracy. In this paper, we… ▽ More

    Submitted 20 January, 2021; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: 16 pages, 4 tables, 1 figure

    ACM Class: I.5.3; I.5.4; I.5.5

  8. arXiv:2002.07463  [pdf, ps, other

    cs.DS cs.DC cs.LG

    Coreset-based Strategies for Robust Center-type Problems

    Authors: Andrea Pietracaprina, Geppino Pucci, Federico Soldà

    Abstract: Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center. The \emph{robust} formulation of the problem features a further parameter $z$ and allows up to $z$ points of $V$ (outliers) to be disregarded when computing the maximum distance… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Comments: 16 pages

  9. arXiv:2002.03175  [pdf, ps, other

    cs.DC cs.DS

    A General Coreset-Based Approach to Diversity Maximization under Matroid Constraints

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci

    Abstract: Diversity maximization is a fundamental problem in web search and data mining. For a given dataset $S$ of $n$ elements, the problem requires to determine a subset of $S$ containing $k\ll n$ "representatives" which minimize some diversity function expressed in terms of pairwise distances, where distance models dissimilarity. An important variant of the problem prescribes that the solution satisfy a… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  10. arXiv:1904.12728  [pdf, ps, other

    cs.DC cs.DS

    Accurate MapReduce Algorithms for $k$-median and $k$-means in General Metric Spaces

    Authors: Alessio Mazzetto, Andrea Pietracaprina, Geppino Pucci

    Abstract: Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-median and $k$-means variants which, given a set $P$ of points from a metric space and a parameter $k<|P|$, require to identify a set $S$ of $k$ centers minimizing, respectively, the sum of the distances and of the squared distances of all… ▽ More

    Submitted 29 September, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

  11. arXiv:1802.09205  [pdf, other

    cs.DC cs.DS

    Solving $k$-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci

    Abstract: Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-center variant which, given a set $S$ of points from some metric space and a parameter $k<|S|$, requires to identify a subset of $k$ centers in $S$ minimizing the maximum distance of any point of $S$ from its closest center. A more general… ▽ More

    Submitted 1 June, 2021; v1 submitted 26 February, 2018; originally announced February 2018.

  12. arXiv:1612.06675  [pdf, other

    cs.DS

    Clustering Uncertain Graphs

    Authors: Matteo Ceccarello, Carlo Fantozzi, Andrea Pietracaprina, Geppino Pucci, Fabio Vandin

    Abstract: An uncertain graph $\mathcal{G} = (V, E, p : E \rightarrow (0,1])$ can be viewed as a probability space whose outcomes (referred to as \emph{possible worlds}) are subgraphs of $\mathcal{G}$ where any edge $e\in E$ occurs with probability $p(e)$, independently of the other edges. These graphs naturally arise in many application domains where data management systems are required to cope with uncerta… ▽ More

    Submitted 16 October, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

  13. arXiv:1605.05590  [pdf, other

    cs.DC

    MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: Given a dataset of points in a metric space and an integer $k$, a diversity maximization problem requires determining a subset of $k$ points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in… ▽ More

    Submitted 23 January, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

    Comments: Extended version of http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5, January 2017

  14. arXiv:1506.03265  [pdf, other

    cs.DC

    A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the algorithm is a weighted graph decomposition strategy generating disjoint clusters of bounded weighted radius. Theoretically, our algorithm uses linear space and yields a polylogarith… ▽ More

    Submitted 9 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

  15. arXiv:1407.3144  [pdf, other

    cs.DC cs.DS

    Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: We develop a novel parallel decomposition strategy for unweighted, undirected graphs, based on growing disjoint connected clusters from batches of centers progressively selected from yet uncovered nodes. With respect to similar previous decompositions, our strategy exercises a tighter control on both the number of clusters and their maximum radius. We present two important applications of our pa… ▽ More

    Submitted 6 February, 2015; v1 submitted 11 July, 2014; originally announced July 2014.

    Comments: 14 pages

  16. arXiv:1404.3318  [pdf, other

    cs.DS cs.DC

    Network-Oblivious Algorithms

    Authors: Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, Michele Scquizzato, Francesco Silvestri

    Abstract: A framework is proposed for the design and analysis of \emph{network-oblivious algorithms}, namely, algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the p… ▽ More

    Submitted 12 April, 2014; originally announced April 2014.

    Comments: 34 pages

  17. Space-Efficient Parallel Algorithms for Combinatorial Search Problems

    Authors: Andrea Pietracaprina, Geppino Pucci, Francesco Silvestri, Fabio Vandin

    Abstract: We present space-efficient parallel strategies for two fundamental combinatorial search problems, namely, backtrack search and branch-and-bound, both involving the visit of an $n$-node tree of height $h$ under the assumption that a node can be accessed only through its father or its children. For both problems we propose efficient algorithms that run on a $p$-processor distributed-memory machine.… ▽ More

    Submitted 26 March, 2014; v1 submitted 11 June, 2013; originally announced June 2013.

    Comments: Extended version of the paper in the Proc. of 38th International Symposium on Mathematical Foundations of Computer Science (MFCS)

    ACM Class: F.2.2

  18. Space-Round Tradeoffs for MapReduce Computations

    Authors: Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal

    Abstract: This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and… ▽ More

    Submitted 9 November, 2011; originally announced November 2011.

    Journal ref: Final version in Proc. of the 26th ACM international conference on Supercomputing, pages 235-244, 2012

  19. arXiv:1101.4609  [pdf, ps, other

    cs.DM cs.DS

    Tight Bounds on Information Dissemination in Sparse Mobile Networks

    Authors: Alberto Pettarin, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: Motivated by the growing interest in mobile systems, we study the dynamics of information dissemination between agents moving independently on a plane. Formally, we consider $k$ mobile agents performing independent random walks on an $n$-node grid. At time $0$, each agent is located at a random node of the grid and one agent has a rumor. The spread of the rumor is governed by a dynamic communicati… ▽ More

    Submitted 1 February, 2011; v1 submitted 24 January, 2011; originally announced January 2011.

    Comments: 19 pages; we rewrote Lemma 4, fixing a claim which was not fully justified in the first version of the draft

  20. arXiv:1007.1604  [pdf, other

    cs.DM cs.DS

    Infectious Random Walks

    Authors: Alberto Pettarin, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: We study the dynamics of information (or virus) dissemination by $m$ mobile agents performing independent random walks on an $n$-node grid. We formulate our results in terms of two scenarios: broadcasting and gossi**. In the broadcasting scenario, the mobile agents are initially placed uniformly at random among the grid nodes. At time 0, one agent is informed of a rumor and starts a random walk.… ▽ More

    Submitted 25 January, 2011; v1 submitted 9 July, 2010; originally announced July 2010.

    Comments: 21 pages, 3 figures --- The results presented in this paper have been extended in: Pettarin et al., Tight Bounds on Information Dissemination in Sparse Mobile Networks, http://arxiv.longhoe.net/abs/1101.4609

  21. arXiv:1002.1104  [pdf, ps, other

    cs.DB cs.DS

    An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

    Authors: Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci, Eli Upfal, Fabio Vandin

    Abstract: As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support thres… ▽ More

    Submitted 4 February, 2010; originally announced February 2010.

    Comments: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figures

    ACM Class: H.2.8

  22. arXiv:1002.0874  [pdf, ps, other

    cs.DS

    MADMX: A Novel Strategy for Maximal Dense Motif Extraction

    Authors: Roberto Grossi, Andrea Pietracaprina, Nadia Pisanti, Geppino Pucci, Eli Upfal, Fabio Vandin

    Abstract: We develop, analyze and experiment with a new tool, called MADMX, which extracts frequent motifs, possibly including don't care characters, from biological sequences. We introduce density, a simple and flexible measure for bounding the number of don't cares in a motif, defined as the ratio of solid (i.e., different from don't care) characters to the total length of the motif. By extracting only… ▽ More

    Submitted 3 February, 2010; originally announced February 2010.

    Comments: A preliminary version of this work was presented in WABI 2009. 10 pages, 0 figures