-
Joint reconstruction-segmentation on graphs
Authors:
Jeremy Budd,
Yves van Gennip,
Jonas Latz,
Simone Parisotto,
Carola-Bibiane Schönlieb
Abstract:
Practical image segmentation tasks concern images which must be reconstructed from noisy, distorted, and/or incomplete observations. A recent approach for solving such tasks is to perform this reconstruction jointly with the segmentation, using each to guide the other. However, this work has so far employed relatively simple segmentation methods, such as the Chan--Vese algorithm. In this paper, we…
▽ More
Practical image segmentation tasks concern images which must be reconstructed from noisy, distorted, and/or incomplete observations. A recent approach for solving such tasks is to perform this reconstruction jointly with the segmentation, using each to guide the other. However, this work has so far employed relatively simple segmentation methods, such as the Chan--Vese algorithm. In this paper, we present a method for joint reconstruction-segmentation using graph-based segmentation methods, which have been seeing increasing recent interest. Complications arise due to the large size of the matrices involved, and we show how these complications can be managed. We then analyse the convergence properties of our scheme. Finally, we apply this scheme to distorted versions of ``two cows'' images familiar from previous graph-based segmentation literature, first to a highly noised version and second to a blurred version, achieving highly accurate segmentations in both cases. We compare these results to those obtained by sequential reconstruction-segmentation approaches, finding that our method competes with, or even outperforms, those approaches in terms of reconstruction and segmentation accuracy.
△ Less
Submitted 20 January, 2023; v1 submitted 11 August, 2022;
originally announced August 2022.
-
An MBO scheme for clustering and semi-supervised clustering of signed networks
Authors:
Mihai Cucuringu,
Andrea Pizzoferrato,
Yves van Gennip
Abstract:
We introduce a principled method for the signed clustering problem, where the goal is to partition a graph whose edge weights take both positive and negative values, such that edges within the same cluster are mostly positive, while edges spanning across clusters are mostly negative. Our method relies on a graph-based diffuse interface model formulation utilizing the Ginzburg-Landau functional, ba…
▽ More
We introduce a principled method for the signed clustering problem, where the goal is to partition a graph whose edge weights take both positive and negative values, such that edges within the same cluster are mostly positive, while edges spanning across clusters are mostly negative. Our method relies on a graph-based diffuse interface model formulation utilizing the Ginzburg-Landau functional, based on an adaptation of the classic numerical Merriman-Bence-Osher (MBO) scheme for minimizing such graph-based functionals. The proposed objective function aims to minimize the total weight of inter-cluster positively-weighted edges, while maximizing the total weight of the inter-cluster negatively-weighted edges. Our method scales to large sparse networks, and can be easily adjusted to incorporate labelled data information, as is often the case in the context of semi-supervised learning. We tested our method on a number of both synthetic stochastic block models and real-world data sets (including financial correlation matrices), and obtained promising results that compare favourably against a number of state-of-the-art approaches from the recent literature.
△ Less
Submitted 9 October, 2019; v1 submitted 10 January, 2019;
originally announced January 2019.
-
Unsupervised record matching with noisy and incomplete data
Authors:
Yves van Gennip,
Blake Hunter,
Anna Ma,
Daniel Moyer,
Ryan de Vera,
Andrea L. Bertozzi
Abstract:
We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, which can lead to records being different, despite referring to the same entity. Our method consists o…
▽ More
We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, which can lead to records being different, despite referring to the same entity. Our method consists of three main steps: creating a similarity score between records, grou** records together into "unique entities", and refining the groups. We compare various methods for creating similarity scores between noisy records, considering different combinations of string matching, term frequency-inverse document frequency methods, and n-gram techniques. In particular, we introduce a vectorized soft term frequency-inverse document frequency method, with an optional refinement step. We also discuss two methods to deal with missing data in computing similarity scores.
We test our method on the Los Angeles Police Department Field Interview Card data set, the Cora Citation Matching data set, and two sets of restaurant review data. The results show that the methods that use words as the basic units are preferable to those that use 3-grams. Moreover, in some (but certainly not all) parameter ranges soft term frequency-inverse document frequency methods can outperform the standard term frequency-inverse document frequency method. The results also confirm that our method for automatically determining the number of groups typically works well in many cases and allows for accurate results in the absence of a priori knowledge of the number of unique entities in the data set.
△ Less
Submitted 30 April, 2018; v1 submitted 10 April, 2017;
originally announced April 2017.
-
Graph clustering, variational image segmentation methods and Hough transform scale detection for object measurement in images
Authors:
Luca Calatroni,
Yves van Gennip,
Carola-Bibiane Schönlieb,
Hannah Rowland,
Arjuna Flenner
Abstract:
We consider the problem of scale detection in images where a region of interest is present together with a measurement tool (e.g. a ruler). For the segmentation part, we focus on the graph based method by Flenner and Bertozzi which reinterprets classical continuous Ginzburg-Landau minimisation models in a totally discrete framework. To overcome the numerical difficulties due to the large size of t…
▽ More
We consider the problem of scale detection in images where a region of interest is present together with a measurement tool (e.g. a ruler). For the segmentation part, we focus on the graph based method by Flenner and Bertozzi which reinterprets classical continuous Ginzburg-Landau minimisation models in a totally discrete framework. To overcome the numerical difficulties due to the large size of the images considered we use matrix completion and splitting techniques. The scale on the measurement tool is detected via a Hough transform based algorithm. The method is then applied to some measurement tasks arising in real-world applications such as zoology, medicine and archaeology.
△ Less
Submitted 23 September, 2016; v1 submitted 27 February, 2016;
originally announced February 2016.
-
A Regularization Approach to Blind Deblurring and Denoising of QR Barcodes
Authors:
Yves van Gennip,
Prashant Athavale,
Jérôme Gilles,
Rustum Choksi
Abstract:
QR bar codes are prototypical images for which part of the image is a priori known (required patterns). Open source bar code readers, such as ZBar, are readily available. We exploit both these facts to provide and assess purely regularization-based methods for blind deblurring of QR bar codes in the presence of noise.
QR bar codes are prototypical images for which part of the image is a priori known (required patterns). Open source bar code readers, such as ZBar, are readily available. We exploit both these facts to provide and assess purely regularization-based methods for blind deblurring of QR bar codes in the presence of noise.
△ Less
Submitted 24 March, 2017; v1 submitted 23 October, 2014;
originally announced October 2014.
-
Multislice Modularity Optimization in Community Detection and Image Segmentation
Authors:
Huiyi Hu,
Yves van Gennip,
Blake Hunter,
Mason A. Porter,
Andrea L. Bertozzi
Abstract:
Because networks can be used to represent many complex systems, they have attracted considerable attention in physics, computer science, sociology, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups (i.e., "communities") of nodes. In this paper, we algorithmically detect communities in social networks and image data by opt…
▽ More
Because networks can be used to represent many complex systems, they have attracted considerable attention in physics, computer science, sociology, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups (i.e., "communities") of nodes. In this paper, we algorithmically detect communities in social networks and image data by optimizing multislice modularity. A key advantage of modularity optimization is that it does not require prior knowledge of the number or sizes of communities, and it is capable of finding network partitions that are composed of communities of different sizes. By optimizing multislice modularity and subsequently calculating diagnostics on the resulting network partitions, it is thereby possible to obtain information about network structure across multiple system scales. We illustrate this method on data from both social networks and images, and we find that optimization of multislice modularity performs well on these two tasks without the need for extensive problem-specific adaptation. However, improving the computational speed of this method remains a challenging open problem.
△ Less
Submitted 30 November, 2012;
originally announced November 2012.
-
Geosocial Graph-Based Community Detection
Authors:
Yves van Gennip,
Huiyi Hu,
Blake Hunter,
Mason A. Porter
Abstract:
We apply spectral clustering and multislice modularity optimization to a Los Angeles Police Department field interview card data set. To detect communities (i.e., cohesive groups of vertices), we use both geographic and social information about stops involving street gang members in the LAPD district of Hollenbeck. We then compare the algorithmically detected communities with known gang identifica…
▽ More
We apply spectral clustering and multislice modularity optimization to a Los Angeles Police Department field interview card data set. To detect communities (i.e., cohesive groups of vertices), we use both geographic and social information about stops involving street gang members in the LAPD district of Hollenbeck. We then compare the algorithmically detected communities with known gang identifications and argue that discrepancies are due to sparsity of social connections in the data as well as complex underlying sociological factors that blur distinctions between communities.
△ Less
Submitted 25 November, 2012;
originally announced November 2012.
-
Community detection using spectral clustering on sparse geosocial data
Authors:
Yves van Gennip,
Blake Hunter,
Raymond Ahn,
Peter Elliott,
Kyle Luh,
Megan Halvorson,
Shannon Reid,
Matt Valasik,
James Wo,
George E. Tita,
Andrea L. Bertozzi,
P. Jeffrey Brantingham
Abstract:
In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters i…
▽ More
In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters in the graph, corresponding to communities in Hollenbeck, and compare these with the LAPD's knowledge of the individuals' gang membership. We discuss different ways of encoding the geosocial information using a graph structure and the influence on the resulting clusterings. Finally we analyze the robustness of this technique with respect to noisy and incomplete data, thereby providing suggestions about the relative importance of quantity versus quality of collected data.
△ Less
Submitted 8 November, 2012; v1 submitted 21 June, 2012;
originally announced June 2012.