-
GATE: How to Keep Out Intrusive Neighbors
Authors:
Nimrah Mustafa,
Rebekka Burkholz
Abstract:
Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It…
▽ More
Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It alleviates over-smoothing by addressing its root cause of unnecessary neighborhood aggregation. ii) Similarly to perceptrons, it benefits from higher depth as it can still utilize additional layers for (non-)linear feature transformations in case of (nearly) switched-off neighborhood aggregation. iii) By down-weighting connections to unrelated neighbors, it often outperforms GATs on real-world heterophilic datasets. To further validate our claims, we construct a synthetic test bed to analyze a model's ability to utilize the appropriate amount of neighborhood aggregation, which could be of independent interest.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Are GATs Out of Balance?
Authors:
Nimrah Mustafa,
Aleksandar Bojchevski,
Rebekka Burkholz
Abstract:
While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conse…
▽ More
While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a step** stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
△ Less
Submitted 25 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Deep Maxout Network-based Feature Fusion and Political Tangent Search Optimizer enabled Transfer Learning for Thalassemia Detection
Authors:
Hemn Barzan Abdalla,
Awder Ahmed,
Guoquan Li,
Nasser Mustafa,
Abdur Rashid Sangi
Abstract:
Thalassemia is a heritable blood disorder which is the outcome of a genetic defect causing lack of production of hemoglobin polypeptide chains. However, there is less understanding of the precise frequency as well as sharing in these areas. Knowing about the frequency of thalassemia occurrence and dependable mutations is thus a significant step in preventing, controlling, and treatment planning. H…
▽ More
Thalassemia is a heritable blood disorder which is the outcome of a genetic defect causing lack of production of hemoglobin polypeptide chains. However, there is less understanding of the precise frequency as well as sharing in these areas. Knowing about the frequency of thalassemia occurrence and dependable mutations is thus a significant step in preventing, controlling, and treatment planning. Here, Political Tangent Search Optimizer based Transfer Learning (PTSO_TL) is introduced for thalassemia detection. Initially, input data obtained from a particular dataset is normalized in the data normalization stage. Quantile normalization is utilized in the data normalization stage, and the data are then passed to the feature fusion phase, in which Weighted Euclidean Distance with Deep Maxout Network (DMN) is utilized. Thereafter, data augmentation is performed using the oversampling method to increase data dimensionality. Lastly, thalassemia detection is carried out by TL, wherein a convolutional neural network (CNN) is utilized with hyperparameters from a trained model such as Xception. TL is tuned by PTSO, and the training algorithm PTSO is presented by merging of Political Optimizer (PO) and Tangent Search Algorithm (TSA). Furthermore, PTSO_TL obtained maximal precision, recall, and f-measure values of about 94.3%, 96.1%, and 95.2%, respectively.
△ Less
Submitted 28 June, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical
Authors:
Mónika Csikós,
Nabil H. Mustafa
Abstract:
We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $χ:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{χ(S)}|$, where $χ(S)=\sum\limits_{x \in S}χ(x)$.
We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter functi…
▽ More
We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $χ:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{χ(S)}|$, where $χ(S)=\sum\limits_{x \in S}χ(x)$.
We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $π^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size.
Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s.
The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Persistent Memory Objects: Fast and Easy Crash Consistency for Persistent Memory
Authors:
Derrick Greenspan,
Naveed Ul Mustafa,
Zoran Kolega,
Mark Heinrich,
Yan Solihin
Abstract:
DIMM-compatible persistent memory unites memory and storage. Prior works utilize persistent memory either by combining the filesystem with direct access on memory mapped files or by managing it as a collection of objects while abolishing the POSIX abstraction. In contrast, we propose retaining the POSIX abstraction and extending it to provide support for persistent memory, using Persistent Memory…
▽ More
DIMM-compatible persistent memory unites memory and storage. Prior works utilize persistent memory either by combining the filesystem with direct access on memory mapped files or by managing it as a collection of objects while abolishing the POSIX abstraction. In contrast, we propose retaining the POSIX abstraction and extending it to provide support for persistent memory, using Persistent Memory Objects (PMOs). In this work, we design and implement PMOs, a crash-consistent abstraction for managing persistent memory. We introduce psync, a single system call, that a programmer can use to specify crash consistency points in their code, without needing to orchestrate durability explicitly. When rendering data crash consistent, our design incurs a overhead of $\approx 25\%$ and $\approx 21\%$ for parallel workloads and FileBench, respectively, compared to a system without crash consistency. Compared to NOVA-Fortis, our design provides a speedup of $\approx 1.67\times$ and $\approx 3\times$ for the two set of benchmarks, respectively.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Social Groups Based Content Caching in Wireless Networks
Authors:
Nimrah Mustafa,
Imdadullah Khan,
Muhammad Asad Khan,
Zartash Afzal Uzmi
Abstract:
The unprecedented growth of wireless mobile traffic, mainly due to multimedia traffic over online social platforms has strained the resources in the mobile backhaul network. A promising approach to reduce the backhaul load is to proactively cache content at the network edge, taking into account the overlaid social network. Known caching schemes require complete knowledge of the social graph and ma…
▽ More
The unprecedented growth of wireless mobile traffic, mainly due to multimedia traffic over online social platforms has strained the resources in the mobile backhaul network. A promising approach to reduce the backhaul load is to proactively cache content at the network edge, taking into account the overlaid social network. Known caching schemes require complete knowledge of the social graph and mainly focus on one-to-one interactions forgoing the prevalent mode of content sharing among circles of 'friends'. We propose Bingo, a proactive content caching scheme that leverages the presence of interest groups in online social networks. The mobile network operator (MNO) can choose to incrementally deploy Bingo at select network nodes (base stations, packet core, data center) based on user profiles and revenue numbers. We approximate the group memberships of users using the available user-content request logs without any prior knowledge of the overlaid social graph. Bingo can cater to the evolving nature of online social groups and file popularity distribution for making caching decisions. We use synthetically generated group structures and simulate user requests at the base station for empirical evaluation against traditional and recent caching schemes. Bingo achieves up to 30%-34% gain over the best baseline.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Optimal Approximations Made Easy
Authors:
Mónika Csikós,
Nabil H. Mustafa
Abstract:
The fundamental result of Li, Long, and Srinivasan on approximations of set systems has become a key tool across several communities such as learning theory, algorithms, computational geometry, combinatorics and data analysis.
The goal of this paper is to give a modular, self-contained, intuitive proof of this result for finite set systems. The only ingredient we assume is the standard Chernoff'…
▽ More
The fundamental result of Li, Long, and Srinivasan on approximations of set systems has become a key tool across several communities such as learning theory, algorithms, computational geometry, combinatorics and data analysis.
The goal of this paper is to give a modular, self-contained, intuitive proof of this result for finite set systems. The only ingredient we assume is the standard Chernoff's concentration bound. This makes the proof accessible to a wider audience, readers not familiar with techniques from statistical learning theory, and makes it possible to be covered in a single self-contained lecture in a geometry, algorithms or combinatorics course.
△ Less
Submitted 1 September, 2022; v1 submitted 20 August, 2020;
originally announced August 2020.
-
Optimal Bounds on the VC-dimension
Authors:
Monika Csikos,
Andrey Kupavskii,
Nabil H. Mustafa
Abstract:
The VC-dimension of a set system is a way to capture its complexity and has been a key parameter studied extensively in machine learning and geometry communities. In this paper, we resolve two longstanding open problems on bounding the VC-dimension of two fundamental set systems: $k$-fold unions/intersections of half-spaces, and the simplices set system. Among other implications, it settles an ope…
▽ More
The VC-dimension of a set system is a way to capture its complexity and has been a key parameter studied extensively in machine learning and geometry communities. In this paper, we resolve two longstanding open problems on bounding the VC-dimension of two fundamental set systems: $k$-fold unions/intersections of half-spaces, and the simplices set system. Among other implications, it settles an open question in machine learning that was first studied in the 1989 foundational paper of Blumer, Ehrenfeucht, Haussler and Warmuth as well as by Eisenstat and Angluin and Johnson.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Theorems of Carathéodory, Helly, and Tverberg without dimension
Authors:
Karim Adiprasito,
Imre Bárány,
Nabil H. Mustafa,
Tamás Terpai
Abstract:
We prove a no-dimensional version of Carathédory's theorem: given an $n$-element set $P\subset \Re^d$, a point $a \in \conv P$, and an integer $r\le d$, $r \le n$, there is a subset $Q\subset P$ of $r$ elements such that the distance between $a$ and $\conv Q$ is less than $\diam P/\sqrt {2r}$. A general no-dimension Helly type result is also proved with colourful and fractional consequences. Simil…
▽ More
We prove a no-dimensional version of Carathédory's theorem: given an $n$-element set $P\subset \Re^d$, a point $a \in \conv P$, and an integer $r\le d$, $r \le n$, there is a subset $Q\subset P$ of $r$ elements such that the distance between $a$ and $\conv Q$ is less than $\diam P/\sqrt {2r}$. A general no-dimension Helly type result is also proved with colourful and fractional consequences. Similar versions of Tverberg's theorem and some of their extensions are also established.
△ Less
Submitted 28 August, 2019; v1 submitted 22 June, 2018;
originally announced June 2018.
-
Tverberg theorems over discrete sets of points
Authors:
Jesús A. De Loera,
Thomas A. Hogan,
Frédéric Meunier,
Nabil Mustafa
Abstract:
This paper discusses Tverberg-type theorems with coordinate constraints (i.e., versions of these theorems where all points lie within a subset $S \subset \mathbb{R}^d$ and the intersection of convex hulls is required to have a non-empty intersection with $S$). We determine the $m$-Tverberg number, when $m \geq 3$, of any discrete subset $S$ of $\mathbb{R}^2$ (a generalization of an unpublished res…
▽ More
This paper discusses Tverberg-type theorems with coordinate constraints (i.e., versions of these theorems where all points lie within a subset $S \subset \mathbb{R}^d$ and the intersection of convex hulls is required to have a non-empty intersection with $S$). We determine the $m$-Tverberg number, when $m \geq 3$, of any discrete subset $S$ of $\mathbb{R}^2$ (a generalization of an unpublished result of J.-P. Doignon). We also present improvements on the upper bounds for the Tverberg numbers of $\mathbb{Z}^3$ and $\mathbb{Z}^j \times \mathbb{R}^k$ and an integer version of the well-known positive-fraction selection lemma of J. Pach.
△ Less
Submitted 29 January, 2019; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Search Based Code Generation for Machine Learning Programs
Authors:
Muhammad Zubair Malik,
Muhammad Nawaz,
Nimrah Mustafa,
Junaid Haroon Siddiqui
Abstract:
Machine Learning (ML) has revamped every domain of life as it provides powerful tools to build complex systems that learn and improve from experience and data. Our key insight is that to solve a machine learning problem, data scientists do not invent a new algorithm each time, but evaluate a range of existing models with different configurations and select the best one. This task is laborious, err…
▽ More
Machine Learning (ML) has revamped every domain of life as it provides powerful tools to build complex systems that learn and improve from experience and data. Our key insight is that to solve a machine learning problem, data scientists do not invent a new algorithm each time, but evaluate a range of existing models with different configurations and select the best one. This task is laborious, error-prone, and drains a large chunk of project budget and time. In this paper we present a novel framework inspired by programming by Sketching and Partial Evaluation to minimize human intervention in develo** ML solutions. We templatize machine learning algorithms to expose configuration choices as holes to be searched. We share code and computation between different algorithms, and only partially evaluate configuration space of algorithms based on information gained from initial algorithm evaluations. We also employ hierarchical and heuristic based pruning to reduce the search space. Our initial findings indicate that our approach can generate highly accurate ML models. Interviews with data scientists show that they feel our framework can eliminate sources of common errors and significantly reduce development time.
△ Less
Submitted 6 February, 2018; v1 submitted 29 January, 2018;
originally announced January 2018.
-
Bounding the size of an almost-equidistant set in Euclidean space
Authors:
Andrey Kupavskii,
Nabil H. Mustafa,
Konrad J. Swanepoel
Abstract:
A set of points in d-dimensional Euclidean space is almost equidistant if among any three points of the set, some two are at distance 1. We show that an almost-equidistant set in $\mathbb{R}^d$ has cardinality $O(d^{4/3})$.
A set of points in d-dimensional Euclidean space is almost equidistant if among any three points of the set, some two are at distance 1. We show that an almost-equidistant set in $\mathbb{R}^d$ has cardinality $O(d^{4/3})$.
△ Less
Submitted 4 August, 2017;
originally announced August 2017.
-
The discrete yet ubiquitous theorems of Carathéodory, Helly, Sperner, Tucker, and Tverberg
Authors:
Jesus A. De Loera,
Xavier Goaoc,
Frédéric Meunier,
Nabil Mustafa
Abstract:
We discuss five discrete results: the lemmas of Sperner and Tucker from combinatorial topology and the theorems of Carathéodory, Helly, and Tverberg from combinatorial geometry. We explore their connections and emphasize their broad impact in application areas such as game theory, graph theory, mathematical optimization, computational geometry, etc.
We discuss five discrete results: the lemmas of Sperner and Tucker from combinatorial topology and the theorems of Carathéodory, Helly, and Tverberg from combinatorial geometry. We explore their connections and emphasize their broad impact in application areas such as game theory, graph theory, mathematical optimization, computational geometry, etc.
△ Less
Submitted 8 October, 2018; v1 submitted 16 June, 2017;
originally announced June 2017.
-
Epsilon-approximations and epsilon-nets
Authors:
Nabil H. Mustafa,
Kasturi R. Varadarajan
Abstract:
The use of random samples to approximate properties of geometric configurations has been an influential idea for both combinatorial and algorithmic purposes. This chapter considers two related notions---$ε$-approximations and $ε$-nets---that capture the most important quantitative properties that one would expect from a random sample with respect to an underlying geometric configuration.
The use of random samples to approximate properties of geometric configurations has been an influential idea for both combinatorial and algorithmic purposes. This chapter considers two related notions---$ε$-approximations and $ε$-nets---that capture the most important quantitative properties that one would expect from a random sample with respect to an underlying geometric configuration.
△ Less
Submitted 8 August, 2017; v1 submitted 13 February, 2017;
originally announced February 2017.
-
A Note on the Size-Sensitive Packing Lemma
Authors:
Nabil H. Mustafa
Abstract:
We show that the size-sensitive packing lemma follows from a simple modification of the standard proof, due to Haussler and simplified by Chazelle, of the packing lemma.
We show that the size-sensitive packing lemma follows from a simple modification of the standard proof, due to Haussler and simplified by Chazelle, of the packing lemma.
△ Less
Submitted 15 September, 2015; v1 submitted 14 September, 2015;
originally announced September 2015.
-
Tighter Estimates for epsilon-nets for Disks
Authors:
Norbert Bus,
Shashwat Garg,
Nabil H. Mustafa,
Saurabh Ray
Abstract:
The geometric hitting set problem is one of the basic geometric combinatorial optimization problems: given a set $P$ of points, and a set $\mathcal{D}$ of geometric objects in the plane, the goal is to compute a small-sized subset of $P$ that hits all objects in $\mathcal{D}$. In 1994, Bronniman and Goodrich made an important connection of this problem to the size of fundamental combinatorial stru…
▽ More
The geometric hitting set problem is one of the basic geometric combinatorial optimization problems: given a set $P$ of points, and a set $\mathcal{D}$ of geometric objects in the plane, the goal is to compute a small-sized subset of $P$ that hits all objects in $\mathcal{D}$. In 1994, Bronniman and Goodrich made an important connection of this problem to the size of fundamental combinatorial structures called $ε$-nets, showing that small-sized $ε$-nets imply approximation algorithms with correspondingly small approximation ratios. Very recently, Agarwal and Pan showed that their scheme can be implemented in near-linear time for disks in the plane. Altogether this gives $O(1)$-factor approximation algorithms in $\tilde{O}(n)$ time for hitting sets for disks in the plane.
This constant factor depends on the sizes of $ε$-nets for disks; unfortunately, the current state-of-the-art bounds are large -- at least $24/ε$ and most likely larger than $40/ε$. Thus the approximation factor of the Agarwal and Pan algorithm ends up being more than $40$. The best lower-bound is $2/ε$, which follows from the Pach-Woeginger construction for halfspaces in two dimensions. Thus there is a large gap between the best-known upper and lower bounds. Besides being of independent interest, finding precise bounds is important since this immediately implies an improved linear-time algorithm for the hitting-set problem.
The main goal of this paper is to improve the upper-bound to $13.4/ε$ for disks in the plane. The proof is constructive, giving a simple algorithm that uses only Delaunay triangulations. We have implemented the algorithm, which is available as a public open-source module. Experimental results show that the sizes of $ε$-nets for a variety of data-sets is lower, around $9/ε$.
△ Less
Submitted 13 January, 2015;
originally announced January 2015.
-
QPTAS for Geometric Set-Cover Problems via Optimal Separators
Authors:
Nabil H. Mustafa,
Rajiv Raman,
Saurabh Ray
Abstract:
Weighted geometric set-cover problems arise naturally in several geometric and non-geometric settings (e.g. the breakthrough of Bansal-Pruhs (FOCS 2010) reduces a wide class of machine scheduling problems to weighted geometric set-cover). More than two decades of research has succeeded in settling the $(1+ε)$-approximability status for most geometric set-cover problems, except for four basic scena…
▽ More
Weighted geometric set-cover problems arise naturally in several geometric and non-geometric settings (e.g. the breakthrough of Bansal-Pruhs (FOCS 2010) reduces a wide class of machine scheduling problems to weighted geometric set-cover). More than two decades of research has succeeded in settling the $(1+ε)$-approximability status for most geometric set-cover problems, except for four basic scenarios which are still lacking. One is that of weighted disks in the plane for which, after a series of papers, Varadarajan (STOC 2010) presented a clever \emph{quasi-sampling} technique, which together with improvements by Chan \etal~(SODA 2012), yielded a $O(1)$-approximation algorithm. Even for the unweighted case, a PTAS for a fundamental class of objects called pseudodisks (which includes disks, unit-height rectangles, translates of convex sets etc.) is currently unknown. Another fundamental case is weighted halfspaces in $\Re^3$, for which a PTAS is currently lacking. In this paper, we present a QPTAS for all of these remaining problems. Our results are based on the separator framework of Adamaszek-Wiese (FOCS 2013, SODA 2014), who recently obtained a QPTAS for weighted independent set of polygonal regions. This rules out the possibility that these problems are APX-hard, assuming $\textbf{NP} \nsubseteq \textbf{DTIME}(2^{polylog(n)})$. Together with the recent work of Chan-Grant (CGTA 2014), this settles the APX-hardness status for all natural geometric set-cover problems.
△ Less
Submitted 5 April, 2014; v1 submitted 4 March, 2014;
originally announced March 2014.