-
Empirical Challenge for NC Theory
Authors:
Ananth Hari,
Uzi Vishkin
Abstract:
Horn-satisfiability or Horn-SAT is the problem of deciding whether a satisfying assignment exists for a Horn formula, a conjunction of clauses each with at most one positive literal (also known as Horn clauses). It is a well-known P-complete problem, which implies that unless P = NC, it is a hard problem to parallelize. In this paper, we empirically show that, under a known simple random model for…
▽ More
Horn-satisfiability or Horn-SAT is the problem of deciding whether a satisfying assignment exists for a Horn formula, a conjunction of clauses each with at most one positive literal (also known as Horn clauses). It is a well-known P-complete problem, which implies that unless P = NC, it is a hard problem to parallelize. In this paper, we empirically show that, under a known simple random model for generating the Horn formula, the ratio of hard-to-parallelize instances (closer to the worst-case behavior) is infinitesimally small. We show that the depth of a parallel algorithm for Horn-SAT is polylogarithmic on average, for almost all instances, while kee** the work linear. This challenges theoreticians and programmers to look beyond worst-case analysis and come up with practical algorithms coupled with respective performance guarantees.
△ Less
Submitted 25 May, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Subspace Graph Physics: Real-Time Rigid Body-Driven Granular Flow Simulation
Authors:
Amin Haeri,
Krzysztof Skonieczny
Abstract:
An important challenge in robotics is understanding the interactions between robots and deformable terrains that consist of granular material. Granular flows and their interactions with rigid bodies still pose several open questions. A promising direction for accurate, yet efficient, modeling is using continuum methods. Also, a new direction for real-time physics modeling is the use of deep learni…
▽ More
An important challenge in robotics is understanding the interactions between robots and deformable terrains that consist of granular material. Granular flows and their interactions with rigid bodies still pose several open questions. A promising direction for accurate, yet efficient, modeling is using continuum methods. Also, a new direction for real-time physics modeling is the use of deep learning. This research advances machine learning methods for modeling rigid body-driven granular flows, for application to terrestrial industrial machines as well as space robotics (where the effect of gravity is an important factor). In particular, this research considers the development of a subspace machine learning simulation approach. To generate training datasets, we utilize our high-fidelity continuum method, material point method (MPM). Principal component analysis (PCA) is used to reduce the dimensionality of data. We show that the first few principal components of our high-dimensional data keep almost the entire variance in data. A graph network simulator (GNS) is trained to learn the underlying subspace dynamics. The learned GNS is then able to predict particle positions and interaction forces with good accuracy. More importantly, PCA significantly enhances the time and memory efficiency of GNS in both training and rollout. This enables GNS to be trained using a single desktop GPU with moderate VRAM. This also makes the GNS real-time on large-scale 3D physics configurations (700x faster than our continuum method).
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Legal perspective on possible fairness measures - A legal discussion using the example of hiring decisions (preprint)
Authors:
Marc P Hauer,
Johannes Kevekordes,
Maryam Amir Haeri
Abstract:
With the increasing use of AI in algorithmic decision making (e.g. based on neural networks), the question arises how bias can be excluded or mitigated. There are some promising approaches, but many of them are based on a "fair" ground truth, others are based on a subjective goal to be reached, which leads to the usual problem of how to define and compute "fairness". The different functioning of a…
▽ More
With the increasing use of AI in algorithmic decision making (e.g. based on neural networks), the question arises how bias can be excluded or mitigated. There are some promising approaches, but many of them are based on a "fair" ground truth, others are based on a subjective goal to be reached, which leads to the usual problem of how to define and compute "fairness". The different functioning of algorithmic decision making in contrast to human decision making leads to a shift from a process-oriented to a result-oriented discrimination assessment. We argue that with such a shift society needs to determine which kind of fairness is the right one to choose for which certain scenario. To understand the implications of such a determination we explain the different kinds of fairness concepts that might be applicable for the specific application of hiring decisions, analyze their pros and cons with regard to the respective fairness interpretation and evaluate them from a legal perspective (based on EU law).
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Adaptive Explicit Kernel Minkowski Weighted K-means
Authors:
Amir Aradnia,
Maryam Amir Haeri,
Mohammad Mehdi Ebadzadeh
Abstract:
The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the ke…
▽ More
The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the kernel matrix of the data, which scale poorly with the size of the matrix or suffer from the high clustering cost due to the repetitive calculations of kernel values. Another issue is that algorithms access the data only through evaluations of $K(x_i, x_j)$, which limits many processes that can be done on data through the clustering task. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using driven corresponding approximate finite-dimensional feature maps based on spectral analysis. Applying approximate finite-dimensional feature maps were only discussed in the Support Vector Machines (SVM) problems before. We suggest using this method in kernel K-means era as alleviates storing huge kernel matrix in memory, further calculating cluster centers more efficiently and access the data explicitly in feature space. These explicit feature maps enable us to access the data in the feature space explicitly and take advantage of K-means extensions in that space. We demonstrate our Explicit Kernel Minkowski Weighted K-mean (Explicit KMWK-mean) method is able to be more adopted and find best-fitting values in new space by applying additional Minkowski exponent and feature weights parameter. Moreover, it can reduce the impact of concentration on nearest neighbour search by suggesting investigate among other norms instead of Euclidean norm, includes Minkowski norms and fractional norms (as an extension of the Minkowski norms with p<1).
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Multi-Label Classification Using Link Prediction
Authors:
Seyed Amin Fadaee,
Maryam Amir Haeri
Abstract:
Solving classification with graph methods has gained huge popularity in recent years. This is due to the fact that the data can be intuitively modeled with graphs to utilize high level features to aid in solving the classification problem. CULP which is short for Classification Using Link Prediction is a graph-based classifier. This classifier utilizes the graph representation of the data and tran…
▽ More
Solving classification with graph methods has gained huge popularity in recent years. This is due to the fact that the data can be intuitively modeled with graphs to utilize high level features to aid in solving the classification problem. CULP which is short for Classification Using Link Prediction is a graph-based classifier. This classifier utilizes the graph representation of the data and transforms the problem to that of link prediction where we try to find the link between an unlabeled node and the proper class node for it. CULP proved to be highly accurate classifier and it has the power to predict the labels in near constant time. A variant of the classification problem is multi-label classification which tackles this problem for multi-label data where an instance can have multiple labels associated to it. In this work, we extend the CULP algorithm to address this problem. Our proposed extensions conveys the powers of CULP and its intuitive representation of the data in to the multi-label domain and in comparison to some of the cutting edge multi-label classifiers, yield competitive results.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
PettingZoo: Gym for Multi-Agent Reinforcement Learning
Authors:
J. K. Terry,
Benjamin Black,
Nathaniel Grammel,
Mario Jayakumar,
Ananth Hari,
Ryan Sullivan,
Luis Santos,
Rodrigo Perez,
Caroline Horsch,
Clemens Dieffendahl,
Niall L. Williams,
Yashas Lokesh,
Praveen Ravi
Abstract:
This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL"), by making work more interchangeable, accessible and reproducible akin…
▽ More
This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL"), by making work more interchangeable, accessible and reproducible akin to what OpenAI's Gym library did for single-agent reinforcement learning. PettingZoo's API, while inheriting many features of Gym, is unique amongst MARL APIs in that it's based around the novel AEC games model. We argue, in part through case studies on major problems in popular MARL environments, that the popular game models are poor conceptual models of games commonly used in MARL and accordingly can promote confusing bugs that are hard to detect, and that the AEC games model addresses these problems.
△ Less
Submitted 26 October, 2021; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Agent Environment Cycle Games
Authors:
J K Terry,
Nathaniel Grammel,
Benjamin Black,
Ananth Hari,
Caroline Horsch,
Luis Santos
Abstract:
Partially Observable Stochastic Games (POSGs) are the most general and common model of games used in Multi-Agent Reinforcement Learning (MARL). We argue that the POSG model is conceptually ill suited to software MARL environments, and offer case studies from the literature where this mismatch has led to severely unexpected behavior. In response to this, we introduce the Agent Environment Cycle G…
▽ More
Partially Observable Stochastic Games (POSGs) are the most general and common model of games used in Multi-Agent Reinforcement Learning (MARL). We argue that the POSG model is conceptually ill suited to software MARL environments, and offer case studies from the literature where this mismatch has led to severely unexpected behavior. In response to this, we introduce the Agent Environment Cycle Games (AEC Games) model, which is more representative of software implementation. We then prove it's as an equivalent model to POSGs. The AEC games model is also uniquely useful in that it can elegantly represent both all forms of MARL environments, whereas for example POSGs cannot elegantly represent strictly turn based games like chess.
△ Less
Submitted 1 May, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
SuperSuit: Simple Microwrappers for Reinforcement Learning Environments
Authors:
J. K. Terry,
Benjamin Black,
Ananth Hari
Abstract:
In reinforcement learning, wrappers are universally used to transform the information that passes between a model and an environment. Despite their ubiquity, no library exists with reasonable implementations of all popular preprocessing methods. This leads to unnecessary bugs, code inefficiencies, and wasted developer time. Accordingly we introduce SuperSuit, a Python library that includes all p…
▽ More
In reinforcement learning, wrappers are universally used to transform the information that passes between a model and an environment. Despite their ubiquity, no library exists with reasonable implementations of all popular preprocessing methods. This leads to unnecessary bugs, code inefficiencies, and wasted developer time. Accordingly we introduce SuperSuit, a Python library that includes all popular wrappers, and wrappers that can easily apply lambda functions to the observations/actions/reward. It's compatible with the standard Gym environment specification, as well as the PettingZoo specification for multi-agent environments. The library is available at https://github.com/PettingZoo-Team/SuperSuit,and can be installed via pip.
△ Less
Submitted 16 August, 2020;
originally announced August 2020.
-
SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets
Authors:
Sayyed Ahmad Naghavi Nozad,
Maryam Amir Haeri,
Gianluigi Folino
Abstract:
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase;…
▽ More
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase; then, it is gradually updated by analyzing consecutive memory loads of points. Subsequently, at the end of scalable clustering, the approximate structure of the original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk.
△ Less
Submitted 5 July, 2021; v1 submitted 13 June, 2020;
originally announced June 2020.
-
Hybrid Forest: A Concept Drift Aware Data Stream Mining Algorithm
Authors:
Radin Hamidi Rad,
Maryam Amir Haeri
Abstract:
Nowadays with a growing number of online controlling systems in the organization and also a high demand of monitoring and stats facilities that uses data streams to log and control their subsystems, data stream mining becomes more and more vital. Hoeffding Trees (also called Very Fast Decision Trees a.k.a. VFDT) as a Big Data approach in dealing with the data stream for classification and regressi…
▽ More
Nowadays with a growing number of online controlling systems in the organization and also a high demand of monitoring and stats facilities that uses data streams to log and control their subsystems, data stream mining becomes more and more vital. Hoeffding Trees (also called Very Fast Decision Trees a.k.a. VFDT) as a Big Data approach in dealing with the data stream for classification and regression problems showed good performance in handling facing challenges and making the possibility of any-time prediction. Although these methods outperform other methods e.g. Artificial Neural Networks (ANN) and Support Vector Regression (SVR), they suffer from high latency in adapting with new concepts when the statistical distribution of incoming data changes. In this article, we introduced a new algorithm that can detect and handle concept drift phenomenon properly. This algorithms also benefits from fast startup ability which helps systems to be able to predict faster than other algorithms at the beginning of data stream arrival. We also have shown that our approach will overperform other controversial approaches for classification and regression tasks.
△ Less
Submitted 10 February, 2019;
originally announced February 2019.
-
A Fuzzy Community-Based Recommender System Using PageRank
Authors:
Maliheh Goliforoushani,
Radin Hamidi Rad,
Maryam Amir Haeri
Abstract:
Recommendation systems are widely used by different user service providers specially those who have interactions with the large community of users. This paper introduces a recommender system based on community detection. The recommendation is provided using the local and global similarities between users. The local information is obtained from communities, and the global ones are based on the rati…
▽ More
Recommendation systems are widely used by different user service providers specially those who have interactions with the large community of users. This paper introduces a recommender system based on community detection. The recommendation is provided using the local and global similarities between users. The local information is obtained from communities, and the global ones are based on the ratings. Here, a new fuzzy community detection using the personalized PageRank metaphor is introduced. The fuzzy membership values of the users to the communities are utilized to define a similarity measure. The method is evaluated by using two well-known datasets: MovieLens and FilmTrust. The results show that our method outperforms recent recommender systems.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Classification Using Link Prediction
Authors:
Seyed Amin Fadaee,
Maryam Amir Haeri
Abstract:
Link prediction in a graph is the problem of detecting the missing links that would be formed in the near future. Using a graph representation of the data, we can convert the problem of classification to the problem of link prediction which aims at finding the missing links between the unlabeled data (unlabeled nodes) and their classes. To our knowledge, despite the fact that numerous algorithms u…
▽ More
Link prediction in a graph is the problem of detecting the missing links that would be formed in the near future. Using a graph representation of the data, we can convert the problem of classification to the problem of link prediction which aims at finding the missing links between the unlabeled data (unlabeled nodes) and their classes. To our knowledge, despite the fact that numerous algorithms use the graph representation of the data for classification, none are using link prediction as the heart of their classifying procedure. In this work, we propose a novel algorithm called CULP (Classification Using Link Prediction) which uses a new structure namely Label Embedded Graph or LEG and a link predictor to find the class of the unlabeled data. Different link predictors along with Compatibility Score - a new link predictor we proposed that is designed specifically for our settings - has been used and showed promising results for classifying different datasets. This paper further improved CULP by designing an extension called CULM which uses a majority vote (hence the M in the acronym) procedure with weights proportional to the predictions' confidences to use the predictive power of multiple link predictors and also exploits the low level features of the data. Extensive experimental evaluations shows that both CULP and CULM are highly accurate and competitive with the cutting edge graph classifiers and general classifiers.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.
-
Continuous occurrence theory
Authors:
Abdorrahman Haeri
Abstract:
Usually gradual and continuous changes in entities will lead to appear events. But usually it is supposed that an event is occurred at once. In this research an integrated framework called continuous occurrence theory (COT) is presented to investigate respective path leading to occurrence of the events in the real world. For this purpose initially fundamental concepts are defined. Afterwards, the…
▽ More
Usually gradual and continuous changes in entities will lead to appear events. But usually it is supposed that an event is occurred at once. In this research an integrated framework called continuous occurrence theory (COT) is presented to investigate respective path leading to occurrence of the events in the real world. For this purpose initially fundamental concepts are defined. Afterwards, the appropriate tools such as occurrence variables computations, occurrence dependency function and occurrence model are introduced and explained in a systematic manner. Indeed, COT provides the possibility to: (a) monitor occurrence of events during time; (b) study background of the events; (c) recognize the relevant issues of each event; and (d) understand how these issues affect on the considered event. The developed framework (COT) provides the necessary context to analyze accurately continual changes of the issues and the relevant events in the various branches of science and business. Finally, typical applications of COT and an applied modeling example of it have been explained and a mathematical programming example is modeled in the occurrence based environment.
△ Less
Submitted 11 November, 2016; v1 submitted 6 August, 2016;
originally announced September 2016.
-
On the Problem of Optimal Path Encoding for Software-Defined Networks
Authors:
Adiseshu Hari,
Urs Niesen,
Gordon Wilfong
Abstract:
Packet networks need to maintain state in the form of forwarding tables at each switch. The cost of this state increases as networks support ever more sophisticated per-flow routing, traffic engineering, and service chaining. Per-flow or per-path state at the switches can be eliminated by encoding each packet's desired path in its header. A key component of such a method is an efficient encoding o…
▽ More
Packet networks need to maintain state in the form of forwarding tables at each switch. The cost of this state increases as networks support ever more sophisticated per-flow routing, traffic engineering, and service chaining. Per-flow or per-path state at the switches can be eliminated by encoding each packet's desired path in its header. A key component of such a method is an efficient encoding of paths through the network. We introduce a mathematical formulation of this optimal path-encoding problem. We prove that the problem is APX-hard, by showing that approximating it to within a factor less than 8/7 is NP-hard. Thus, at best we can hope for a constant-factor approximation algorithm. We then present such an algorithm, approximating the optimal path-encoding problem to within a factor 2. Finally, we provide empirical results illustrating the effectiveness of the proposed algorithm.
△ Less
Submitted 18 May, 2016; v1 submitted 26 July, 2015;
originally announced July 2015.