Search | arXiv e-print repository

arXiv:1906.08160 [pdf, other]

Who is in Your Top Three? Optimizing Learning in Elections with Many Candidates

Authors: Nikhil Garg, Lodewijk Gelauff, Sukolsak Sakshuwong, Ashish Goel

Abstract: Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters' preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical frame… ▽ More Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters' preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election correctly recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning, for arbitrary voter noise models. Finally, we use data from 5 large participatory budgeting elections that we organized across several US cities, along with other ranking data, to demonstrate the utility of our methods. In particular, we find that historically such elections have set K too low and that picking the right mechanism can be the difference between identifying the ultimate winner with only a 80% probability or a 99.9% probability after 400 voters. △ Less

Submitted 14 August, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: To appear in HCOMP 2019

arXiv:1905.08169 [pdf, other]

ATAC: A Tool for Automating Timed Automata Construction

Authors: Beyazit Yalcinkaya, Ebru Aydin Gol

Abstract: In this paper, we focus on the design and verification of timed automata (TA). We introduce a new method for assisting construction and verification of TA models along with a tool implementing the proposed method, i.e., ATAC: Automated Timed Automata Construction. Our method provides two main functionalities, i.e., construction of TA models from descriptions and generation of temporal logic querie… ▽ More In this paper, we focus on the design and verification of timed automata (TA). We introduce a new method for assisting construction and verification of TA models along with a tool implementing the proposed method, i.e., ATAC: Automated Timed Automata Construction. Our method provides two main functionalities, i.e., construction of TA models from descriptions and generation of temporal logic queries from specifications. Both description and specification sentences shall follow our well-defined structured natural language definition. TA models constructed from descriptions and temporal logic queries generated from specifications can be imported to UPPAAL, a verification tool for TA models. The goal is to accelerate the design phase for real-time systems by assisting the construction and verification of a formal model. We believe ATAC can be useful especially during the initial phases of the design process and help designers to avoid erroneous models. △ Less

Submitted 1 July, 2020; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: 10 pages, 1 figure, tool paper

arXiv:1904.07828 [pdf, other]

An Efficient Formula Synthesis Method with Past Signal Temporal Logic

Authors: Mert Ergurtuna, Ebru Aydin Gol

Abstract: In this work, we propose a novel method to find temporal properties that lead to the unexpected behaviors from labeled dataset. We express these properties in past time Signal Temporal Logic (ptSTL). First, we present a novel approach for finding parameters of a template ptSTL formula, which extends the results on monotonicity based parameter synthesis. The proposed method optimizes a given monoto… ▽ More In this work, we propose a novel method to find temporal properties that lead to the unexpected behaviors from labeled dataset. We express these properties in past time Signal Temporal Logic (ptSTL). First, we present a novel approach for finding parameters of a template ptSTL formula, which extends the results on monotonicity based parameter synthesis. The proposed method optimizes a given monotone criteria while bounding an error. Then, we employ the parameter synthesis method in an iterative unguided formula synthesis framework. In particular, we combine optimized formulas iteratively to describe the causes of the labeled events while bounding the error. We illustrate the proposed framework on two examples. △ Less

Submitted 16 April, 2019; originally announced April 2019.

Comments: 8 pages, 5 figures, conference paper

arXiv:1904.07714 [pdf, other]

Low-Power Computer Vision: Status, Challenges, Opportunities

Authors: Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee , et al. (19 additional authors not shown)

Abstract: Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batte… ▽ More Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: Preprint, Accepted by IEEE Journal on Emerging and Selected Topics in Circuits and Systems. arXiv admin note: substantial text overlap with arXiv:1810.01732

arXiv:1904.03649 [pdf, other]

Cause Mining and Controller Synthesis with STL

Authors: Irmak Saglam, Ebru Aydin Gol

Abstract: Formal control of cyber-physical systems allows for synthesis of control strategies from rich specifications such as temporal logics. However, the classes of systems that the formal approaches can be applied to is limited due to the computational complexity. Furthermore, the synthesis problem becomes even harder when non-determinism or stochasticity is considered. In this work, we propose an alter… ▽ More Formal control of cyber-physical systems allows for synthesis of control strategies from rich specifications such as temporal logics. However, the classes of systems that the formal approaches can be applied to is limited due to the computational complexity. Furthermore, the synthesis problem becomes even harder when non-determinism or stochasticity is considered. In this work, we propose an alternative approach. First, we mark the unwanted events on the traces of the system and generate a controllable cause representing these events as a Signal Temporal Logic (STL) formula. Then, we synthesize a controller based on this formula to avoid the satisfaction of it. Our approach is applicable to any system with finitely many control choices. While we can not guarantee correctness, i.e., the unwanted events will never occur, we show on an example that the proposed approach reduces the number of the unwanted events. In particular, we validate it for the congestion avoidance problem in a traffic network. △ Less

Submitted 3 September, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

arXiv:1903.09784 [pdf, other]

An End-to-End Network for Generating Social Relationship Graphs

Authors: Arushi Goel, Keng Teck Ma, Cheston Tan

Abstract: Socially-intelligent agents are of growing interest in artificial intelligence. To this end, we need systems that can understand social relationships in diverse social contexts. Inferring the social context in a given visual scene not only involves recognizing objects, but also demands a more in-depth understanding of the relationships and attributes of the people involved. To achieve this, one co… ▽ More Socially-intelligent agents are of growing interest in artificial intelligence. To this end, we need systems that can understand social relationships in diverse social contexts. Inferring the social context in a given visual scene not only involves recognizing objects, but also demands a more in-depth understanding of the relationships and attributes of the people involved. To achieve this, one computational approach for representing human relationships and attributes is to use an explicit knowledge graph, which allows for high-level reasoning. We introduce a novel end-to-end-trainable neural network that is capable of generating a Social Relationship Graph - a structured, unified representation of social relationships and attributes - from a given input image. Our Social Relationship Graph Generation Network (SRG-GN) is the first to use memory cells like Gated Recurrent Units (GRUs) to iteratively update the social relationship states in a graph using scene and attribute context. The neural network exploits the recurrent connections among the GRUs to implement message passing between nodes and edges in the graph, and results in significant improvement over previous methods for social relationship recognition. △ Less

Submitted 23 March, 2019; originally announced March 2019.

Journal ref: CVPR 2019

arXiv:1812.04891 [pdf, other]

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

Authors: Zhi-Xuan Tan, Arushi Goel, Thanh-Son Nguyen, Desmond C. Ong

Abstract: People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level… ▽ More People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level fusion and local attention that predicts empathic responses from audio, text, and visual features. Our best-performing model, which used only the audio and text features, achieved a concordance correlation coefficient (CCC) of 0.29 and 0.32 on the Validation set for the Generalized and Personalized track respectively, and achieved a CCC of 0.14 and 0.14 on the held-out Test set. We discuss the difficulties faced and the lessons learnt tackling this challenge. △ Less

Submitted 28 January, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

arXiv:1811.08131 [pdf, ps, other]

doi 10.23919/FMCAD.2017.8102256

FAR-Cubicle - A new reachability algorithm for Cubicle

Authors: Sylvain Conchon, Amit Goel, Sava Krstic, Rupak Majumdar, Mattias Roux

Abstract: We present a fully automatic algorithm for verifying safety properties of parameterized software systems. This algorithm is based on both IC3 and Lazy Annotation. We implemented it in Cubicle, a model checker for verifying safety properties of array-based systems. Cache-coherence protocols and mutual exclusion algorithms are known examples of such systems. Our algorithm iteratively builds an abstr… ▽ More We present a fully automatic algorithm for verifying safety properties of parameterized software systems. This algorithm is based on both IC3 and Lazy Annotation. We implemented it in Cubicle, a model checker for verifying safety properties of array-based systems. Cache-coherence protocols and mutual exclusion algorithms are known examples of such systems. Our algorithm iteratively builds an abstract reachability graph refining the set of reachable states from counterexamples. Refining is made through counterexample approximation. We show the effectiveness and limitations of this algorithm and tradeoffs that results from it. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Journal ref: 2017 Formal Methods in Computer-Aided Design (FMCAD), Oct 2017, Vienna, France. IEEE

arXiv:1811.04786 [pdf, other]

Random Dictators with a Random Referee: Constant Sample Complexity Mechanisms for Social Choice

Authors: Brandon Fain, Ashish Goel, Kamesh Munagala, Nina Prabhu

Abstract: We study social choice mechanisms in an implicit utilitarian framework with a metric constraint, where the goal is to minimize \textit{Distortion}, the worst case social cost of an ordinal mechanism relative to underlying cardinal utilities. We consider two additional desiderata: Constant sample complexity and Squared Distortion. Constant sample complexity means that the mechanism (potentially ran… ▽ More We study social choice mechanisms in an implicit utilitarian framework with a metric constraint, where the goal is to minimize \textit{Distortion}, the worst case social cost of an ordinal mechanism relative to underlying cardinal utilities. We consider two additional desiderata: Constant sample complexity and Squared Distortion. Constant sample complexity means that the mechanism (potentially randomized) only uses a constant number of ordinal queries regardless of the number of voters and alternatives. Squared Distortion is a measure of variance of the Distortion of a randomized mechanism. Our primary contribution is the first social choice mechanism with constant sample complexity \textit{and} constant Squared Distortion (which also implies constant Distortion). We call the mechanism Random Referee, because it uses a random agent to compare two alternatives that are the favorites of two other random agents. We prove that the use of a comparison query is necessary: no mechanism that only elicits the top-k preferred alternatives of voters (for constant k) can have Squared Distortion that is sublinear in the number of alternatives. We also prove that unlike any top-k only mechanism, the Distortion of Random Referee meaningfully improves on benign metric spaces, using the Euclidean plane as a canonical example. Finally, among top-1 only mechanisms, we introduce Random Oligarchy. The mechanism asks just 3 queries and is essentially optimal among the class of such mechanisms with respect to Distortion. In summary, we demonstrate the surprising power of constant sample complexity mechanisms generally, and just three random voters in particular, to provide some of the best known results in the implicit utilitarian framework. △ Less

Submitted 1 July, 2020; v1 submitted 12 November, 2018; originally announced November 2018.

Comments: Conference version Published in AAAI 2019 (https://aaai.org/Conferences/AAAI-19/)

arXiv:1810.01092 [pdf, ps, other]

Relating Metric Distortion and Fairness of Social Choice Rules

Authors: Ashish Goel, Reyna Hulett, Anilesh K. Krishnaswamy

Abstract: One way of evaluating social choice (voting) rules is through a utilitarian distortion framework. In this model, we assume that agents submit full rankings over the alternatives, and these rankings are generated from underlying, but unknown, quantitative costs. The \emph{distortion} of a social choice rule is then the ratio of the total social cost of the chosen alternative to the optimal social c… ▽ More One way of evaluating social choice (voting) rules is through a utilitarian distortion framework. In this model, we assume that agents submit full rankings over the alternatives, and these rankings are generated from underlying, but unknown, quantitative costs. The \emph{distortion} of a social choice rule is then the ratio of the total social cost of the chosen alternative to the optimal social cost of any alternative; since the true costs are unknown, we consider the worst-case distortion over all possible underlying costs. Analogously, we can consider the worst-case \emph{fairness ratio} of a social choice rule by comparing a useful notion of fairness (based on approximate majorization) for the chosen alternative to that of the optimal alternative. With an additional metric assumption -- that the costs equal the agent-alternative distances in some metric space -- it is known that the Copeland rule achieves both a distortion and fairness ratio of at most 5. For other rules, only bounds on the distortion are known, e.g., the popular Single Transferable Vote (STV) rule has distortion $O(\log m)$, where $m$ is the number of alternatives. We prove that the distinct notions of distortion and fairness ratio are in fact closely linked -- within an additive factor of 2 for any voting rule -- and thus STV also achieves an $O(\log m)$ fairness ratio. We further extend the notions of distortion and fairness ratio to social choice rules choosing a \emph{set} of alternatives. By relating the distortion of single-winner rules to multiple-winner rules, we establish that Recursive Copeland achieves a distortion of 5 and a fairness ratio of at most 7 for choosing a set of alternatives. △ Less

Submitted 2 October, 2018; originally announced October 2018.

arXiv:1810.01042 [pdf, other]

Implementing the Lexicographic Maxmin Bargaining Solution

Authors: Ashish Goel, Anilesh K. Krishnaswamy

Abstract: There has been much work on exhibiting mechanisms that implement various bargaining solutions, in particular, the Kalai-Smorodinsky solution \cite{moulin1984implementing} and the Nash Bargaining solution. Another well-known and axiomatically well-studied solution is the lexicographic maxmin solution. However, there is no mechanism known for its implementation. To fill this gap, we construct a mech… ▽ More There has been much work on exhibiting mechanisms that implement various bargaining solutions, in particular, the Kalai-Smorodinsky solution \cite{moulin1984implementing} and the Nash Bargaining solution. Another well-known and axiomatically well-studied solution is the lexicographic maxmin solution. However, there is no mechanism known for its implementation. To fill this gap, we construct a mechanism that implements the lexicographic maxmin solution as the unique subgame perfect equilibrium outcome in the n-player setting. As is standard in the literature on implementation of bargaining solutions, we use the assumption that any player can grab the entire surplus. Our mechanism consists of a binary game tree, with each node corresponding to a subgame where the players are allowed to choose between two outcomes. We characterize novel combinatorial properties of the lexicographic maxmin solution which are crucial to the design of our mechanism. △ Less

Submitted 1 October, 2018; originally announced October 2018.

arXiv:1807.10836 [pdf, ps, other]

Markets for Public Decision-making

Authors: Nikhil Garg, Ashish Goel, Benjamin Plaut

Abstract: A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred… ▽ More A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred alternatives (we allow randomized outcomes). We first show that when each issue has a single price that is common to all agents, market equilibria can be arbitrarily bad. This negative result motivates a different approach. We present a novel technique called "pairwise issue expansion", which transforms any public decision-making instance into an equivalent Fisher market, the simplest type of private goods market. This is done by expanding each issue into many goods: one for each pair of agents who disagree on that issue. We show that the equilibrium prices in the constructed Fisher market yield a "pairwise pricing equilibrium" in the original public decision-making problem which maximizes Nash welfare. More broadly, pairwise issue expansion uncovers a powerful connection between the public decision-making and private goods settings; this immediately yields several interesting results about public decisions markets, and furthers the hope that we will be able to find a simple iterative voting protocol that leads to near-optimum decisions. △ Less

Submitted 19 July, 2019; v1 submitted 27 July, 2018; originally announced July 2018.

Comments: Appeared in WINE 2018

arXiv:1807.05293 [pdf, ps, other]

Markets Beyond Nash Welfare for Leontief Utilities

Authors: Ashish Goel, Reyna Hulett, Benjamin Plaut

Abstract: We study the allocation of divisible goods to competing agents via a market mechanism, focusing on agents with Leontief utilities. The majority of the economics and mechanism design literature has focused on \emph{linear} prices, meaning that the cost of a good is proportional to the quantity purchased. Equilibria for linear prices are known to be exactly the maximum Nash welfare allocations. \e… ▽ More We study the allocation of divisible goods to competing agents via a market mechanism, focusing on agents with Leontief utilities. The majority of the economics and mechanism design literature has focused on \emph{linear} prices, meaning that the cost of a good is proportional to the quantity purchased. Equilibria for linear prices are known to be exactly the maximum Nash welfare allocations. \emph{Price curves} allow the cost of a good to be any (increasing) function of the quantity purchased. We show that price curve equilibria are not limited to maximum Nash welfare allocations with two main results. First, we show that an allocation can be supported by strictly increasing price curves if and only if it is \emph{group-domination-free}. A similarly characterization holds for weakly increasing price curves. We use this to show that given any allocation, we can compute strictly (or weakly) increasing price curves that support it (or show that none exist) in polynomial time. These results involve a connection to the \emph{agent-order matrix} of an allocation, which may have other applications. Second, we use duality to show that in the bandwidth allocation setting, any allocation maximizing a CES welfare function can be supported by price curves. △ Less

Submitted 23 December, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

Comments: Appeared in WINE 2019

arXiv:1807.03916 [pdf, other]

doi 10.1007/JHEP02(2019)156

Expanding the Black Hole Interior: Partially Entangled Thermal States in SYK

Authors: Akash Goel, Ho Tat Lam, Gustavo J. Turiaci, Herman Verlinde

Abstract: We introduce a family of partially entangled thermal states in the SYK model that interpolates between the thermo-field double state and a pure (product) state. The states are prepared by a euclidean path integral describing the evolution over two euclidean time segments separated by a local scaling operator $\mathcal{O}$. We argue that the holographic dual of this class of states consists of two… ▽ More We introduce a family of partially entangled thermal states in the SYK model that interpolates between the thermo-field double state and a pure (product) state. The states are prepared by a euclidean path integral describing the evolution over two euclidean time segments separated by a local scaling operator $\mathcal{O}$. We argue that the holographic dual of this class of states consists of two black holes with their interior regions connected via a domain wall, described by the worldline of a massive particle. We compute the size of the interior region and the entanglement entropy as a function of the scale dimension of $\mathcal{O}$ and the temperature of each black hole. We argue that the one-sided bulk reconstruction can access the interior region of the black hole. △ Less

Submitted 6 December, 2018; v1 submitted 10 July, 2018; originally announced July 2018.

Comments: 40 pages + appendices, 16 figures; v3. typos fixed

arXiv:1807.00132 [pdf, ps, other]

N strongly quasi invariant measure on double coset

Authors: Fatemeh Fahimian, Rajab Ali Kamyabi Gol, Fatemeh Esmaeelzadeh

Abstract: Let G be a locally compact group, H and K be two closed sub-groups of G, and N be the normalizer group of K in G. In this paper, the existence and properties of a rho-function for the triple (K,G,H) and an N-strongly quasi-invariant measure of double coset space K\G/H is investigated. In particular, it is shown that any such measure arises from a rho-function. Furthermore, the conditions under whi… ▽ More Let G be a locally compact group, H and K be two closed sub-groups of G, and N be the normalizer group of K in G. In this paper, the existence and properties of a rho-function for the triple (K,G,H) and an N-strongly quasi-invariant measure of double coset space K\G/H is investigated. In particular, it is shown that any such measure arises from a rho-function. Furthermore, the conditions under which an N-strongly quasi-invariant measure arises from a rho-function are studied. △ Less

Submitted 30 June, 2018; originally announced July 2018.

arXiv:1805.08399 [pdf, other]

A fingerprint based crypto-biometric system for secure communication

Authors: Rudresh Dwivedi, Somnath Dey, Mukul Anand Sharma, Apurv Goel

Abstract: To ensure the secure transmission of data, cryptography is treated as the most effective solution. Cryptographic key is an important entity in this procedure. In general, randomly generated cryptographic key (of 256 bits) is difficult to remember. However, such a key needs to be stored in a protected place or transported through a shared communication line which, in fact, poses another threat to s… ▽ More To ensure the secure transmission of data, cryptography is treated as the most effective solution. Cryptographic key is an important entity in this procedure. In general, randomly generated cryptographic key (of 256 bits) is difficult to remember. However, such a key needs to be stored in a protected place or transported through a shared communication line which, in fact, poses another threat to security. As an alternative, researchers advocate the generation of cryptographic key using the biometric traits of both sender and receiver during the sessions of communication, thus avoiding key storing and at the same time without compromising the strength in security. Nevertheless, the biometric-based cryptographic key generation possesses few concerns such as privacy of biometrics, sharing of biometric data between both communicating users (i.e., sender and receiver), and generating revocable key from irrevocable biometric. This work addresses the above-mentioned concerns. In this work, a framework for secure communication between two users using fingerprint based crypto-biometric system has been proposed. For this, Diffie-Hellman (DH) algorithm is used to generate public keys from private keys of both sender and receiver which are shared and further used to produce a symmetric cryptographic key at both ends. In this approach, revocable key for symmetric cryptography is generated from irrevocable fingerprint. The biometric data is neither stored nor shared which ensures the security of biometric data, and perfect forward secrecy is achieved using session keys. This work also ensures the long-term security of messages communicated between two users. Based on the experimental evaluation over four datasets of FVC2002 and NIST special database, the proposed framework is privacy-preserving and could be utilized onto real access control systems. △ Less

Submitted 22 May, 2018; originally announced May 2018.

Comments: 29 single column pages, 8 figures

arXiv:1805.05032 [pdf, ps, other]

doi 10.1007/s10955-018-2201-z

Strong law of large numbers for Betti numbers in the thermodynamic regime

Authors: Akshay Goel, Khanh Duy Trinh, Kenkichi Tsunoda

Abstract: We establish the strong law of large numbers for Betti numbers of random Čech complexes built on $\mathbb R^N$-valued binomial point processes and related Poisson point processes in the thermodynamic regime. Here we consider both the case where the underlying distribution of the point processes is absolutely continuous with respect to the Lebesgue measure on $\mathbb R^N$ and the case where it is… ▽ More We establish the strong law of large numbers for Betti numbers of random Čech complexes built on $\mathbb R^N$-valued binomial point processes and related Poisson point processes in the thermodynamic regime. Here we consider both the case where the underlying distribution of the point processes is absolutely continuous with respect to the Lebesgue measure on $\mathbb R^N$ and the case where it is supported on a $C^1$ compact manifold of dimension strictly less than $N$. The strong law is proved under very mild assumption which only requires that the common probability density function belongs to $L^p$ spaces, for all $1\leq p < \infty$. △ Less

Submitted 27 November, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

Comments: 30 pages, 2 figures; to be appear in Journal of Statistical Physics

MSC Class: 60D05 (Primary) 60F15 (Secondary)

arXiv:1712.08709 [pdf, ps, other]

Pruning based Distance Sketches with Provable Guarantees on Random Graphs

Authors: Hongyang Zhang, Huacheng Yu, Ashish Goel

Abstract: Measuring the distances between vertices on graphs is one of the most fundamental components in network analysis. Since finding shortest paths requires traversing the graph, it is challenging to obtain distance information on large graphs very quickly. In this work, we present a preprocessing algorithm that is able to create landmark based distance sketches efficiently, with strong theoretical gua… ▽ More Measuring the distances between vertices on graphs is one of the most fundamental components in network analysis. Since finding shortest paths requires traversing the graph, it is challenging to obtain distance information on large graphs very quickly. In this work, we present a preprocessing algorithm that is able to create landmark based distance sketches efficiently, with strong theoretical guarantees. When evaluated on a diverse set of social and information networks, our algorithm significantly improves over existing approaches by reducing the number of landmarks stored, preprocessing time, or stretch of the estimated distances. On Erdös-Rényi graphs and random power law graphs with degree distribution exponent $2 < β< 3$, our algorithm outputs an exact distance data structure with space between $Θ(n^{5/4})$ and $Θ(n^{3/2})$ depending on the value of $β$, where $n$ is the number of vertices. We complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the case when $β$ is close to two. △ Less

Submitted 10 February, 2019; v1 submitted 22 December, 2017; originally announced December 2017.

Comments: Full version for the conference paper to appear in The Web Conference'19

arXiv:1711.03050 [pdf, other]

doi 10.1145/3158137

Correctness of Speculative Optimizations with Dynamic Deoptimization

Authors: Olivier Flückiger, Gabriel Scherer, Ming-Ho Yee, Aviral Goel, Amal Ahmed, Jan Vitek

Abstract: High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that a… ▽ More High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named \sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such constant folding, dead code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. △ Less

Submitted 15 November, 2017; v1 submitted 8 November, 2017; originally announced November 2017.

Journal ref: Proceedings of the ACM on Programming Languages (POPL 2018)

arXiv:1710.00771 [pdf, other]

Sequential Deliberation for Social Choice

Authors: Brandon Fain, Ashish Goel, Kamesh Munagala, Sukolsak Sakshuwong

Abstract: In large scale collective decision making, social choice is a normative study of how one ought to design a protocol for reaching consensus. However, in instances where the underlying decision space is too large or complex for ordinal voting, standard voting methods of social choice may be impractical. How then can we design a mechanism - preferably decentralized, simple, scalable, and not requirin… ▽ More In large scale collective decision making, social choice is a normative study of how one ought to design a protocol for reaching consensus. However, in instances where the underlying decision space is too large or complex for ordinal voting, standard voting methods of social choice may be impractical. How then can we design a mechanism - preferably decentralized, simple, scalable, and not requiring any special knowledge of the decision space - to reach consensus? We propose sequential deliberation as a natural solution to this problem. In this iterative method, successive pairs of agents bargain over the decision space using the previous decision as a disagreement alternative. We describe the general method and analyze the quality of its outcome when the space of preferences define a median graph. We show that sequential deliberation finds a 1.208- approximation to the optimal social cost on such graphs, coming very close to this value with only a small constant number of agents sampled from the population. We also show lower bounds on simpler classes of mechanisms to justify our design choices. We further show that sequential deliberation is ex-post Pareto efficient and has truthful reporting as an equilibrium of the induced extensive form game. We finally show that for general metric spaces, the second moment of of the distribution of social cost of the outcomes produced by sequential deliberation is also bounded. △ Less

Submitted 2 October, 2017; originally announced October 2017.

arXiv:1709.08053 [pdf, other]

Finite Synchrosqueezing Transform Based On The STFT

Authors: Mozhgan Mohammadpour, Bastiaan Kleijn, Rajab Ali Kamyabi Gol

Abstract: The finite STFT Synchrosqueezing transform is a time-frequency analysis method that can decompose finite complex signals into time-varying oscillatory components. This representation is sparse and invertible, allowing recovery of the original signal. The STFT Synchrosqueezing transform on finite dimensional signals has the advantage of an efficient matrix representation. This article defines the f… ▽ More The finite STFT Synchrosqueezing transform is a time-frequency analysis method that can decompose finite complex signals into time-varying oscillatory components. This representation is sparse and invertible, allowing recovery of the original signal. The STFT Synchrosqueezing transform on finite dimensional signals has the advantage of an efficient matrix representation. This article defines the finite STFT Synchrosqueezing transform and describes some properties of this transform. We compare the finite STFT and the finite STFT Synchrosqueezing transform by applying these transform to a set of signals. △ Less

Submitted 23 September, 2017; originally announced September 2017.

Comments: 10 pages, 6 figures

arXiv:1709.00512 [pdf]

doi 10.1038/s41467-020-15618-w

Spectro-temporal encoded Multiphoton Microscopy

Authors: Sebastian Karpf, Carson Riche, Dino di Carlo, Anubhuti Goel, William A. Zeiger, Anand Suresh, Carlos Portera-Cailliau, Bahram Jalali

Abstract: Two-Photon Microscopy has become an invaluable tool for biological and medical research, providing high sensitivity, molecular specificity, inherent three-dimensional sub-cellular resolution and deep tissue penetration. In terms of imaging speeds, however, mechanical scanners still limit the acquisition rates to typically 10-100 frames per second. Here we present a high-speed non-linear microscope… ▽ More Two-Photon Microscopy has become an invaluable tool for biological and medical research, providing high sensitivity, molecular specificity, inherent three-dimensional sub-cellular resolution and deep tissue penetration. In terms of imaging speeds, however, mechanical scanners still limit the acquisition rates to typically 10-100 frames per second. Here we present a high-speed non-linear microscope achieving kilohertz frame rates by employing pulse-modulated, rapidly wavelength-swept lasers and inertia-free beam steering through angular dispersion. In combination with a high bandwidth, single-photon sensitive detector, we achieve recording of fluorescent lifetimes at unprecedented speeds of 88 million pixels per second. We show diffraction-limited, multi-modal, Two-Photon fluorescence and fluorescence lifetime (FLIM), microscopy and imaging flow cytometry with a digitally reconfigurable laser, imaging system and data acquisition system. These unprecedented speeds should enable high-speed and high-throughput image-assisted cell sorting. △ Less

Submitted 20 August, 2019; v1 submitted 1 September, 2017; originally announced September 2017.

Journal ref: NATURE COMMUNICATIONS | (2020)11:2062

arXiv:1708.01494 [pdf, other]

Hierarchical Metric Learning for Optical Remote Sensing Scene Categorization

Authors: Akashdeep Goel, Biplab Banerjee, Aleksandra Pizurica

Abstract: We address the problem of scene classification from optical remote sensing (RS) images based on the paradigm of hierarchical metric learning. Ideally, supervised metric learning strategies learn a projection from a set of training data points so as to minimize intra-class variance while maximizing inter-class separability to the class label space. However, standard metric learning techniques do no… ▽ More We address the problem of scene classification from optical remote sensing (RS) images based on the paradigm of hierarchical metric learning. Ideally, supervised metric learning strategies learn a projection from a set of training data points so as to minimize intra-class variance while maximizing inter-class separability to the class label space. However, standard metric learning techniques do not incorporate the class interaction information in learning the transformation matrix, which is often considered to be a bottleneck while dealing with fine-grained visual categories. As a remedy, we propose to organize the classes in a hierarchical fashion by exploring their visual similarities and subsequently learn separate distance metric transformations for the classes present at the non-leaf nodes of the tree. We employ an iterative max-margin clustering strategy to obtain the hierarchical organization of the classes. Experiment results obtained on the large-scale NWPU-RESISC45 and the popular UC-Merced datasets demonstrate the efficacy of the proposed hierarchical metric learning based RS scene recognition strategy in comparison to the standard approaches. △ Less

Submitted 1 August, 2018; v1 submitted 4 August, 2017; originally announced August 2017.

Comments: Undergoing revision in GRSL

arXiv:1705.02613 [pdf, other]

Incremental DFS algorithms: a theoretical and experimental study

Authors: Surender Baswana, Ayush Goel, Shahbaz Khan

Abstract: Depth First Search (DFS) tree is a fundamental data structure for solving graph problems. The DFS tree of a graph $G$ with $n$ vertices and $m$ edges can be built in $O(m+n)$ time. Till date, only a few algorithms have been designed for maintaining incremental DFS. For undirected graphs, the two algorithms, namely, ADFS1 and ADFS2 [ICALP14] achieve total $O(n^{3/2}\sqrt{m})$ and $O(n^2)$ time resp… ▽ More Depth First Search (DFS) tree is a fundamental data structure for solving graph problems. The DFS tree of a graph $G$ with $n$ vertices and $m$ edges can be built in $O(m+n)$ time. Till date, only a few algorithms have been designed for maintaining incremental DFS. For undirected graphs, the two algorithms, namely, ADFS1 and ADFS2 [ICALP14] achieve total $O(n^{3/2}\sqrt{m})$ and $O(n^2)$ time respectively. For DAGs, the only non-trivial algorithm, namely, FDFS [IPL97] requires total $O(mn)$ time. In this paper, we carry out extensive experimental and theoretical evaluation of existing incremental DFS algorithms in random and real graphs, and derive the following results. 1- For insertion of uniformly random sequence of $n \choose 2$ edges, ADFS1, ADFS2 and FDFS perform equally well and are found to take $Θ(n^2)$ time experimentally. This is quite surprising because the worst case bounds of ADFS1 and FDFS are greater than $Θ(n^2)$ by a factor of $\sqrt{m/n}$ and $m/n$ respectively. We complement this result by probabilistic analysis of these algorithms proving $\tilde{O}(n^2)$ bound on the update time. Here, we derive results about the structure of a DFS tree in random graphs, which are of independent interest. 2- These insights led us to design an extremely simple incremental DFS algorithm for both undirected and directed graphs. This algorithm theoretically matches and experimentally outperforms the state-of-the-art in dense random graphs. It can also be used as a single-pass semi-streaming algorithm for incremental DFS and strong connectivity in random graphs. 3- Even for real graphs, both ADFS1 and FDFS perform much better than their theoretical bounds. Here again, we present two simple algorithms for incremental DFS for directed and undirected real graphs. In fact, our algorithm for directed graphs almost always matches the performance of FDFS. △ Less

Submitted 7 May, 2017; originally announced May 2017.

Comments: 31 pages, 14 figures

arXiv:1703.01054 [pdf, other]

doi 10.1145/3038912.3052633

When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

Authors: Aneesh Sharma, C. Seshadhri, Ashish Goel

Abstract: Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors de… ▽ More Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold $τ$. In contrast to previous work where $τ$ is assumed to be quite close to 1, we focus on recommendation applications where $τ$ is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small $τ$. To the best of our knowledge, there is no practical solution for computing all user pairs with, say $τ= 0.2$ on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges. △ Less

Submitted 3 March, 2017; originally announced March 2017.

arXiv:1702.07984 [pdf, other]

Iterative Local Voting for Collective Decision-making in Continuous Spaces

Authors: Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, Kamesh Munagala

Abstract: Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called {\sc Iterative Local Voting} for collective decision-making in this sett… ▽ More Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called {\sc Iterative Local Voting} for collective decision-making in this setting. In this algorithm, voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm, with the size of the ball shrinking at a specified rate. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to Pareto optimal solutions with desirable fairness properties in certain natural settings: when the voters' utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 2,000 workers, employing neighborhoods defined by $\mathcal{L}^1, \mathcal{L}^2$ and $\mathcal{L}^\infty$ balls. We make several observations that inform future implementations of such a procedure. △ Less

Submitted 27 October, 2018; v1 submitted 25 February, 2017; originally announced February 2017.

Comments: 39 pages, to appear in Journal of Artificial Intelligence Research

arXiv:1701.00047 [pdf, ps, other]

Gabor Tight Fusion Frames: Construction and Applications in Signal Retrieval Modulo Phase

Authors: Mozhgan Mohammadpour, Brian Tuomanen, Rajab Ali Kamyabi Gol

Abstract: Hilbert space fusion frames are a natural extension of Hilbert space frames, extending the notion from a set of vectors in a Hilbert space to a set of subspaces of a Hilbert space with analogous notions of overcompleteness and boundedness. As tight frames are a very important topic within standard frame theory, tight fusion frames are similarly important; however, only trivial examples of tight fu… ▽ More Hilbert space fusion frames are a natural extension of Hilbert space frames, extending the notion from a set of vectors in a Hilbert space to a set of subspaces of a Hilbert space with analogous notions of overcompleteness and boundedness. As tight frames are a very important topic within standard frame theory, tight fusion frames are similarly important; however, only trivial examples of tight fusion frames are hitherto known. In this paper, we apply ideas from Gabor analysis to demonstrate a non-trivial construction of tight fusion frames. We then use this construction to further show their applicability in some cases for the retrieval of signals modulo phase. △ Less

Submitted 21 June, 2017; v1 submitted 30 December, 2016; originally announced January 2017.

Comments: This work was supported by the National Science Foundation. (NSF ATD 1321779)

arXiv:1612.04485 [pdf, other]

Re-incentivizing Discovery: Mechanisms for Partial-Progress Sharing in Research

Authors: Anilesh Kollagunta Krishnaswamy, Ashish Goel, Siddhartha Banerjee

Abstract: An essential primitive for an efficient research ecosystem is \emph{partial-progress sharing} (PPS) -- whereby a researcher shares information immediately upon making a breakthrough. This helps prevent duplication of work; however there is evidence that existing reward structures in research discourage partial-progress sharing. Ensuring PPS is especially important for new online collaborative-rese… ▽ More An essential primitive for an efficient research ecosystem is \emph{partial-progress sharing} (PPS) -- whereby a researcher shares information immediately upon making a breakthrough. This helps prevent duplication of work; however there is evidence that existing reward structures in research discourage partial-progress sharing. Ensuring PPS is especially important for new online collaborative-research platforms, which involve many researchers working on large, multi-stage problems. We study the problem of incentivizing information-sharing in research, under a stylized model: non-identical agents work independently on subtasks of a large project, with dependencies between subtasks captured via an acyclic subtask-network. Each subtask carries a reward, given to the first agent who publicly shares its solution. Agents can choose which subtasks to work on, and more importantly, when to reveal solutions to completed subtasks. Under this model, we uncover the strategic rationale behind certain anecdotal phenomena. Moreover, for any acyclic subtask-network, and under a general model of agent-subtask completion times, we give sufficient conditions that ensure PPS is incentive-compatible for all agents. One surprising finding is that rewards which are approximately proportional to perceived task-difficulties are sufficient to ensure PPS in all acyclic subtask-networks. The fact that there is no tension between local fairness and global information-sharing in multi-stage projects is encouraging, as it suggests practical mechanisms for real-world settings. Finally, we show that PPS is necessary, and in many cases, sufficient, to ensure a high rate of progress in research. △ Less

Submitted 13 December, 2016; originally announced December 2016.

arXiv:1612.02912 [pdf, other]

Metric Distortion of Social Choice Rules: Lower Bounds and Fairness Properties

Authors: Ashish Goel, Anilesh Kollagunta Krishnaswamy, Kamesh Munagala

Abstract: We study social choice rules under the utilitarian distortion framework, with an additional metric assumption on the agents' costs over the alternatives. In this approach, these costs are given by an underlying metric on the set of all agents plus alternatives. Social choice rules have access to only the ordinal preferences of agents but not the latent cardinal costs that induce them. Distortion i… ▽ More We study social choice rules under the utilitarian distortion framework, with an additional metric assumption on the agents' costs over the alternatives. In this approach, these costs are given by an underlying metric on the set of all agents plus alternatives. Social choice rules have access to only the ordinal preferences of agents but not the latent cardinal costs that induce them. Distortion is then defined as the ratio between the social cost (typically the sum of agent costs) of the alternative chosen by the mechanism at hand, and that of the optimal alternative chosen by an omniscient algorithm. The worst-case distortion of a social choice rule is, therefore, a measure of how close it always gets to the optimal alternative without any knowledge of the underlying costs. Under this model, it has been conjectured that Ranked Pairs, the well-known weighted-tournament rule, achieves a distortion of at most 3 [Anshelevich et al. 2015]. We disprove this conjecture by constructing a sequence of instances which shows that the worst-case distortion of Ranked Pairs is at least 5. Our lower bound on the worst case distortion of Ranked Pairs matches a previously known upper bound for the Copeland rule, proving that in the worst case, the simpler Copeland rule is at least as good as Ranked Pairs. And as long as we are limited to (weighted or unweighted) tournament rules, we demonstrate that randomization cannot help achieve an expected worst-case distortion of less than 3. Using the concept of approximate majorization within the distortion framework, we prove that Copeland and Randomized Dictatorship achieve low constant factor fairness-ratios (5 and 3 respectively), which is a considerable generalization of similar results for the sum of costs and single largest cost objectives. In addition to all of the above, we outline several interesting directions for further research in this space. △ Less

Submitted 8 May, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

arXiv:1610.03474 [pdf, other]

The Core of the Participatory Budgeting Problem

Authors: Brandon Fain, Ashish Goel, Kamesh Munagala

Abstract: In participatory budgeting, communities collectively decide on the allocation of public tax dollars for local public projects. In this work, we consider the question of fairly aggregating the preferences of community members to determine an allocation of funds to projects. This problem is different from standard fair resource allocation because of public goods: The allocated goods benefit all user… ▽ More In participatory budgeting, communities collectively decide on the allocation of public tax dollars for local public projects. In this work, we consider the question of fairly aggregating the preferences of community members to determine an allocation of funds to projects. This problem is different from standard fair resource allocation because of public goods: The allocated goods benefit all users simultaneously. Fairness is crucial in participatory decision making, since generating equitable outcomes is an important goal of these processes. We argue that the classic game theoretic notion of core captures fairness in the setting. To compute the core, we first develop a novel characterization of a public goods market equilibrium called the Lindahl equilibrium, which is always a core solution. We then provide the first (to our knowledge) polynomial time algorithm for computing such an equilibrium for a broad set of utility functions; our algorithm also generalizes (in a non-trivial way) the well-known concept of proportional fairness. We use our theoretical insights to perform experiments on real participatory budgeting voting data. We empirically show that the core can be efficiently computed for utility functions that naturally model our practical setting, and examine the relation of the core with the familiar welfare objective. Finally, we address concerns of incentives and mechanism design by develo** a randomized approximately dominant-strategy truthful mechanism building on the exponential mechanism from differential privacy. △ Less

Submitted 14 October, 2016; v1 submitted 11 October, 2016; originally announced October 2016.

arXiv:1605.08143 [pdf, other]

Towards large-scale deliberative decision-making: small groups and the importance of triads

Authors: Ashish Goel, David T. Lee

Abstract: Though deliberation is a critical component of democratic decision-making, existing deliberative processes do not scale to large groups of people. Motivated by this, we propose a model in which large-scale decision-making takes place through a sequence of small group interactions. Our model considers a group of participants, each having an opinion which together form a graph. We show that for medi… ▽ More Though deliberation is a critical component of democratic decision-making, existing deliberative processes do not scale to large groups of people. Motivated by this, we propose a model in which large-scale decision-making takes place through a sequence of small group interactions. Our model considers a group of participants, each having an opinion which together form a graph. We show that for median graphs, a class of graphs including grids and trees, it is possible to use a small number of three-person interactions to tightly approximate the wisdom of the crowd, defined here to be the generalized median of participant opinions, even when agents are strategic. Interestingly, we also show that this sharply contrasts with small groups of size two, for which we prove an impossibility result. Specifically, we show that it is impossible to use sequences of two-person interactions satisfying natural axioms to find a tight approximation of the generalized median, even when agents are non-strategic. Our results demonstrate the potential of small group interactions for reaching global decision-making properties. △ Less

Submitted 4 June, 2016; v1 submitted 26 May, 2016; originally announced May 2016.

arXiv:1603.07796 [pdf, ps, other]

Approximate Personalized PageRank on Dynamic Graphs

Authors: Hongyang Zhang, Peter Lofgren, Ashish Goel

Abstract: We propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One, Forward Push, propagates probability mass forwards along edges from a source node, while the other, Reverse Push, propagates local changes backwa… ▽ More We propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One, Forward Push, propagates probability mass forwards along edges from a source node, while the other, Reverse Push, propagates local changes backwards along edges from a target. In both variations, we maintain an invariant between two vectors, and when an edge is updated, our algorithm first modifies the vectors to restore the invariant, then performs any needed local push operations to restore accuracy. For Reverse Push, we prove that for an arbitrary directed graph in a random edge model, or for an arbitrary undirected graph, given a uniformly random target node $t$, the cost to maintain a PPR vector to $t$ of additive error $\varepsilon$ as $k$ edges are updated is $O(k + \bar{d} / \varepsilon)$, where $\bar{d}$ is the average degree of the graph. This is $O(1)$ work per update, plus the cost of computing a reverse vector once on a static graph. For Forward Push, we show that on an arbitrary undirected graph, given a uniformly random start node $s$, the cost to maintain a PPR vector from $s$ of degree-normalized error $\varepsilon$ as $k$ edges are updated is $O(k + 1 / \varepsilon)$, which is again $O(1)$ per update plus the cost of computing a PPR vector once on a static graph. △ Less

Submitted 22 December, 2017; v1 submitted 24 March, 2016; originally announced March 2016.

Comments: KDD'16

arXiv:1512.03275 [pdf, other]

doi 10.1088/1361-6382/ab6271

Quantum Cosmology in Four Dimensions

Authors: Teresa Bautista, André Benevides, Atish Dabholkar, Akash Goel

Abstract: We analyze the cosmological solutions to the recently proposed nonlocal quantum effective action for gravity with a cosmological term. We show that the vacuum energy decays with a slow-roll parameter proportional to the anomalous gravitational dressings. We analyze the cosmological solutions to the recently proposed nonlocal quantum effective action for gravity with a cosmological term. We show that the vacuum energy decays with a slow-roll parameter proportional to the anomalous gravitational dressings. △ Less

Submitted 10 December, 2015; originally announced December 2015.

Comments: 19 pages

arXiv:1510.07795 [pdf]

Improvised Broadcast Algorithm for Wireless Networks

Authors: Ashima Goel, Debasis Das

Abstract: Broadcasting problem is an important issue in the wireless networks, especially in dynamic wireless networks. In dynamic wireless networks the node density and mobility is high, due to several problems which arise during broadcasting. Two major problems faced are namely, Broadcast Storm Problem and Disconnected network problem. In a highly dense network, if information is being flooded in a loop,… ▽ More Broadcasting problem is an important issue in the wireless networks, especially in dynamic wireless networks. In dynamic wireless networks the node density and mobility is high, due to several problems which arise during broadcasting. Two major problems faced are namely, Broadcast Storm Problem and Disconnected network problem. In a highly dense network, if information is being flooded in a loop, it could lead to broadcast storm. The broadcast storm may eventually crash the entire network and lead to loss of information. Mobility of the nodes may lead to the problem of Disconnected Network. If the two nodes sending and receiving information are mobile with different speeds, it could lead to a disconnection between them as soon as the receiver moves out of the communication range. In this paper, we are trying to solve both the problems based on our proposed algorithms. △ Less

Submitted 27 October, 2015; originally announced October 2015.

Comments: 4 pages

Journal ref: International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) - 2015

arXiv:1507.08705 [pdf, other]

Bidirectional PageRank Estimation: From Average-Case to Worst-Case

Authors: Peter Lofgren, Siddhartha Banerjee, Ashish Goel

Abstract: We present a new algorithm for estimating the Personalized PageRank (PPR) between a source and target node on undirected graphs, with sublinear running-time guarantees over the worst-case choice of source and target nodes. Our work builds on a recent line of work on bidirectional estimators for PPR, which obtained sublinear running-time guarantees but in an average-case sense, for a uniformly rand… ▽ More We present a new algorithm for estimating the Personalized PageRank (PPR) between a source and target node on undirected graphs, with sublinear running-time guarantees over the worst-case choice of source and target nodes. Our work builds on a recent line of work on bidirectional estimators for PPR, which obtained sublinear running-time guarantees but in an average-case sense, for a uniformly random choice of target node. Crucially, we show how the reversibility of random walks on undirected networks can be exploited to convert average-case to worst-case guarantees. While past bidirectional methods combine forward random walks with reverse local pushes, our algorithm combines forward local pushes with reverse random walks. We also discuss how to modify our methods to estimate random-walk probabilities for any length distribution, thereby obtaining fast algorithms for estimating general graph diffusions, including the heat kernel, on undirected networks. △ Less

Submitted 14 December, 2015; v1 submitted 30 July, 2015; originally announced July 2015.

Comments: Workshop on Algorithms and Models for the Web-Graph (WAW) 2015

arXiv:1507.05999 [pdf, other]

doi 10.1145/2835776.2835823

Personalized PageRank Estimation and Search: A Bidirectional Approach

Authors: Peter Lofgren, Siddhartha Banerjee, Ashish Goel

Abstract: We present new algorithms for Personalized PageRank estimation and Personalized PageRank search. First, for the problem of estimating Personalized PageRank (PPR) from a source distribution to a target node, we present a new bidirectional estimator with simple yet strong guarantees on correctness and performance, and 3x to 8x speedup over existing estimators in experiments on a diverse set of netwo… ▽ More We present new algorithms for Personalized PageRank estimation and Personalized PageRank search. First, for the problem of estimating Personalized PageRank (PPR) from a source distribution to a target node, we present a new bidirectional estimator with simple yet strong guarantees on correctness and performance, and 3x to 8x speedup over existing estimators in experiments on a diverse set of networks. Moreover, it has a clean algebraic structure which enables it to be used as a primitive for the Personalized PageRank Search problem: Given a network like Facebook, a query like "people named John", and a searching user, return the top nodes in the network ranked by PPR from the perspective of the searching user. Previous solutions either score all nodes or score candidate nodes one at a time, which is prohibitively slow for large candidate sets. We develop a new algorithm based on our bidirectional PPR estimator which identifies the most relevant results by sampling candidates based on their PPR; this is the first solution to PPR search that can find the best results without iterating through the set of all candidate results. Finally, by combining PPR sampling with sequential PPR estimation and Monte Carlo, we develop practical algorithms for PPR search, and we show via experiments that our algorithms are efficient on networks with billions of edges. △ Less

Submitted 14 December, 2015; v1 submitted 21 July, 2015; originally announced July 2015.

Comments: WSDM 2016

ACM Class: H.3.3; G.2.2

arXiv:1504.01302 [pdf, ps, other]

doi 10.1103/PhysRevD.91.104029

Tidal Forces in Naked Singularity Backgrounds

Authors: Akash Goel, Reevu Maity, Pratim Roy, Tapobrata Sarkar

Abstract: The end stage of a gravitational collapse process can generically result in a black hole or a naked singularity. Here we undertake a comparative analysis of the nature of tidal forces in these backgrounds. The effect of such forces is generically exemplified by the Roche limit, which predicts the distance within which a celestial object disintegrates due to the tidal effects of a second more massi… ▽ More The end stage of a gravitational collapse process can generically result in a black hole or a naked singularity. Here we undertake a comparative analysis of the nature of tidal forces in these backgrounds. The effect of such forces is generically exemplified by the Roche limit, which predicts the distance within which a celestial object disintegrates due to the tidal effects of a second more massive object. In this paper, using Fermi normal coordinates, we numerically compute the Roche limit for a class of non-rotating naked singularity backgrounds, and compare them with known results for Schwarzschild black holes. Our analysis indicates that there might be substantially large deviations in the magnitudes of tidal forces in naked singularity backgrounds, compared to the black hole cases. If observationally established, these can prove to be an effective indicator of the nature of the singularity at a galactic centre. △ Less

Submitted 6 April, 2015; originally announced April 2015.

Comments: 1 + 18 Pages, 9 figures

arXiv:1409.5671 [pdf, other]

A Formal Methods Approach to Pattern Synthesis in Reaction Diffusion Systems

Authors: Ebru Aydin Gol, Ezio Bartocci, Calin Belta

Abstract: We propose a technique to detect and generate patterns in a network of locally interacting dynamical systems. Central to our approach is a novel spatial superposition logic, whose semantics is defined over the quad-tree of a partitioned image. We show that formulas in this logic can be efficiently learned from positive and negative examples of several types of patterns. We also demonstrate that pa… ▽ More We propose a technique to detect and generate patterns in a network of locally interacting dynamical systems. Central to our approach is a novel spatial superposition logic, whose semantics is defined over the quad-tree of a partitioned image. We show that formulas in this logic can be efficiently learned from positive and negative examples of several types of patterns. We also demonstrate that pattern detection, which is implemented as a model checking algorithm, performs very well for test data sets different from the learning sets. We define a quantitative semantics for the logic and integrate the model checking algorithm with particle swarm optimization in a computational framework for synthesis of parameters leading to desired patterns in reaction-diffusion systems. △ Less

Submitted 12 September, 2014; originally announced September 2014.

arXiv:1408.1437 [pdf, other]

doi 10.1109/TCNS.2015.2428471

Traffic Network Control from Temporal Logic Specifications

Authors: Samuel Coogan, Ebru Aydin Gol, Murat Arcak, Calin Belta

Abstract: We propose a framework for generating a signal control policy for a traffic network of signalized intersections to accomplish control objectives expressible using linear temporal logic. By applying techniques from model checking and formal methods, we obtain a correct-by-construction controller that is guaranteed to satisfy complex specifications. To apply these tools, we identify and exploit stru… ▽ More We propose a framework for generating a signal control policy for a traffic network of signalized intersections to accomplish control objectives expressible using linear temporal logic. By applying techniques from model checking and formal methods, we obtain a correct-by-construction controller that is guaranteed to satisfy complex specifications. To apply these tools, we identify and exploit structural properties particular to traffic networks that allow for efficient computation of a finite state abstraction. In particular, traffic networks exhibit a componentwise monotonicity property which allows reach set computations that scale linearly with the dimension of the continuous state space. △ Less

Submitted 21 June, 2016; v1 submitted 6 August, 2014; originally announced August 2014.

Journal ref: IEEE Transactions on Control of Network Systems, vol. 3, no. 2, pp. 162-172, June 2016

arXiv:1406.7542 [pdf]

Crowdsourcing for Participatory Democracies: Efficient Elicitation of Social Choice Functions

Authors: David Lee, Ashish Goel, Tanja Aitamurto, Helene Landemore

Abstract: We present theoretical and empirical results demonstrating the usefulness of voting rules for participatory democracies. We first give algorithms which efficiently elicit ε-approximations to two prominent voting rules: the Borda rule and the Condorcet winner. This result circumvents previous prohibitive lower bounds and is surprisingly strong: even if the number of ideas is as large as the number… ▽ More We present theoretical and empirical results demonstrating the usefulness of voting rules for participatory democracies. We first give algorithms which efficiently elicit ε-approximations to two prominent voting rules: the Borda rule and the Condorcet winner. This result circumvents previous prohibitive lower bounds and is surprisingly strong: even if the number of ideas is as large as the number of participants, each participant will only have to make a logarithmic number of comparisons, an exponential improvement over the linear number of comparisons previously needed. We demonstrate the approach in an experiment in Finland's recent off-road traffic law reform, observing that the total number of comparisons needed to achieve a fixed εapproximation is linear in the number of ideas and that the constant is not large. Finally, we note a few other experimental observations which support the use of voting rules for aggregation. First, we observe that rating, one of the common alternatives to ranking, manifested effects of bias in our data. Second, we show that very few of the topics lacked a Condorcet winner, one of the prominent negative results in voting. Finally, we show data hinting at a potential future direction: the use of partial rankings as opposed to pairwise comparisons to further decrease the elicitation time. △ Less

Submitted 16 July, 2014; v1 submitted 29 June, 2014; originally announced June 2014.

Report number: ci-2014/14

arXiv:1404.3181 [pdf, other]

FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs

Authors: Peter Lofgren, Siddhartha Banerjee, Ashish Goel, C. Seshadhri

Abstract: We propose a new algorithm, FAST-PPR, for estimating personalized PageRank: given start node $s$ and target node $t$ in a directed graph, and given a threshold $δ$, FAST-PPR estimates the Personalized PageRank $π_s(t)$ from $s$ to $t$, guaranteeing a small relative error as long $π_s(t)>δ$. Existing algorithms for this problem have a running-time of $Ω(1/δ)$; in comparison, FAST-PPR has a provable… ▽ More We propose a new algorithm, FAST-PPR, for estimating personalized PageRank: given start node $s$ and target node $t$ in a directed graph, and given a threshold $δ$, FAST-PPR estimates the Personalized PageRank $π_s(t)$ from $s$ to $t$, guaranteeing a small relative error as long $π_s(t)>δ$. Existing algorithms for this problem have a running-time of $Ω(1/δ)$; in comparison, FAST-PPR has a provable average running-time guarantee of ${O}(\sqrt{d/δ})$ (where $d$ is the average in-degree of the graph). This is a significant improvement, since $δ$ is often $O(1/n)$ (where $n$ is the number of nodes) for applications. We also complement the algorithm with an $Ω(1/\sqrtδ)$ lower bound for PageRank estimation, showing that the dependence on $δ$ cannot be improved. We perform a detailed empirical study on numerous massive graphs, showing that FAST-PPR dramatically outperforms existing algorithms. For example, on the 2010 Twitter graph with 1.5 billion edges, for target nodes sampled by popularity, FAST-PPR has a $20$ factor speedup over the state of the art. Furthermore, an enhanced version of FAST-PPR has a $160$ factor speedup on the Twitter graph, and is at least $20$ times faster on all our candidate graphs. △ Less

Submitted 21 August, 2014; v1 submitted 11 April, 2014; originally announced April 2014.

Comments: KDD 2014

ACM Class: G.2.2; F.2.2

arXiv:1305.1318 [pdf]

Meta-Analysis of Gene Level Association Tests

Authors: Dajiang J. Liu, Gina M. Peloso, Xiaowei Zhan, Oddgeir Holmen, Matthew Zawistowski, Shuang Feng, Majid Nikpay, Paul L. Auer, Anuj Goel, He Zhang, Ulrike Peters, Martin Farrall, Marju Orho-Melander, Charles Kooperberg, Ruth McPherson, Hugh Watkins, Cristen J. Willer, Kristian Hveem, Olle Melander, Sekar Kathiresan, Gonçalo R. Abecasis

Abstract: The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare varian… ▽ More The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the unit of analysis. Here, we propose and evaluate new approaches for meta-analysis of rare variant association. We show that our approach retains useful features of single variant meta-analytic approaches and demonstrate its utility in a study of blood lipid levels in ~18,500 individuals genotyped with exome arrays. △ Less

Submitted 6 May, 2013; originally announced May 2013.

arXiv:1304.4658 [pdf, other]

Personalized PageRank to a Target Node

Authors: Peter Lofgren, Ashish Goel

Abstract: Personalalized PageRank uses random walks to determine the importance or authority of nodes in a graph from the point of view of a given source node. Much past work has considered how to compute personalized PageRank from a given source node to other nodes. In this work we consider the problem of computing personalized PageRanks to a given target node from all source nodes. This problem can be int… ▽ More Personalalized PageRank uses random walks to determine the importance or authority of nodes in a graph from the point of view of a given source node. Much past work has considered how to compute personalized PageRank from a given source node to other nodes. In this work we consider the problem of computing personalized PageRanks to a given target node from all source nodes. This problem can be interpreted as finding who supports the target or who is interested in the target. We present an efficient algorithm for computing personalized PageRank to a given target up to any given accuracy. We give a simple analysis of our algorithm's running time in both the average case and the parameterized worst-case. We show that for any graph with $n$ nodes and $m$ edges, if the target node is randomly chosen and the teleport probability $α$ is given, the algorithm will compute a result with $ε$ error in time $O\left(\frac{1}{αε} \left(\frac{m}{n} + \log(n)\right)\right)$. This is much faster than the previously proposed method of computing personalized PageRank separately from every source node, and it is comparable to the cost of computing personalized PageRank from a single source. We present results from experiments on the Twitter graph which show that the constant factors in our running time analysis are small and our algorithm is efficient in practice. △ Less

Submitted 11 April, 2014; v1 submitted 16 April, 2013; originally announced April 2013.

arXiv:1303.6512 [pdf]

Web Service Interface for Data Collection

Authors: Ruchika Thukral, Anita Goel

Abstract: Data collection is a key component of an information system. The widespread penetration of ICT tools in organizations and institutions has resulted in a shift in the way the data is collected. Data may be collected in printed-form, by e-mails, on a compact disk, or, by direct upload on the management information system. Since web services are platform-independent, it can access data stored in the… ▽ More Data collection is a key component of an information system. The widespread penetration of ICT tools in organizations and institutions has resulted in a shift in the way the data is collected. Data may be collected in printed-form, by e-mails, on a compact disk, or, by direct upload on the management information system. Since web services are platform-independent, it can access data stored in the XML format from any platform. In this paper, we present an interface which uses web services for data collection. It requires interaction between a web service deployed for the purposes of data collection, and the web address where the data is stored. Our interface requires that the web service has pre-knowledge of the address from where the data is to be collected. Also, the data to be accessed must be stored in XML format. Since our interface uses computer-supported interaction on both sides, it eases the task of regular and ongoing data collection. We apply our framework to the Education Management Information System, which collects data from schools spread across the country. △ Less

Submitted 26 March, 2013; originally announced March 2013.

Comments: 6 pages, 3 figures, International Journal in computer Science and issues

Journal ref: IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 3, May 2012, IJCSI-9-3-3-525-530

arXiv:1211.6526 [pdf, ps, other]

Complexity Measures for Map-Reduce, and Comparison to Parallel Computing

Authors: Ashish Goel, Kamesh Munagala

Abstract: The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity measures that allow us to talk about Map-Reduce algorithms formally, and second, we want to point out why this model is actually different from other models of para… ▽ More The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity measures that allow us to talk about Map-Reduce algorithms formally, and second, we want to point out why this model is actually different from other models of parallel programming, most notably the PRAM (Parallel Random Access Memory) model. We are looking for complexity measures that are detailed enough to make fine-grained distinction between different algorithms, but which also abstract away many of the implementation details. △ Less

Submitted 28 November, 2012; originally announced November 2012.

arXiv:1210.7057 [pdf, ps, other]

Efficient Distributed Locality Sensitive Hashing

Authors: Bahman Bahmani, Ashish Goel, Rajendra Shinde

Abstract: Distributed frameworks are gaining increasingly widespread use in applications that process large amounts of data. One important example application is large scale similarity search, for which Locality Sensitive Hashing (LSH) has emerged as the method of choice, specially when the data is high-dimensional. At its core, LSH is based on hashing the data points to a number of buckets such that simila… ▽ More Distributed frameworks are gaining increasingly widespread use in applications that process large amounts of data. One important example application is large scale similarity search, for which Locality Sensitive Hashing (LSH) has emerged as the method of choice, specially when the data is high-dimensional. At its core, LSH is based on hashing the data points to a number of buckets such that similar points are more likely to map to the same buckets. To guarantee high search quality, the LSH scheme needs a rather large number of hash tables. This entails a large space requirement, and in the distributed setting, with each query requiring a network call per hash bucket look up, this also entails a big network load. The Entropy LSH scheme proposed by Panigrahy significantly reduces the number of required hash tables by looking up a number of query offsets in addition to the query itself. While this improves the LSH space requirement, it does not help with (and in fact worsens) the search network efficiency, as now each query offset requires a network call. In this paper, focusing on the Euclidian space under $l_2$ norm and building up on Entropy LSH, we propose the distributed Layered LSH scheme, and prove that it exponentially decreases the network cost, while maintaining a good load balance between different machines. Our experiments also verify that our scheme results in a significant network traffic reduction that brings about large runtime improvement in real world applications. △ Less

Submitted 26 October, 2012; originally announced October 2012.

Comments: A short version of this paper will appear in CIKM 2012

arXiv:1210.0664 [pdf, ps, other]

Triadic Consensus: A Randomized Algorithm for Voting in a Crowd

Authors: Ashish Goel, David Lee

Abstract: Typical voting rules do not work well in settings with many candidates. If there are just several hundred candidates, then even a simple task such as choosing a top candidate becomes impractical. Motivated by the hope of develo** group consensus mechanisms over the internet, where the numbers of candidates could easily number in the thousands, we study an urn-based voting rule where each partici… ▽ More Typical voting rules do not work well in settings with many candidates. If there are just several hundred candidates, then even a simple task such as choosing a top candidate becomes impractical. Motivated by the hope of develo** group consensus mechanisms over the internet, where the numbers of candidates could easily number in the thousands, we study an urn-based voting rule where each participant acts as a voter and a candidate. We prove that when participants lie in a one-dimensional space, this voting protocol finds a $(1-ε/sqrt{n})$ approximation of the Condorcet winner with high probability while only requiring an expected $O(\frac{1}{ε^2}\log^2 \frac{n}{ε^2})$ comparisons on average per voter. Moreover, this voting protocol is shown to have a quasi-truthful Nash equilibrium: namely, a Nash equilibrium exists which may not be truthful, but produces a winner with the same probability distribution as that of the truthful strategy. △ Less

Submitted 2 October, 2012; originally announced October 2012.

arXiv:1209.5998 [pdf, other]

doi 10.1073/pnas.1217220110

Biased Assimilation, Homophily and the Dynamics of Polarization

Authors: Pranav Dandekar, Ashish Goel, David Lee

Abstract: Are we as a society getting more polarized, and if so, why? We try to answer this question through a model of opinion formation. Empirical studies have shown that homophily results in polarization. However, we show that DeGroot's well-known model of opinion formation based on repeated averaging can never be polarizing, even if individuals are arbitrarily homophilous. We generalize DeGroot's model… ▽ More Are we as a society getting more polarized, and if so, why? We try to answer this question through a model of opinion formation. Empirical studies have shown that homophily results in polarization. However, we show that DeGroot's well-known model of opinion formation based on repeated averaging can never be polarizing, even if individuals are arbitrarily homophilous. We generalize DeGroot's model to account for a phenomenon well-known in social psychology as biased assimilation: when presented with mixed or inconclusive evidence on a complex issue, individuals draw undue support for their initial position thereby arriving at a more extreme opinion. We show that in a simple model of homophilous networks, our biased opinion formation process results in either polarization, persistent disagreement or consensus depending on how biased individuals are. In other words, homophily alone, without biased assimilation, is not sufficient to polarize society. Quite interestingly, biased assimilation also provides insight into the following related question: do internet based recommender algorithms that show us personalized content contribute to polarization? We make a connection between biased assimilation and the polarizing effects of some random-walk based recommender algorithms that are similar in spirit to some commonly used recommender algorithms. △ Less

Submitted 26 September, 2012; originally announced September 2012.

arXiv:1208.5471 [pdf, other]

Finite Bisimulations for Switched Linear Systems

Authors: Ebru Aydin Gol, Xuchu Ding, Mircea Lazar, Calin Belta

Abstract: In this paper, we consider the problem of constructing a finite bisimulation quotient for a discrete-time switched linear system in a bounded subset of its state space. Given a set of observations over polytopic subsets of the state space and a switched linear system with stable subsystems, the proposed algorithm generates the bisimulation quotient in a finite number of steps with the aid of suble… ▽ More In this paper, we consider the problem of constructing a finite bisimulation quotient for a discrete-time switched linear system in a bounded subset of its state space. Given a set of observations over polytopic subsets of the state space and a switched linear system with stable subsystems, the proposed algorithm generates the bisimulation quotient in a finite number of steps with the aid of sublevel sets of a polyhedral Lyapunov function. Starting from a sublevel set that includes the origin in its interior, the proposed algorithm iteratively constructs the bisimulation quotient for any larger sublevel set. The bisimulation quotient can then be further used for synthesis of the switching law and system verification with respect to specifications given as syntactically co-safe Linear Temporal Logic formulas over the observed polytopic subsets. △ Less

Submitted 27 August, 2012; originally announced August 2012.

arXiv:1206.2082 [pdf, ps, other]

Dimension Independent Similarity Computation

Authors: Reza Bosagh Zadeh, Ashish Goel

Abstract: We present a suite of algorithms for Dimension Independent Similarity Computation (DISCO) to compute all pairwise similarities between very high dimensional sparse vectors. All of our results are provably independent of dimension, meaning apart from the initial cost of trivially reading in the data, all subsequent operations are independent of the dimension, thus the dimension can be very large. W… ▽ More We present a suite of algorithms for Dimension Independent Similarity Computation (DISCO) to compute all pairwise similarities between very high dimensional sparse vectors. All of our results are provably independent of dimension, meaning apart from the initial cost of trivially reading in the data, all subsequent operations are independent of the dimension, thus the dimension can be very large. We study Cosine, Dice, Overlap, and the Jaccard similarity measures. For Jaccard similiarity we include an improved version of MinHash. Our results are geared toward the MapReduce framework. We empirically validate our theorems at large scale using data from the social networking site Twitter. At time of writing, our algorithms are live in production at twitter.com. △ Less

Submitted 23 May, 2013; v1 submitted 10 June, 2012; originally announced June 2012.

Showing 151–200 of 223 results for author: Goel, A