-
Causal Conceptions of Fairness and their Consequences
Authors:
Hamed Nilforoshan,
Johann Gaebler,
Ravi Shroff,
Sharad Goel
Abstract:
Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) th…
▽ More
Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions \emph{almost always} -- in a measure theoretic sense -- result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. For example, in the case of college admissions decisions, policies constrained to satisfy causal fairness definitions would be disfavored by every stakeholder with neutral or positive preferences for both academic preparedness and diversity. Indeed, under a prominent definition of causal fairness, we prove the resulting policies require admitting all students with the same probability, regardless of academic qualifications or group membership. Our results highlight formal limitations and potential adverse consequences of common mathematical notions of causal fairness.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Simultaneously sorting overlap** quantum states of light
Authors:
Suraj Goel,
Max Tyler,
Feng Zhu,
Saroch Leedumrongwatthanakun,
Mehul Malik,
Jonathan Leach
Abstract:
The efficient manipulation, sorting, and measurement of optical modes and single-photon states is fundamental to classical and quantum science. Here, we realise simultaneous and efficient sorting of non-orthogonal, overlap** states of light, encoded in the transverse spatial degree of freedom. We use a specifically designed multi-plane light converter (MPLC) to sort states encoded in dimensions…
▽ More
The efficient manipulation, sorting, and measurement of optical modes and single-photon states is fundamental to classical and quantum science. Here, we realise simultaneous and efficient sorting of non-orthogonal, overlap** states of light, encoded in the transverse spatial degree of freedom. We use a specifically designed multi-plane light converter (MPLC) to sort states encoded in dimensions ranging from $d = 3$ to $d = 7$. Through the use of an auxiliary output mode, the MPLC simultaneously performs the unitary operation required for unambiguous discrimination and the basis change for the outcomes to be spatially separated. Our results lay the groundwork for optimal image identification and classification via optical networks, with potential applications ranging from self-driving cars to quantum communication systems.
△ Less
Submitted 11 April, 2023; v1 submitted 8 July, 2022;
originally announced July 2022.
-
RAPid-Learn: A Framework for Learning to Recover for Handling Novelties in Open-World Environments
Authors:
Shivam Goel,
Yash Shukla,
Vasanth Sarathy,
Matthias Scheutz,
Jivko Sinapov
Abstract:
We propose RAPid-Learn: Learning to Recover and Plan Again, a hybrid planning and learning method, to tackle the problem of adapting to sudden and unexpected changes in an agent's environment (i.e., novelties). RAPid-Learn is designed to formulate and solve modifications to a task's Markov Decision Process (MDPs) on-the-fly and is capable of exploiting domain knowledge to learn any new dynamics ca…
▽ More
We propose RAPid-Learn: Learning to Recover and Plan Again, a hybrid planning and learning method, to tackle the problem of adapting to sudden and unexpected changes in an agent's environment (i.e., novelties). RAPid-Learn is designed to formulate and solve modifications to a task's Markov Decision Process (MDPs) on-the-fly and is capable of exploiting domain knowledge to learn any new dynamics caused by the environmental changes. It is capable of exploiting the domain knowledge to learn action executors which can be further used to resolve execution impasses, leading to a successful plan execution. This novelty information is reflected in its updated domain model. We demonstrate its efficacy by introducing a wide variety of novelties in a gridworld environment inspired by Minecraft, and compare our algorithm with transfer learning baselines from the literature. Our method is (1) effective even in the presence of multiple novelties, (2) more sample efficient than transfer learning RL baselines, and (3) robust to incomplete model information, as opposed to pure symbolic planning approaches.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Authors:
Shashank Goel,
Hritik Bansal,
Sumit Bhatia,
Ryan A. Rossi,
Vishwa Vinay,
Aditya Grover
Abstract:
Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional robustness. Such models typically require joint reasoning in the image and text representation spaces for downstream inference tasks. Contrary to prior beliefs, we demonstrate that the image and…
▽ More
Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional robustness. Such models typically require joint reasoning in the image and text representation spaces for downstream inference tasks. Contrary to prior beliefs, we demonstrate that the image and text representations learned via a standard contrastive objective are not interchangeable and can lead to inconsistent downstream predictions. To mitigate this issue, we formalize consistency and propose CyCLIP, a framework for contrastive representation learning that explicitly optimizes for the learned representations to be geometrically consistent in the image and text space. In particular, we show that consistent representations can be learned by explicitly symmetrizing (a) the similarity between the two mismatched image-text pairs (cross-modal consistency); and (b) the similarity between the image-image pair and the text-text pair (in-modal consistency). Empirically, we show that the improved consistency in CyCLIP translates to significant gains over CLIP, with gains ranging from 10%-24% for zero-shot classification accuracy on standard benchmarks (CIFAR-10, CIFAR-100, ImageNet1K) and 10%-27% for robustness to various natural distribution shifts. The code is available at https://github.com/goel-shashank/CyCLIP.
△ Less
Submitted 26 October, 2022; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Optimal grading contests
Authors:
Sumit Goel
Abstract:
We study the design of grading contests between agents with private information about their abilities under the assumption that the value of a grade is determined by the information it reveals about the agent's productivity. Towards the goal of identifying the effort-maximizing grading contest, we study the effect of increasing prizes and increasing competition on effort and find that the effects…
▽ More
We study the design of grading contests between agents with private information about their abilities under the assumption that the value of a grade is determined by the information it reveals about the agent's productivity. Towards the goal of identifying the effort-maximizing grading contest, we study the effect of increasing prizes and increasing competition on effort and find that the effects depend qualitatively on the distribution of abilities in the population. Consequently, while the optimal grading contest always uniquely identifies the best performing agent, it may want to pool or separate the remaining agents depending upon the distribution. We identify sufficient conditions under which a rank-revealing grading contest, a leaderboard-with-cutoff type grading contest, and a coarse grading contest with at most three grades are optimal. In the process, we also identify distributions under which there is a monotonic relationship between the informativeness of a grading scheme and the effort induced by it.
△ Less
Submitted 16 September, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Inverse-design of high-dimensional quantum optical circuits in a complex medium
Authors:
Suraj Goel,
Saroch Leedumrongwatthanakun,
Natalia Herrera Valencia,
Will McCutcheon,
Claudio Conti,
Pepijn W. H. Pinkse,
Mehul Malik
Abstract:
Programmable optical circuits form a key part of quantum technologies today, ranging from transceivers for quantum communication to integrated photonic chips for quantum information processing. As the size of such circuits is increased, maintaining precise control over every individual component becomes challenging, leading to a reduction in the quality of the operations performed. In parallel, mi…
▽ More
Programmable optical circuits form a key part of quantum technologies today, ranging from transceivers for quantum communication to integrated photonic chips for quantum information processing. As the size of such circuits is increased, maintaining precise control over every individual component becomes challenging, leading to a reduction in the quality of the operations performed. In parallel, minor imperfections in circuit fabrication are amplified in this regime, dramatically inhibiting their performance. Here we show how embedding an optical circuit in the higher-dimensional space of a large, ambient mode-mixer using inverse-design techniques allows us to forgo control over each individual circuit element, while retaining a high degree of programmability over the circuit. Using this approach, we implement high-dimensional linear optical circuits within a complex scattering medium consisting of a commercial multi-mode fibre placed between two controllable phase planes. We employ these circuits to manipulate high-dimensional spatial-mode entanglement in up to seven dimensions, demonstrating their application as fully programmable quantum gates. Furthermore, we show how their programmability allows us to turn the multi-mode fibre itself into a generalised multi-outcome measurement device, allowing us to both transport and certify entanglement within the transmission channel. Finally, we discuss the scalability of our approach, numerically showing how a high circuit fidelity can be achieved with a low circuit depth by harnessing the resource of a high-dimensional mode-mixer. Our work serves as an alternative yet powerful approach for realising precise control over high-dimensional quantum states of light, with clear applications in next-generation quantum communication and computing technologies.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
On the distribution of index of Farey Sequences
Authors:
Bittu,
Sneha Chaubey,
Shivani Goel
Abstract:
In this article, we study the distribution of index of Farey fractions which was first introduced and studied by Hall and Shiu. We provide asymptotic formulas for moments of index of Farey fractions twisted by Dirichlet characters for Farey fractions with $\mathcal{B}$-free denominators. Additionally, we reconsider the squarefree case earlier done in [ALVZ08], and obtain new results for moments of…
▽ More
In this article, we study the distribution of index of Farey fractions which was first introduced and studied by Hall and Shiu. We provide asymptotic formulas for moments of index of Farey fractions twisted by Dirichlet characters for Farey fractions with $\mathcal{B}$-free denominators. Additionally, we reconsider the squarefree case earlier done in [ALVZ08], and obtain new results for moments of indices with square-free denominators. We also study higher level correlations of the index function generalizing earlier known results on two level correlations.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Topological Duality for Distributive Lattices: Theory and Applications
Authors:
Mai Gehrke,
Sam van Gool
Abstract:
This book is a course in Stone-Priestley duality theory, with applications to logic and theoretical computer science. Our target audience are graduate students and researchers in mathematics and computer science. Our aim is to get in a fairly full palette of duality tools as directly and quickly as possible, then to illustrate and further elaborate these tools within the setting of three emblemati…
▽ More
This book is a course in Stone-Priestley duality theory, with applications to logic and theoretical computer science. Our target audience are graduate students and researchers in mathematics and computer science. Our aim is to get in a fairly full palette of duality tools as directly and quickly as possible, then to illustrate and further elaborate these tools within the setting of three emblematic applications: semantics of propositional logics, domain theory in logical form, and the theory of profinite monoids for the study of regular languages and automata.
△ Less
Submitted 5 April, 2023; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Understanding Contrastive Learning Requires Incorporating Inductive Biases
Authors:
Nikunj Saunshi,
Jordan Ash,
Surbhi Goel,
Dipendra Misra,
Cyril Zhang,
Sanjeev Arora,
Sham Kakade,
Akshay Krishnamurthy
Abstract:
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of {\em augmentations} and the value of…
▽ More
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of {\em augmentations} and the value of {\em contrastive loss} of representations. We demonstrate that such analyses, that ignore {\em inductive biases} of the function class and training algorithm, cannot adequately explain the success of contrastive learning, even {\em provably} leading to vacuous guarantees in some settings. Extensive experiments on image and text domains highlight the ubiquity of this problem -- different function classes and algorithms behave very differently on downstream tasks, despite having the same augmentations and contrastive losses. Theoretical analysis is presented for the class of linear representations, where incorporating inductive biases of the function class allows contrastive learning to work with less stringent conditions compared to prior analyses.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Stable allocations in discrete exchange economies
Authors:
Federico Echenique,
Sumit Goel,
SangMok Lee
Abstract:
We study stable allocations in an exchange economy with indivisible goods. The problem is well-known to be challenging, and rich enough to encode fundamentally unstable economies, such as the roommate problem. Our approach stems from generalizing the original study of an exchange economy with unit demand and unit endowments, the \emph{housing model}. Our first approach uses Scarf's theorem, and pr…
▽ More
We study stable allocations in an exchange economy with indivisible goods. The problem is well-known to be challenging, and rich enough to encode fundamentally unstable economies, such as the roommate problem. Our approach stems from generalizing the original study of an exchange economy with unit demand and unit endowments, the \emph{housing model}. Our first approach uses Scarf's theorem, and proposes sufficient conditions under which a ``convexify then round'' technique ensures that the core is nonempty. The upshot is that a core allocation exists in categorical economies with dichotomous preferences. Our second approach uses a generalization of the TTC: it works under general conditions, and finds a solution that is a version of the stable set.
△ Less
Submitted 21 February, 2024; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Adaptive Sampling Strategies to Construct Equitable Training Datasets
Authors:
William Cai,
Ro Encarnacion,
Bobbie Chern,
Sam Corbett-Davies,
Miranda Bogen,
Stevie Bergman,
Sharad Goel
Abstract:
In domains ranging from computer vision to natural language processing, machine learning models have been shown to exhibit stark disparities, often performing worse for members of traditionally underserved groups. One factor contributing to these performance gaps is a lack of representation in the data the models are trained on. It is often unclear, however, how to operationalize representativenes…
▽ More
In domains ranging from computer vision to natural language processing, machine learning models have been shown to exhibit stark disparities, often performing worse for members of traditionally underserved groups. One factor contributing to these performance gaps is a lack of representation in the data the models are trained on. It is often unclear, however, how to operationalize representativeness in specific applications. Here we formalize the problem of creating equitable training datasets, and propose a statistical framework for addressing this problem. We consider a setting where a model builder must decide how to allocate a fixed data collection budget to gather training data from different subgroups. We then frame dataset creation as a constrained optimization problem, in which one maximizes a function of group-specific performance metrics based on (estimated) group-specific learning rates and costs per sample. This flexible approach incorporates preferences of model-builders and other stakeholders, as well as the statistical properties of the learning task. When data collection decisions are made sequentially, we show that under certain conditions this optimization problem can be efficiently solved even without prior knowledge of the learning rates. To illustrate our approach, we conduct a simulation study of polygenic risk scores on synthetic genomic data -- an application domain that often suffers from non-representative data collection. We find that our adaptive sampling strategy outperforms several common data collection heuristics, including equal and proportional sampling, demonstrating the value of strategic dataset design for building equitable models.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Towards Adversarial Evaluations for Inexact Machine Unlearning
Authors:
Shashwat Goel,
Ameya Prabhu,
Amartya Sanyal,
Ser-Nam Lim,
Philip Torr,
Ponnurangam Kumaraguru
Abstract:
Machine Learning models face increased concerns regarding the storage of personal user data and adverse impacts of corrupted data like backdoors or systematic bias. Machine Unlearning can address these by allowing post-hoc deletion of affected training data from a learned model. Achieving this task exactly is computationally expensive; consequently, recent works have proposed inexact unlearning al…
▽ More
Machine Learning models face increased concerns regarding the storage of personal user data and adverse impacts of corrupted data like backdoors or systematic bias. Machine Unlearning can address these by allowing post-hoc deletion of affected training data from a learned model. Achieving this task exactly is computationally expensive; consequently, recent works have proposed inexact unlearning algorithms to solve this approximately as well as evaluation methods to test the effectiveness of these algorithms.
In this work, we first outline some necessary criteria for evaluation methods and show no existing evaluation satisfies them all. Then, we design a stronger black-box evaluation method called the Interclass Confusion (IC) test which adversarially manipulates data during training to detect the insufficiency of unlearning procedures. We also propose two analytically motivated baseline methods~(EU-k and CF-k) which outperform several popular inexact unlearning methods. Overall, we demonstrate how adversarial evaluation strategies can help in analyzing various unlearning phenomena which can guide the development of stronger unlearning algorithms.
△ Less
Submitted 22 February, 2023; v1 submitted 17 January, 2022;
originally announced January 2022.
-
First-order separation over countable ordinals
Authors:
Thomas Colcombet,
Sam van Gool,
Rémi Morvan
Abstract:
We show that the existence of a first-order formula separating two monadic second order formulas over countable ordinal words is decidable. This extends the work of Henckell and Almeida on finite words, and of Place and Zeitoun on $ω$-words. For this, we develop the algebraic concept of monoid (resp. $ω$-semigroup, resp. ordinal monoid) with aperiodic merge, an extension of monoids (resp. $ω$-semi…
▽ More
We show that the existence of a first-order formula separating two monadic second order formulas over countable ordinal words is decidable. This extends the work of Henckell and Almeida on finite words, and of Place and Zeitoun on $ω$-words. For this, we develop the algebraic concept of monoid (resp. $ω$-semigroup, resp. ordinal monoid) with aperiodic merge, an extension of monoids (resp. $ω$-semigroup, resp. ordinal monoid) that explicitly includes a new operation capturing the loss of precision induced by first-order indistinguishability. We also show the computability of FO-pointlike sets, and the decidability of the covering problem for first-order logic on countable ordinal words.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
On resistance matrices of weighted balanced digraphs
Authors:
R. Balaji,
R. B. Bapat,
Shivani Goel
Abstract:
Let $G$ be a connected graph with $V(G)=\{1,\dotsc,n\}$. Then the resistance distance between any two vertices $i$ and $j$ is given by $r_{ij}:=l_{ii}^† + l_{jj}^†-2 l_{ij}^†$, where $l_{ij}^†$ is the $(i,j)^{\rm th}$ entry of the Moore-Penrose inverse of the Laplacian matrix of $G$.
For the resistance matrix $R:=[r_{ij}]$, there is an elegant formula to compute the inverse of $R$. This says tha…
▽ More
Let $G$ be a connected graph with $V(G)=\{1,\dotsc,n\}$. Then the resistance distance between any two vertices $i$ and $j$ is given by $r_{ij}:=l_{ii}^† + l_{jj}^†-2 l_{ij}^†$, where $l_{ij}^†$ is the $(i,j)^{\rm th}$ entry of the Moore-Penrose inverse of the Laplacian matrix of $G$.
For the resistance matrix $R:=[r_{ij}]$, there is an elegant formula to compute the inverse of $R$. This says that \[R^{-1}=-\frac{1}{2}L + \frac{1}{τ' R τ} ττ', \] where \[τ:=(τ_1,\dotsc,τ_n)'~~\mbox{and}~~ τ_{i}:=2- \sum_{\{j \in V(G):(i,j) \in E(G)\}} r_{ij}~~~i=1,\dotsc,n. \] A far reaching generalization of this result that gives an inverse formula for a generalized resistance matrix of a strongly connected and matrix weighted balanced directed graph is obtained in this paper. When the weights are scalars, it is shown that the generalized resistance is a non-negative real number. We also obtain a perturbation result involving resistance matrices of connected graphs and Laplacians of digraphs.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
PARIS: Personalized Activity Recommendation for Improving Sleep Quality
Authors:
Meghna Singh,
Saksham Goel,
Abhiraj Mohan,
Jaideep Srivastava
Abstract:
The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to…
▽ More
The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to study and improve sleep quality. In this paper, we utilize the relationship between physical activity and sleep quality to find ways of assisting people improve their sleep using machine learning techniques. People usually have several behavior modes that their bio-functions can be divided into. Performing time series clustering on activity data, we find cluster centers that would correlate to the most evident behavior modes for a specific subject. Activity recipes are then generated for good sleep quality for each behavior mode within each cluster. These activity recipes are supplied to an activity recommendation engine for suggesting a mix of relaxed to intense activities to subjects during their daily routines. The recommendations are further personalized based on the subjects' lifestyle constraints, i.e. their age, gender, body mass index (BMI), resting heart rate, etc, with the objective of the recommendation being the improvement of that night's quality of sleep. This would in turn serve a longer-term health objective, like lowering heart rate, improving the overall quality of sleep, etc.
△ Less
Submitted 28 May, 2024; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Anti-Concentrated Confidence Bonuses for Scalable Exploration
Authors:
Jordan T. Ash,
Cyril Zhang,
Surbhi Goel,
Akshay Krishnamurthy,
Sham Kakade
Abstract:
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in l…
▽ More
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in large action spaces. This bonus scheme cannot be directly transferred to high-dimensional exploration problems, however, due to the computational cost of maintaining the inverse covariance matrix of action features. We introduce \emph{anti-concentrated confidence bounds} for efficiently approximating the elliptical bonus, using an ensemble of regressors trained to predict random noise from policy network-derived features. Using this approximation, we obtain stochastic linear bandit algorithms which obtain $\tilde O(d \sqrt{T})$ regret bounds for $\mathrm{poly}(d)$ fixed actions. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic reward heuristics on Atari benchmarks.
△ Less
Submitted 11 April, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Authors:
Benjamin L. Edelman,
Surbhi Goel,
Sham Kakade,
Cyril Zhang
Abstract:
Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. This work provides a theoretical analysis of the inductive biases of self-attention modules. Our focus is to rigorously establish which functions and long-range dependencies self-attention blocks prefer to represent…
▽ More
Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. This work provides a theoretical analysis of the inductive biases of self-attention modules. Our focus is to rigorously establish which functions and long-range dependencies self-attention blocks prefer to represent. Our main result shows that bounded-norm Transformer networks "create sparse variables": a single self-attention head can represent a sparse function of the input sequence, with sample complexity scaling only logarithmically with the context length. To support our analysis, we present synthetic experiments to probe the sample complexity of learning sparse Boolean functions with Transformers.
△ Less
Submitted 23 June, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
Authors:
Jasmine Collins,
Shubham Goel,
Kenan Deng,
Achleshwar Luthra,
Leon Xu,
Erhan Gundogdu,
Xi Zhang,
Tomas F. Yago Vicente,
Thomas Dideriksen,
Himanshu Arora,
Matthieu Guillaumin,
Jitendra Malik
Abstract:
We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure…
▽ More
We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure the current limits of the state-of-the-art on three open problems for real-world 3D object understanding: single-view 3D reconstruction, material estimation, and cross-domain multi-view object retrieval.
△ Less
Submitted 24 June, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Differentiable Stereopsis: Meshes from multiple views using differentiable rendering
Authors:
Shubham Goel,
Georgia Gkioxari,
Jitendra Malik
Abstract:
We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras. We pair traditional stereopsis and modern differentiable rendering to build an end-to-end model which predicts textured 3D meshes of objects with varying topologies and shape. We frame stereopsis as an optimization problem and simultaneously update shape an…
▽ More
We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras. We pair traditional stereopsis and modern differentiable rendering to build an end-to-end model which predicts textured 3D meshes of objects with varying topologies and shape. We frame stereopsis as an optimization problem and simultaneously update shape and cameras via simple gradient descent. We run an extensive quantitative analysis and compare to traditional multi-view stereo techniques and state-of-the-art learning based methods. We show compelling reconstructions on challenging real-world scenes and for an abundance of object types with complex shape, topology and texture. Project webpage: https://shubham-goel.github.io/ds/
△ Less
Submitted 23 September, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
On the distribution of Ramanujan Sums over number fields
Authors:
Sneha Chaubey,
Shivani Goel
Abstract:
For a number field $\mathbb{K}$, and integral ideals $\mathcal{I}$ and $\mathcal{J}$ in its number ring $\mathcal{O}_{\mathbb{K}}$, Nowak studied the asymptotic behaviour of the average of Ramanujan sums $C_{\mathcal{J}}({\mathcal{I}})$ over both ideals $\mathcal{I}$ and $\mathcal{J}$. In this article, we extend this investigation by establishing asymptotic formulas for the second moment of averag…
▽ More
For a number field $\mathbb{K}$, and integral ideals $\mathcal{I}$ and $\mathcal{J}$ in its number ring $\mathcal{O}_{\mathbb{K}}$, Nowak studied the asymptotic behaviour of the average of Ramanujan sums $C_{\mathcal{J}}({\mathcal{I}})$ over both ideals $\mathcal{I}$ and $\mathcal{J}$. In this article, we extend this investigation by establishing asymptotic formulas for the second moment of averages of Ramanujan sums over quadratic and cubic number fields, thereby generalizing previous works of Chen, Kumchev, Robles, and Roy on moments of averages of Ramanujan sums over rationals. Additionally, using a special property of certain integral domains, we obtain second moment results for Ramanujan sums over some other number fields.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making
Authors:
Alex Chohlas-Wood,
Madison Coots,
Henry Zhu,
Emma Brunskill,
Sharad Goel
Abstract:
In an attempt to make algorithms fair, the machine learning literature has largely focused on equalizing decisions, outcomes, or error rates across race or gender groups. To illustrate, consider a hypothetical government rideshare program that provides transportation assistance to low-income people with upcoming court dates. Following this literature, one might allocate rides to those with the hig…
▽ More
In an attempt to make algorithms fair, the machine learning literature has largely focused on equalizing decisions, outcomes, or error rates across race or gender groups. To illustrate, consider a hypothetical government rideshare program that provides transportation assistance to low-income people with upcoming court dates. Following this literature, one might allocate rides to those with the highest estimated treatment effect per dollar, while constraining spending to be equal across race groups. That approach, however, ignores the downstream consequences of such constraints, and, as a result, can induce unexpected harms. For instance, if one demographic group lives farther from court, enforcing equal spending would necessarily mean fewer total rides provided, and potentially more people penalized for missing court. Here we present an alternative framework for designing equitable algorithms that foregrounds the consequences of decisions. In our approach, one first elicits stakeholder preferences over the space of possible decisions and the resulting outcomes--such as preferences for balancing spending parity against court appearance rates. We then optimize over the space of decision policies, making trade-offs in a way that maximizes the elicited utility. To do so, we develop an algorithm for efficiently learning these optimal policies from data for a large family of expressive utility functions. In particular, we use a contextual bandit algorithm to explore the space of policies while solving a convex optimization problem at each step to estimate the best policy based on the available information. This consequentialist paradigm facilitates a more holistic approach to equitable decision-making.
△ Less
Submitted 12 February, 2024; v1 submitted 17 September, 2021;
originally announced September 2021.
-
Steiner distance matrix of caterpillar graphs
Authors:
Ali Azimi,
R. B. Bapat,
Shivani Goel
Abstract:
For a connected graph $G:=(V,E)$, the Steiner distance $d_G(X)$ among a set of vertices $X$ is the minimum size among all the connected subgraphs of $G$ whose vertex set contains $X$. The $k-$Steiner distance matrix $D_k(G)$ of $G$ is a matrix whose rows and columns are indexed by $k-$subsets of $V$. For $k$-subsets $X_1$ and $X_2$, the $(X_1,X_2)-$entry of $D_k(G)$ is $d_G(X_1 \cup X_2)$. In this…
▽ More
For a connected graph $G:=(V,E)$, the Steiner distance $d_G(X)$ among a set of vertices $X$ is the minimum size among all the connected subgraphs of $G$ whose vertex set contains $X$. The $k-$Steiner distance matrix $D_k(G)$ of $G$ is a matrix whose rows and columns are indexed by $k-$subsets of $V$. For $k$-subsets $X_1$ and $X_2$, the $(X_1,X_2)-$entry of $D_k(G)$ is $d_G(X_1 \cup X_2)$. In this paper, we show that the rank of $2-$Steiner distance matrix of a caterpillar graph on $N$ vertices and with $p$ pendant veritices is $2N-p-1$.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation
Authors:
Shashwat Goel,
Kunwar Shaanjeet Singh Grover
Abstract:
This paper describes an approach used to generate new translations using raw bilingual dictionaries as part of the 4th Task Inference Across Dictionaries (TIAD 2021) shared task. We propose Augmented Cycle Density (ACD) as a framework that combines insights from two state of the art methods that require no sense information and parallel corpora: Cycle Density (CD) and One Time Inverse Consultation…
▽ More
This paper describes an approach used to generate new translations using raw bilingual dictionaries as part of the 4th Task Inference Across Dictionaries (TIAD 2021) shared task. We propose Augmented Cycle Density (ACD) as a framework that combines insights from two state of the art methods that require no sense information and parallel corpora: Cycle Density (CD) and One Time Inverse Consultation (OTIC). The task results show that across 3 unseen language pairs, ACD's predictions, has more than double (74%) the coverage of OTIC at almost the same precision (76%). ACD combines CD's scalability - leveraging rich multilingual graphs for better predictions, and OTIC's data efficiency - producing good results with the minimum possible resource of one pivot language.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Statistical Estimation from Dependent Data
Authors:
Yuval Dagan,
Constantinos Daskalakis,
Nishanth Dikkala,
Surbhi Goel,
Anthimos Vardis Kandiros
Abstract:
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and…
▽ More
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {\em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Investigating the Role of Negatives in Contrastive Representation Learning
Authors:
Jordan T. Ash,
Surbhi Goel,
Akshay Krishnamurthy,
Dipendra Misra
Abstract:
Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many…
▽ More
Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Gone Fishing: Neural Active Learning with Fisher Embeddings
Authors:
Jordan T. Ash,
Surbhi Goel,
Akshay Krishnamurthy,
Sham Kakade
Abstract:
There is an increasing need for effective active learning algorithms that are compatible with deep neural networks. This paper motivates and revisits a classic, Fisher-based active selection objective, and proposes BAIT, a practical, tractable, and high-performing algorithm that makes it viable for use with neural models. BAIT draws inspiration from the theoretical analysis of maximum likelihood e…
▽ More
There is an increasing need for effective active learning algorithms that are compatible with deep neural networks. This paper motivates and revisits a classic, Fisher-based active selection objective, and proposes BAIT, a practical, tractable, and high-performing algorithm that makes it viable for use with neural models. BAIT draws inspiration from the theoretical analysis of maximum likelihood estimators (MLE) for parametric models. It selects batches of samples by optimizing a bound on the MLE error in terms of the Fisher information, which we show can be implemented efficiently at scale by exploiting linear-algebraic structure especially amenable to execution on modern hardware. Our experiments demonstrate that BAIT outperforms the previous state of the art on both classification and regression problems, and is flexible enough to be used with a variety of model architectures.
△ Less
Submitted 14 December, 2021; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Probability Paths and the Structure of Predictions over Time
Authors:
Zhiyuan Jerry Lin,
Hao Sheng,
Sharad Goel
Abstract:
In settings ranging from weather forecasts to political prognostications to financial projections, probability estimates of future binary outcomes often evolve over time. For example, the estimated likelihood of rain on a specific day changes by the hour as new information becomes available. Given a collection of such probability paths, we introduce a Bayesian framework -- which we call the Gaussi…
▽ More
In settings ranging from weather forecasts to political prognostications to financial projections, probability estimates of future binary outcomes often evolve over time. For example, the estimated likelihood of rain on a specific day changes by the hour as new information becomes available. Given a collection of such probability paths, we introduce a Bayesian framework -- which we call the Gaussian latent information martingale, or GLIM -- for modeling the structure of dynamic predictions over time. Suppose, for example, that the likelihood of rain in a week is 50 %, and consider two hypothetical scenarios. In the first, one expects the forecast to be equally likely to become either 25 % or 75 % tomorrow; in the second, one expects the forecast to stay constant for the next several days. A time-sensitive decision-maker might select a course of action immediately in the latter scenario, but may postpone their decision in the former, knowing that new information is imminent. We model these trajectories by assuming predictions update according to a latent process of information flow, which is inferred from historical data. In contrast to general methods for time series analysis, this approach preserves important properties of probability paths such as the martingale structure and appropriate amount of volatility and better quantifies future uncertainties around probability paths. We show that GLIM outperforms three popular baseline methods, producing better estimated posterior probability path distributions measured by three different metrics. By elucidating the dynamic structure of predictions over time, we hope to help individuals make more informed choices.
△ Less
Submitted 4 November, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Time Warps, from Algebra to Algorithms
Authors:
Sam van Gool,
Adrien Guatto,
George Metcalfe,
Simon Santschi
Abstract:
Graded modalities have been proposed in recent work on programming languages as a general framework for refining type systems with intensional properties. In particular, continuous endomaps of the discrete time scale, or time warps, can be used to quantify the growth of information in the course of program execution. Time warps form a complete residuated lattice, with the residuals playing an impo…
▽ More
Graded modalities have been proposed in recent work on programming languages as a general framework for refining type systems with intensional properties. In particular, continuous endomaps of the discrete time scale, or time warps, can be used to quantify the growth of information in the course of program execution. Time warps form a complete residuated lattice, with the residuals playing an important role in potential programming applications. In this paper, we study the algebraic structure of time warps, and prove that their equational theory is decidable, a necessary condition for their use in real-world compilers. We also describe how our universal-algebraic proof technique lends itself to a constraint-based implementation, establishing a new link between universal algebra and verification technology.
△ Less
Submitted 19 August, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Room-temperature spin injection and spin-to-charge conversion in a ferromagnetic semiconductor / topological insulator heterostructure
Authors:
Shobhit Goel,
Nguyen Huynh Duy Khang,
Le Duc Anh,
Pham Nam Hai,
Masaaki Tanaka
Abstract:
Spin injection using ferromagnetic semiconductors at room temperature is a building block for the realization of spin-functional semiconductor devices. Nevertheless, this has been very challenging due to the lack of reliable room-temperature ferromagnetism in well-known group IV and III-V based semiconductors. Here, we demonstrate room-temperature spin injection by using spin pum** in a (Ga,Fe)S…
▽ More
Spin injection using ferromagnetic semiconductors at room temperature is a building block for the realization of spin-functional semiconductor devices. Nevertheless, this has been very challenging due to the lack of reliable room-temperature ferromagnetism in well-known group IV and III-V based semiconductors. Here, we demonstrate room-temperature spin injection by using spin pum** in a (Ga,Fe)Sb / BiSb heterostructure, where (Ga,Fe)Sb is a ferromagnetic semiconductor (FMS) with high Curie temperature (TC) and BiSb is a topological insulator (TI). Despite the very small magnetization of (Ga,Fe)Sb at room temperature (45 emu/cc), we are able to detect spin injection from (Ga,Fe)Sb by utilizing the inverse spin Hall effect (ISHE) in the topological surface states of BiSb with a large inverse spin Hall angle of 2.5. Our study provides the first demonstration of spin injection as well as spin-to-charge conversion at room temperature in a FMS/TI heterostructure.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Surveilling Surveillance: Estimating the Prevalence of Surveillance Cameras with Street View Data
Authors:
Hao Sheng,
Keniel Yao,
Sharad Goel
Abstract:
The use of video surveillance in public spaces -- both by government agencies and by private citizens -- has attracted considerable attention in recent years, particularly in light of rapid advances in face-recognition technology. But it has been difficult to systematically measure the prevalence and placement of cameras, hampering efforts to assess the implications of surveillance on privacy and…
▽ More
The use of video surveillance in public spaces -- both by government agencies and by private citizens -- has attracted considerable attention in recent years, particularly in light of rapid advances in face-recognition technology. But it has been difficult to systematically measure the prevalence and placement of cameras, hampering efforts to assess the implications of surveillance on privacy and public safety. Here, we combine computer vision, human verification, and statistical analysis to estimate the spatial distribution of surveillance cameras. Specifically, we build a camera detection model and apply it to 1.6 million street view images sampled from 10 large U.S. cities and 6 other major cities around the world, with positive model detections verified by human experts. After adjusting for the estimated recall of our model, and accounting for the spatial coverage of our sampled images, we are able to estimate the density of surveillance cameras visible from the road. Across the 16 cities we consider, the estimated number of surveillance cameras per linear kilometer ranges from 0.2 (in Los Angeles) to 0.9 (in Seoul). In a detailed analysis of the 10 U.S. cities, we find that cameras are concentrated in commercial, industrial, and mixed zones, and in neighborhoods with higher shares of non-white residents -- a pattern that persists even after adjusting for land use. These results help inform ongoing discussions on the use of surveillance technology, including its potential disparate impacts on communities of color.
△ Less
Submitted 30 August, 2021; v1 submitted 4 May, 2021;
originally announced May 2021.
-
Blocks as geographic discontinuities: The effect of polling place assignment on voting
Authors:
Sabina Tomkins,
Keniel Yao,
Johann Gaebler,
Tobias Konitzer,
David Rothschild,
Marc Meredith,
Sharad Goel
Abstract:
A potential voter must incur a number of costs in order to successfully cast an in-person ballot, including the costs associated with identifying and traveling to a polling place. In order to investigate how these costs affect voting behavior, we introduce two quasi-experimental designs that can be used to study how the political participation of registered voters is affected by differences in the…
▽ More
A potential voter must incur a number of costs in order to successfully cast an in-person ballot, including the costs associated with identifying and traveling to a polling place. In order to investigate how these costs affect voting behavior, we introduce two quasi-experimental designs that can be used to study how the political participation of registered voters is affected by differences in the relative distance that registrants must travel to their assigned Election Day polling place and whether their polling place remains at the same location as in a previous election. Our designs make comparisons of registrants who live on the same residential block, but are assigned to vote at different polling places. We find that living farther from a polling place and being assigned to a new polling place reduce in-person Election Day voting, but that registrants largely offset for this by casting more early in-person and mail ballots.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Generalized Euclidean distance matrices
Authors:
R. Balaji,
R. B. Bapat,
Shivani Goel
Abstract:
Euclidean distance matrices (EDM) are symmetric nonnegative matrices with several interesting properties. In this article, we introduce a wider class of matrices called generalized Euclidean distance matrices (GDMs) that include EDMs. Each GDM is an entry-wise nonnegative matrix. A GDM is not symmetric unless it is an EDM. By some new techniques, we show that many significant results on Euclidean…
▽ More
Euclidean distance matrices (EDM) are symmetric nonnegative matrices with several interesting properties. In this article, we introduce a wider class of matrices called generalized Euclidean distance matrices (GDMs) that include EDMs. Each GDM is an entry-wise nonnegative matrix. A GDM is not symmetric unless it is an EDM. By some new techniques, we show that many significant results on Euclidean distance matrices can be extended to generalized Euclidean distance matrices. These contain results about eigenvalues, inverse, determinant, spectral radius, Moore-Penrose inverse and some majorization inequalities. We finally give an application by constructing infinitely divisible matrices using generalized Euclidean distance matrices.
△ Less
Submitted 19 August, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Blockchain Based Accounts Payable Platform for Goods Trade
Authors:
Krishnasuri Narayanam,
Seep Goel,
Abhishek Singh,
Yedendra Shrinivasan,
Parameswaram Selvam
Abstract:
Goods trade is a supply chain transaction that involves shippers buying goods from suppliers and carriers providing goods transportation. Shippers are issued invoices from suppliers and carriers. Shippers carry out goods receiving and invoice processing before payment processing of bills for suppliers and carriers, where invoice processing includes tasks like processing claims and adjusting the bi…
▽ More
Goods trade is a supply chain transaction that involves shippers buying goods from suppliers and carriers providing goods transportation. Shippers are issued invoices from suppliers and carriers. Shippers carry out goods receiving and invoice processing before payment processing of bills for suppliers and carriers, where invoice processing includes tasks like processing claims and adjusting the bill payments. Goods receiving involves verification of received goods by the Shipper's receiving team. Invoice processing is carried out by the Shipper's accounts payable team, which in turn is verified by the accounts receivable teams of suppliers and carriers. This paper presents a blockchain-based accounts payable system that generates claims for the deficiency in the goods received and accordingly adjusts the payment in the bills for suppliers and carriers. Primary motivations for these supply chain organizations to adopt blockchain-based accounts payable systems are to eliminate the process redundancies (accounts payable vs. accounts receivable), to reduce the number of disputes among the transacting participants, and to accelerate the accounts payable processes via optimizations in the claims generation and blockchain-based dispute reconciliation.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Acceleration via Fractal Learning Rate Schedules
Authors:
Naman Agarwal,
Surbhi Goel,
Cyril Zhang
Abstract:
In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune. We demonstrate the presence of these subtleties even in the innocuous case when the objective is a convex quadratic. We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate sch…
▽ More
In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune. We demonstrate the presence of these subtleties even in the innocuous case when the objective is a convex quadratic. We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate schedule for accelerating vanilla gradient descent, and show that the problem of mitigating instability leads to a fractal ordering of step sizes. We provide some experiments to challenge conventional beliefs about stable learning rates in deep learning: the fractal schedule enables training to converge with locally unstable updates which make negative progress on the objective.
△ Less
Submitted 11 June, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
One Shot Audio to Animated Video Generation
Authors:
Neeraj Kumar,
Srishti Goel,
Ankur Narang,
Brejesh Lall,
Mujtaba Hasan,
Pranshu Agarwal,
Dipankar Sarkar
Abstract:
We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image…
▽ More
We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image. In the second stage, the talking-head video from the human domain is converted to the animated domain. The model architecture of the first stage consists of spatially adaptive normalization based multi-level generator and multiple multilevel discriminators along with multiple adversarial and non-adversarial losses. The second stage leverages attention based normalization driven GAN architecture along with temporal predictor based recycle loss and blink loss coupled with lipsync loss, for unsupervised generation of animated video. In our approach, the input audio clip is not restricted to any specific language, which gives the method multilingual applicability. OneShotAu2AV can generate animated videos that have: (a) lip movements that are in sync with the audio, (b) natural facial expressions such as blinks and eyebrow movements, (c) head movements. Experimental evaluation demonstrates superior performance of OneShotAu2AV as compared to U-GAT-IT and RecycleGan on multiple quantitative metrics including KID(Kernel Inception Distance), Word error rate, blinks/sec
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning
Authors:
Vasanth Sarathy,
Daniel Kasenberg,
Shivam Goel,
Jivko Sinapov,
Matthias Scheutz
Abstract:
Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains. However, they are typically handcrafted and tend to require precise formulations that are not robust to human error. Reinforcement learning (RL) approaches do not require such models, and instead learn domain dynamics by exploring the environment and collect…
▽ More
Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains. However, they are typically handcrafted and tend to require precise formulations that are not robust to human error. Reinforcement learning (RL) approaches do not require such models, and instead learn domain dynamics by exploring the environment and collecting rewards. However, RL approaches tend to require millions of episodes of experience and often learn policies that are not easily transferable to other tasks. In this paper, we address one aspect of the open problem of integrating these approaches: how can decision-making agents resolve discrepancies in their symbolic planning models while attempting to accomplish goals? We propose an integrated framework named SPOTTER that uses RL to augment and support ("spot") a planning agent by discovering new operators needed by the agent to accomplish goals that are initially unreachable for the agent. SPOTTER outperforms pure-RL approaches while also discovering transferable symbolic knowledge and does not require supervision, successful plan traces or any a priori knowledge about the missing planning operator.
△ Less
Submitted 23 December, 2020;
originally announced December 2020.
-
On distance matrices of helm graphs obtained from wheel graphs with an even number of vertices
Authors:
Shivani Goel
Abstract:
Let $n \geq 4$. The helm graph $H_n$ on $2n-1$ vertices is obtained from the wheel graph $W_n$ by adjoining a pendant edge to each vertex of the outer cycle of $W_n$. Suppose $n$ is even. Let $D := [d_{ij}]$ be the distance matrix of $H_n$. In this paper, we first show that $\det(D) = 3(n-1)2^{n-1}.$ Next, we find a matrix $Ł$ and a vector $u$ such that \[D^{-1} = -\frac{1}{2}Ł+\frac{4}{3(n-1)}uu'…
▽ More
Let $n \geq 4$. The helm graph $H_n$ on $2n-1$ vertices is obtained from the wheel graph $W_n$ by adjoining a pendant edge to each vertex of the outer cycle of $W_n$. Suppose $n$ is even. Let $D := [d_{ij}]$ be the distance matrix of $H_n$. In this paper, we first show that $\det(D) = 3(n-1)2^{n-1}.$ Next, we find a matrix $Ł$ and a vector $u$ such that \[D^{-1} = -\frac{1}{2}Ł+\frac{4}{3(n-1)}uu'.\] We also prove an interlacing property between the eigenvalues of $Ł$ and $D$.
△ Less
Submitted 23 December, 2020;
originally announced December 2020.
-
Robust One Shot Audio to Video Generation
Authors:
Neeraj Kumar,
Srishti Goel,
Ankur Narang,
Mujtaba Hasan
Abstract:
Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements is a challenging problem that involves complex learning steps for generative adversarial networks. Further, enabling one-shot learning for an unseen single ima…
▽ More
Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements is a challenging problem that involves complex learning steps for generative adversarial networks. Further, enabling one-shot learning for an unseen single image increases the complexity of the problem while simultaneously making it more applicable to practical scenarios. In the paper, we propose a novel approach OneShotA2V to synthesize a talking person video of arbitrary length using as input: an audio signal and a single unseen image of a person. OneShotA2V leverages curriculum learning to learn movements of expressive facial components and hence generates a high-quality talking-head video of the given person. Further, it feeds the features generated from the audio input directly into a generative adversarial network and it adapts to any given unseen selfie by applying fewshot learning with only a few output updation epochs. OneShotA2V leverages spatially adaptive normalization based multi-level generator and multiple multi-level discriminators based architecture. The input audio clip is not restricted to any specific language, which gives the method multilingual applicability. Experimental evaluation demonstrates superior performance of OneShotA2V as compared to Realistic Speech-Driven Facial Animation with GANs(RSDGAN) [43], Speech2Vid [8], and other approaches, on multiple quantitative metrics including: SSIM (structural similarity index), PSNR (peak signal to noise ratio) and CPBD (image sharpness). Further, qualitative evaluation and Online Turing tests demonstrate the efficacy of our approach.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Multi Modal Adaptive Normalization for Audio to Video Generation
Authors:
Neeraj Kumar,
Srishti Goel,
Ankur Narang,
Brejesh Lall
Abstract:
Speech-driven facial video generation has been a complex problem due to its multi-modal aspects namely audio and video domain. The audio comprises lots of underlying features such as expression, pitch, loudness, prosody(speaking style) and facial video has lots of variability in terms of head movement, eye blinks, lip synchronization and movements of various facial action units along with temporal…
▽ More
Speech-driven facial video generation has been a complex problem due to its multi-modal aspects namely audio and video domain. The audio comprises lots of underlying features such as expression, pitch, loudness, prosody(speaking style) and facial video has lots of variability in terms of head movement, eye blinks, lip synchronization and movements of various facial action units along with temporal smoothness. Synthesizing highly expressive facial videos from the audio input and static image is still a challenging task for generative adversarial networks. In this paper, we propose a multi-modal adaptive normalization(MAN) based architecture to synthesize a talking person video of arbitrary length using as input: an audio signal and a single image of a person. The architecture uses the multi-modal adaptive normalization, keypoint heatmap predictor, optical flow predictor and class activation map[58] based layers to learn movements of expressive facial components and hence generates a highly expressive talking-head video of the given person. The multi-modal adaptive normalization uses the various features of audio and video such as Mel spectrogram, pitch, energy from audio signals and predicted keypoint heatmap/optical flow and a single image to learn the respective affine parameters to generate highly expressive video. Experimental evaluation demonstrates superior performance of the proposed method as compared to Realistic Speech-Driven Facial Animation with GANs(RSDGAN) [53], Speech2Vid [10], and other approaches, on multiple quantitative metrics including: SSIM (structural similarity index), PSNR (peak signal to noise ratio), CPBD (image sharpness), WER(word error rate), blinks/sec and LMD(landmark distance). Further, qualitative evaluation and Online Turing tests demonstrate the efficacy of our approach.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Authors:
Neeraj Kumar,
Srishti Goel,
Ankur Narang,
Brejesh Lall
Abstract:
The style of the speech varies from person to person and every person exhibits his or her own style of speaking that is determined by the language, geography, culture and other factors. Style is best captured by prosody of a signal. High quality multi-speaker speech synthesis while considering prosody and in a few shot manner is an area of active research with many real-world applications. While m…
▽ More
The style of the speech varies from person to person and every person exhibits his or her own style of speaking that is determined by the language, geography, culture and other factors. Style is best captured by prosody of a signal. High quality multi-speaker speech synthesis while considering prosody and in a few shot manner is an area of active research with many real-world applications. While multiple efforts have been made in this direction, it remains an interesting and challenging problem. In this paper, we present a novel few shot multi-speaker speech synthesis approach (FSM-SS) that leverages adaptive normalization architecture with a non-autoregressive multi-head attention model. Given an input text and a reference speech sample of an unseen person, FSM-SS can generate speech in that person's style in a few shot manner. Additionally, we demonstrate how the affine parameters of normalization help in capturing the prosodic features such as energy and fundamental frequency in a disentangled fashion and can be used to generate morphed speech output. We demonstrate the efficacy of our proposed architecture on multi-speaker VCTK and LibriTTS datasets, using multiple quantitative metrics that measure generated speech distortion and MoS, along with speaker embedding analysis of the generated speech vs the actual speech samples.
△ Less
Submitted 13 December, 2020;
originally announced December 2020.
-
A Novel Tool for the Accurate and Affordable Early Diagnosis of Pancreatic Cancer via Machine Learning and Bioinformatics
Authors:
Siya Goel,
Clark Gedney,
Jean Honorio
Abstract:
Pancreatic cancer (PC) is the fourth leading cause of cancer death in the United States due to its five-year survival rate of 10%. Late diagnosis, affiliated with the asymptomatic nature in early stages and the location of the cancer with respect to the pancreas, makes current widely-accepted screening methods unavailable. Prior studies have achieved low (70-75%) diagnostic accuracy, possibly beca…
▽ More
Pancreatic cancer (PC) is the fourth leading cause of cancer death in the United States due to its five-year survival rate of 10%. Late diagnosis, affiliated with the asymptomatic nature in early stages and the location of the cancer with respect to the pancreas, makes current widely-accepted screening methods unavailable. Prior studies have achieved low (70-75%) diagnostic accuracy, possibly because 80% of PC cases are associated with diabetes, leading to misdiagnosis. To address the problems of frequent late diagnosis and misdiagnosis, we developed an accessible, accurate and affordable diagnostic tool for PC, by analyzing the expression of nineteen genes in PC and diabetes. First, machine learning algorithms were trained on four groups of subjects, depending on the occurrence of PC and Diabetes. The models were analyzed with 400 PC subjects at varying stages to ensure validity. Naive Bayes, Neural Network and K-Nearest Neighbors models achieved the highest testing accuracy of around 92.6%. Second, the biological implication of the nineteen genes was investigated using bioinformatics tools. It was found that these genes were significantly involved in regulating the cytoplasm, cytoskeleton and nuclear receptor activity in the pancreas, specifically in acinar and ductal cells. Our novel tool is the first in the literature that achieves a PC diagnostic accuracy of above 90%, having the potential to significantly improve the detection of PC in the background of diabetes and increase the five-year survival rate.
△ Less
Submitted 13 December, 2020;
originally announced December 2020.
-
Emotion Detection using Image Processing in Python
Authors:
Raghav Puri,
Archit Gupta,
Manas Sikri,
Mohit Tiwari,
Nitish Pathak,
Shivendra Goel
Abstract:
In this work, user's emotion using its facial expressions will be detected. These expressions can be derived from the live feed via system's camera or any pre-exisiting image available in the memory. Emotions possessed by humans can be recognized and has a vast scope of study in the computer vision industry upon which several researches have already been done. The work has been implemented using P…
▽ More
In this work, user's emotion using its facial expressions will be detected. These expressions can be derived from the live feed via system's camera or any pre-exisiting image available in the memory. Emotions possessed by humans can be recognized and has a vast scope of study in the computer vision industry upon which several researches have already been done. The work has been implemented using Python (2.7, Open Source Computer Vision Library (OpenCV) and NumPy. The scanned image(testing dataset) is being compared to the training dataset and thus emotion is predicted. The objective of this paper is to develop a system which can analyze the image and predict the expression of the person. The study proves that this procedure is workable and produces valid results.
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Tight Hardness Results for Training Depth-2 ReLU Networks
Authors:
Surbhi Goel,
Adam Klivans,
Pasin Manurangsi,
Daniel Reichman
Abstract:
We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also…
▽ More
We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of $k$ ReLUs minimizing the squared error (for $k>1$) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error $ε$. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest $κ$-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in $1/ε^2$. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on $ε$.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
A Search for Technosignatures Around 31 Sun-like Stars with the Green Bank Telescope at 1.15-1.73 GHz
Authors:
Jean-Luc Margot,
Pavlo Pinchuk,
Robert Geil,
Stephen Alexander,
Sparsh Arora,
Swagata Biswas,
Jose Cebreros,
Sanjana Prabhu Desai,
Benjamin Duclos,
Riley Dunne,
Kristy Kwan Lin,
Shashwat Goel,
Julia Gonzales,
Alexander Gonzalez,
Rishabh Jain,
Adrian Lam,
Briley Lewis,
Rebecca Lewis,
Grace Li,
Mason MacDougall,
Christopher Makarem,
Ivan Manan,
Eden Molina,
Caroline Nagib,
Kyle Neville
, et al. (15 additional authors not shown)
Abstract:
We conducted a search for technosignatures in April of 2018 and 2019 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. These observations focused on regions surrounding 31 Sun-like stars near the plane of the Galaxy. We present the results of our search for narrowband signals in this data set as well as improvements to our data processing pipeline. Specifically,…
▽ More
We conducted a search for technosignatures in April of 2018 and 2019 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. These observations focused on regions surrounding 31 Sun-like stars near the plane of the Galaxy. We present the results of our search for narrowband signals in this data set as well as improvements to our data processing pipeline. Specifically, we applied an improved candidate signal detection procedure that relies on the topographic prominence of the signal power, which nearly doubles the signal detection count of some previously analyzed data sets. We also improved the direction-of-origin filters that remove most radio frequency interference (RFI) to ensure that they uniquely link signals observed in separate scans. We performed a preliminary signal injection and recovery analysis to test the performance of our pipeline. We found that our pipeline recovers 93% of the injected signals over the usable frequency range of the receiver and 98% if we exclude regions with dense RFI. In this analysis, 99.73% of the recovered signals were correctly classified as technosignature candidates. Our improved data processing pipeline classified over 99.84% of the ~26 million signals detected in our data as RFI. Of the remaining candidates, 4539 were detected outside of known RFI frequency regions. The remaining candidates were visually inspected and verified to be of anthropogenic nature. Our search compares favorably to other recent searches in terms of end-to-end sensitivity, frequency drift rate coverage, and signal detection count per unit bandwidth per unit integration time.
△ Less
Submitted 17 November, 2020; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Joint Spatio-Textual Reasoning for Answering Tourism Questions
Authors:
Danish Contractor,
Shashank Goel,
Mausam,
Parag Singla
Abstract:
Our goal is to answer real-world tourism questions that seek Points-of-Interest (POI) recommendations. Such questions express various kinds of spatial and non-spatial constraints, necessitating a combination of textual and spatial reasoning. In response, we develop the first joint spatio-textual reasoning model, which combines geo-spatial knowledge with information in textual corpora to answer que…
▽ More
Our goal is to answer real-world tourism questions that seek Points-of-Interest (POI) recommendations. Such questions express various kinds of spatial and non-spatial constraints, necessitating a combination of textual and spatial reasoning. In response, we develop the first joint spatio-textual reasoning model, which combines geo-spatial knowledge with information in textual corpora to answer questions. We first develop a modular spatial-reasoning network that uses geo-coordinates of location names mentioned in a question, and of candidate answer POIs, to reason over only spatial constraints. We then combine our spatial-reasoner with a textual reasoner in a joint model and present experiments on a real world POI recommendation task. We report substantial improvements over existing models with-out joint spatio-textual reasoning.
△ Less
Submitted 19 October, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Analysis of Emotional Content in Indian Political Speeches
Authors:
Sharu Goel,
Sandeep Kumar Pandey,
Hanumant Singh Shekhawat
Abstract:
Emotions play an essential role in public speaking. The emotional content of speech has the power to influence minds. As such, we present an analysis of the emotional content of politicians speech in the Indian political scenario. We investigate the emotional content present in the speeches of politicians using an Attention based CNN+LSTM network. Experimental evaluations on a dataset of eight Ind…
▽ More
Emotions play an essential role in public speaking. The emotional content of speech has the power to influence minds. As such, we present an analysis of the emotional content of politicians speech in the Indian political scenario. We investigate the emotional content present in the speeches of politicians using an Attention based CNN+LSTM network. Experimental evaluations on a dataset of eight Indian politicians shows how politicians incorporate emotions in their speeches to strike a chord with the masses. An analysis of the voting share received along with victory margin and their relation to emotional content in speech of the politicians is also presented.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
From Boltzmann Machines to Neural Networks and Back Again
Authors:
Surbhi Goel,
Adam Klivans,
Frederic Koehler
Abstract:
Graphical models are powerful tools for modeling high-dimensional data, but learning graphical models in the presence of latent variables is well-known to be difficult. In this work we give new results for learning Restricted Boltzmann Machines, probably the most well-studied class of latent variable models. Our results are based on new connections to learning two-layer neural networks under…
▽ More
Graphical models are powerful tools for modeling high-dimensional data, but learning graphical models in the presence of latent variables is well-known to be difficult. In this work we give new results for learning Restricted Boltzmann Machines, probably the most well-studied class of latent variable models. Our results are based on new connections to learning two-layer neural networks under $\ell_{\infty}$ bounded input; for both problems, we give nearly optimal results under the conjectured hardness of sparse parity with noise. Using the connection between RBMs and feedforward networks, we also initiate the theoretical study of $supervised~RBMs$ [Hinton, 2012], a version of neural-network learning that couples distributional assumptions induced from the underlying graphical model with the architecture of the unknown function class. We then give an algorithm for learning a natural class of supervised RBMs with better runtime than what is possible for its related class of networks without distributional assumptions.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
Shape and Viewpoint without Keypoints
Authors:
Shubham Goel,
Angjoo Kanazawa,
Jitendra Malik
Abstract:
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We approach this highly under-constrained problem in a "analysis by synthesis" framework where the goal is to predict the likely shape, texture and camera viewpoint that co…
▽ More
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We approach this highly under-constrained problem in a "analysis by synthesis" framework where the goal is to predict the likely shape, texture and camera viewpoint that could produce the image with various learned category-specific priors. Our particular contribution in this paper is a representation of the distribution over cameras, which we call "camera-multiplex". Instead of picking a point estimate, we maintain a set of camera hypotheses that are optimized during training to best explain the image given the current shape and texture. We call our approach Unsupervised Category-Specific Mesh Reconstruction (U-CMR), and present qualitative and quantitative results on CUB, Pascal 3D and new web-scraped datasets. We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects using an image collection without any keypoint annotations or 3D ground truth. Project page: https://shubham-goel.github.io/ucmr
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
Project selection with partially verifiable information
Authors:
Sumit Goel,
Wade Hann-Caruthers
Abstract:
We consider a principal agent project selection problem with asymmetric information. There are $N$ projects and the principal must select exactly one of them. Each project provides some profit to the principal and some payoff to the agent and these profits and payoffs are the agent's private information. We consider the principal's problem of finding an optimal mechanism for two different objectiv…
▽ More
We consider a principal agent project selection problem with asymmetric information. There are $N$ projects and the principal must select exactly one of them. Each project provides some profit to the principal and some payoff to the agent and these profits and payoffs are the agent's private information. We consider the principal's problem of finding an optimal mechanism for two different objectives: maximizing expected profit and maximizing the probability of choosing the most profitable project. Importantly, we assume partial verifiability so that the agent cannot report a project to be more profitable to the principal than it actually is. Under this no-overselling constraint, we characterize the set of implementable mechanisms. Using this characterization, we find that in the case of two projects, the optimal mechanism under both objectives takes the form of a simple cutoff mechanism. The simple structure of the optimal mechanism also allows us to find evidence in support of the well-known ally-principle which says that principal delegates more authority to an agent who shares their preferences.
△ Less
Submitted 25 February, 2022; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Optimality of the coordinate-wise median mechanism for strategyproof facility location in two dimensions
Authors:
Sumit Goel,
Wade Hann-Caruthers
Abstract:
We consider the facility location problem in two dimensions. In particular, we consider a setting where agents have Euclidean preferences, defined by their ideal points, for a facility to be located in $\mathbb{R}^2$. We show that for the $p-norm$ ($p \geq 1$) objective, the coordinate-wise median mechanism (CM) has the lowest worst-case approximation ratio in the class of deterministic, anonymous…
▽ More
We consider the facility location problem in two dimensions. In particular, we consider a setting where agents have Euclidean preferences, defined by their ideal points, for a facility to be located in $\mathbb{R}^2$. We show that for the $p-norm$ ($p \geq 1$) objective, the coordinate-wise median mechanism (CM) has the lowest worst-case approximation ratio in the class of deterministic, anonymous, and strategyproof mechanisms. For the minisum objective and an odd number of agents $n$, we show that CM has a worst-case approximation ratio (AR) of $\sqrt{2}\frac{\sqrt{n^2+1}}{n+1}$. For the $p-norm$ social cost objective ($p\geq 2$), we find that the AR for CM is bounded above by $2^{\frac{3}{2}-\frac{2}{p}}$. We conjecture that the AR of CM actually equals the lower bound $2^{1-\frac{1}{p}}$ (as is the case for $p=2$ and $p=\infty$) for any $p\geq 2$.
△ Less
Submitted 11 July, 2022; v1 submitted 2 July, 2020;
originally announced July 2020.